{"id":12147,"date":"2023-05-23T23:18:12","date_gmt":"2023-05-23T15:18:12","guid":{"rendered":"https:\/\/www.tejwin.com\/?post_type=insight&#038;p=12147"},"modified":"2026-02-25T17:46:23","modified_gmt":"2026-02-25T09:46:23","slug":"comparison-of-the-funds-similarity","status":"publish","type":"insight","link":"https:\/\/www.tejwin.com\/en\/insight\/comparison-of-the-funds-similarity\/","title":{"rendered":"Comparison of the fund\u2019s similarity"},"content":{"rendered":"\n<figure class=\"wp-block-image caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/0ODl-ISAo8ZDdgn_g.jpg\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Photo by <a href=\"https:\/\/unsplash.com\/@sharonmccutcheon?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noreferrer noopener\">Alexander Grey<\/a> on&nbsp;<a href=\"https:\/\/unsplash.com?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\">Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_81 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-69f7787b4963f\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"ez-toc-cssicon\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-69f7787b4963f\"  aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.tejwin.com\/en\/insight\/comparison-of-the-funds-similarity\/#Highlight\" >Highlight<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.tejwin.com\/en\/insight\/comparison-of-the-funds-similarity\/#Preface\" >Preface<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.tejwin.com\/en\/insight\/comparison-of-the-funds-similarity\/#Programming_environment_and_Module_required\" >Programming environment and Module required<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.tejwin.com\/en\/insight\/comparison-of-the-funds-similarity\/#Database\" >Database<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.tejwin.com\/en\/insight\/comparison-of-the-funds-similarity\/#Data_Import\" >Data Import<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.tejwin.com\/en\/insight\/comparison-of-the-funds-similarity\/#Prepossessing\" >Prepossessing<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.tejwin.com\/en\/insight\/comparison-of-the-funds-similarity\/#Radar_Chart\" >Radar Chart<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.tejwin.com\/en\/insight\/comparison-of-the-funds-similarity\/#Euclidean_Distance\" >Euclidean Distance<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.tejwin.com\/en\/insight\/comparison-of-the-funds-similarity\/#Cosine_Similarity\" >Cosine Similarity<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.tejwin.com\/en\/insight\/comparison-of-the-funds-similarity\/#Conclusion\" >Conclusion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.tejwin.com\/en\/insight\/comparison-of-the-funds-similarity\/#Source_Code\" >Source Code<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.tejwin.com\/en\/insight\/comparison-of-the-funds-similarity\/#Extended_Reading\" >Extended Reading<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.tejwin.com\/en\/insight\/comparison-of-the-funds-similarity\/#Related_Link\" >Related Link<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Highlight\"><\/span>Highlight<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Difficulty\uff1a\u2605\u2605\u2605\u2606\u2606<\/li>\n\n\n\n<li>Using the data of the fund\u2019s basic information to implement similarity analysis.<\/li>\n\n\n\n<li>Advice\uff1aThe primary focal point for today\u2019s article is to codify the Euclidean Distance and the Cosine Similarity via Python. Detailed introductions for formulas and attributes are not included in this article. As a result, previews for Euclidean Distance and Cosine Similarity are suggested before reading this article.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Preface\"><\/span>Preface<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Investments always come with risks, and not all are inevitable. Through risk diversification, the distribution of different asset types could help investors efficiently manage risk and reduce the influence of market volatility on their portfolios. Today\u2019s article will mainly discuss how to use data, comparing the similarity of funds from a scientific point of view.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Programming_environment_and_Module_required\"><\/span>Programming environment and Module required<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>This article uses Mac OS as a system and jupyter as an editor.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code># \u8f09\u5165\u6240\u9700\u5957\u4ef6\nimport pandas as pd \nimport re\nimport numpy as np \nimport tejapi\nimport plotly.graph_objects as go\nimport random\nimport seaborn as sns\nimport math\n\n# \u767b\u5165TEJ API\napi_key = 'YOUR_KEY'\ntejapi.ApiConfig.api_key = api_key\ntejapi.ApiConfig.ignoretz = True<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Database\"><\/span>Database<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/api.tej.com.tw\/columns.html?idCode=TWN\/AATT\" target=\"_blank\" rel=\"noreferrer noopener\">Basic Information of Funds(TWN\/AATT)<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Data_Import\"><\/span>Data Import<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>fund = tejapi.get('TWN\/AATT',\n                   paginate = True,\n                   opts = {\n                       'columns':['coid', 'mdate', 'isin',\n                                  'fld006_c', 'fld007_c', \n                                  'fld014_c','fld015',\n                                  'fld016_c','un_name_c',\n                                  'risk', 'main_flag', \n                                  'fld021', 'currency', 'aunit1',\n                                 ]\n                   }\n                  )<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1nvEGQsxDV8t6sBgLhEm4Hw.png\" alt=\"\u57fa\u91d1\u8cc7\u6599\" width=\"782\" height=\"408\"\/><figcaption class=\"wp-element-caption\">preview of the dataframe<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Prepossessing\"><\/span>Prepossessing<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"b3e4\">In this article, we will use following columns.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>isin :&nbsp;<\/strong>ISIN Code<\/li>\n\n\n\n<li><strong>fld014_c :&nbsp;<\/strong>Type(Open Ended Fund\/Closed End Fund)<\/li>\n\n\n\n<li><strong>fld015 :&nbsp;<\/strong>Type(Fund raising place and investment area)<\/li>\n\n\n\n<li><strong>fld016_c :&nbsp;<\/strong>Investment target<\/li>\n\n\n\n<li><strong>risk :&nbsp;<\/strong>Risk Return<\/li>\n\n\n\n<li><strong>fld021<\/strong>&nbsp;: Initial asset size(thousand NTD)<\/li>\n\n\n\n<li><strong>currency :&nbsp;<\/strong>Currency<\/li>\n<\/ul>\n\n\n\n<p id=\"0c76\">Because most of the columns\u2019 data types are strings, we must encode them to ordinal numbers. Typically, we can simply use ordinal encoding packages to do so; however, we want to make sure that the information on the relative risk of each type can be preserved in sequence, so in this part, we will define our own function to ensure that this information wouldn\u2019t miss.<\/p>\n\n\n\n<p id=\"8f55\">In order to know how many ordinal numbers should be given, we print out all unique targets in our data frame.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>fund[\"fld016_c\"].unique()<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1ZP9hQIK17olhzpyS4nkJIA.png\" alt=\"\u57fa\u91d1\u6295\u8cc7\u6a19\u7684\" width=\"840\" height=\"65\"\/><figcaption class=\"wp-element-caption\">types of investment target<\/figcaption><\/figure>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>style = {\n    '':0,\n    '\u4fdd\u672c\u578b': 1,\n    '\u8ca8\u5e63\u578b': 2,\n    '\u50b5\u5238\u578b': 3,\n    '\u5e73\u8861\u578b': 4,\n    'ETF': 5,\n    '\u6307\u6578\u578b\u57fa\u91d1': 6,\n    '\u57fa\u91d1': 7,\n    '\u591a\u91cd\u8cc7\u7522': 8,\n    '\u80a1\u7968\u578b': 9,\n    '\u623f\u5730\u7522': 10,\n    '\u7522\u8b49\u5238\u5316': 11,\n    '\u4e0d\u52d5\u7522\u8b49\u5238\u5316': 12,\n    '\u79d1\u6280\u80a1': 13,\n    '\u5c0f\u578b\u80a1\u8cc7': 14,\n    '\u80fd\u6e90\u80a1\u7968': 15,\n    '\u671f\u8ca8\u5546\u54c1': 16,\n}\n\nrisk = {\n    \"\":0,\n    \"RR1\":1,\n    \"RR2\":2,\n    \"RR3\":3,\n    \"RR4\":4,\n    \"RR5\":5,\n}\n\narea = {\n    \"\u570b\u5167\u52df\u96c6,\u6295\u8cc7\u570b\u5167\":1,\n    \"\u570b\u5167\u52df\u96c6,\u6295\u8cc7\u570b\u5167\u5916\":2,\n    \"\u570b\u5916\u52df\u96c6,\u6295\u8cc7\u570b\u5167\":3,   \n}\n\nOorC = {\n    \"\u5c01\u9589\":0,\n    \"\u958b\u653e\":1,\n}\n\n# adjust sting data to Ordinal encoding data\nfund_adj = fund.copy()\nfund_adj[\"fld015\"] = fund_adj[\"fld015\"].apply(lambda x: area.get(x))\nfund_adj[\"fld016_c\"] = fund_adj[\"fld016_c\"].apply(lambda x: style.get(x))\nfund_adj[\"risk\"] = fund_adj[\"risk\"].apply(lambda x: risk.get(x))\nfund_adj[\"fld014_c\"] = fund_adj[\"fld014_c\"].apply(lambda x: OorC.get(x))<\/code><\/pre>\n\n\n\n<p>Next, because we need to visualize the similarity of funds by the diagram, we will rescale the value of \u201cInitial asset size\u2018\u2018 by normalization. After normalization, the size value will be rescaled from 0 to 17 since the maximum of the ordinal number is 16. Now \u201cInitial asset size\u2018\u2018 no longer contains an actual monetary value; it just reflects the ratio.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code># min-max normalization\nsize = np.array(fund_adj[\"fld021\"].fillna(0))\nsize = (size - size.min()) \/ (size.max() - size.min())*len(style)\nfund_adj[\"fld021\"] = size<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1xhTosAFPk3GEbskT-ctMvA.png\" alt=\"\u57fa\u91d1\u8cc7\u6599\"\/><figcaption class=\"wp-element-caption\">\u9810\u8655\u7406\u5f8c\u8cc7\u6599\u8868\u9810\u89bd<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Radar_Chart\"><\/span>Radar Chart<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>We select the funds whose \u201ccurrency\u201d is \u201cTWD\u201d to demonstrate visualization, choosing a radar chart to show the difference of funds.<br>Let\u2019s just randomly pick ten funds to generate the chart.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>fund = fund[fund[\"currency\"].str.contains(\"TWD\")]\n\n# randomly pick 10 funds\n# set the random state\nisin_lst = list(fund[\"isin\"].unique())\n\nrandom.seed(1)\nrandom_isin_lst = random.sample(isin_lst, 10, )\n\ncheck_lst = random_isin_lst\ncategories = ['\u958b\u653e\u578b\u57fa\u91d1\/\u5c01\u9589\u578b\u57fa\u91d1','\u52df\u96c6\u5730\u53ca\u6295\u8cc7\u5340\u57df','\u6295\u8cc7\u6a19\u7684',\n              '\u98a8\u96aa', '\u6210\u7acb\u8cc7\u7522']\n\nfig = go.Figure()\ndata_lst = []\nfor num, isin in enumerate(check_lst):\n    \n    data = list(fund_adj[fund_adj[\"isin\"] == isin][[\"fld014_c\", \"fld015\", \"fld016_c\", \"risk\", \"fld021\"]].iloc[0, :])\n    data_lst.append(data)\n    \n    fig.add_trace(go.Scatterpolar(\n          r=data,\n          theta=categories,\n          fill='toself',\n          name=isin\n    ))\n    \n    \nfig.update_layout(\n  polar=dict(\n    radialaxis=dict(\n      visible=True,\n      range=[0, len(style)]\n    )),\n  showlegend=True\n)\n\nfig.show()<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img decoding=\"async\" width=\"985\" height=\"525\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/image-115.png\" alt=\"\u57fa\u91d1\u76f8\u4f3c\u5ea6\" class=\"wp-image-9629\" srcset=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/image-115.png 985w, https:\/\/www.tejwin.com\/wp-content\/uploads\/image-115-300x160.png 300w, https:\/\/www.tejwin.com\/wp-content\/uploads\/image-115-150x80.png 150w, https:\/\/www.tejwin.com\/wp-content\/uploads\/image-115-768x409.png 768w\" sizes=\"(max-width: 985px) 100vw, 985px\" \/><figcaption class=\"wp-element-caption\">Radar Chart<\/figcaption><\/figure>\n\n\n\n<p>A radar chart can readily and clearly help us understand the difference among funds, but it can not offer us a specific value of how similar they are. Hence, in the next part, we will use the\u00a0<strong>Euclidean distance\u00a0<\/strong>and<strong>\u00a0<\/strong>the\u00a0<strong>Cosine Similarity\u00a0<\/strong>to solve this problem.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Euclidean_Distance\"><\/span>Euclidean Distance<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The Euclidean distance is a common way to measure the straight distance of any two points in multi-dimensions; webuild The Euclidean distance formula via Python.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1fNQ9aO13Wn-DPVdL2N9MtA.png\" alt=\"\u6b50\u5f0f\u8ddd\u96e2\"\/><figcaption class=\"wp-element-caption\">The Euclidean distance formula<\/figcaption><\/figure>\n\n\n\n<pre class=\"wp-block-preformatted\"><code># calculate Euclidean Distance of each couple funds\nED_matrix = np.empty(shape=[len(random_isin_lst), len(random_isin_lst)])\nfor i in range(len(data_lst)):\n    for j in range(len(data_lst)):\n        dist = math.dist(data_lst[i], data_lst[j])\n        ED_matrix[i, j] = round(dist,5)\nprint(ED_matrix)\nsns.heatmap(ED_matrix, xticklabels = random_isin_lst, yticklabels = random_isin_lst, annot=True, cmap = \"flare\")<\/code><\/pre>\n\n\n\n<p>By doing so, we already complete the calculation of the Euclidean Distance. However, it is not easy to read the result when it presents as a matrix. We introduce another chart, \u201cheatmap,\u201d to make it better.<\/p>\n\n\n\n<figure class=\"wp-block-image caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1W0lvugd4TrCXK2BQhzSGnw.png\" alt=\"\u57fa\u91d1\u6b50\u5f0f\u8ddd\u96e2\u76f8\u95dc\u77e9\u9663\"\/><figcaption class=\"wp-element-caption\">correlation matrix of the Euclidean Distance<\/figcaption><\/figure>\n\n\n\n<p>Without standardization or normalization, the value of the Euclidean distance doesn\u2019t have a maximum, and the minimum is 0, which means totally the same. And the larger value, the longer distance between the two elements, representing the degree of discrepancy of funds. In the heatmap, the x-axis and y-axis are ISIN codes, and the diagonal from upper-left to lower-right match two same ISIN codes, so the distance will always be 0.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1ZhDoaUMKkgmwztMXJ_45Rg.png\" alt=\"\u57fa\u91d1\u6b50\u5f0f\u8ddd\u96e2\u71b1\u529b\u5716\" width=\"410\" height=\"319\"\/><figcaption class=\"wp-element-caption\">\u57fa\u91d1\u6b50\u5f0f\u8ddd\u96e2\u71b1\u529b\u5716<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Cosine_Similarity\"><\/span>Cosine Similarity<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Cosine Similarity evaluates how much any two are similar by measuring two vectors\u2019 cosine values. The cosine value will locate between 1 and -1. 0-degree angle\u2019s cosine value is 1, representing absolutely identical. The cosine value could indicate whether two vectors point in the same direction.<\/p>\n\n\n\n<figure class=\"wp-block-image caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1oOD0U00Tgy4sUccZn3Qhxg.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Cosine Similarity Formula<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1b4XIJ3lhxecohx1VQ_Ehpw.jpg\" alt=\"\u9918\u5f26\u76f8\u4f3c\u5ea6\"\/><figcaption class=\"wp-element-caption\">cosine value in geometric space<\/figcaption><\/figure>\n\n\n\n<p>Let\u2019s codify the formula. The method is as same as what we did to the Euclidean distance. A thing worth talking about is that the Cosine Similarity won\u2019t change by the size of the vector because the formula of the Cosine Similarity, fortunately, has a process that acquires the same function as normalization.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code># The measure of cosine similarity will not be affected by the size of the vector\nCS_matrix = np.empty(shape=[len(random_isin_lst), len(random_isin_lst)])\nfor i in range(len(data_lst)):\n    for j in range(len(data_lst)):\n        A = np.array(data_lst[i])\n        B = np.array(data_lst[j])\n        cosine = np.dot(A,B)\/(norm(A)*norm(B))\n        CS_matrix[i, j] = round(cosine,5)\nprint(CS_matrix)  \nsns.heatmap(CS_matrix, xticklabels = random_isin_lst, yticklabels = random_isin_lst, annot=True, cmap = \"flare_r\")<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1HyLpHvx3FEVP55-S3oLyvA.png\" alt=\"\u9918\u5f26\u76f8\u4f3c\u5ea6\" width=\"625\" height=\"349\"\/><figcaption class=\"wp-element-caption\">correlation matrix of the Cosine Similarity<\/figcaption><\/figure>\n\n\n\n<p>At this point, observant friends should have noticed that the Euclidean distance is smaller for more similar values, while cosine similarity is larger for more similar values. They exhibit an inverse relationship, so when presenting them in a heatmap, it is more convenient to reverse the color scheme for comparing the two calculation results. Therefore, in the visualization code for Euclidean distance, the cmap = \u201cflare\u201d should be changed to cmap = \u201cflare_r\u201d.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/10z5YX_S_92uzpKue85R33g.png\" alt=\"\u9918\u5f26\u76f8\u4f3c\u5ea6\"\/><figcaption class=\"wp-element-caption\">heatmap of cosine similarity<\/figcaption><\/figure>\n\n\n\n<p>Comparing the two images, the overall distribution trends correspond to each other. In fact, the Euclidean distance is equivalent to cosine similarity.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/13_hb_K395_9mvaaBBrUywA.png\" alt=\"\u9918\u5f26\u76f8\u4f3c\u5ea6\"\/><figcaption class=\"wp-element-caption\">Proof of Equivalence between Cosine Similarity and Euclidean Distance<\/figcaption><\/figure>\n\n\n\n<p id=\"fb1c\">Let\u2019s assume there are two points in space: A and B. By normalizing A and B separately, we obtain unit vectors. The cosine similarity between the two unit vectors can be calculated, and since the denominator is 1, it can be omitted in this case. We can also calculate the Euclidean distance between the two unit vectors and simplify the equation to prove the equivalence.<\/p>\n\n\n\n<p id=\"bdde\">So, what is the difference between Euclidean distance and cosine similarity?<\/p>\n\n\n\n<p id=\"2115\">For Euclidean distance, it calculates the straight-line distance between two points. Therefore, when the trends of the two points are similar, but their vector lengths differ, the Euclidean distance cannot reflect this similarity. For example, in this implementation, if two funds have a high degree of similarity, but one fund has a more significant asset size. In comparison, the other has a smaller size, the Euclidean distance will still show a large discrepancy due to the difference in asset size.<\/p>\n\n\n\n<p id=\"bcb4\">On the other hand, cosine similarity calculates the cosine of the angle between two vectors. Therefore, if two funds have a similar nature, their angle will be small, effectively indicating their similarity.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1ZL_75DZiPklB3T8vnrPb-w.png\" alt=\"\u6b50\u5f0f\u8ddd\u96e2\u8207\u9918\u5f26\u76f8\u4f3c\u5ea6\"\/><figcaption class=\"wp-element-caption\">Euclidean distance and cosine similarity in geometric space<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span><strong><strong>Conclusion<\/strong><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"f584\">In the comparison results of similarity, we can observe that the majority of domestic funds exhibit a very high level of similarity. This is likely due to the fact that the current implementation only considers some basic information about the funds, reflecting their initial states when they were established. Readers have the flexibility to incorporate additional information for their calculations, such as returns, expense ratios, and more.&nbsp;<a href=\"https:\/\/api.tej.com.tw\/\" rel=\"noreferrer noopener\" target=\"_blank\">TEJ API<\/a>&nbsp;provides comprehensive fund data and various access methods, allowing readers to customize their own comparison modules according to their preferences.<\/p>\n\n\n\n<p id=\"61b7\"><strong>Please note that this introduction and the underlying asset are for reference only and do not represent any recommendations for commodities or investments.<\/strong>&nbsp;We will also introduce the use of TEJ\u2019s database to construct various option pricing models. Therefore, readers interested in options trading are welcome to purchase relevant solutions from&nbsp;<a href=\"https:\/\/eshop.tej.com.tw\/E-Shop\/index\" rel=\"noreferrer noopener\" target=\"_blank\">TEJ E-Shop<\/a>&nbsp;and use high-quality databases to construct pricing models that are suitable for them.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Source_Code\"><\/span>Source Code<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/github.com\/tejtw\/TEJAPI_Python_Medium_DataAnalysis\/blob\/main\/fund_similarity.ipynb\" class=\"ek-link\" target=\"_blank\" rel=\"noopener\">Github<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"43b5\"><span class=\"ez-toc-section\" id=\"Extended_Reading\"><\/span>Extended Reading<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"5af5\">\u25cf&nbsp;<a href=\"https:\/\/medium.com\/tej-api-financial-data-anlaysis\/data-analysis-gru-and-lstm-31d73954dc23\" target=\"_blank\" rel=\"noopener\">\u3010Data Analysis\u3011GRU and LSTM<\/a><br>\u25cf&nbsp;<a href=\"https:\/\/medium.com\/tej-api-financial-data-anlaysis\/quant-black-scholes-model-and-greeks-f00dc82bcb81\" target=\"_blank\" rel=\"noopener\">\u3010Quant\u3011Black Scholes model and Greeks<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"e2ab\"><span class=\"ez-toc-section\" id=\"Related_Link\"><\/span>Related Link<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"0be1\">\u25cf&nbsp;<a href=\"https:\/\/api.tej.com.tw\/index.html\" rel=\"noreferrer noopener\" target=\"_blank\">TEJ API<\/a><br>\u25cf&nbsp;<a href=\"https:\/\/eshop.tej.com.tw\/E-Shop\/Edata_intro\" rel=\"noreferrer noopener\" target=\"_blank\">TEJ E-Shop<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Investments always come with risks, and not all are inevitable. Through risk diversification, the distribution of different asset types could help investors efficiently manage risk and reduce the influence of market volatility on their portfolios. Today\u2019s article will mainly discuss how to use data, comparing the similarity of funds from a scientific point of view.<\/p>\n","protected":false},"featured_media":12148,"template":"","tags":[2962,3160],"insight-category":[690,50],"class_list":["post-12147","insight","type-insight","status-publish","has-post-thumbnail","hentry","tag-market-data","tag-tej-api-2","insight-category-data-analysis","insight-category-fintech"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight\/12147","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight"}],"about":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/types\/insight"}],"version-history":[{"count":1,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight\/12147\/revisions"}],"predecessor-version":[{"id":43572,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight\/12147\/revisions\/43572"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/media\/12148"}],"wp:attachment":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/media?parent=12147"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/tags?post=12147"},{"taxonomy":"insight-category","embeddable":true,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight-category?post=12147"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}