{"id":17862,"date":"2023-02-23T09:24:00","date_gmt":"2023-02-23T01:24:00","guid":{"rendered":"https:\/\/www.tejwin.com\/?post_type=insight&#038;p=17862"},"modified":"2023-09-07T14:31:52","modified_gmt":"2023-09-07T06:31:52","slug":"pca-feature-portfolio","status":"publish","type":"insight","link":"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/","title":{"rendered":"PCA Feature Portfolio"},"content":{"rendered":"\n<p id=\"ceae\">Optimizing investment portfolios using PCA (Principal Component Analysis)<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1oAOMDP0-pDGV2U7raYe6uQ.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Photo by&nbsp;<a href=\"https:\/\/unsplash.com\/@kmuza\" rel=\"noreferrer noopener\" target=\"_blank\">Carlos Muza<\/a>&nbsp;on&nbsp;<a href=\"https:\/\/unsplash.com\/s\/photos\/finance\" rel=\"noreferrer noopener\" target=\"_blank\">Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_81 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a142c8849408\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"ez-toc-cssicon\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a142c8849408\"  aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/#Summary_of_key_points_of_this_article\" >Summary of key points of this article<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/#Introdution\" >Introdution<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/#The_key_points_of_this_article\" >The key points of this article<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/#Editing_Environment_and_Module_Requirements\" >Editing Environment and Module Requirements<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/#Database_Usage\" >Database Usage<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/#Data_Loading\" >Data Loading<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/#Data_Cleaning\" >Data Cleaning<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/#Data_Visualization\" >Data Visualization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/#Data_Standardization\" >Data Standardization<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/#PCA\" >PCA<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/#Creating_an_Investment_Portfolio_using_PCA\" >Creating an Investment Portfolio using PCA<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/#Searching_for_the_Optimal_PCA_Portfolio\" >Searching for the Optimal PCA Portfolio<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/#Full_Code\" >Full Code<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/#Further_Reading\" >Further Reading<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/www.tejwin.com\/en\/insight\/pca-feature-portfolio\/#Related_Link\" >Related Link<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\" id=\"b23c\"><span class=\"ez-toc-section\" id=\"Summary_of_key_points_of_this_article\"><\/span>Summary of key points of this article<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"4c4d\">Article Difficulty\uff1a\u2605\u2605\u2605\u2605\u2606<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"9271\"><span class=\"ez-toc-section\" id=\"Introdution\"><\/span>Introdution<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The essence of mathematics is not to complicate simple things but to simplify complex things.&#8221; &#8211; Stan Gudder<\/p>\n\n\n\n<p>Principal Component Analysis (PCA), a crucial technique in unsupervised learning, is widely used in the fields of machine learning and statistics to analyze data and reduce data dimensionality. Its core idea is to break down the original data into representative principal components, achieving dimensionality reduction and providing a new description of the data.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"9573\"><span class=\"ez-toc-section\" id=\"The_key_points_of_this_article\"><\/span>The key points of this article<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p><br>The main purpose of this study is to utilize daily stock return data, apply PCA to obtain principal components, and construct an investment portfolio. Readers of this article will see the following key points:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Understanding the eigenvalues and eigenvectors of PCA and using them to design an investment portfolio.<\/li>\n\n\n\n<li>Methods for backtesting portfolio performance, applicable to various investment strategies.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"abd2\"><span class=\"ez-toc-section\" id=\"Editing_Environment_and_Module_Requirements\"><\/span>Editing Environment and Module Requirements<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"18a0\">This article uses Windows OS and employs Jupyter as the editor.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import tejapi<br>import pandas as pd<br>import numpy as np<br>import matplotlib.pyplot as plt<br>import seaborn as sns<br>from sklearn.preprocessing import StandardScaler<br>from sklearn.decomposition import PCA<br><br>tejapi.ApiConfig.api_key = \"Your Key\"<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"59a7\"><span class=\"ez-toc-section\" id=\"Database_Usage\"><\/span>Database Usage<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>0050 Index Constituent Data Set \u2014 Listed OTC Index (TWN\/EWISAMPLE)<\/p>\n\n\n\n<p>0050 Stock price return (day) &#8211; rate of return (TWN\/APRCD2)<\/p>\n\n\n\n<p>0050 Adjustment of stock price (day) \u2014 ex-dividend adjustment (TWN\/APRCD1)<\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"879e\"><span class=\"ez-toc-section\" id=\"Data_Loading\"><\/span>Data Loading<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"2bd1\">Loading Index Data Period: 2013.01.01\u20132022.11.24 Loading 0050 Constituent Stocks with filtering based on the &#8220;end_date&#8221; column to select stocks that are currently part of the constituents.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">mdate = {'gte':'2000-01-01', 'lte':'2022-11-24'}<br>data = tejapi.get('TWN\/EWISAMPLE',<br>                  idx_id = \"IX0002\",<br>                  start_date = mdate,<br>                          paginate=True)<br><br>data1 = data[data[\"end_date\"] &lt; \"2022-11-24\"]<br>diff_data = pd.concat([data,data1,data1]).drop_duplicates(keep=False)<br>coid = list(diff_data[\"coid\"])<br>print(len(coid))<br>diff_data<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1iexpA-FYzPaRapG9SzR-LQ.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"8a49\"><strong>0050 Loading Returns Data<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">for i in range(0,len(coid)):<br>    print(i)<br>    if i == 0:<br>        df = tejapi.get('TWN\/EWPRCD2',<br>                                  coid = coid[i],<br>                                  mdate = {'gte':'2013-01-01', 'lte':'2022-11-24'},<br>                                  paginate=True)<br>        df.set_index(df[\"mdate\"],inplace=True)<br>        Df = pd.DataFrame({coid[i]:df[\"roia\"]})<br>    else:<br>        df = tejapi.get('TWN\/EWPRCD2',<br>                                  coid = coid[i],<br>                                  mdate = {'gte':'2013-01-01', 'lte':'2022-11-24'},<br>                                  paginate=True)<br>        df.set_index(df[\"mdate\"],inplace=True)<br>        Df1 = pd.DataFrame({coid[i]:df[\"roia\"]})<br>        Df = pd.merge(Df,Df1[coid[i]],how='left', left_index=True, right_index=True)<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1btUH44re5rNlbRI5XqbsVw.png\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"35d7\"><span class=\"ez-toc-section\" id=\"Data_Cleaning\"><\/span>Data Cleaning<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The return rate information of ASE Investment Holdings (3711) was only available after 2018\/04\/30 and was deleted.<br>Shanghai Commercial Savings Bank ( 5876 ) was listed after 2014\/09\/25 and only had rate of return information, so it was excluded.<br>Silicon Power-KY (6415) was listed after 2013\u201312\u201312, and the rate of return data is excluded.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">del Df[\"3711\"]<br>del Df[\"5876\"]<br>del Df[\"6415\"]<\/pre>\n\n\n\n<p>Therefore, this article focuses on the 47 constituent stocks of the Taiwan 0050 Index up to 2022\/11\/24, excluding the three stocks mentioned above.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"42fa\"><span class=\"ez-toc-section\" id=\"Data_Visualization\"><\/span>Data Visualization<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p><br>First, we need to have a basic understanding of the dataset. By observing the correlations between the returns of each constituent stock, we can see a significant positive correlation among daily returns. Therefore, the data can be represented in a lower dimension, which is less than the current 47 dimensions.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">cor = Df.corr()<br>plt.figure(figsize=(30,30))<br>plt.title(\"Correlation Matrix\")<br>sns.heatmap(cor, vmax=1,square=True,annot=True,cmap=\"cubehelix\")<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1elAvDiWlr859CKfoAaunTA.png\" alt=\"\"\/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"e656\"><span class=\"ez-toc-section\" id=\"Data_Standardization\"><\/span>Data Standardization<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"1b82\">Before building the model, we do not know the importance of each feature in the dataset, which can lead to a significant loss of information. Therefore, standardizing each feature to have the same range of values is necessary, followed by applying PCA.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">scale = StandardScaler().fit(Df)<br>rescale = pd.DataFrame(scale.fit_transform(Df),columns=Df.columns,index=Df.index)<br>#\u6a19\u6e96\u5316\u8996\u89ba\u5316<br>plt.figure(figsize=(20,5))<br>plt.title(\"2330_Return\")<br>rescale[\"2330\"].plot()<br>plt.grid=True<br>plt.legend()<br>plt.show()<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1sjZ0IqlK_SYR1XmpAJD_AQ.png\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"73a6\"><span class=\"ez-toc-section\" id=\"PCA\"><\/span>PCA<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"0cb3\"><strong>Model Setup<br><\/strong>We aim to reduce the original 47-dimensional data to 10 dimensions, representing the original data using 10 principal components.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">n_components = 10<br>pca = PCA(n_components=n_components)<br>Pc = pca.fit(X_train)<\/pre>\n\n\n\n<p id=\"4e9e\"><strong>PCA Explaining Variables<\/strong><\/p>\n\n\n\n<p id=\"3f66\">The first principal component represents the largest variance in the original data, the second principal component represents the second-largest variance, and so on in descending order of variance.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">fig, axes = plt.subplots(ncols=2)<br>Series1 = pd.Series(Pc.explained_variance_ratio_[:n_components ]).sort_values()<br>Series2 = pd.Series(Pc.explained_variance_ratio_[:n_components ]).cumsum()<br><br>Series1.plot.barh(title=\"Explained Variance\",ax=axes[0])<br>Series2.plot(ylim=(0,1),ax=axes[1],title=\"Cumulative Explained Variance\")<br>print(\"\u8b8a\u6578\u7d2f\u7a4d\u89e3\u91cb\u6bd4\u4f8b\uff1a\")<br>print(Series2[len(Series2)-1:len(Series2)].values[0])<br>print(\"\u5404\u8b8a\u6578\u89e3\u91cb\u6bd4\u4f8b\uff1a\")<br>print(Series1.sort_values(ascending=False))<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1FO0XVK4rrM-Tl39pU53xCw.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Cumulative Explained Variance Ratio<\/figcaption><\/figure>\n\n\n\n<p>From the left chart, it can be seen that the first 10 principal components explain the variance. The first principal component alone accounts for 35% of the variance in the original data, which means it explains 35% of the daily return variations in the 47 constituent stocks. This dominant principal component is often referred to as the &#8220;market&#8221; factor.<\/p>\n\n\n\n<p>From the right chart, it can be observed that the first 10 principal components collectively explain approximately 60% of the variance in the daily returns of these 47 stocks.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"fd9b\"><span class=\"ez-toc-section\" id=\"Creating_an_Investment_Portfolio_using_PCA\"><\/span><strong>Creating an Investment Portfolio using PCA<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"6c90\"><strong>Setting Portfolio Weights<br><\/strong>In the previous step, we examined the explained variance by the principal components. Next, we will explore the correlation of the original data, which consists of 47 stocks, with these 10 principal components. We will use this information to design the portfolio weights.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">n_components = 10<br>weights = pd.DataFrame()<br>for i in range(n_components):<br>    weights[\"weights_{}\".format(i)] = pca.components_[i] \/ sum(pca.components_[i])<br>weights = weights.values.T<br>weight_port = pd.DataFrame(weights,columns=Df.columns)<br>weight_port.index = [f'Portfolio{i}' for i in range(weight_port.shape[0])]<br>weight_port<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/11srkuYE91zk9YuGVJqOXJQ.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"f217\"><strong>Explaining the Portfolio Weighting Method<br><\/strong>The first principal component explains 35% of the variance. Let&#8217;s examine the correlation of each variable (47 constituent stocks) with the first principal component.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1QDWSGYYcfUoXOR33qj-QyA.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"d340\">As seen from the array, the correlation of all 47 constituent stocks with the first principal component is in the same direction (all negative), and the differences in magnitude are not significant. This further validates our previous explanation that the first principal component represents the &#8220;market&#8221; factor.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">weight_port.iloc[0].T.sort_values(ascending=False).plot.bar(subplots=True,figsize=(20,5),<br>                          legend=False,sharey=True,ylim=(-0.75,0.75))<\/pre>\n\n\n\n<p id=\"6cd3\">Next, we calculate the portfolio weights by taking the correlation of each stock divided by the sum of correlations of all stocks.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1ktoEdEOTfxR-vdXFLwZuhg.png\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1xJ41g0iR-uU8CAh3D16mlw.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"a0a8\"><strong>Plotting the Weights of the Top Five Principal Components in the Portfolio<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">weight_port[:5].T.plot.bar(subplots=True,layout = (int(5),1),figsize=(20,25),<br>                      legend=False,sharey=True,ylim=(-2,2))<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1_z7C5kw5l92eekTiIvw7FQ.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"d5b7\"><strong>Examining the Classification Logic of Other Principal Components<\/strong><\/p>\n\n\n\n<p id=\"995c\">Portfolio 1<\/p>\n\n\n\n<p id=\"531f\">The top three are Nanya Branch (2408), Yageo (2327), and Airtec KY (1590); the last three are Far EasTone (4904), Taiwan University (3045), and Chunghwa Electronics (2412). Out of Portfolio 1, the weighting of electronic stocks is higher, while the weighting of transmission and telecommunications stocks is lower.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1VsJy8gQHh27CF6x0goNUpA.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"5b91\">Portfolio 2<\/p>\n\n\n\n<p id=\"4d5b\">The top three stocks in Portfolio 2 are actually the three major telecommunications companies, while the later stocks are mostly from the financial sector. This suggests that Portfolio 2 is a non-financial investment portfolio.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1bc5RRns1gAXEyts_4-tMLA.png\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"a798\"><span class=\"ez-toc-section\" id=\"Searching_for_the_Optimal_PCA_Portfolio\"><\/span>Searching for the Optimal PCA Portfolio<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>We are using the Sharpe Ratio as our metric, which is a crucial indicator for measuring the performance and stability of an investment portfolio in fund investment or asset allocation. It represents &#8220;how much return can be obtained while enduring 1% of risk?&#8221;<\/p>\n\n\n\n<p>In this article, the Sharpe Ratio is calculated as follows: <\/p>\n\n\n\n<p><strong>Sharpe Ratio = Annualized Return \/ Annualized Risk<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">def sharpe_ratio(ts_returns):<br>    ts_returns = ts_returns<br>    days = ts_returns.shape[0]<br>    n_years = days\/252<br>    if ts_returns.cumsum()[-1] &lt; 0:  <br>        annualized_return = (np.power(1+abs(ts_returns.cumsum()[-1])*0.01,1\/n_years)-1)*(-1)<br>    else:<br>        annualized_return = np.power(1+abs(ts_returns.cumsum()[-1])*0.01,1\/n_years)-1<br>    annualized_vol = (ts_returns*0.01).std()*np.sqrt(252)<br>    annualized_sharpe = annualized_return \/ annualized_vol<br>    <br>    return annualized_return,annualized_vol,annualized_sharpe<\/pre>\n\n\n\n<p id=\"aab4\"><strong>\u9078\u51faTop5 Portfolio<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">n_components = 10\nannualized_ret = np.array([0.]*n_components)\nsharpe_metric = np.array([0.]*n_components)\nannualized_vol = np.array([0.]*n_components)\ncoids = X_train.columns.values\nn_coids = len(coids)\n\npca = PCA(n_components=n_components)\nPc = pca.fit(X_train)\npcs = pca.components_\nfor i in range(n_components):\n    pc_w = pcs[i] \/ sum(pcs[i])\n    eigen_port = pd.DataFrame(data={\"weights\":pc_w.squeeze()},index=coids)\n    eigen_port.sort_values(by=[\"weights\"],ascending=False,inplace=True)\n    #The daily portfolio return is obtained by taking the dot product of the portfolio weights and the daily returns of each constituent stock.\n    eigen_port_returns = np.dot(X_train.loc[:,eigen_port.index],eigen_port[\"weights\"])\n    eigen_port_returns = pd.Series(eigen_port_returns.squeeze(),\n                                   index = X_train.index)\n    \n    ar,vol,sharpe = sharpe_ratio(eigen_port_returns)\n    \n    annualized_ret[i] = ar\n    annualized_vol[i] = vol\n    sharpe_metric[i] = sharpe\n\nsharpe_metric = np.nan_to_num(sharpe_metric)\n\nN=5\nresult = pd.DataFrame({\"Annual Return\":annualized_ret,\"Vol\":annualized_vol,\"Sharpe\":sharpe_metric})\nresult.dropna(inplace=True)\n#Sharpe Ratio of PCA portfolio\nax = result[:N][\"Sharpe\"].plot(linewidth=3,xticks=range(0,N,1))\nax.set_ylabel(\"Sharpe\")\nresult.sort_values(by=[\"Sharpe\"],ascending=False,inplace=True)\nprint(result[:N])<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/12AdUdd-BoQxEOvfxEQPfFA.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"b2ad\">Drawing a Trend Chart of Portfolio Returns Over the Investment Period<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">def Backtest(i,data):<br>    pca = PCA()<br>    Pc = pca.fit(data)<br>    pcs = pca.components_<br>    pc_w = pcs[i] \/ sum(pcs[i])<br>    eigen_port = pd.DataFrame(data={\"weights\":pc_w.squeeze()},index=coids)<br>    eigen_port.sort_values(by=[\"weights\"],ascending=False,inplace=True)<br>    #\u6b0a\u91cd\u8207\u6bcf\u5929\u5831\u916c\u53d6\u5167\u7a4d\u5f97\u51fa\u6bcf\u65e5\u6295\u8cc7\u7d44\u5408\u5831\u916c<br>    eigen_port_returns = np.dot(data.loc[:,eigen_port.index],eigen_port[\"weights\"])<br>    eigen_port_returns = pd.Series(eigen_port_returns.squeeze(),<br>                                   index = data.index)<br>    <br>    ar,vol,sharpe = sharpe_ratio(eigen_port_returns)<br>    return eigen_port_returns,ar,vol,sharpe<\/pre>\n\n\n\n<p id=\"23d9\"><strong>Visualizing the Trend of Portfolio Returns<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">def Weight_plot(i):<br>    top_port = weight_port.iloc[[i]].T<br>    port_name = top_port.columns.values.tolist()<br>    top_port.sort_values(by=port_name,ascending=False,inplace=True)<br>    ax = top_port.plot(title = port_name[0],xticks=range(0,len(coids),1),<br>                  figsize=(15,6),<br>                  rot=45,linewidth=3)<br>    ax.set_ylabel(\"Portfolio Weight\")<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">portfolio = 0<br>train_returns,train_ar,train_vol,train_sharpe = Backtest(portfolio,X_train)<br>ax = train_returns.cumsum().plot(rot=45)<br>ax.set_ylabel(\"Accumulated Return(%)\")<br>Weight_plot(portfolio)<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1a18i7raBXUAJ0eDnRbQ8EA.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"b7ba\"><strong>Summary<\/strong><\/p>\n\n\n\n<p>The provided backtesting method for the portfolio constructed using PCA shows that its performance is not favorable. This outcome was somewhat expected since PCA is primarily used for portfolio classification based on return correlations and does not guarantee good returns on its own.<\/p>\n\n\n\n<p>This article provided an analysis of the Taiwan 50 Index (with three stocks excluded due to data availability) using PCA. It reduced the original 47 stocks to 10 principal components and constructed portfolio weights based on the correlation between the principal components and individual stocks. The article discussed each principal component, highlighting the presence of the dominant &#8220;market&#8221; factor and the logical classification ability of PCA. However, providing a detailed interpretation of the meaning of each principal component can be challenging.<\/p>\n\n\n\n<p>In conclusion, it&#8217;s important to reiterate that t<strong>he assets mentioned in this article are for illustrative purposes only and do not constitute any recommendations or advice on financial products. <\/strong>Therefore, readers interested in strategy development, performance testing, empirical research, and related topics are welcome to explore solutions available in the <a href=\"https:\/\/eshop.tej.com.tw\/E-Shop\/index\" target=\"_blank\" rel=\"noreferrer noopener\">TEJ E Shop<\/a>, which offers comprehensive databases for conducting various analyses and tests.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"e4b6\"><span class=\"ez-toc-section\" id=\"Full_Code\"><\/span>Full Code<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/gist.github.com\/tej87681088\/a9a9aeda9ad707d542ec667d4f916ee2#file-tejapi_python_pca-py\" class=\"ek-link\" target=\"_blank\" rel=\"noopener\">Click here to go<\/a><a href=\"https:\/\/gist.github.com\/tej87681088\/a9a9aeda9ad707d542ec667d4f916ee2#file-tejapi_python_pca-py\" class=\"ek-link\" target=\"_blank\" rel=\"noopener\">Github<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"9241\"><span class=\"ez-toc-section\" id=\"Further_Reading\"><\/span>Further Reading<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/medium.com\/tej-api-%E9%87%91%E8%9E%8D%E8%B3%87%E6%96%99%E5%88%86%E6%9E%90\/%E9%87%8F%E5%8C%96%E5%88%86%E6%9E%90-%E7%B1%8C%E7%A2%BC%E9%9B%86%E4%B8%AD%E5%BA%A6-b01c9d4b0593\" class=\"ek-link\" target=\"_blank\" rel=\"noopener\">Chip Concentration<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.tejwin.com\/insight\/%e3%80%90%e5%af%a6%e6%88%b0%e6%87%89%e7%94%a8%e3%80%91%e7%be%8a%e7%be%a4%e6%8c%87%e6%a8%99%e6%87%89%e7%94%a8\/\" class=\"ek-link\">Herd Indicator Application<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"69fb\"><span class=\"ez-toc-section\" id=\"Related_Link\"><\/span>Related Link<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/api.tej.com.tw\/index.html\" class=\"ek-link\" target=\"_blank\" rel=\"noopener\">TEJ API&nbsp;Database Home Page<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/eshop.tej.com.tw\/E-Shop\/Edata_intro\" class=\"ek-link\" target=\"_blank\" rel=\"noopener\">TEJ E-Shop Complete Database Purchase<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Principal Component Analysis (PCA) is a key technique in unsupervised learning widely used in machine learning and statistics to analyze data and reduce data dimensionality. Its core idea is to break down the original data into representative principal components, achieving dimensionality reduction and providing a new description of the data.<\/p>\n","protected":false},"featured_media":9716,"template":"","tags":[2573,2944,2615,2620,2371,2428],"insight-category":[50,3509],"class_list":["post-17862","insight","type-insight","status-publish","has-post-thumbnail","hentry","tag-data-science","tag-historical-backtesting","tag-pca","tag-portfolio","tag-python","tag-2428","insight-category-fintech","insight-category-fintech-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight\/17862","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight"}],"about":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/types\/insight"}],"version-history":[{"count":0,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight\/17862\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/media\/9716"}],"wp:attachment":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/media?parent=17862"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/tags?post=17862"},{"taxonomy":"insight-category","embeddable":true,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight-category?post=17862"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}