{"id":16593,"date":"2021-12-07T02:40:01","date_gmt":"2021-12-06T18:40:01","guid":{"rendered":"https:\/\/www.tejwin.com\/?post_type=insight&#038;p=16593"},"modified":"2024-07-03T17:28:52","modified_gmt":"2024-07-03T09:28:52","slug":"stock-selection-by-random-forest-algorithm","status":"publish","type":"insight","link":"https:\/\/www.tejwin.com\/en\/insight\/stock-selection-by-random-forest-algorithm\/","title":{"rendered":"Stock Selection by Random Forest Algorithm"},"content":{"rendered":"\n<p>Backtesting and stock-picking strategy with machine learning<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1_0eK829n2tjA9-winl.jpg\" alt=\"\" style=\"width:837px;height:558px\"\/><figcaption class=\"wp-element-caption\">Photo by&nbsp;<a href=\"https:\/\/unsplash.com\/@jaymantri?utm_source=medium&amp;utm_medium=referral\" rel=\"noreferrer noopener\" target=\"_blank\">Jay Mantri<\/a>&nbsp;on&nbsp;<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" rel=\"noreferrer noopener\" target=\"_blank\">Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_81 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-69f66910902d7\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"ez-toc-cssicon\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-69f66910902d7\"  aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.tejwin.com\/en\/insight\/stock-selection-by-random-forest-algorithm\/#Highlights\" >Highlights<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.tejwin.com\/en\/insight\/stock-selection-by-random-forest-algorithm\/#Preface\" >Preface<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.tejwin.com\/en\/insight\/stock-selection-by-random-forest-algorithm\/#The_Editing_Environment_and_Modules_Required\" >The Editing Environment and Modules Required<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.tejwin.com\/en\/insight\/stock-selection-by-random-forest-algorithm\/#Database_Used\" >Database Used<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.tejwin.com\/en\/insight\/stock-selection-by-random-forest-algorithm\/#Data_Processing\" >Data Processing<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.tejwin.com\/en\/insight\/stock-selection-by-random-forest-algorithm\/#Model_Training\" >Model Training<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.tejwin.com\/en\/insight\/stock-selection-by-random-forest-algorithm\/#Backtesting_and_Visualization\" >Backtesting and Visualization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.tejwin.com\/en\/insight\/stock-selection-by-random-forest-algorithm\/#Optimize_Stock-Picking_Strategy\" >Optimize Stock-Picking Strategy<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.tejwin.com\/en\/insight\/stock-selection-by-random-forest-algorithm\/#Stock_Watchlist_Based_on_2021_Q3_Data\" >Stock Watchlist Based on 2021 Q3 Data<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.tejwin.com\/en\/insight\/stock-selection-by-random-forest-algorithm\/#Conclusion\" >Conclusion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.tejwin.com\/en\/insight\/stock-selection-by-random-forest-algorithm\/#Source_Code\" >Source Code<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.tejwin.com\/en\/insight\/stock-selection-by-random-forest-algorithm\/#Extended_Reading\" >Extended Reading<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.tejwin.com\/en\/insight\/stock-selection-by-random-forest-algorithm\/#Related_Link\" >Related Link<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\" id=\"7b12\"><span class=\"ez-toc-section\" id=\"Highlights\"><\/span><strong>Highlights<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Difficulty\uff1a\u2605\u2605\u2605\u2606\u2606<\/li>\n\n\n\n<li>Random forest algorithm<\/li>\n\n\n\n<li>Advice: This article aims to predict the direction of stock price movement by using random forest algorithm, further explore what impacts the stock return the most, and use them as our standards of stock\u2019s selection. We have introduced\u00a0<a href=\"https:\/\/medium.com\/tej-api-financial-data-anlaysis\/data-analysis-5-xgboost-algorithm-predicts-returns-part-1-dd5f4c40728d\" class=\"ek-link\" target=\"_blank\" rel=\"noopener\">how to predict stock return by XGBoost<\/a>\u00a0previously. And this week we\u2019ll do the backtesting and optimize our stock-picking strategy, and show the latest stock watchlist at the end.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"c0c0\"><span class=\"ez-toc-section\" id=\"Preface\"><\/span><strong>Preface<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"eb62\">To put it simply, random forest is one of algorithms made up of many decision trees with the adoption of bagging and random sampling. Since it\u2019s based on CART algorithm, it can handle both classification and continuous data. Other advantages such as its comparability with high dimensional data, high tolerance with noise, high accuracy of fitting results, so it\u2019s commonly used in business competition like Kaggle.<\/p>\n\n\n\n<p id=\"170f\">In this article, we\u2019ll treat financial data as features that are used to predict the movement of stock price, so it belongs to binary classification problem. Then see whether there\u2019s valuable information gained from the fitted model to improve our stock-picking strategy.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"9568\"><span class=\"ez-toc-section\" id=\"The_Editing_Environment_and_Modules_Required\"><\/span><strong>The Editing Environment and Modules Required<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"1a2c\">Windows OS and Jupyter Notebook<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">#Basic function<br>import pandas as pd<br>import numpy as np<br>import matplotlib.pyplot as plt#Machine learning<br>from sklearn.ensemble import RandomForestClassifier#TEJ API<br>import tejapi<br>tejapi.ApiConfig.api_key = 'Your Key'<br>tejapi.ApiConfig.ignoretz = True<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"b2d9\"><span class=\"ez-toc-section\" id=\"Database_Used\"><\/span><strong>Database Used<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/api.tej.com.tw\/columndoc.html?subId=107\" rel=\"noreferrer noopener\" target=\"_blank\">Security Document<\/a>: the code of the database is \u2018TWN\/ANPRCSTD\u2019<\/li>\n\n\n\n<li><a href=\"https:\/\/eshop.tej.com.tw\/E-Shop\/Edata_case_detail\/3?item_id=TWN%2FEWIFINQ\" rel=\"noreferrer noopener\" target=\"_blank\">Seasonal Financial Table<\/a>: the code of the database is \u2018TWN\/EWIFINQ\u2019<\/li>\n\n\n\n<li><a href=\"https:\/\/api.tej.com.tw\/columndoc.html?subId=50\" rel=\"noreferrer noopener\" target=\"_blank\">Daily Stock Return<\/a>: the code of the database is \u2018TWN\/APRCD2\u2019<\/li>\n\n\n\n<li><a href=\"https:\/\/eshop.tej.com.tw\/E-Shop\/Edata_case_detail\/4?item_id=TWN%2FEWFINDATE2\" rel=\"noreferrer noopener\" target=\"_blank\">Date of Return and Financial Data<\/a>: the code of the database is \u2018TWN\/EWFINDATE2\u2019<\/li>\n\n\n\n<li><a href=\"https:\/\/api.tej.com.tw\/columndoc.html?subId=14\" rel=\"noreferrer noopener\" target=\"_blank\">Company Information<\/a>: the code of the database is \u2018TWN\/AIND\u2019<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"0259\"><span class=\"ez-toc-section\" id=\"Data_Processing\"><\/span><strong>Data Processing<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"86e2\"><strong>Step 1.&nbsp;<\/strong>Obtain industry code, financial and return data<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">security = tejapi.get('TWN\/ANPRCSTD',<br>          mkt = 'TSE',<br>          stypenm = '\u666e\u901a\u80a1',<br>          paginate = True,<br>          chinese_column_name = True)#Store firms' codes<br>security_list = security['\u8b49\u5238\u78bc'].tolist()<br>#Store industries' codes<br>industry_code = security[['\u8b49\u5238\u78bc', 'TSE\u696d\u5225']]<br>industry_code = industry_code.set_index('\u8b49\u5238\u78bc').to_dict()['TSE\u696d\u5225']<\/pre>\n\n\n\n<p id=\"0ce5\">First of all, obtain TSE-listed stocks\u2019 codes and their corresponding industries\u2019 codes. We save the latter one as a dictionary as shown below.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/12Fe1g1vNrC-nDBCHDej_Zw.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">security_list &amp; industry_code<\/figcaption><\/figure>\n\n\n\n<pre class=\"wp-block-preformatted\">groups = []<br>while True:<br>    if len(security_list) &gt;= 50:<br>        groups.append(security_list[:50])<br>        security_list = security_list[50:]<br>    elif 0 &lt;= len(security_list) &lt; 50:<br>        groups.append(security_list)<br>        break<\/pre>\n\n\n\n<p id=\"2997\">Here we form several groups with 50 firms each group for the next step. Because if we obtain huge amount of data at one time, the failure may happen.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">fin_data = pd.DataFrame()   #Financial <br>ret_data = pd.DataFrame()   #Return<br>date_data = pd.DataFrame()  #Datefor group in groups:<br>    fin_data = fin_data.append(tejapi.get('TWN\/EWIFINQ',<br>                  coid = group,<br>                  chinese_column_name = True,<br>                  paginate = True)).reset_index(drop=True)<br>    ret_data = ret_data.append(tejapi.get('TWN\/APRCD2',<br>           coid = group,<br>           opts = {'columns': ['coid', 'mdate', 'roi_q']},<br>           paginate = True,<br>           chinese_column_name = True)).reset_index(drop=True)<br>    date_data = date_data.append(tejapi.get('TWN\/EWFINDATE2',<br>           coid = group,<br>           opts = {'columns': ['coid', 'mdate', 'fin_date']},<br>           paginate = True,<br>           chinese_column_name = True)).reset_index(drop=True)<\/pre>\n\n\n\n<p id=\"d5e9\">Then use the loop to get each groups\u2019 financial and seasonal stock return data. It\u2019s worth noting that the frequency of seasonal stock return is daily, because it\u2019s the cumulative seasonal return before that date. It\u2019s like a rolling stock return, thus it has data for each trading day.&nbsp;<code>date_data&nbsp;<\/code>is the table provided by TEJ, and it\u2019s very useful while combining return and financial data.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/0gdosHX2BTPMCSXUB.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"cf6c\"><strong>Step 2.&nbsp;<\/strong>Merge data<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">date_data = date_data.groupby(['\u8b49\u5238\u78bc', '\u8ca1\u52d9\u516c\u544a\u65e5']).last().reset_index() <br>date_data = date_data.rename(columns = {'\u4ea4\u6613\u65e5\u671f':'\u5e74\u6708\u65e5', '\u8ca1\u52d9\u516c\u544a\u65e5':'\u8ca1\u5831\u767c\u5e03\u65e5'})<\/pre>\n\n\n\n<p id=\"65e8\">Obtain the last trading date before the next financial statement announcement date, because we\u2019ll use this date to combine with the date of seasonal return. That\u2019s to say, the seasonal return will represent the cumulative return after the financial statement announcement date. Finally, we change the column names to prepare for merging date.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/0EqmIJgR_E6pDlI1R.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">date_data<\/figcaption><\/figure>\n\n\n\n<pre class=\"wp-block-preformatted\">merge = date_data.merge(fin_data, on = ['\u8b49\u5238\u78bc', '\u8ca1\u5831\u767c\u5e03\u65e5'])<br>merge = merge.rename(columns = {'\u8b49\u5238\u78bc':'\u8b49\u5238\u4ee3\u78bc'})<br>merge = merge.merge(ret_data, on = ['\u8b49\u5238\u4ee3\u78bc', '\u5e74\u6708\u65e5'])<br>merge = merge.set_index(['\u8b49\u5238\u4ee3\u78bc', '\u8ca1\u52d9\u8cc7\u6599\u65e5']).select_dtypes(include=np.number)<\/pre>\n\n\n\n<p id=\"c38f\">Combine all the data and set codes of stock and date as our new indexes. Then only keep the numeric columns as features to predict the return movement.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/0kNpsmLBWKB7Q3H8J.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">merge<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"db97\"><span class=\"ez-toc-section\" id=\"Model_Training\"><\/span><strong>Model Training<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"b62e\"><strong>Step 1.&nbsp;<\/strong>Split dataset into training and testing date and train the model<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">condition = merge.index.get_level_values('\u8ca1\u52d9\u8cc7\u6599\u65e5') &lt; '2020'<br>train_data = merge[condition].fillna(0)<br>test_data = merge[~condition].fillna(0)<br>rf = RandomForestClassifier(n_estimators=100, criterion= 'entropy')<br>rf.fit(train_data.drop(columns = '\u5b63\u5831\u916c\u7387 %'), train_data['\u5b63\u5831\u916c\u7387 %'] &gt; 0)<\/pre>\n\n\n\n<p id=\"2861\">The dataset before 2020 is used as training data and testing data otherwise. We fill the missing value with zero and finally fit the random forest model with features and boolean labels in training dataset.<\/p>\n\n\n\n<p id=\"8c05\"><strong>Step 2.&nbsp;<\/strong>Model performance<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">print(\"\u8a13\u7df4\u96c6\u5206\u6578: \" , rf.score(train_data.drop(columns = '\u5b63\u5831\u916c\u7387 %'), train_data['\u5b63\u5831\u916c\u7387 %'] &gt; 0))<br>print(\"\u8a13\u7df4\u96c6\u5206\u6578: \" , rf.score(test_data.drop(columns = '\u5b63\u5831\u916c\u7387 %'), test_data['\u5b63\u5831\u916c\u7387 %'] &gt; 0))<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/0zBL6bZfoLg8W8mpQ.png\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"31ca\"><span class=\"ez-toc-section\" id=\"Backtesting_and_Visualization\"><\/span><strong>Backtesting and Visualization<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">selected = rf.predict(test_data.drop(columns = '\u5b63\u5831\u916c\u7387 %'))<br>test_data[selected]<\/pre>\n\n\n\n<p id=\"f1d2\">Use the predicted outcome as ways of filtering stocks. So we will have stocks that are predicted to perform well in the next season.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/0hflwesWIy_W97gZT.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">test_data[selected]<\/figcaption><\/figure>\n\n\n\n<pre class=\"wp-block-preformatted\">plt.rcParams['font.sans-serif'] = ['Microsoft JhengHei']  #\u986f\u793a\u4e2d\u6587<br>(test_data[selected].groupby('\u8ca1\u52d9\u8cc7\u6599\u65e5').mean()['\u5b63\u5831\u916c\u7387 %']*0.01 + 1).cumprod().plot(color = 'blue')     #randomforest<br>(test_data[~selected].groupby('\u8ca1\u52d9\u8cc7\u6599\u65e5').mean()['\u5b63\u5831\u916c\u7387 %']*0.01 + 1).cumprod().plot(color = 'orange')  #benchmark1<br>(test_data.groupby('\u8ca1\u52d9\u8cc7\u6599\u65e5').mean()['\u5b63\u5831\u916c\u7387 %']*0.01 + 1).cumprod().plot(color = 'red')     #benchmark2<\/pre>\n\n\n\n<p id=\"7b54\">We draw three lines here. Blue line means the cumulative return of portfolio based on model\u2019s prediction. Orange line is the cumulative return of portfolio formed by selecting the stocks that are predicted to fall in stock return as our benchmark. Red line means we own all of the stocks without any filtering. It can be seen that blue line is the best of the three lines.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/0BbEeU6JMaIEySRft.png\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"c332\"><span class=\"ez-toc-section\" id=\"Optimize_Stock-Picking_Strategy\"><\/span><strong>Optimize Stock-Picking Strategy<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"92e9\"><strong>Step 1.&nbsp;<\/strong>Choose important features<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">feature_name = train_data.columns[:-1]<br>important = pd.Series(rf.feature_importances_, index = feature_name).sort_values(ascending=False)<br>important.head(20)<\/pre>\n\n\n\n<p id=\"1ecd\">Observe the most 20 important features to be standards of selecting stocks<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/033mZNI8uoU_gC3cn.png\" alt=\"\"\/><\/figure>\n\n\n\n<pre class=\"wp-block-preformatted\">positive_features = ['\u71df\u696d\u5229\u76ca\u6210\u9577\u7387', '\u71df\u6536\u6210\u9577\u7387', '\u6295\u8cc7\u6d3b\u52d5\u4e4b\u73fe\u91d1\u6d41\u91cf', '\u71df\u696d\u6bdb\u5229\u6210\u9577\u7387', 'ROE(A)-\u7a05\u5f8c', '\u71df\u696d\u5916\u6536\u5165\u53ca\u652f\u51fa', 'ROA(C) \u7a05\u524d\u606f\u524d\u6298\u820a\u524d', 'CFO\/\u5408\u4f75\u7e3d\u640d\u76ca', '\u7a05\u5f8c\u6de8\u5229\u7387', '\u6bcf\u80a1\u6de8\u503c(F)-TSE\u516c\u544a\u6578', '\u71df\u696d\u5229\u76ca\u7387', '\u71df\u696d\u6bdb\u5229\u7387', '\u4f86\u81ea\u71df\u904b\u4e4b\u73fe\u91d1\u6d41\u91cf']<\/pre>\n\n\n\n<p id=\"94c3\">Next step is to subjectively choose the positive features, meaning the higher its value, the better the company is, from those 20 features. Since we will convert those values into percentiles and rank them by values, the higher value should signify it has better performance.<\/p>\n\n\n\n<p id=\"72ba\"><strong>Step 2.&nbsp;<\/strong>The setting of same industry comparison<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">merge['\u7522\u696d'] = merge.index.get_level_values('\u8b49\u5238\u4ee3\u78bc').map(industry_code)<br>merge = merge.reset_index().set_index(['\u8b49\u5238\u4ee3\u78bc', '\u7522\u696d', '\u8ca1\u52d9\u8cc7\u6599\u65e5'])<\/pre>\n\n\n\n<p id=\"6ba8\">Here we map industry code, the value of the dictionary&nbsp;<code>industry_code&nbsp;<\/code>stored in data processing step, into a new column in&nbsp;<code>merge<\/code>. And set this column, security code and date as new index.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/0H_o1IxSYuWsbjjZY.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">merge<\/figcaption><\/figure>\n\n\n\n<p id=\"efaf\"><strong>Step 3.&nbsp;<\/strong>Calculate important features scores to select stocks<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">score = merge[positive_features].groupby(['\u8ca1\u52d9\u8cc7\u6599\u65e5', '\u7522\u696d']).rank(pct=True).sum(axis = 1) <br>rank = score.rank(pct = True)          #\u7e3d\u5206\u518drank<br>filters = rank &gt; 0.97<br>(merge[filters].groupby('\u8ca1\u52d9\u8cc7\u6599\u65e5').mean()['\u5b63\u5831\u916c\u7387 %']*0.01+1).cumprod().plot(color = 'blue')  <br>(merge[~filters].groupby('\u8ca1\u52d9\u8cc7\u6599\u65e5').mean()['\u5b63\u5831\u916c\u7387 %']*0.01+1).cumprod().plot(color = 'orange') <br>(merge.groupby('\u8ca1\u52d9\u8cc7\u6599\u65e5').mean()['\u5b63\u5831\u916c\u7387 %']*0.01 + 1).cumprod().plot(color = 'red')<\/pre>\n\n\n\n<p id=\"e90a\">Then we group data by date and industry, and rank the values and convert them into percentiles in each group by using&nbsp;<code>rank(pct=True)<\/code>&nbsp;. Next is sum up the values horizontally and rank again. Finally we choose the data which is better than 97th percentiles of all data and backtesting. Following picture shows the cumulative return of this strategy (blue line) is superior than the other two benchmarks.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/08GF9VLZlsSGN5kUe.png\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"e2fe\"><span class=\"ez-toc-section\" id=\"Stock_Watchlist_Based_on_2021_Q3_Data\"><\/span><strong>Stock Watchlist Based on 2021 Q3 Data<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">this_season = fin_data[fin_data['\u8ca1\u52d9\u8cc7\u6599\u65e5'] == '2021-09-01']<br>this_season['\u7522\u696d'] = this_season['\u8b49\u5238\u78bc'].map(industry_code)<br>this_season = this_season.set_index(['\u8b49\u5238\u78bc', '\u8ca1\u52d9\u8cc7\u6599\u65e5', '\u7522\u696d']).loc[:,positive_features]<br>score = this_season.groupby(['\u7522\u696d']).rank(pct=True).sum(axis = 1) <br>rank =  score.rank(pct = True)<br>firm_list = [i[0] for i in rank[rank &gt; 0.97].index]<br>firms = tejapi.get('TWN\/AIND',<br>                  coid = firm_list,<br>                  opts = {'columns':['coid','fnamec']},<br>                  paginate = True,<br>                  chinese_column_name = True)<\/pre>\n\n\n\n<p id=\"362d\">Choose the date equals to \u20182021\u201309\u201301\u2019 , and also choose the best 3% of each industry. Then adopt database&nbsp;<code>TWN\/AIND<\/code>&nbsp;to see the names of the companies<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/0Ar1zXPZ1qHJ2g8uX.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">firms<\/figcaption><\/figure>\n\n\n\n<pre class=\"wp-block-preformatted\">ret_sofar = tejapi.get('TWN\/APRCD2',<br>           coid = firm_list,<br>           mdate = {'gte':'2021-09-30'},  <br>           opts = {'columns': ['coid', 'mdate', 'roia']},<br>           paginate = True,<br>           chinese_column_name = True)<br>ret_sofar.groupby('\u5e74\u6708\u65e5')['\u65e5\u5831\u916c\u7387 %'].mean().apply(lambda x: 0.01*x + 1).cumprod().plot()<\/pre>\n\n\n\n<p id=\"b738\">The cumulative return of the newly-formed portfolio<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/0N1sgT4FLRFuOrEL9.png\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"eba4\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span><strong>Conclusion<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"f7e6\">Because TEJ API database has comprehensive and high quality data, it\u2019s easier to handle in data processing step. We just need to merge the data, split the dataset and then we can start to build the model. Even though the accuracy is only 54.88%, we still can extract valuable information from the fitted model. Readers can try to adjust parameters while training the model, pick different combination of important features, use different databases or consider the trend of industries and do the second filtering. Lastly build the portfolio with optimal performance and see if its great performance will remain.<\/p>\n\n\n\n<p id=\"be1d\">The content of this webpage is not an investment device and does not constitute any offer or solicitation to offer or recommendation of any investment product. It is for learning purposes only and does not take into account your individual needs, investment objectives and specific financial circumstances. Investment involves risk. Past performance is not indicative of future performance. Readers are requested to use their personal independent thinking skills to make investment decisions on their own. If losses are incurred due to relevant suggestions, it will not be involved with author.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Source_Code\"><\/span><strong>Source Code<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/gist.github.com\/tej87681088\/f98981963ae8e688227ae75f3d655d79#file-tejapi_medium-9-ipynb\" class=\"ek-link\" target=\"_blank\" rel=\"noopener\">Click here to go Github<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"37a4\"><span class=\"ez-toc-section\" id=\"Extended_Reading\"><\/span><strong>Extended Reading<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.tejwin.com\/en\/insight\/xgboost-algorithm-predicts-returns-part-1\/\" class=\"ek-link\">XGBoost Algorithm Predicts Returns (Part 1)<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.tejwin.com\/en\/insight\/warren-e-buffetts-value-investing\/\" class=\"ek-link\">Warren E. Buffett\u2019s Value Investing<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"1cc1\"><span class=\"ez-toc-section\" id=\"Related_Link\"><\/span><strong>Related Link<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/api.tej.com.tw\/index.html\" rel=\"noreferrer noopener\" target=\"_blank\">TEJ API<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/eshop.tej.com.tw\/E-Shop\/Edata_intro\" rel=\"noreferrer noopener\" target=\"_blank\">TEJ E-Shop<\/a><\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><\/p>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Backtesting and stock-picking strategy with machine learning Highlights Preface To put it simply, random forest is one of algorithms made up of many decision trees with the adoption of bagging and random sampling. Since it\u2019s based on CART algorithm, it can handle both classification and continuous data. Other advantages such as its comparability with high [&hellip;]<\/p>\n","protected":false},"featured_media":16594,"template":"","tags":[2583,2604,2371,2626,3000,3007,2428,2537],"insight-category":[690,50],"class_list":["post-16593","insight","type-insight","status-publish","has-post-thumbnail","hentry","tag-finance","tag-machine-learning","tag-python","tag-random-forest","tag-stock-selection","tag-tejapi-data-analysis","tag-2428","tag-2537","insight-category-data-analysis","insight-category-fintech"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight\/16593","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight"}],"about":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/types\/insight"}],"version-history":[{"count":1,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight\/16593\/revisions"}],"predecessor-version":[{"id":24854,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight\/16593\/revisions\/24854"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/media\/16594"}],"wp:attachment":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/media?parent=16593"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/tags?post=16593"},{"taxonomy":"insight-category","embeddable":true,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight-category?post=16593"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}