{"id":16416,"date":"2021-10-19T03:11:58","date_gmt":"2021-10-18T19:11:58","guid":{"rendered":"https:\/\/www.tejwin.com\/?post_type=insight&#038;p=16416"},"modified":"2024-07-11T09:02:34","modified_gmt":"2024-07-11T01:02:34","slug":"xgboost-algorithm-predicts-returns-part-2","status":"publish","type":"insight","link":"https:\/\/www.tejwin.com\/en\/insight\/xgboost-algorithm-predicts-returns-part-2\/","title":{"rendered":"XGBoost Algorithm Predicts Returns (Part 2)"},"content":{"rendered":"\n<p>Use algorithm to learn the investment factors and predict returns.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1_1bOokbEeXpF1Z4gd_BpL93w.jpg\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Photo Creds:&nbsp;<a href=\"https:\/\/unsplash.com\/photos\/NDfqqq_7QWM\" rel=\"noreferrer noopener\" target=\"_blank\">Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_81 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a0d3c0c27612\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"ez-toc-cssicon\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a0d3c0c27612\"  aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.tejwin.com\/en\/insight\/xgboost-algorithm-predicts-returns-part-2\/#Highlights\" >Highlights<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.tejwin.com\/en\/insight\/xgboost-algorithm-predicts-returns-part-2\/#Preface\" >Preface<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.tejwin.com\/en\/insight\/xgboost-algorithm-predicts-returns-part-2\/#The_Editing_Environment_and_Modules_Required\" >The Editing Environment and Modules Required<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.tejwin.com\/en\/insight\/xgboost-algorithm-predicts-returns-part-2\/#Database\" >Database<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.tejwin.com\/en\/insight\/xgboost-algorithm-predicts-returns-part-2\/#Preprocessing\" >Preprocessing<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.tejwin.com\/en\/insight\/xgboost-algorithm-predicts-returns-part-2\/#Training\" >Training<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.tejwin.com\/en\/insight\/xgboost-algorithm-predicts-returns-part-2\/#Predict_returns_on_2330\" >Predict returns on 2330<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.tejwin.com\/en\/insight\/xgboost-algorithm-predicts-returns-part-2\/#Visualization\" >Visualization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.tejwin.com\/en\/insight\/xgboost-algorithm-predicts-returns-part-2\/#Conclusion\" >Conclusion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.tejwin.com\/en\/insight\/xgboost-algorithm-predicts-returns-part-2\/#Source_code\" >Source code<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.tejwin.com\/en\/insight\/xgboost-algorithm-predicts-returns-part-2\/#Extended_Reading\" >Extended Reading<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.tejwin.com\/en\/insight\/xgboost-algorithm-predicts-returns-part-2\/#Related_Link\" >Related Link<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\" id=\"7f52\"><span class=\"ez-toc-section\" id=\"Highlights\"><\/span><strong>Highlights<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Difficulty\uff1a\u2605\u2605\u2605\u2605\u2606<\/li>\n\n\n\n<li>Data Preprocessing<\/li>\n\n\n\n<li>XGBoost Modeling<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"faad\"><span class=\"ez-toc-section\" id=\"Preface\"><\/span><strong>Preface<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"2eda\">We talked about how to create new enviornment and install XGBoost last time. If you haven\u2019t read it yet, please click this\u00a0<a href=\"https:\/\/medium.com\/tej-api-financial-data-anlaysis\/data-analysis-5-xgboost-algorithm-predicts-returns-part-1-dd5f4c40728d\" class=\"ek-link\" target=\"_blank\" rel=\"noopener\"><strong>link<\/strong><\/a>. In this article we will make some preprocessing on data. Then train the model to predict the stock returns and try to analyze which factor is the most important.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"032d\"><span class=\"ez-toc-section\" id=\"The_Editing_Environment_and_Modules_Required\"><\/span><strong>The Editing Environment and Modules Required<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"7570\">Mac OS and Jupyter Notebook<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># basic<br>import numpy as np<br>import pandas as pd# graphy<br>import matplotlib.pyplot as plt<br>%matplotlib inline# machine learning<br>from sklearn.metrics import mean_squared_error<br>from sklearn.model_selection import train_test_split<br>import xgboost as xgb# TEJ<br>import tejapi<br>tejapi.ApiConfig.api_key = \"Your Key\"<br>tejapi.ApiConfig.ignoretz = True<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"9cce\"><span class=\"ez-toc-section\" id=\"Database\"><\/span><strong>Database<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/api.tej.com.tw\/columndoc.html?subId=119\" rel=\"noreferrer noopener\" target=\"_blank\"><strong>\u53f0\u7063_\u591a\u56e0\u5b50 DB<\/strong><\/a>: code&nbsp;<code><strong>TWN\/APRCM<\/strong><\/code>&nbsp;Which covers the indicators used by scholars to measure factors since 2000, and the data frequency is monthly data.<\/li>\n\n\n\n<li><a href=\"https:\/\/api.tej.com.tw\/columns.html?idCode=TWN%2FAPRCM\" rel=\"noreferrer noopener\" target=\"_blank\"><strong>\u4e0a\u5e02(\u6ac3)\u672a\u8abf\u6574\u80a1\u50f9(\u6708)<\/strong><\/a>: code&nbsp;<code><strong>TWN\/APRCM<\/strong><\/code>&nbsp;Target listed securities and indexes.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"01f4\"><span class=\"ez-toc-section\" id=\"Preprocessing\"><\/span><strong>Preprocessing<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"e90a\">We use all listed companies of investment factors during 2000~2015 to predict whether the return rate from 2016 to 2017 is positive or negative.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df = tejapi.get('TWN\/AFF_RAW',<br>                mdate={'gte': '2000-01-01', 'lte':'2015-12-31'},<br>                opts={'columns':['coid','mdate','pbr','per',<br>                      'div_yid','mom','str','ltr','profit','invest',<br>                      'dd_merton','dd_kmv','illiq','idiosyncratic',<br>                      'hhi','skew']},<br>                chinese_column_name = True,<br>                paginate = True)<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1_1ZkJ0lk0lBu1QS-WlfHfjBA.png\" alt=\"\"\/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"d3e6\"><strong>Step 1.&nbsp;Check for missing values<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\">df.isnull().sum(axis=0)<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1nFuG9Qsy9OdUB1OK9GHFhA.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"dd4c\">If there are missing values, directly thrown into the model, it will be impossible to calculate. However, the&nbsp;<mark>XGBoost model<\/mark>&nbsp;can handle sparse matrices and can tolerate the existence of missing values. However, if we can fill in the missing values \u200b\u200breasonably, it will help us to enhance the model. The common method is to fill in the \u201caverage\u201d, \u201cmedian\u201d, or directly fill in 0. The syntax used is fillna. As for what to fill in, you can do some exploratory data analysis on the data first ( Exploratory Data Analysis, EDA), then this will be another important point, and we will spend another time to introduce it in the future!<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"6237\"><strong>Step 2.&nbsp;In order to avoid forward-looking errors, we will use the current month factor to predict the next month\u2019s remuneration, so we need to match the current month\u2019s remuneration with the next month\u2019s remuneration, so we postpone the data by one month, and then we can use it when we merge the data.<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"># \u8655\u7406\u6642\u9593<br>from datetime import date, timedelta<br>import calendar<\/pre>\n\n\n\n<p id=\"b96f\">Switch one month<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df['\u5e74\u6708'] = df['\u5e74\u6708'].apply(lambda x: x + timedelta(days=calendar.monthrange(x.year, x.month)[1]))<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"b1fc\"><strong>step 3.&nbsp;Standardization<\/strong><\/h4>\n\n\n\n<p id=\"220c\">Generally speaking, in machine learning, standardizing data will improve the predictive power of the model, but this step is not required in XGBoost. The rough explanation is: standardization is to deal with continuous features, and the main function is to perform numerical scaling (minus the average value). , Divided by the standard deviation). The purpose of numerical scaling is to solve the problem of increasing the number of iterations due to the contour line being an ellipse when the gradient is descent. However, the previous article mentioned that XGBoost is a tree model, and gradient descent cannot be performed, because the tree model is step-by-step and cannot be used as a derivative. Instead, optimization is done by finding the optimal split point of the feature. Since standardization does not change the location of the split point, XGBoost does not need to standardize the data!<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"2a5e\"><strong>step 4.&nbsp;Processing label data<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\">df_label = tejapi.get('TWN\/APRCM',<br>                mdate={'gte': '2000-01-01', 'lte':'2015-12-31'},<br>                opts = {'columns':['coid','mdate','roi']},<br>                chinese_column_name = True,<br>                paginate = True)<\/pre>\n\n\n\n<p id=\"86e4\">Set to 1 if it is a positive reward, and 0 if it is a negative reward.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df_label['\u5831\u916c\u7387\uff05_\u6708'] = df_label['\u5831\u916c\u7387\uff05_\u6708'].apply(lambda x: 1 if x&gt;0 else 0)<br>df_label.rename(columns={'\u8b49\u5238\u4ee3\u78bc':'\u8b49\u5238\u78bc'}, inplace=True)<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1-qrvSJSUC5U8sv2fYqaNeA.png\" alt=\"\"\/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"0aac\"><strong>step 5.&nbsp;Merge Data<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\">data = pd.merge(df , df_label, on=['\u8b49\u5238\u78bc', '\u5e74\u6708'])<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"a538\"><span class=\"ez-toc-section\" id=\"Training\"><\/span><strong>Training<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"c7ae\"><strong>step 1.&nbsp;Divide training and labeling dataset<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\">X, y = data.iloc[:,2:-1],data.iloc[:,-1]<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"e786\"><strong>step 2.&nbsp;Cutting training and testing data, usually when performing machine learning, we don\u2019t use all the data for training, because in this way, it is difficult for us to judge whether his learning effect is good or bad, so we will cut out some of them. , We will split into 8:2 below, and 20% of the data will be used for subsequent evaluation of the model<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\">X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"b537\"><strong>step 3.&nbsp;Data thrown into the classifier<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\">from xgboost import XGBClassifier<br>from sklearn.metrics import accuracy_score<\/pre>\n\n\n\n<p id=\"6ade\">Name the classifier \u201cmodel\u201d, the following string means that we have changed the label to the encoding we want by ourselves.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">model = XGBClassifier(use_label_encoder=False)<\/pre>\n\n\n\n<p id=\"100e\">Start training!<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">model.fit(X_train, y_train)<\/pre>\n\n\n\n<p id=\"1ddf\">Let\u2019s check the prediction accuracy of this model on the test set<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">y_pred = model.predict(X_test)<br>predictions = [round(value) for value in y_pred]<br>accuracy = accuracy_score(y_test, predictions)<br>print(\"Accuracy: %.2f%%\" % (accuracy * 100.0))<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1MSloEUuoleDBx06vThyD2A.png\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"54fe\"><span class=\"ez-toc-section\" id=\"Predict_returns_on_2330\"><\/span><strong>Predict returns on 2330<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"4769\"><strong>step 1.&nbsp;Load 2330 investment factors in 2015~2017.<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\">df_pred = tejapi.get('TWN\/AFF_RAW',<br>                coid = '2330',<br>                mdate={'gte': '2015-12-01', 'lte':'2017-11-30'},<br>                opts={'columns': \u5982\u6587\u7ae0\u7b2c\u4e00\u6bb5}<br>                chinese_column_name = True,<br>                paginate = True)<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"6de8\"><strong>step 2.&nbsp;Load the monthly return of 2330 in 2016~2017 and defer it for one month.<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\">df_pred_label = tejapi.get('TWN\/APRCM',<br>                coid = comp,<br>                mdate={'gte': '2016-01-01', 'lte':'2017-12-31'},<br>                opts = {'columns':['mdate','roi']},<br>                chinese_column_name = True,<br>                paginate = True)<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"0f32\"><strong>step 3.&nbsp;Calculation accuracy<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\">pred2 = model.predict(df_pred.iloc[:,2:])<br>df_pred_label['\u5831\u916c\u7387\u9810\u6e2c'] = pred2<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1zlWGOjx2tLPLX9mJhbR52g.png\" alt=\"\"\/><\/figure>\n\n\n\n<pre class=\"wp-block-preformatted\">accuracy = accuracy_score(df_pred_label['\u5831\u916c\u7387\uff05_\u6708'].apply(lambda x: 1 if x&gt;0 else 0), pred2)<br>print(\"Accuracy: %.2f%%\" % (accuracy * 100.0))<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1_kmH4QcWyMfN1cXqNt-Obw.png\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"cc41\"><span class=\"ez-toc-section\" id=\"Visualization\"><\/span><strong>Visualization<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"3404\"><strong>step 1.&nbsp;Set Matplotlib to display Chinese chart.<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\">import matplotlib.pyplot as plt<br>import matplotlib.font_manager<br>plt.rcParams['font.sans-serif'] = 'Arial Unicode MS'<br>plt.rcParams['axes.unicode_minus'] = False<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"3d67\"><strong>step 2.&nbsp;Visual decision tree (partial)<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\">plt.figure(figsize=(30,10))<br>xgb.plot_tree(model,num_trees=0)<br>plt.rcParams['figure.figsize'] = [1300, 1000]<br>plt.show()<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1_1PTLixnJikm3dlENqhpFlLw.png\" alt=\"\"\/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"4a20\"><strong>step 3.&nbsp;Visual feature importance<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\">plt.figure(figsize=(10,40))<br>xgb.plot_importance(model)<br>plt.rcParams['figure.figsize'] = [5, 5]<br>plt.show()<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1_1ELSqni9bh8jAQ3-zQJ8E_A.png\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"e70d\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span><strong>Conclusion<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"3655\">Today\u2019s teaching is actually the common architecture of the current machine learning competition. The development of the model is actually very complicated mathematics, and we can understand its characteristics and performance to use it, and there is no need to deeply understand his mathematical calculations, unless you are right This is very interesting. Regarding the accuracy of prediction, the key is often in the pre-processing of data, including data processing such as missing values, skew states, and collinearity, or discovering from related papers that effective features can be combined.<\/p>\n\n\n\n<p id=\"13b6\">In this article, because TEJ has sorted out the data of related factors, it is relatively easy to pre-process, but it is still important to emphasize that due to the large amount of uncertainty in finance, the current accuracy cannot represent the future accuracy. But it is understandable that these factors are the characteristics of the current financial search for stock prices, but there are a lot of data in real life that can be used to play, now hurry up and throw the data into XGBoost!<\/p>\n\n\n\n<p id=\"509c\">This article is for reference only, and does not constitute an offer, solicitation or invitation, inducement, any representation of any kind or form, or the conclusion of any suggestions and recommendations. Readers are advised to use their personal independent thinking skills to make investment decisions on their own, if relevant The suggestion incurs losses and has nothing to do with the author.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Source_code\"><\/span><strong>Source code<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/gist.github.com\/tej87681088\/9069b3b3d7bee363a9121194fe03aa73#file-tejapi_medium-6-ipynb\" class=\"ek-link\" target=\"_blank\" rel=\"noopener\">Click here to go Github<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"0db9\"><span class=\"ez-toc-section\" id=\"Extended_Reading\"><\/span><strong>Extended Reading<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.tejwin.com\/en\/insight\/xgboost-algorithm-predicts-returns-part-1\/\" class=\"ek-link\">XGBoost Algorithm Predicts Returns (Part 1)<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.tejwin.com\/en\/insight\/efficient-frontier\/\" class=\"ek-link\">Efficient Frontier<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"5358\"><span class=\"ez-toc-section\" id=\"Related_Link\"><\/span><strong>Related Link<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/api.tej.com.tw\/index.html\" rel=\"noreferrer noopener\" target=\"_blank\">TEJ API<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/eshop.tej.com.tw\/E-Shop\/Edata_intro\" rel=\"noreferrer noopener\" target=\"_blank\">TEJ E-Shop<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Use algorithm to learn the investment factors and predict returns. Highlights Preface We talked about how to create new enviornment and install XGBoost last time. If you haven\u2019t read it yet, please click this\u00a0link. In this article we will make some preprocessing on data. Then train the model to predict the stock returns and try [&hellip;]<\/p>\n","protected":false},"featured_media":16417,"template":"","tags":[2573,2583,2371,3007,2646],"insight-category":[690,50],"class_list":["post-16416","insight","type-insight","status-publish","has-post-thumbnail","hentry","tag-data-science","tag-finance","tag-python","tag-tejapi-data-analysis","tag-xgboost","insight-category-data-analysis","insight-category-fintech"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight\/16416","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight"}],"about":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/types\/insight"}],"version-history":[{"count":1,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight\/16416\/revisions"}],"predecessor-version":[{"id":24855,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight\/16416\/revisions\/24855"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/media\/16417"}],"wp:attachment":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/media?parent=16416"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/tags?post=16416"},{"taxonomy":"insight-category","embeddable":true,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight-category?post=16416"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}