# XGBoost Algorithm Predicts Returns (Part 1)

Use algorithm to learn the investment factors and predict returns.

## Highlights

• Difficulty：★★★☆☆
• Setting Virtual Environment
• XGBoost Introduction and Installation

## Preface

Recently, a lot of algorithms have emerged, and various mathematical models have been developed to solve problems. The classic model is “regression”. With the advancement of technology, algorithms now been developed which can improve and learn by themselves (Machine Learning). Nowaday has developed into the most popular type of neural network model (Deep Learning).

This article introduces the tree model XGBoost and will be divided into two parts. The first part will teach how to set environment and module installation. The second part is the preprocessing of the data, training, and prediction and visualization.

## XGBoost Introduction

First, let’s introduce the popular algorithm XGBoost. The so-called Boosting is a kind of aggregating many weak learnings into a more powerful learner, which has higher accuracy for the final prediction result.

XGBoost (Extreme Gradient Boosting) is a gradient descent algorithm, Gradient Boosted Tree (GBDT), Each step of learning is based on previous errors, and will retain the original model, and add new functions as a correction the last error, this is a collection of multiple weak learners. The application mainly solves supervised learning, which can deal with classification and regression problems as well.

## The Editing Environment and Modules Required

Mac OS and Jupyter Notebook

## Virtual Environment

Due to XGBoost uses many modules, if the versions are inconsistent, it will cause endless errors. Therefore, we can create a new environment to install these modules. There are many ways to install them. This tutorial is a relatively simple and easy-to-understand way to minimize errors.

#### Step 1. Install Anaconda

Anaconda can be said to be a lazy package for beginners. It solves the current situation that the inconsistency of various systems causes installation difficulties. It has organized more than 1000 packages that can be installed, which are suitable for Windows, Linux and MacOS. Operating system environment, also has a virtual environment manager, which is simple and fast for installing and executing machine learning environment.

#### Step 2. Click terminal

Windows system is Anaconda Prompt

Enter the following command

`conda create -n new_env_name python==3.8`

It will pop up and ask if you want to install it. Enter `y` and `enter` ！ The name of our new environment is `test`. Of course you can also type any name you like.

`conda env list`

This command will show all of the environment we have created.

step 3. Activate environment

`conda activate new_env_name`

At this time, the front bracket (base) of the terminal will turn into the name (test). It means we activate the environment successful. If the following installation fails and need to reinstall. We just remove the environment by simply entering a series of commands below.

`conda env remove -n new_env_name`

## Install XGBoost

#### step 1. Activate environment

`conda activate new_env_name`

#### step 2. Enter command

`conda install py-xgboost`

The same will ask if you want to install these modules, type `y` and press `enter` to start the installation, and it will be successful after running! Is it very simple!

## Install XGBoost visualization module graphviz

#### step 1. Install Homebrew (under our new environment)

Homebrew We can understand it as an installation method. For example, using `pip` to install python module. On macOS, Homebrew is the most widely used package management tool.

`/bin/bash -c "\$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"`

Enter the command on the terminal to install

#### step 2. graphviz

`brew install graphviz`

The above are the modules we will mainly use in this article! However, in the new environment, XGBoost does not have some of the modules we need, so we have to install them separately (pandas, matplotlib, tejapi). The command is separated by spaces.

`pip install pandas matplotlib tejapi`

## Final Result

Finally, checking whether the installation is successful in jupyter!

## Database

We use TWN/AFF_RAW in this article. It provides trading factors for algorithms learning. Database refer to Kenneth R. French and top three financial journals (JF、RFS、JFE). The indicators are calculated by using Taiwan market data, and the all indicators are sorted out in a monthly frequency.

`df = tejapi.get('TWN/AFF_RAW',                coid = '9921',                mdate={'gte': '2015-01-01', 'lte':'2020-12-31'}                chinese_column_name = True,                paginate = True)`

## Conclusion

The part 1 of this article is about module installation. I believe that most people will encounter many installation situations when first contact the program. The arrangement of the environment is the first class for programmer. After everyone has successfully installed it, the part 2 will start to use the database. We will process the data, feed the model, and predict returns as a reference for our investment.