Table of Contents
In machine learning, predicting the prices of financial market derivatives has always been popular. Numerous studies have focused on using machine learning to achieve excess returns in the market. However, since the financial market is essentially a collection of human behaviors encompassing many irregular and uncertain factors, ordinary machine learning models like logistic regression, random forests, and extreme gradient boosting seem unable to capture the overly complex market rules effectively. Consequently, with the vigorous development of deep learning, more time series-related models apply to future stock price predictions. This article uses the LSTM time series model for deep learning-based LSTM stock price prediction, utilizing the opening, high, low, and closing prices of the past five days, quarterly ROE, MOM (indicating the magnitude of price trend changes, and the direction of market trends), and RSI indicators to predict the next day’s closing price.
This article uses Mac OS and VS Code as editors.
Due to the significant impact of large investors and unpredictable market fluctuations on the stock prices of large-cap stocks in Taiwan, making their price movements challenging to predict, this article selects the seventh largest component stock (2618, Eva Airways Corp.) of the Taiwan Small and Medium Cap 300 Index (referred to as “Small and Medium Cap 300 Index”) for Q2 2024, as the target stock. We also selected a higher market cap stock (8215, BenQ Materials Corp.) for backtesting as a reference.
import os
import time
import tejapi
import numpy as np
import pandas as pd
...
ML_stock()
is a custom class we created for preprocessing data. It handles loading the API_KEY, price-volume data, fundamental data, and technical indicators. Finally, it sets the start and end dates for the model’s sample period.
*Note: To ensure operation, please enter your API_KEY in the config.ini file before using it.
ml_stock = ML_stock()
ml_stock.ini()
start = '2012-07-01'
end = '2022-07-01'
We have retained only the necessary features for the next steps.
First, standardize all data and define the training set’s window_size as 5. This means each data point consists of the current day’s data and the following five days’ data. Data is iterated through a sliding window approach, so each data point overlaps the previous data point by five days.
From the above diagram, we can see that we created three-dimensional matrices for the dependent variables (such as open, high, low, and close prices) and the independent variable (the next day’s closing price). The dimensions from left to right represent (the number of data points, number of days, and number of features). After this, we split the data into training, validation, and test sets using an 8:1:1 ratio.
This study uses one LSTM layer and three Dense layers for LSTM stock price prediction modeling, with Dropout layers interspersed to prevent overfitting. The final layer is a Dense layer with a single neuron outputting the prediction value. We also define an exponentially decaying learning rate, starting at 0.001, with the learning rate being reduced to 90% of its previous value every 10,000 steps and following a stepwise decrease.
We use the Adam Optimizer with the previously defined learning rate settings. The loss function is Mean Squared Error (MSE), and the evaluation metric is Mean Absolute Error (MAE).
Finally, we set up an Early Stopping mechanism that monitors val_loss
. If there is no improvement over 10 epochs, the training will stop to prevent overfitting.
model = Sequential([layers.Input((X_train.shape[1], X_train.shape[2])),
layers.LSTM(64),
layers.Dense(32, activation='relu'),
Dropout(0.2),
layers.Dense(32, activation='relu'),
Dropout(0.2),
layers.Dense(1)
])
lr_schedule = ExponentialDecay(
0.001,
decay_steps=10000,
decay_rate=0.9,
staircase=True)
model.compile(optimizer=Adam(learning_rate=lr_schedule), loss='mse', metrics=['mae'])
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
After about 25 epochs, the training loss gradually stops decreasing significantly, indicating that the model converges quickly.
The performance of LSTM stock price prediction on the training and validation sets is quite good, as expected within the sample. However, there is some discrepancy between the predicted and actual prices in the latter part of the out-of-sample test set. While the model captures the overall trend direction, further backtesting is needed to verify its accuracy.
We applied the same method to model 8215 (BenQ Materials) of LSTM stock price prediction and plotted the comparison graph. In the out-of-sample data, the model performs better predicting the next day’s closing price for 8215. However, the actual effectiveness will be validated in the next article.
Daily price changes remain the model’s primary reference for all features, followed by MOM and RSI indicators. Interestingly, quarterly ROE is not favored by the model, likely due to its less frequent data updates than other features. For a model that predicts daily stock prices, quarterly ROE is not as relevant in LSTM stock price prediction.
The observation made with 2618 is also evident in 8215: the model does not effectively utilize quarterly ROE for LSTM stock price prediction.
model.save(f'lstm_{sample[0]}.keras', include_optimizer = False)
Finally, we save the model as a .keras
file to facilitate the next backtesting phase.
In constructing the LSTM stock price prediction, it appears LSTM could perform well in price forecasting based on the charts. However, past research on time series models often shows a certain degree of delay between the model’s predictions and the actual data. The charts above indicate that the price movements on the first day might only be reflected on the second day. Although the differences are insignificant, this delay could potentially cause issues with timely order execution in backtesting. The next article will provide further analysis of LSTM stock price prediction.
Note: This analysis is for reference only and does not constitute any product or investment advice.
Verifying LSTM Stock Price Prediction Effectiveness Using TQuant Lab (Part 2)
Start Building Portfolios That Outperform the Market!
“Taiwan stock market data, TEJ collect it all.”
The characteristics of the Taiwan stock market differ from those of other European and American markets, and the dynamics of retail investors are worth noting. Especially in the first quarter of 2024, with the Taiwan Stock Exchange reaching a new high of 20,000 points due to the rise in TSMC’s stock price, global institutional investors are paying more attention to the performance of the Taiwan stock market.
Taiwan Economical Journal (TEJ), a financial database established in Taiwan for over 30 years, serves local financial institutions and academic institutions, and has long-term cooperation with internationally renowned data providers, providing high-quality financial data for five financial markets in Asia.
With TEJ’s assistance, you can access relevant information about major stock markets in Asia, such as securities market, financials data, enterprise operations, board of directors, sustainability data, etc., providing investors with timely and high-quality content. Additionally, TEJ offers advisory services to help solve problems in theoretical practice and financial management!
Subscribe to newsletter