What is the Look-ahead Bias ?

Highlight

  • Difficulty:★★★☆☆
  • Automated trading via Bollinger Band
  • Demonstrate how the look-ahead bias affects the trading result

Preface

Look-ahead bias is the phenomenon that unconsciously uses unavailable or unrevealed data in analyzing or simulating historical events.
It exists in the processes of making decisions or evaluations which use information or data that was unknown at that time. 
Look-ahead bias may cause distortion and misleadingness of the result because it violates the principle of using only information available during analysis. It could emerge in any field, such as finance, economy, and data analysis, and influence investment strategy, backtesting of the trading system, and performance grading.

Today’s practice will demonstrate a typical scenario of Look-ahead bias ━ using historical data, which contains the info of future variety in testing trading strategy. However, the info is unrevealed during the period of testing. This phenomenon may cause man-made overestimated performance and unrealistic expectations of the strategy’s profitability.

Programming environment and Module required

MacOS and Jupyter Notebook is used as editor

import pandas as pd 
import re
import numpy as np 
import tejapi
from functools import reduce
import matplotlib.pyplot as plt
from collections import defaultdict, OrderedDict
from tqdm import trange, tqdm
import plotly.express as px
import plotly.graph_objects as go

tejapi.ApiConfig.api_key = "Your api key"
tejapi.ApiConfig.ignoretz = True

Database

  • Listed (OTC) adjusted stock price (day) — average price
  • unadjusted (day) technical indicator

Import data

For the period from 2021–06–01 to 2022–12–31, we take YangMing Marine Transport Corporation(2609) as an example, we will use unadjusted closed price、BB-Upper(20)、BB-Lower(20) to construct the Bollinger Band, and then we will compare the return with Market Return Index(Y9997)

After acquiring the stock price and technical indicators data, as in the previous article, we use plotly.express to visualize our Bollinger Band. In the diagram, bbu20 will be the upper track 、bbl20 will be the lower track, and close_d will be the closed price.

YangMing Marine Transport Corporation(2609) Bollinger Band
YangMing Marine Transport Corporation(2609) Bollinger Band

Next, we will implement two Bollinger Band trading strategies and compare their differences.

  1. Same as the previous article, when the closed price touches the upper track, we will sell our holding position at tomorrow’s open price; when the closed price touch the lower track, we will buy 1 unit at tomorrow’s opened price; when the conditions above are already satisfied, we remain adequate principal, both the holding position and the closed price are lower than last time buying price, we will buy one more unit.
  2. when the closed price touches the upper track, we will sell our holding position at today’s closed price; when the closed price touch the lower track, we will buy 1 unit at today’s closed price; when the conditions above are already satisfied, we remain adequate principal, both the holding position and the closed price are lower than last time buying price, we will buy one more unit.

In light of the only difference in strategies is the transaction’s unit price, we modify our strategy code in the previous article. We define our strategy in def bollingeband_strategy, add an if condition, and set a parameter — mode to control which strategy we want to execute. When mode is True, execute strategy 1; when mode is False, run strategy 2.

Now we build an automated trading strategy. The return of the def bollingeband_strategy is a transaction table that could let us understand each transaction’s details. Further, we define a def simplify to summarize all info for better readability.

The final step is calculating both strategies’ performance. Basically, the code in this part is similar to the previous code. However, because of the upgrade of the pandas version, the latest version no longer supports the function append; we made a slight modification to make the code can keep working and wrap it in def back_test.

Get Started with TEJ Comprehensive Historical Data Now!
Make a Better Trading Decision Today.

So far, we already finished the whole coding process, then we can compare both strategies’ performance with real data.

Trade at the open price of tomorrow

principal = 500000
cash = principal
position = 0
order_unit = 0

trade_book = bollingeband_strategy(data, principal, cash, position, order_unit, True)
trade_book_ = simplify(trade_book)
back_test(principal, trade_book_, data, market)
Trade at the open price of tomorrow
Trade at the open price of tomorrow

 Trade at the closed price of today

principal = 500000
cash = principal
position = 0
order_unit = 0

trade_book_cu_close = bollingeband_strategy(data, principal, cash, position, order_unit, False)
trade_book_cu_close_ = simplify(trade_book_cu_close)
back_test(principal, trade_book_cu_close_, data, market)
Trade at the closed price of today
Trade at the closed price of today

By observing the results of two trading strategies, it can be noticed that trading based on the day’s closing price yields better overall performance. However, novice stock market participants may mistakenly use the historical backtesting data and assume the closing price as the trading price, disregarding the fact that it is impossible to know the closing price in advance in the actual market. Using information that is not known when trading for backtesting constitutes a “look-ahead bias,” resulting in discrepancies in the backtesting results. Therefore, it is advisable to use the next day’s opening price as the trading price to reflect the most accurate trading conditions.

Conclusion

Through this implementation of simple trading backtest, we have demonstrated the presence of the look-ahead bias in backtesting, which is not limited to trading alone but is a common occurrence in the field of finance. In order to avoid the look-ahead bias, it is crucial to ensure that historical analysis or decision-making processes are based solely on the information available at that time. This requires using historical data in a manner consistent with what was known in the past, excluding any subsequent information that was not available at the time. Being aware of the look-ahead bias and handling the data with caution is essential for maintaining the integrity and accuracy of statistical analysis and decision-making processes.

Last but not least, please note that the “Stocks this article mentions are just for discussion. Please do not consider it to be any recommendations or suggestions for investment or products.” Hence, if you are interested in issues like Creating Trading Strategy, Performance Backtesting, and Evidence-based research, you are welcome to purchase the plans offered in TEJ Quantitative Solution and use the well-complete database to create your optimal trading strategy.

Source Code

Extended Reading

Back
Procesing