Application of TEJ Investment Database in Quantitative Investment
Jun 15 2022
Since Markowitz proposed the Portfolio Theory, Sharpe revised the Capital Asset Pricing Model (CAPM), and Ross further developed the Arbitrage Pricing Theory (APT), scholars have gradually discovered that the characteristics of stock have a certain explanatory power to its expected rate of return, which has become a cornerstone for quantitative investment analysis. With the progress of computers and algorithms, the application of machine learning (ML) and artificial intelligence (AI) to data mining has also achieved good results, making quantitative investment analysis an important technique of the investment society. At the same time, the demand for data in the investment market has also increased simultaneously. When researchers conduct quantitative investment analysis, they often need a large amount of data to support it.
Taiwan stock market produces a lot of transaction information every day, such as price volume, credit and loan transactions, etc., and the announcement of companies, such as revenue, earnings, dividend policies, etc., are also regarded as important information. It is quite difficult to catch up with this information on a daily basis. Moreover, the quality of the data is also an issue. Although many websites in the market provide free data, and researchers can crawl the data through web crawlers, these free data are generally incomplete and wrong. Cleaning and maintaining this data every day may incur a high cost. Therefore, in order to solve the above problems and meet the needs of quantitative investment analysts, a database with complete data and high quality is necessary and that databank is TEJ Investment Database.
TEJ investment database has collected a large amount of Taiwan stock information, and researchers regularly clean and review it to maintain the quality of the information. The database content covers three different categories of databases, namely market data, financial accounting data, and corporate action event data. The database of market data includes stock price, volume, and chip data, the database of financial accounting data includes the company’s revenue and earnings data, and the corporate action event data includes the data of major decisions of the company’s management. The overall database not only has high coverage of Taiwan stock market information but also is point-in-time, which is necessary for quantitative analysis.
Table of Contents
TEJ Investment Database
The main structure of the TEJ investment database consists of three categories of databases: market data, financial accounting data, and corporate action events. The respective databases contain different types of data, which will be explained below:
Market Data It covers stock price and volume, credit and lending transactions, and transactions of institutional investors. In addition, there are attribute data, which can be used to judge the current listing status of the stock and the industry it belongs to. It can also be used to confirm whether the stock has been listed as alerting stocks, suspended from trading, or listed as a full-cash delivery stock on that day. In addition, it also includes information on stocks that have been listed on the market in the past, daily different index constituent stocks, and ETF constituent stocks. Using this data for quantitative analysis can avoid survivorship bias.
Financial Accounting Data It includes monthly revenue, financial statements audited by accountants, and company forward-looking statements that have not been audited. The early announcement of monthly revenue and the forward-looking statements could be beneficial to investors in adjusting their investment decisions. In addition, the financial reports reviewed by accountants and the company’s forward-looking statements contain three types of data, which are single-quarter, cumulative, and moving data for four seasons, which allows analysts to access according to their own needs and eliminates the need for tedious data sorting.
Corporate Action Events This category includes personnel changes of management, insider shareholding declaration and transfer, M&As, capital budgeting (including the capital increase or decrease, private placement, etc.), changes in fixed assets, dividend and treasury stock policies, and important company news. Each type of event includes its announcement date and relevant important information, which is very suitable for research on the effect of event announcement, or further study with other information. In addition, the most important feature throughout the overall database is point-in-time.
Survivorship Bias If the stock price of a listed company is excluded from the historical database due to factors such as bankruptcy, delisting, mergers, and acquisitions, or the expiration of futures contracts, we generally use the current company pool to capture historical data, and there will be survivorship bias. TEJ provides a complete listing and delisting information, allowing users to avoid survivorship bias when investing.
Availability Bias Availability bias refers to the use of future data, rather than data that can be collected at the time, during the experiment, which will lead to deviations in the experimental results. For example, if the financial statements are restated or revised in the same period last year, they are future data. If this data is used as a stock selection condition, the strategy will not be able to accurately reflect the real trading situation. TEJ exclusively provides users with financial report data before re-editing and makes it into a financial database for investment to improve the accuracy of strategy. Furthermore, the date of announcement is also included. As mentioned above, the price of the news on the announcement day is the most authentic. Therefore, the announcement day is necessary information for quantitative strategies.
Look Ahead Bias If you do not pay attention to the time point of the financial report announcement, the end date of the financial report is mistakenly used as the start point. For example, the end date of the annual financial statement is December 31 of the current year, and the financial report information is not announced until the end of March of the following year. If the date of December 31 of the current year is misused, it is look-ahead bias. In addition to providing the date of the financial report, the TEJ database also provides the date of the financial report announcement. The announcement date is used to understand the stock price reaction to avoid strategic misjudgment.
Historical Stock Price Adjustment When conducting stock analysis, whether the price information has passed the time point when the company distributes dividends and increases or decreases capital will also greatly affect the return. And to avoid the unusual fluctuation of the price on the ex-dividend day, and to compare the current price with the past price on the same benchmark, we must use the TEJ-adjusted stock price as the back testing data.
With the four key features of the above, and TEJ Database allows researchers to directly access the cleaned information, which greatly saves data the time of processing data.
Figure 1. Point-in-Time Data Features
There is a Chinese saying, that goes, “if you want to do good work, you must first sharpen your tools.” When researchers conduct quantitative investment analysis, they need a lot of data support. They must use point-in-time data to solve the major differences between quantitative data and general data to improve accuracy of their investment strategies.