TFDF, not TF-IDF in Financial Analysis

Document Type : Research Article


Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran.


Textual analysis in the realm of business depends on text-processing techniques borrowed mainly from information retrieval. Yet, these text-processing techniques are not viable in text-based financial forecasting. In this paper, we suggest developing financial home-grown techniques for processing textual data, specifically in the course of scoring words where standard techniques are not appropriate in financial analysis. On that matter, we pursue two issues. First, we examine major information retrieval heuristics, where we find TF-IDF too facile not only in predicting trends but also in generating accurate results (in terms of errors) on large numbers in text-based financial analysis. Second, we work on a new heuristic satisfying financial concerns. We consider the relationship between the publication rate of information and its importance. The proposed heuristic provides results of unmatchable performance in both predicting trends and precision measures. In an additional analysis, we optimize our scheme using a genetic algorithm as an optimization technique and get greater precision. In comparison with TF-IDF, our proposed heuristic conduces to a 38.5 percent lower error in closeness measures which is again reduced by 16.46 percent with the help of a genetic algorithm. Our findings suggest that researchers in the field of financial textual analysis should not rely on standard information retrieval heuristics.


Main Subjects

[1] Antweiler, Werner and Frank, Murray Z. Is all that talk just noise? The information content of internet stock message boards. The Journal of finance. 59(3): 1259--1294, Wiley Online Library. 2004. [DOI ]
[2] Schumaker, Robert P and Chen, Hsinchun. Textual analysis of stock market prediction using breaking financial news: The AZFin text system. ACM Transactions on Information Systems (TOIS). 27(2): 1--19, ACM New York, NY, USA. 2009. [DOI ]
[3] Arias, Marta and Arratia, Argimiro and Xuriguera, Ramon. Forecasting with twitter data. ACM Transactions on Intelligent Systems and Technology (TIST). 5(1): 1--24, ACM New York, NY, USA. 2014. [DOI ]
[4] Bollen, Johan and Mao, Huina and Zeng, Xiaojun. Twitter mood predicts the stock market. Journal of computational science. 2(1): 1--8, Elsevier. 2011. [DOI ]
[5] Ozsoylev, Han N and Walden, Johan and Yavuz, M Deniz and Bildik, Recep. Investor networks in the stock market. The Review of Financial Studies. 27(5): 1323--1366, Oxford University Press. 2014. [DOI ]
[6] Fama, Eugene F. The behavior of stock-market prices. The journal of Business. 38(1): 34--105, JSTOR. 1965.
[7] Malkiel, Burton Gordon. A random walk down Wall Street: including a life-cycle guide to personal investing. WW Norton \& Company. 1999.
[8] Baeza-Yates, Ricardo and Ribeiro-Neto, Berthier and others. Modern information retrieval. 463: ACM press New York. 1999.
[9] Loughran, Tim and McDonald, Bill. When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. The Journal of finance. 66(1): 35--65, Wiley Online Library. 2011. [DOI ]
[10] Semiromi, Hamed Naderi and Lessmann, Stefan and Peters, Wiebke. News will tell: Forecasting foreign exchange rates based on news story events in the economy calendar. The North American Journal of Economics and Finance. 52: 101181, Elsevier. 2020. [DOI ]
[11] Hashemi, Meisam and Rezaei, Mehran and Kaedi, Marjan. Textual analysis of central bank news in forecasting long-term trend of Tehran stock exchange index. Journal of Information and Communication Technology. 43(43): 119, 2020. [DOI ]
[12] Butler, Matthew and Keselj, Vlado. Financial forecasting using character n-gram analysis and readability scores of annual reports. Advances in Artificial Intelligence: 22nd Canadian Conference on Artificial Intelligence, Canadian AI 2009 Kelowna, Canada, May 25-27, 2009 Proceedings 22. 39--51, 2009. [DOI ]
[13] Mittermayer, MA. Forecasting intraday stock price trends with text mining techniques. In system sciences, 2004. proceedings of the 37th annual hawaii international conference on, pages 10--pp. IEEE. 2004. [DOI ]
[14] Seo, Young-Woo and Giampapa, Joseph A and Sycara-Cyranski, Katia. Text classification for intelligent portfolio management. Carnegie Mellon University, The Robotics Institute. 2002.
[15] Thomas, James D and Sycara, Katia. Integrating genetic algorithms and text learning for financial prediction. Data Mining with Evolutionary Algorithms. 72--75, Association for the Advancement of Artificial Intelligence Palo Alto, CA. 2000.
[16] Gidofalvi, G and Elkan, C. Using News Articles to Predict Stock Price Movements. University of California, San Diego: Department of Computer Science and Engineering. 2001.
[17] Amin-Naseri, Mohammad Reza and Gharacheh, Ehsan Ahmadi. A hybrid artificial intelligence approach to monthly forecasting of crude oil price time series. The Proceedings of the 10th International Conference on Engineering Applications of Neural Networks, CEUR-WS284. 160--167, 2007.
[18] Feuerriegel, Stefan and Gordon, Julius. Long-term stock index forecasting based on text mining of regulatory disclosures. Decision Support Systems. 112: 88--97, Elsevier. 2018.
[19] Asur, Sitaram and Huberman, Bernardo A. Predicting the future with social media. 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology. 1: 492--499, 2010. [DOI ]
[20] Tumasjan, Andranik and Sprenger, Timm and Sandner, Philipp and Welpe, Isabell. Predicting elections with twitter: What 140 characters reveal about political sentiment. Proceedings of the international AAAI conference on web and social media. 4: 178--185, 2010. [DOI ]
[21] Culotta, Aron. Towards detecting influenza epidemics by analyzing Twitter messages. Proceedings of the first workshop on social media analytics. 115--122, 2010. [DOI ]
[22] Lampos, Vasileios and De Bie, Tijl and Cristianini, Nello. Flu detector-tracking epidemics on Twitter. Joint European conference on machine learning and knowledge discovery in databases. 599--602, 2010. [DOI ]
[23] Aase, Kim-Georg. Text mining of news articles for stock price predictions. 2011.
[24] O'Connor, Brendan and Balasubramanyan, Ramnath and Routledge, Bryan and Smith, Noah. From tweets to polls: Linking text sentiment to public opinion time series. Proceedings of the international AAAI conference on web and social media. 4: 122--129, 2010. [DOI ]
[25] Wu, George Guan-Ru and Hou, Tony Chieh-Tse and Lin, Jin-Lung. Can economic news predict Taiwan stock market returns?. Asia Pacific management review. 24(1): 54--59, Elsevier. 2019. [DOI ]
[26] Ab. Rahman, Asyraf Safwan and Abdul-Rahman, Shuzlina and Mutalib, Sofianita. Mining textual terms for stock market prediction analysis using financial news. Soft Computing in Data Science: Third International Conference, SCDS 2017, Yogyakarta, Indonesia, Proceedings 3. 293--305, 2017. [DOI ]
[27] Yu, Wen-Bin and Lea, Bih-Ru and Guruswamy, Balasubramania. A Theoretic Framework Integrating Text Mining and Energy Demand Forecasting. Int. J. Electron. Bus. Manag.. 5(3): 211--224, Citeseer. 2007.
[28] Keynes, John Maynard. The General Theory of Employment, Interest and Money.. Palgrave Macmillan. 1936.
[29] Falinouss, Pegah. Stock trend prediction using news articles: a text mining approach. 2007.
[30] Fasanghari, Mehdi and Montazer, Gholam Ali. Design and implementation of fuzzy expert system for Tehran Stock Exchange portfolio recommendation. Expert Systems with Applications. 37(9): 6138--6147, Elsevier. 2010. [DOI ]
[31] Zahedi, Javad and Rounaghi, Mohammad Mahdi. Application of artificial neural network models and principal component analysis method in predicting stock prices on Tehran Stock Exchange. Physica A: Statistical Mechanics and its Applications. 438: 178--187, Elsevier. 2015. [DOI ]
[32] Shahrestani, Parnia and Rafei, Meysam. The impact of oil price shocks on Tehran Stock Exchange returns: Application of the Markov switching vector autoregressive models. Resources Policy. 65: 101579, Elsevier. 2020. [DOI ]
[33] Ramezanian, Reza and Peymanfar, Arsalan and Ebrahimi, Seyed Babak. An integrated framework of genetic network programming and multi-layer perceptron neural network for prediction of daily stock return: An application in Tehran stock exchange market. Applied soft computing. 82: 105551, Elsevier. 2019. [DOI ]
[34] Hatefi Ghahfarrokhi, Arezoo and Shamsfard, Mehrnoush. Tehran stock exchange prediction using sentiment analysis of online textual opinions. Intelligent Systems in Accounting, Finance and Management. 27(1): 22--37, Wiley Online Library. 2020. [DOI ]
[35] De Myttenaere, Arnaud and Golden, Boris and Le Grand, B{\'e}n{\'e}dicte and Rossi, Fabrice. Mean absolute percentage error for regression models. Neurocomputing. 192: 38--48, Elsevier. 2016. [DOI ]
[36] Willmott, Cort J and Matsuura, Kenji. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate research. 30(1): 79--82, 2005. [DOI ]