Abstract
We study investor sentiment on a non-classical asset such as cryptocurrency using machine learning methods. We account for context-specific information and word similarity using efficient language modeling tools such as construction of featurized word representations (embeddings) and recursive neural networks. We apply these tools for sentence-level sentiment classification and sentiment index construction. This analysis is performed on a novel dataset of 1220K messages related to 425 cryptocurrencies posted on a microblogging platform StockTwits during the period between March 2013 and May 2018. Both in- and out-of-sample predictive regressions are run to test significance of the constructed sentiment index variables. We find that the constructed sentiment indices are informative regarding returns and volatility predictability of the cryptocurrency market index.
Similar content being viewed by others
Notes
This list can be found at https://api.stocktwits.com/symbol-sync/symbols.csv.
Reddit is a generic message board; a message board dedicated only to financial markets, covers a wider number of topics related to cryptocurrencies including discussions about cryptocurrency technology such as the blockchain.
References
Aboody, D., Even-Tov, O., Lehavy, R., & Trueman, B. (2018). Overnight returns and firm-specific investor sentiment. Journal of Financial and Quantitative Analysis, 53(2), 485–505.
Antweiler, W., & Frank, M. Z. (2004). Is all that talk just noise? The information content of internet stock message boards. The Journal of Finance, 59(3), 1259–1294. https://doi.org/10.1111/j.1540-6261.2004.00662.x.
Avery, C. N., Chevalier, J. A., & Zeckhauser, R. J. (2016). The "CAPS" prediction system and stock market returns. Review of Finance, 20(4), 1363–1381. https://doi.org/10.1093/rof/rfv043.
Brody, S., & Diakopoulos, N. (2011). Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: Using word lengthening to detect sentiment in microblogs. In Proceedings of the conference on empirical methods in natural language processing, EMNLP ’11, Association for Computational Linguistics, Stroudsburg, PA, USA (pp. 562–570). http://dl.acm.org/citation.cfm?id=2145432.2145498
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2011). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
Chen, C. Y., Després, R., Guo, L., & Renault, T. (2018). What makes cryptocurrencies special? Investor sentiment and price predictability in the absence of fundamental value, Sfb 649 discussion paper. Berlin.
Cheuque Cerda, G., L. Reutter, J. (2019). Bitcoin price prediction through opinion mining. In Companion proceedings of the 2019 world wide web conference, Association for Computing Machinery (pp. 755–762).
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:org/abs/1406.1078
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2019). Transformer-XL: Attentive language models beyond a fixed-length context. arXiv:org/abs/1901.02860
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.
Detzel, A.L., Liu, H., Strauss, J., Zhou, G., & Zhu, Y. (2018). Bitcoin: predictability and profitability via technical analysis. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3115846
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv.org/abs/1810.04805
Engle, R. F., & Bollerslev, T. (1986). Modelling the persistence of conditional variances. Econometric Reviews, 5(1), 1–50.
Feldman, R., Govindaraj, S., Livnat, J., & Segal, B. (2010). Management’s tone change, post earnings announcement drift and accruals. Review of Accounting Studies, 15(4), 915–953.
Gal, Y., & Ghahramani, Z. (2015). A theoretically grounded application of dropout in recurrent neural networks. https://arxiv.org/abs/1512.05287
Goldberg, Y., & Levy, O. (2014). Word2Vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv.org/abs/1402.3722
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Kim, S.-H., & Kim, D. (2014). Investor sentiment from internet message postings and the predictability of stock returns. Journal of Economic Behavior & Organization, 107, 708–729. (Empirical Behavioral Finance).
Lerman, A., & Livnat, J. (2010). The new form 8-K disclosures. Review of Accounting Studies, 15(4), 752–778.
Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-ks. The Journal of Finance, 66(1), 35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x.
Mai, F., Shan, Z., Bai, Q., Wang, X., & Chiang, R. H. (2018). How does social media impact Bitcoin value? A test of the silent majority hypothesis. Journal of Management Information Systems, 35(1), 19–52.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv.org/abs/1301.3781
Nassirtoussi, A. K., Aghabozorgia, S., Wah, T. Y., & Ngo, D. C. L. (2015). Text mining of news-headlines for FOREX market prediction: A multi-layer dimension reduction algorithm with semantics and sentiment. Expert Systems with Applications, 42(1), 306–324.
Oliveira, N., Cortez, P., & Areal, N. (2016). Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decision Support Systems, 85, 62–73.
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing—volume 10, EMNLP ’02, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 79–86.
Persio, L. D., & Honchar, O. (2016). Artificial neural networks architectures for stock price prediction: Comparisons and applications. International Journal of Circuits, Systems and Signal Processing, 10, 403–413.
Plakandaras, V., Papadimitriou, T., Gogas, P., & Diamantaras, K. (2015). Market sentiment and exchange rate directional forecasting. Algorithmic Finance, 4(1–2), 69–79.
Renault, T. (2017). Intraday online investor sentiment and return patterns in the U.S. stock market. Journal of Banking & Finance, 84, 25–40.
Spiegel, M. (2008). Forecasting the equity premium: Where we stand today. The Review of Financial Studies, 21(4), 1453–1454.
Tetlock, P. C., Saar-Tsechansky, M., & Macskassy, S. (2008). More than words: quantifying language to measure firms’ fundamentals. The Journal of Finance, 63(3), 1437–1467.
Trimborn, S., & Härdle, W. K. (2018). CRIX an index for cryptocurrencies. Journal of Empirical Finance, 49, 107–122.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. arXiv.org/abs/1706.03762
Welch, I., & Goyal, A. (2007). A comprehensive look at the empirical performance of equity premium prediction. The Review of Financial Studies, 21(4), 1455–1508.
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica: Journal of the Econometric Society, 1–25.
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le, Q. V. (2019). XLNet: Generalized autoregressive pretraining for language understanding. arXiv.org/abs/1906.08237
Yu, J. (2013). A sentiment-based explanation of the forward premium puzzle. Journal of Monetary Economics, 60(4), 474–491.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Backpropagation through time (BPTT) for LSTM
We predict the probability of a bullish message \(y_T\) as \(\widehat{y}_T = \sigma (W_h h_T + b_h)\), where \(h_T = o_T \odot \tanh (c_T)\), according to the above. A suitable loss function is the binary cross-entropy loss over samples n:
In the current case we deal with the many-to-one LSTM architecture: we predict a single value \(\widehat{y}_T\) for each sequence, so the backward pass is simplified to the last time point T only compared to the backward pass for a many-to-many architecture.
We need to update \(W_o\), \(W_i\), \(W_f\), \(W_c\), \(W_h\), \(U_o\), \(U_i\), \(U_f\), \(U_c\), \(b_o\), \(b_i\), \(b_f\), \(b_c\), \(b_h\) via a gradient descent approach by updating their gradients along the input sequence \(x_1, \ldots , x_T\). We can start by deriving the backward pass equations for sample n and then apply stochastic gradient descent to update the parameters for all n.
Using equations (1)–(6), the backward pass equations are as follows:
where \(\delta o_t = \partial E_T^{(n)} / \partial o_t\), \(\delta i_t = \partial E_T^{(n)} / \partial i_t\), \(\delta f_t = \partial E_T^{(n)} / \partial f_t\), \(\delta \widetilde{c}_t = \partial E_T^{(n)} / \partial \widetilde{c}_t\) are found as follows using the chain rule:
In the above, we used the facts that \(\partial \tanh (x) / \partial x = 1 - \tanh ^2(x)\) and \(\partial \sigma (x) / \partial x = \sigma (x) (1 - \sigma (x))\). Furthermore, \(\partial E_T^{(n)} / \partial b_h = \widehat{y}_{T} - y_{T}\); gradients for \(b_o\), \(b_i\), \(b_f\), \(b_c\), \(b_h\) are obtained by removing outer products with \(x_t\) and \(h_{t-1}\) from Eqs. (27)–(34), respectively.
Finally, derivatives with respect to the memory state \(c_t\), the hidden state \(h_t\) and the input \(x_t\) are found as follows:
Subtle details of LSTM backward pass implementation are contained in (37) and (39). Unlike GRU, in LSTM framework gradients are propagated via both hidden h and memory c channels; (37) reflects the accumulation of gradients via the memory state which is responsible for the transfer of "memory" along the sequence x. In a more general many-to-many framework, (37) follows from the chain rule for a function of several variables; for the error function \(E^{(n)} = \sum _t E_t^{(n)}\) along all t:
In fact, the first term disappears in the many-to-one architecture and only the second term from the last step \(f_{T} \odot \delta c_{T}\) is propagated backwards from end to start of x.
Furthermore, (39) describes propagation of gradients via the hidden state. At each point t, both inputs from (38) and loss function (26) are taken. To arrive at \(\delta h_{t}\), inputs from the current loss and next-step \(\delta h_{t-1}\) are added together. Clearly, in the many-to-one architecture, \(\delta h_t = \delta h_{t-1}\) for all t except the last T and \(\delta h_T = W_h \cdot (\widehat{y}_{T} - y_{T})\).
In a deep LSTM network, the gradient \(\delta x_t\) will be used to propagate errors down to lower hidden layers. Thus, extension to networks with more one one layer such as one shown in Fig. 1, is straightforward.
Finally, we can apply a gradient descent algorithm such as \(\theta = \theta - \alpha \delta \theta\) with a learning rate \(\alpha\) and \(\delta \theta = \left\{ \delta W_o, \delta W_i, \delta W_f, \delta W_c, \delta W_h, \delta U_o, \delta U_i, \delta U_f, \delta U_c, \delta b_o, \delta b_i, \delta b_f, \delta b_c, \delta b_h \right\}\).
Rights and permissions
About this article
Cite this article
Nasekin, S., Chen, C.YH. Deep learning-based cryptocurrency sentiment construction. Digit Finance 2, 39–67 (2020). https://doi.org/10.1007/s42521-020-00018-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42521-020-00018-y