Skip to main content
Log in

Deep learning-based cryptocurrency sentiment construction

  • Original Article
  • Published:
Digital Finance Aims and scope Submit manuscript

Abstract

We study investor sentiment on a non-classical asset such as cryptocurrency using machine learning methods. We account for context-specific information and word similarity using efficient language modeling tools such as construction of featurized word representations (embeddings) and recursive neural networks. We apply these tools for sentence-level sentiment classification and sentiment index construction. This analysis is performed on a novel dataset of 1220K messages related to 425 cryptocurrencies posted on a microblogging platform StockTwits during the period between March 2013 and May 2018. Both in- and out-of-sample predictive regressions are run to test significance of the constructed sentiment index variables. We find that the constructed sentiment indices are informative regarding returns and volatility predictability of the cryptocurrency market index.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://stocktwits.com/.

  2. This list can be found at https://api.stocktwits.com/symbol-sync/symbols.csv.

  3. Reddit is a generic message board; a message board dedicated only to financial markets, covers a wider number of topics related to cryptocurrencies including discussions about cryptocurrency technology such as the blockchain.

References

  • Aboody, D., Even-Tov, O., Lehavy, R., & Trueman, B. (2018). Overnight returns and firm-specific investor sentiment. Journal of Financial and Quantitative Analysis, 53(2), 485–505.

    Article  Google Scholar 

  • Antweiler, W., & Frank, M. Z. (2004). Is all that talk just noise? The information content of internet stock message boards. The Journal of Finance, 59(3), 1259–1294. https://doi.org/10.1111/j.1540-6261.2004.00662.x.

    Article  Google Scholar 

  • Avery, C. N., Chevalier, J. A., & Zeckhauser, R. J. (2016). The "CAPS" prediction system and stock market returns. Review of Finance, 20(4), 1363–1381. https://doi.org/10.1093/rof/rfv043.

    Article  Google Scholar 

  • Brody, S., & Diakopoulos, N. (2011). Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: Using word lengthening to detect sentiment in microblogs. In Proceedings of the conference on empirical methods in natural language processing, EMNLP ’11, Association for Computational Linguistics, Stroudsburg, PA, USA (pp. 562–570). http://dl.acm.org/citation.cfm?id=2145432.2145498

  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2011). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.

    Article  Google Scholar 

  • Chen, C. Y., Després, R., Guo, L., & Renault, T. (2018). What makes cryptocurrencies special? Investor sentiment and price predictability in the absence of fundamental value, Sfb 649 discussion paper. Berlin.

  • Cheuque Cerda, G., L. Reutter, J. (2019). Bitcoin price prediction through opinion mining. In Companion proceedings of the 2019 world wide web conference, Association for Computing Machinery (pp. 755–762).

  • Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:org/abs/1406.1078

  • Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2019). Transformer-XL: Attentive language models beyond a fixed-length context. arXiv:org/abs/1901.02860

  • Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.

    Article  Google Scholar 

  • Detzel, A.L., Liu, H., Strauss, J., Zhou, G., & Zhu, Y. (2018). Bitcoin: predictability and profitability via technical analysis. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3115846

  • Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv.org/abs/1810.04805

  • Engle, R. F., & Bollerslev, T. (1986). Modelling the persistence of conditional variances. Econometric Reviews, 5(1), 1–50.

    Article  Google Scholar 

  • Feldman, R., Govindaraj, S., Livnat, J., & Segal, B. (2010). Management’s tone change, post earnings announcement drift and accruals. Review of Accounting Studies, 15(4), 915–953.

    Article  Google Scholar 

  • Gal, Y., & Ghahramani, Z. (2015). A theoretically grounded application of dropout in recurrent neural networks. https://arxiv.org/abs/1512.05287

  • Goldberg, Y., & Levy, O. (2014). Word2Vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv.org/abs/1402.3722

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

    Article  Google Scholar 

  • Kim, S.-H., & Kim, D. (2014). Investor sentiment from internet message postings and the predictability of stock returns. Journal of Economic Behavior & Organization, 107, 708–729. (Empirical Behavioral Finance).

  • Lerman, A., & Livnat, J. (2010). The new form 8-K disclosures. Review of Accounting Studies, 15(4), 752–778.

    Article  Google Scholar 

  • Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-ks. The Journal of Finance, 66(1), 35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x.

    Article  Google Scholar 

  • Mai, F., Shan, Z., Bai, Q., Wang, X., & Chiang, R. H. (2018). How does social media impact Bitcoin value? A test of the silent majority hypothesis. Journal of Management Information Systems, 35(1), 19–52.

    Article  Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv.org/abs/1301.3781

  • Nassirtoussi, A. K., Aghabozorgia, S., Wah, T. Y., & Ngo, D. C. L. (2015). Text mining of news-headlines for FOREX market prediction: A multi-layer dimension reduction algorithm with semantics and sentiment. Expert Systems with Applications, 42(1), 306–324.

    Article  Google Scholar 

  • Oliveira, N., Cortez, P., & Areal, N. (2016). Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decision Support Systems, 85, 62–73.

    Article  Google Scholar 

  • Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing—volume 10, EMNLP ’02, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 79–86.

  • Persio, L. D., & Honchar, O. (2016). Artificial neural networks architectures for stock price prediction: Comparisons and applications. International Journal of Circuits, Systems and Signal Processing, 10, 403–413.

    Google Scholar 

  • Plakandaras, V., Papadimitriou, T., Gogas, P., & Diamantaras, K. (2015). Market sentiment and exchange rate directional forecasting. Algorithmic Finance, 4(1–2), 69–79.

    Article  Google Scholar 

  • Renault, T. (2017). Intraday online investor sentiment and return patterns in the U.S. stock market. Journal of Banking & Finance, 84, 25–40.

    Article  Google Scholar 

  • Spiegel, M. (2008). Forecasting the equity premium: Where we stand today. The Review of Financial Studies, 21(4), 1453–1454.

    Article  Google Scholar 

  • Tetlock, P. C., Saar-Tsechansky, M., & Macskassy, S. (2008). More than words: quantifying language to measure firms’ fundamentals. The Journal of Finance, 63(3), 1437–1467.

    Article  Google Scholar 

  • Trimborn, S., & Härdle, W. K. (2018). CRIX an index for cryptocurrencies. Journal of Empirical Finance, 49, 107–122.

    Article  Google Scholar 

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. arXiv.org/abs/1706.03762

  • Welch, I., & Goyal, A. (2007). A comprehensive look at the empirical performance of equity premium prediction. The Review of Financial Studies, 21(4), 1455–1508.

    Article  Google Scholar 

  • White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica: Journal of the Econometric Society, 1–25.

  • Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le, Q. V. (2019). XLNet: Generalized autoregressive pretraining for language understanding. arXiv.org/abs/1906.08237

  • Yu, J. (2013). A sentiment-based explanation of the forward premium puzzle. Journal of Monetary Economics, 60(4), 474–491.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergey Nasekin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Backpropagation through time (BPTT) for LSTM

We predict the probability of a bullish message \(y_T\) as \(\widehat{y}_T = \sigma (W_h h_T + b_h)\), where \(h_T = o_T \odot \tanh (c_T)\), according to the above. A suitable loss function is the binary cross-entropy loss over samples n:

$$\begin{aligned} E_T = - \sum _n y^{(n)}_t \log (\widehat{y}^{(n)}_T) + (1 - y^{(n)}_t) \log (1 - \widehat{y}^{(n)}_T). \end{aligned}$$
(26)

In the current case we deal with the many-to-one LSTM architecture: we predict a single value \(\widehat{y}_T\) for each sequence, so the backward pass is simplified to the last time point T only compared to the backward pass for a many-to-many architecture.

We need to update \(W_o\), \(W_i\), \(W_f\), \(W_c\), \(W_h\), \(U_o\), \(U_i\), \(U_f\), \(U_c\), \(b_o\), \(b_i\), \(b_f\), \(b_c\), \(b_h\) via a gradient descent approach by updating their gradients along the input sequence \(x_1, \ldots , x_T\). We can start by deriving the backward pass equations for sample n and then apply stochastic gradient descent to update the parameters for all n.

Using equations (1)–(6), the backward pass equations are as follows:

$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial W_o}&= \sum _t \delta o_t \odot o_t \odot (1 - o_t) \otimes x_t, \end{aligned}$$
(27)
$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial W_i}&= \sum _t \delta i_t \odot i_t \odot (1 - i_t) \otimes x_t, \end{aligned}$$
(28)
$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial W_f}&= \sum _t \delta f_t \odot f_t \odot (1 - f_t) \otimes x_t, \end{aligned}$$
(29)
$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial W_c}&= \sum _t \delta \widetilde{c}_t \odot (1 - \widetilde{c}_t^2) \otimes x_t, \end{aligned}$$
(30)
$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial U_o}&= \sum _t \delta o_t \odot o_t \odot (1 - o_t) \otimes h_{t-1}, \end{aligned}$$
(31)
$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial U_i}&= \sum _t \delta i_t \odot i_t \odot (1 - i_t) \otimes h_{t-1}, \end{aligned}$$
(32)
$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial U_f}&= \sum _t \delta f_t \odot f_t \odot (1 - f_t) \otimes h_{t-1}, \end{aligned}$$
(33)
$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial U_c}&= \sum _t \delta \widetilde{c}_t \odot (1 - \widetilde{c}_t^2) \otimes h_{t-1}, \end{aligned}$$
(34)
$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial W_h}&= h_T \cdot (\widehat{y}_{T} - y_{T}). \end{aligned}$$
(35)

where \(\delta o_t = \partial E_T^{(n)} / \partial o_t\), \(\delta i_t = \partial E_T^{(n)} / \partial i_t\), \(\delta f_t = \partial E_T^{(n)} / \partial f_t\), \(\delta \widetilde{c}_t = \partial E_T^{(n)} / \partial \widetilde{c}_t\) are found as follows using the chain rule:

$$\begin{aligned} \delta o_t&= \delta h_t \odot \tanh (c_t), \\ \delta i_t&= \delta c_t \odot \widetilde{c}_t, \\ \delta f_t&= \delta c_t \odot c_{t-1}, \\ \delta \widetilde{c}_t&= \delta c_t \odot i_t. \end{aligned}$$

In the above, we used the facts that \(\partial \tanh (x) / \partial x = 1 - \tanh ^2(x)\) and \(\partial \sigma (x) / \partial x = \sigma (x) (1 - \sigma (x))\). Furthermore, \(\partial E_T^{(n)} / \partial b_h = \widehat{y}_{T} - y_{T}\); gradients for \(b_o\), \(b_i\), \(b_f\), \(b_c\), \(b_h\) are obtained by removing outer products with \(x_t\) and \(h_{t-1}\) from Eqs. (27)–(34), respectively.

Finally, derivatives with respect to the memory state \(c_t\), the hidden state \(h_t\) and the input \(x_t\) are found as follows:

$$\begin{aligned} \delta c_t&= \delta h_t \odot o_t \odot (1 - \tanh ^2 (c_t)), \end{aligned}$$
(36)
$$\begin{aligned} \delta c_t&= \delta c_t + f_{t+1} \odot \delta c_{t+1}, \end{aligned}$$
(37)
$$\begin{aligned} \delta h_{t-1}&= U_f^{\top } \cdot \delta f_t + U_i^{\top } \cdot \delta i_t + U_o^{\top } \cdot \delta o_t + U_c^{\top } \cdot \delta \widetilde{c}_t, \end{aligned}$$
(38)
$$\begin{aligned} \delta h_{t-1}&= \delta h_{t-1} + W_h \cdot (\widehat{y}_{t-1} - y_{t-1}), \end{aligned}$$
(39)
$$\begin{aligned} \delta x_t&= W_f^{\top } \cdot \delta f_t + W_i^{\top } \cdot \delta i_t + W_o^{\top } \cdot \delta o_t + W_c^{\top } \cdot \delta \widetilde{c}_t. \end{aligned}$$
(40)

Subtle details of LSTM backward pass implementation are contained in (37) and (39). Unlike GRU, in LSTM framework gradients are propagated via both hidden h and memory c channels; (37) reflects the accumulation of gradients via the memory state which is responsible for the transfer of "memory" along the sequence x. In a more general many-to-many framework, (37) follows from the chain rule for a function of several variables; for the error function \(E^{(n)} = \sum _t E_t^{(n)}\) along all t:

$$\begin{aligned} \frac{\partial E^{(n)}}{\partial c_t}&= \frac{\partial E_t^{(n)}}{\partial c_t} + \frac{\partial E_{t+1}^{(n)}}{\partial c_t}, \\&= \frac{\partial E_t^{(n)}}{\partial c_t} + \frac{\partial E_{t+1}^{(n)}}{\partial h_{t+1}} \frac{\partial h_{t+1}}{\partial c_{t+1}} \frac{\partial c_{t+1}}{\partial c_{t}}, \\&= \delta c_t + \delta c_{t+1} \odot f_{t+1}. \end{aligned}$$

In fact, the first term disappears in the many-to-one architecture and only the second term from the last step \(f_{T} \odot \delta c_{T}\) is propagated backwards from end to start of x.

Furthermore, (39) describes propagation of gradients via the hidden state. At each point t, both inputs from (38) and loss function (26) are taken. To arrive at \(\delta h_{t}\), inputs from the current loss and next-step \(\delta h_{t-1}\) are added together. Clearly, in the many-to-one architecture, \(\delta h_t = \delta h_{t-1}\) for all t except the last T and \(\delta h_T = W_h \cdot (\widehat{y}_{T} - y_{T})\).

In a deep LSTM network, the gradient \(\delta x_t\) will be used to propagate errors down to lower hidden layers. Thus, extension to networks with more one one layer such as one shown in Fig. 1, is straightforward.

Finally, we can apply a gradient descent algorithm such as \(\theta = \theta - \alpha \delta \theta\) with a learning rate \(\alpha\) and \(\delta \theta = \left\{ \delta W_o, \delta W_i, \delta W_f, \delta W_c, \delta W_h, \delta U_o, \delta U_i, \delta U_f, \delta U_c, \delta b_o, \delta b_i, \delta b_f, \delta b_c, \delta b_h \right\}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nasekin, S., Chen, C.YH. Deep learning-based cryptocurrency sentiment construction. Digit Finance 2, 39–67 (2020). https://doi.org/10.1007/s42521-020-00018-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42521-020-00018-y

Keywords

JEL Classification

Navigation