Deep learning-based cryptocurrency sentiment construction

2253 Accesses
18 Citations
Explore all metrics

Abstract

We study investor sentiment on a non-classical asset such as cryptocurrency using machine learning methods. We account for context-specific information and word similarity using efficient language modeling tools such as construction of featurized word representations (embeddings) and recursive neural networks. We apply these tools for sentence-level sentiment classification and sentiment index construction. This analysis is performed on a novel dataset of 1220K messages related to 425 cryptocurrencies posted on a microblogging platform StockTwits during the period between March 2013 and May 2018. Both in- and out-of-sample predictive regressions are run to test significance of the constructed sentiment index variables. We find that the constructed sentiment indices are informative regarding returns and volatility predictability of the cryptocurrency market index.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dataset on sentiment-based cryptocurrency-related news and tweets in English and Malay language

Article 20 May 2024

An empirical cryptocurrency price forecasting model

Article 01 July 2024

Tweet Based Sentiment Analysis for Stock Price Prediction

Notes

https://stocktwits.com/.
This list can be found at https://api.stocktwits.com/symbol-sync/symbols.csv.
Reddit is a generic message board; a message board dedicated only to financial markets, covers a wider number of topics related to cryptocurrencies including discussions about cryptocurrency technology such as the blockchain.

References

Aboody, D., Even-Tov, O., Lehavy, R., & Trueman, B. (2018). Overnight returns and firm-specific investor sentiment. Journal of Financial and Quantitative Analysis, 53(2), 485–505.
Article Google Scholar
Antweiler, W., & Frank, M. Z. (2004). Is all that talk just noise? The information content of internet stock message boards. The Journal of Finance, 59(3), 1259–1294. https://doi.org/10.1111/j.1540-6261.2004.00662.x.
Article Google Scholar
Avery, C. N., Chevalier, J. A., & Zeckhauser, R. J. (2016). The "CAPS" prediction system and stock market returns. Review of Finance, 20(4), 1363–1381. https://doi.org/10.1093/rof/rfv043.
Article Google Scholar
Brody, S., & Diakopoulos, N. (2011). Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: Using word lengthening to detect sentiment in microblogs. In Proceedings of the conference on empirical methods in natural language processing, EMNLP ’11, Association for Computational Linguistics, Stroudsburg, PA, USA (pp. 562–570). http://dl.acm.org/citation.cfm?id=2145432.2145498
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2011). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
Article Google Scholar
Chen, C. Y., Després, R., Guo, L., & Renault, T. (2018). What makes cryptocurrencies special? Investor sentiment and price predictability in the absence of fundamental value, Sfb 649 discussion paper. Berlin.
Cheuque Cerda, G., L. Reutter, J. (2019). Bitcoin price prediction through opinion mining. In Companion proceedings of the 2019 world wide web conference, Association for Computing Machinery (pp. 755–762).
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:org/abs/1406.1078
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2019). Transformer-XL: Attentive language models beyond a fixed-length context. arXiv:org/abs/1901.02860
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.
Article Google Scholar
Detzel, A.L., Liu, H., Strauss, J., Zhou, G., & Zhu, Y. (2018). Bitcoin: predictability and profitability via technical analysis. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3115846
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv.org/abs/1810.04805
Engle, R. F., & Bollerslev, T. (1986). Modelling the persistence of conditional variances. Econometric Reviews, 5(1), 1–50.
Article Google Scholar
Feldman, R., Govindaraj, S., Livnat, J., & Segal, B. (2010). Management’s tone change, post earnings announcement drift and accruals. Review of Accounting Studies, 15(4), 915–953.
Article Google Scholar
Gal, Y., & Ghahramani, Z. (2015). A theoretically grounded application of dropout in recurrent neural networks. https://arxiv.org/abs/1512.05287
Goldberg, Y., & Levy, O. (2014). Word2Vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv.org/abs/1402.3722
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Article Google Scholar
Kim, S.-H., & Kim, D. (2014). Investor sentiment from internet message postings and the predictability of stock returns. Journal of Economic Behavior & Organization, 107, 708–729. (Empirical Behavioral Finance).
Lerman, A., & Livnat, J. (2010). The new form 8-K disclosures. Review of Accounting Studies, 15(4), 752–778.
Article Google Scholar
Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-ks. The Journal of Finance, 66(1), 35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x.
Article Google Scholar
Mai, F., Shan, Z., Bai, Q., Wang, X., & Chiang, R. H. (2018). How does social media impact Bitcoin value? A test of the silent majority hypothesis. Journal of Management Information Systems, 35(1), 19–52.
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv.org/abs/1301.3781
Nassirtoussi, A. K., Aghabozorgia, S., Wah, T. Y., & Ngo, D. C. L. (2015). Text mining of news-headlines for FOREX market prediction: A multi-layer dimension reduction algorithm with semantics and sentiment. Expert Systems with Applications, 42(1), 306–324.
Article Google Scholar
Oliveira, N., Cortez, P., & Areal, N. (2016). Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decision Support Systems, 85, 62–73.
Article Google Scholar
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing—volume 10, EMNLP ’02, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 79–86.
Persio, L. D., & Honchar, O. (2016). Artificial neural networks architectures for stock price prediction: Comparisons and applications. International Journal of Circuits, Systems and Signal Processing, 10, 403–413.
Google Scholar
Plakandaras, V., Papadimitriou, T., Gogas, P., & Diamantaras, K. (2015). Market sentiment and exchange rate directional forecasting. Algorithmic Finance, 4(1–2), 69–79.
Article Google Scholar
Renault, T. (2017). Intraday online investor sentiment and return patterns in the U.S. stock market. Journal of Banking & Finance, 84, 25–40.
Article Google Scholar
Spiegel, M. (2008). Forecasting the equity premium: Where we stand today. The Review of Financial Studies, 21(4), 1453–1454.
Article Google Scholar
Tetlock, P. C., Saar-Tsechansky, M., & Macskassy, S. (2008). More than words: quantifying language to measure firms’ fundamentals. The Journal of Finance, 63(3), 1437–1467.
Article Google Scholar
Trimborn, S., & Härdle, W. K. (2018). CRIX an index for cryptocurrencies. Journal of Empirical Finance, 49, 107–122.
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. arXiv.org/abs/1706.03762
Welch, I., & Goyal, A. (2007). A comprehensive look at the empirical performance of equity premium prediction. The Review of Financial Studies, 21(4), 1455–1508.
Article Google Scholar
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica: Journal of the Econometric Society, 1–25.
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le, Q. V. (2019). XLNet: Generalized autoregressive pretraining for language understanding. arXiv.org/abs/1906.08237
Yu, J. (2013). A sentiment-based explanation of the forward premium puzzle. Journal of Monetary Economics, 60(4), 474–491.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Deutsche Bank AG, Frankfurt, Germany
Sergey Nasekin
Adam Smith Business School, University of Glasgow, Glasgow, UK
Cathy Yi-Hsuan Chen

Authors

Sergey Nasekin
View author publications
You can also search for this author in PubMed Google Scholar
Cathy Yi-Hsuan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergey Nasekin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Backpropagation through time (BPTT) for LSTM

We predict the probability of a bullish message $y_T$ as $\widehat{y}_T = \sigma (W_h h_T + b_h)$, where $h_T = o_T \odot \tanh (c_T)$, according to the above. A suitable loss function is the binary cross-entropy loss over samples n:

$$\begin{aligned} E_T = - \sum _n y^{(n)}_t \log (\widehat{y}^{(n)}_T) + (1 - y^{(n)}_t) \log (1 - \widehat{y}^{(n)}_T). \end{aligned}$$

(26)

In the current case we deal with the many-to-one LSTM architecture: we predict a single value $\widehat{y}_T$ for each sequence, so the backward pass is simplified to the last time point T only compared to the backward pass for a many-to-many architecture.

We need to update $W_o$, $W_i$, $W_f$, $W_c$, $W_h$, $U_o$, $U_i$, $U_f$, $U_c$, $b_o$, $b_i$, $b_f$, $b_c$, $b_h$ via a gradient descent approach by updating their gradients along the input sequence $x_1, \ldots , x_T$. We can start by deriving the backward pass equations for sample n and then apply stochastic gradient descent to update the parameters for all n.

Using equations (1)–(6), the backward pass equations are as follows:

$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial W_o}&= \sum _t \delta o_t \odot o_t \odot (1 - o_t) \otimes x_t, \end{aligned}$$

(27)

$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial W_i}&= \sum _t \delta i_t \odot i_t \odot (1 - i_t) \otimes x_t, \end{aligned}$$

(28)

$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial W_f}&= \sum _t \delta f_t \odot f_t \odot (1 - f_t) \otimes x_t, \end{aligned}$$

(29)

$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial W_c}&= \sum _t \delta \widetilde{c}_t \odot (1 - \widetilde{c}_t^2) \otimes x_t, \end{aligned}$$

(30)

$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial U_o}&= \sum _t \delta o_t \odot o_t \odot (1 - o_t) \otimes h_{t-1}, \end{aligned}$$

(31)

$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial U_i}&= \sum _t \delta i_t \odot i_t \odot (1 - i_t) \otimes h_{t-1}, \end{aligned}$$

(32)

$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial U_f}&= \sum _t \delta f_t \odot f_t \odot (1 - f_t) \otimes h_{t-1}, \end{aligned}$$

(33)

$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial U_c}&= \sum _t \delta \widetilde{c}_t \odot (1 - \widetilde{c}_t^2) \otimes h_{t-1}, \end{aligned}$$

(34)

$$\begin{aligned} \frac{\partial E_T^{(n)}}{\partial W_h}&= h_T \cdot (\widehat{y}_{T} - y_{T}). \end{aligned}$$

(35)

where $\delta o_t = \partial E_T^{(n)} / \partial o_t$, $\delta i_t = \partial E_T^{(n)} / \partial i_t$, $\delta f_t = \partial E_T^{(n)} / \partial f_t$, $\delta \widetilde{c}_t = \partial E_T^{(n)} / \partial \widetilde{c}_t$ are found as follows using the chain rule:

$$\begin{aligned} \delta o_t&= \delta h_t \odot \tanh (c_t), \\ \delta i_t&= \delta c_t \odot \widetilde{c}_t, \\ \delta f_t&= \delta c_t \odot c_{t-1}, \\ \delta \widetilde{c}_t&= \delta c_t \odot i_t. \end{aligned}$$

In the above, we used the facts that $\partial \tanh (x) / \partial x = 1 - \tanh ^2(x)$ and $\partial \sigma (x) / \partial x = \sigma (x) (1 - \sigma (x))$. Furthermore, $\partial E_T^{(n)} / \partial b_h = \widehat{y}_{T} - y_{T}$; gradients for $b_o$, $b_i$, $b_f$, $b_c$, $b_h$ are obtained by removing outer products with $x_t$ and $h_{t-1}$ from Eqs. (27)–(34), respectively.

Finally, derivatives with respect to the memory state $c_t$, the hidden state $h_t$ and the input $x_t$ are found as follows:

$$\begin{aligned} \delta c_t&= \delta h_t \odot o_t \odot (1 - \tanh ^2 (c_t)), \end{aligned}$$

(36)

$$\begin{aligned} \delta c_t&= \delta c_t + f_{t+1} \odot \delta c_{t+1}, \end{aligned}$$

(37)

$$\begin{aligned} \delta h_{t-1}&= U_f^{\top } \cdot \delta f_t + U_i^{\top } \cdot \delta i_t + U_o^{\top } \cdot \delta o_t + U_c^{\top } \cdot \delta \widetilde{c}_t, \end{aligned}$$

(38)

$$\begin{aligned} \delta h_{t-1}&= \delta h_{t-1} + W_h \cdot (\widehat{y}_{t-1} - y_{t-1}), \end{aligned}$$

(39)

$$\begin{aligned} \delta x_t&= W_f^{\top } \cdot \delta f_t + W_i^{\top } \cdot \delta i_t + W_o^{\top } \cdot \delta o_t + W_c^{\top } \cdot \delta \widetilde{c}_t. \end{aligned}$$

(40)

Subtle details of LSTM backward pass implementation are contained in (37) and (39). Unlike GRU, in LSTM framework gradients are propagated via both hidden h and memory c channels; (37) reflects the accumulation of gradients via the memory state which is responsible for the transfer of "memory" along the sequence x. In a more general many-to-many framework, (37) follows from the chain rule for a function of several variables; for the error function $E^{(n)} = \sum _t E_t^{(n)}$ along all t:

$$\begin{aligned} \frac{\partial E^{(n)}}{\partial c_t}&= \frac{\partial E_t^{(n)}}{\partial c_t} + \frac{\partial E_{t+1}^{(n)}}{\partial c_t}, \\&= \frac{\partial E_t^{(n)}}{\partial c_t} + \frac{\partial E_{t+1}^{(n)}}{\partial h_{t+1}} \frac{\partial h_{t+1}}{\partial c_{t+1}} \frac{\partial c_{t+1}}{\partial c_{t}}, \\&= \delta c_t + \delta c_{t+1} \odot f_{t+1}. \end{aligned}$$

In fact, the first term disappears in the many-to-one architecture and only the second term from the last step $f_{T} \odot \delta c_{T}$ is propagated backwards from end to start of x.

Furthermore, (39) describes propagation of gradients via the hidden state. At each point t, both inputs from (38) and loss function (26) are taken. To arrive at $\delta h_{t}$, inputs from the current loss and next-step $\delta h_{t-1}$ are added together. Clearly, in the many-to-one architecture, $\delta h_t = \delta h_{t-1}$ for all t except the last T and $\delta h_T = W_h \cdot (\widehat{y}_{T} - y_{T})$.

In a deep LSTM network, the gradient $\delta x_t$ will be used to propagate errors down to lower hidden layers. Thus, extension to networks with more one one layer such as one shown in Fig. 1, is straightforward.

Finally, we can apply a gradient descent algorithm such as $\theta = \theta - \alpha \delta \theta$ with a learning rate $\alpha$ and $\delta \theta = \left\{ \delta W_o, \delta W_i, \delta W_f, \delta W_c, \delta W_h, \delta U_o, \delta U_i, \delta U_f, \delta U_c, \delta b_o, \delta b_i, \delta b_f, \delta b_c, \delta b_h \right\}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nasekin, S., Chen, C.YH. Deep learning-based cryptocurrency sentiment construction. Digit Finance 2, 39–67 (2020). https://doi.org/10.1007/s42521-020-00018-y

Download citation

Received: 09 December 2018
Accepted: 15 February 2020
Published: 11 March 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s42521-020-00018-y

Deep learning-based cryptocurrency sentiment construction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dataset on sentiment-based cryptocurrency-related news and tweets in English and Malay language

An empirical cryptocurrency price forecasting model

Tweet Based Sentiment Analysis for Stock Price Prediction

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Backpropagation through time (BPTT) for LSTM

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Subscribe and save

Buy Now

Navigation

Deep learning-based cryptocurrency sentiment construction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dataset on sentiment-based cryptocurrency-related news and tweets in English and Malay language

An empirical cryptocurrency price forecasting model

Tweet Based Sentiment Analysis for Stock Price Prediction

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Backpropagation through time (BPTT) for LSTM

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Subscribe and save

Buy Now

Search

Navigation