skip to main content
research-article

Fake News Detection on Social Media: A Data Mining Perspective

Published: 01 September 2017 Publication History
  • Get Citation Alerts
  • Abstract

    Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of \fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ine ective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.

    References

    [1]
    Sadia Afroz, Michael Brennan, and Rachel Greenstadt. Detecting hoaxes, frauds, and deception in writing style online. In ISSP'12.
    [2]
    Hunt Allcott and Matthew Gentzkow. Social media and fake news in the 2016 election. Technical report, National Bureau of Economic Research, 2017.
    [3]
    Solomon E. Asch and H. Guetzkow. Effects of group pressure upon the modification and distortion of judgments. Groups, leadership, and men, pages 222--236, 1951.
    [4]
    Meital Balmas. When fake news becomes real: Combined exposure to multiple news sources and political attitudes of inefficacy, alienation, and cynicism. Communication Research, 41(3):430--454, 2014.
    [5]
    Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. Open information extraction from the web. In IJCAI'07.
    [6]
    Alessandro Bessi and Emilio Ferrara. Social bots distort the 2016 us presidential election online discussion. First Monday, 21(11), 2016.
    [7]
    Prakhar Biyani, Kostas Tsioutsiouliklis, and John Blackmer. "8 amazing secrets for getting more clicks": Detecting clickbaits in news streams using article informality. In AAAI'16.
    [8]
    Jonas Nygaard Blom and Kenneth Reinecke Hansen. Click bait: Forward-reference as lure in online news headlines. Journal of Pragmatics, 76:87--100, 2015.
    [9]
    Paul R Brewer, Dannagal Goldthwaite Young, and Michelle Morreale. The impact of real news about fake news: Intertextual processes and political satire. International Journal of Public Opinion Research, 25(3):323--343, 2013.
    [10]
    Carlos Castillo, Mohammed El-Haddad, Jürgen Pfeffer, and Matt Stempeck. Characterizing the life cycle of online news stories using social media reactions. In CSCW'14.
    [11]
    Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. Information credibility on twitter. In WWW'11.
    [12]
    Abhijnan Chakraborty, Bhargavi Paranjape, Sourya Kakarla, and Niloy Ganguly. Stop clickbait: Detecting and preventing clickbaits in online news media. In ASONAM'16.
    [13]
    Yimin Chen, Niall J. Conroy, and Victoria L. Rubin. Misleading online content: Recognizing clickbait as false news. In Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection, pages 15--19. ACM, 2015.
    [14]
    Justin Cheng, Michael Bernstein, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. Anyone can become a troll: Causes of trolling behavior in online discussions. In CSCW '17.
    [15]
    Zi Chu, Steven Gianvecchio, Haining Wang, and Sushil Jajodia. Detecting automation of twitter accounts: Are you a human, bot, or cyborg? IEEE Transactions on Dependable and Secure Computing, 9(6):811--824, 2012.
    [16]
    Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M. Rocha, Johan Bollen, Filippo Menczer, and Alessandro Flammini. Computational fact checking from knowledge networks. PloS one, 10(6):e0128193, 2015.
    [17]
    Niall J. Conroy, Victoria L. Rubin, and Yimin Chen. Automatic deception detection: Methods for finding fake news. Proceedings of the Association for Information Science and Technology, 52(1):1--4, 2015.
    [18]
    Michela Del Vicario, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H. Eugene Stanley, and Walter Quattrociocchi. The spreading of misinformation online. Proceedings of the National Academy of Sciences, 113(3):554--559, 2016.
    [19]
    Michela Del Vicario, Gianna Vivaldo, Alessandro Bessi, Fabiana Zollo, Antonio Scala, Guido Caldarelli, and Walter Quattrociocchi. Echo chambers: Emotional contagion and group polarization on facebook. Scientific Reports, 6, 2016.
    [20]
    Thomas G. Dietterich et al. Ensemble methods in machine learning. Multiple classifier systems, 1857:1--15, 2000.
    [21]
    Mehrdad Farajtabar, Jiachen Yang, Xiaojing Ye, Huan Xu, Rakshit Trivedi, Elias Khalil, Shuang Li, Le Song, and Hongyuan Zha. Fake news mitigation via point process based intervention. arXiv preprint arXiv:1703.07823, 2017.
    [22]
    Song Feng, Ritwik Banerjee, and Yejin Choi. Syntactic stylometry for deception detection. In ACL'12.
    [23]
    Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro Flammini. The rise of social bots. Communications of the ACM, 59(7):96--104, 2016.
    [24]
    Johannes Fürnkranz. A study using n-gram features for text categorization. Austrian Research Institute for Artifical Intelligence, 3(1998):1--10, 1998.
    [25]
    Ashutosh Garg and Dan Roth. Understanding probabilistic classifiers. ECML'01.
    [26]
    Matthew Gentzkow, Jesse M. Shapiro, and Daniel F. Stone. Media bias in the marketplace: Theory. Technical report, National Bureau of Economic Research, 2014.
    [27]
    Adrien Guille, Hakim Hacid, Cecile Favre, and Djamel A Zighed. Information diffusion in online social networks: A survey. ACM Sigmod Record, 42(2):17--28, 2013.
    [28]
    Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and Anupam Joshi. Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In WWW'13.
    [29]
    Manish Gupta, Peixiang Zhao, and Jiawei Han. Evaluating event credibility on twitter. In PSDM'12.
    [30]
    David J. Hand and Robert J. Till. A simple generalisation of the area under the roc curve for multiple class classification problems. Machine learning, 2001.
    [31]
    Naeemul Hassan, Chengkai Li, and Mark Tremayne. Detecting check-worthy factual claims in presidential debates. In CIKM'15.
    [32]
    John Houvardas and Efstathios Stamatatos. N-gram feature selection for authorship identification. Artificial Intelligence: Methodology, Systems, and Applications, pages 77--86, 2006.
    [33]
    Xia Hu, Jiliang Tang, Huiji Gao, and Huan Liu. Social spammer detection with sentiment information. In ICDM'14.
    [34]
    Xia Hu, Jiliang Tang, and Huan Liu. Online social spammer detection. In AAAI'14, pages 59--65, 2014.
    [35]
    Xia Hu, Jiliang Tang, Yanchao Zhang, and Huan Liu. Social spammer detection in microblogging. In IJCAI'13.
    [36]
    Zhiwei Jin, Juan Cao, Yu-Gang Jiang, and Yongdong Zhang. News credibility evaluation on microblog with a hierarchical propagation model. In ICDM'14.
    [37]
    Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. News verification by exploiting conicting social viewpoints in microblogs. In AAAI'16.
    [38]
    Zhiwei Jin, Juan Cao, Yongdong Zhang, Jianshe Zhou, and Qi Tian. Novel visual and statistical image features for microblogs news verification. IEEE Transactions on Multimedia, 19(3):598--608, 2017.
    [39]
    Daniel Kahneman and Amos Tversky. Prospect theory: An analysis of decision under risk. Econometrica: Journal of the econometric society, pages 263--291, 1979.
    [40]
    Jean-Noel Kapferer. Rumors: Uses, Interpretation and Necessity. Routledge, 2017.
    [41]
    David O. Klein and Joshua R. Wueller. Fake news: A legal perspective. 2017.
    [42]
    Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, and Yajun Wang. Prominent features of rumor propagation in online social media. In ICDM'13, pages 1103--1108. IEEE, 2013.
    [43]
    Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436--444, 2015.
    [44]
    Kyumin Lee, James Caverlee, and Steve Webb. Uncovering social spammers: social honeypots+ machine learning. In SIGIR'10.
    [45]
    Tony Lesce. Scan: Deception detection by scientific content analysis. Law and Order, 38(8):3--6, 1990.
    [46]
    Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. A survey on truth discovery. ACM Sigkdd Explorations Newsletter, 17(2):1--16, 2016.
    [47]
    Charles X. Ling, Jin Huang, and Harry Zhang. Auc: a statistically consistent and more discriminating measure than accuracy.
    [48]
    Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Bernard J. Jansen, Kam-Fai Wong, and Meeyoung Cha. Detecting rumors from microblogs with recurrent neural networks.
    [49]
    Jing Ma, Wei Gao, Zhongyu Wei, Yueming Lu, and Kam-Fai Wong. Detect rumors using time series of social context information on microblogging websites. In CIKM'15.
    [50]
    Amr Magdy and Nayer Wanas. Web-based statistical fact checking of textual documents. In Proceedings of the 2nd international workshop on Search and mining user-generated contents, pages 103--110. ACM, 2010.
    [51]
    Filippo Menczer. The spread of misinformation in social media. In WWW'16.
    [52]
    Tanushree Mitra and Eric Gilbert. Credbank: A largescale social media corpus with associated credibility annotations. In ICWSM'15.
    [53]
    Saif M. Mohammad, Parinaz Sobhani, and Svetlana Kiritchenko. Stance and sentiment in tweets. ACM Transactions on Internet Technology (TOIT), 17(3):26, 2017.
    [54]
    Fred Morstatter, Harsh Dani, Justin Sampson, and Huan Liu. Can one tamper with the sample api?: Toward neutralizing bias from spam and bot content. In WWW'16.
    [55]
    Fred Morstatter, Liang Wu, Tahora H. Nazer, Kathleen M. Carley, and Huan Liu. A new approach to bot detection: Striking the balance between precision and recall. In ASONAM'16.
    [56]
    Subhabrata Mukherjee and Gerhard Weikum. Leveraging joint interactions for credibility analysis in news communities. In CIKM'15.
    [57]
    Eni Mustafaraj and Panagiotis Takis Metaxas. The fake news spreading plague: Was it preventable? arXiv preprint arXiv:1703.06988, 2017.
    [58]
    Raymond S. Nickerson. Con rmation bias: A ubiquitous phenomenon in many guises. Review of general psychology, 2(2):175, 1998.
    [59]
    Brendan Nyhan and Jason Reier. When corrections fail: The persistence of political misperceptions. Political Behavior, 32(2):303--330, 2010.
    [60]
    Christopher Paul and Miriam Matthews. The russian firehose of falsehood propaganda model.
    [61]
    Dongping Tian et al. A review on image feature extraction and representation techniques. International Journal of Multimedia and Ubiquitous Engineering, 8(4):385--396, 2013.
    [62]
    Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. A stylometric inquiry into hyperpartisan and fake news. arXiv preprint arXiv:1702.05638, 2017.
    [63]
    Martin Potthast, Sebastian Köpsel, Benno Stein, and Matthias Hagen. Clickbait detection. In European Conference on Information Retrieval, pages 810--817. Springer, 2016.
    [64]
    Vahed Qazvinian, Emily Rosengren, Dragomir R. Radev, and Qiaozhu Mei. Rumor has it: Identifying misinformation in microblogs. In EMNLP'11.
    [65]
    Walter Quattrociocchi, Antonio Scala, and Cass R. Sunstein. Echo chambers on facebook. 2016.
    [66]
    Victoria L. Rubin, Yimin Chen, and Niall J. Conroy. Deception detection for news: three types of fakes. Proceedings of the Association for Information Science and Technology, 52(1):1--4, 2015.
    [67]
    Victoria L. Rubin, Niall J. Conroy, Yimin Chen, and Sarah Cornwell. Fake news or truth? using satirical cues to detect potentially misleading news. In Proceedings of NAACL-HLT, pages 7--17, 2016.
    [68]
    Victoria L. Rubin and Tatiana Lukoianova. Truth and deception at the rhetorical structure level. Journal of the Association for Information Science and Technology, 66(5):905--917, 2015.
    [69]
    Natali Ruchansky, Sungyong Seo, and Yan Liu. Csi: A hybrid deep model for fake news. arXiv preprint arXiv:1703.06959, 2017.
    [70]
    Justin Sampson, Fred Morstatter, Liang Wu, and Huan Liu. Leveraging the implicit structure within social media for emergent rumor detection. In CIKM'15.
    [71]
    Chengcheng Shao, Giovanni Luca Ciampaglia, Alessandro Flammini, and Filippo Menczer. Hoaxy: A platform for tracking online misinformation. In WWW'16.
    [72]
    Baoxu Shi and Tim Weninger. Fact checking in heterogeneous information networks. In WWW'16.
    [73]
    Kai Shu, Suhang Wang, Jiliang Tang, Reza Zafarani, and Huan Liu. User identity linkage across online social networks: A review. ACM SIGKDD Explorations Newsletter, 18(2):5--17, 2017.
    [74]
    Supasorn Suwajanakorn, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. Synthesizing obama: learning lip sync from audio. ACM Transactions on Graphics (TOG), 36(4):95, 2017.
    [75]
    Eugenio Tacchini, Gabriele Ballarin, Marco L. Della Vedova, Stefano Moret, and Luca de Alfaro. Some like it hoax: Automated fake news detection in social networks. arXiv preprint arXiv:1704.07506, 2017.
    [76]
    Henri Tajfel and John C. Turner. An integrative theory of intergroup conict. The social psychology of intergroup relations, 33(47):74, 1979.
    [77]
    Henri Tajfel and John C. Turner. The social identity theory of intergroup behavior. 2004.
    [78]
    Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale information network embedding. In WWW'15.
    [79]
    Jiliang Tang, Yi Chang, and Huan Liu. Mining social media with social theories: a survey. ACM SIGKDD Explorations Newsletter, 15(2):20--29, 2014.
    [80]
    Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. Face2face: Real-time face capture and reenactment of rgb videos. In CVPR'16.
    [81]
    Amos Tversky and Daniel Kahneman. Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and uncertainty, 5(4):297--323, 1992.
    [82]
    Udo Undeutsch. Beurteilung der glaubhaftigkeit von aussagen. Handbuch der psychologie, 11:26--181, 1967.
    [83]
    Andreas Vlachos and Sebastian Riedel. Fact checking: Task definition and dataset construction. ACL'14.
    [84]
    Aldert Vrij. Criteria-based content analysis: A qualitative review of the first 37 studies. Psychology, Public Policy, and Law, 11(1):3, 2005.
    [85]
    Suhang Wang, Charu Aggarwal, Jiliang Tang, and Huan Liu. Attributed signed network embedding. In CIKM'17.
    [86]
    Suhang Wang, Jiliang Tang, Charu Aggarwal, Yi Chang, and Huan Liu. Signed network embedding in social media. In SDM'17.
    [87]
    Suhang Wang, Jiliang Tang, Charu Aggarwal, and Huan Liu. Linked document embedding for classification. In CIKM'16.
    [88]
    Suhang Wang, Jiliang Tang, Fred Morstatter, and Huan Liu. Paired restricted boltzmann machine for linked data. In CIKM'16.
    [89]
    Suhang Wang, Yilin Wang, Jiliang Tang, Kai Shu, Suhas Ranganath, and Huan Liu. What your images reveal: Exploiting visual contents for point-of-interest recommendation. In WWW'17.
    [90]
    William Yang Wang. "liar, liar pants on fire": A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648, 2017.
    [91]
    Yilin Wang, Suhang Wang, Jiliang Tang, Huan Liu, and Baoxin Li. Unsupervised sentiment analysis for social media images. In IJCAI, pages 2378--2379, 2015.
    [92]
    Andrew Ward, L. Ross, E. Reed, E. Turiel, and T. Brown. Naive realism in everyday life: Implications for social conict and misunderstanding. Values and knowledge, pages 103--135, 1997.
    [93]
    Gerhard Weikum. What computers should know, shouldn't know, and shouldn't believe. In WWW'17.
    [94]
    L. Wu, F. Morstatter, X. Hu, and H. Liu. Chapter 5: Mining misinformation in social media, 2016.
    [95]
    Liang Wu, Xia Hu, Fred Morstatter, and Huan Liu. Adaptive spammer detection with sparse group modeling. In ICWSM'17.
    [96]
    Liang Wu, Jundong Li, Xia Hu, and Huan Liu. Gleaning wisdom from the past: Early detection of emerging rumors in social media. In SDM'17.
    [97]
    Liang Wu, Fred Morstatter, Xia Hu, and Huan Liu. Mining misinformation in social media. Big Data in Complex and Social Networks, pages 123--152, 2016.
    [98]
    You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. Toward computational fact-checking. Proceedings of the VLDB Endowment, 7(7):589--600, 2014.
    [99]
    Fan Yang, Yang Liu, Xiaohui Yu, and Min Yang. Automatic detection of rumor on sina weibo. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, page 13. ACM, 2012.
    [100]
    Robert B. Zajonc. Attitudinal effects of mere exposure. Journal of personality and social psychology, 9(2p2):1, 1968
    [101]
    Robert B. Zajonc. Mere exposure: A gateway to the subliminal. Current directions in psychological science, 10(6):224--228, 2001.
    [102]
    Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Maria Liakata, and Rob Procter. Detection and resolution of rumours in social media: A survey. arXiv preprint arXiv:1704.00656, 2017.

    Cited By

    View all
    • (2024)online dezinformáció és hatásaiIn Medias Res10.59851/imr.13.1.313(42-74)Online publication date: 2-Jul-2024
    • (2024)Enhancing the Identification of False News using Machine Learning Algorithms: A Comparative StudyMetaverse Basic and Applied Research10.56294/mr2024663(66)Online publication date: 16-Apr-2024
    • (2024)“Make it difficult”ReMark - Revista Brasileira de Marketing10.5585/remark.v23i3.2436323:3(1023-1080)Online publication date: 26-Jul-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGKDD Explorations Newsletter
    ACM SIGKDD Explorations Newsletter  Volume 19, Issue 1
    June 2017
    59 pages
    ISSN:1931-0145
    EISSN:1931-0153
    DOI:10.1145/3137597
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 September 2017
    Published in SIGKDD Volume 19, Issue 1

    Check for updates

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5,287
    • Downloads (Last 6 weeks)645
    Reflects downloads up to 28 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)online dezinformáció és hatásaiIn Medias Res10.59851/imr.13.1.313(42-74)Online publication date: 2-Jul-2024
    • (2024)Enhancing the Identification of False News using Machine Learning Algorithms: A Comparative StudyMetaverse Basic and Applied Research10.56294/mr2024663(66)Online publication date: 16-Apr-2024
    • (2024)“Make it difficult”ReMark - Revista Brasileira de Marketing10.5585/remark.v23i3.2436323:3(1023-1080)Online publication date: 26-Jul-2024
    • (2024)Fake News Detection System Using Logistic Regression, Decision Tree and Random ForestBritish Journal of Computer, Networking and Information Technology10.52589/BJCNIT-IOYRPY7G7:1(115-121)Online publication date: 17-May-2024
    • (2024)Dimensiones críticas en la proliferación y mitigación de la desinformación: un estudio DelphiEstudios sobre el Mensaje Periodístico10.5209/esmp.9376330:2(281-293)Online publication date: 25-Jun-2024
    • (2024)Using Artificial Intelligence Systems in News Verification: An Application on Xİletişim Kuram ve Araştırma Dergisi10.47998/ikad.1466830Online publication date: 16-Jul-2024
    • (2024)Extracting Hidden Patterns of Iranian User Trust in Social Networks Regarding Coronavirus Disease 2019 Using Data Mining TechniquesInternational Journal of Environmental Health Engineering10.4103/ijehe.ijehe_39_2313:1Online publication date: Apr-2024
    • (2024)Improvement of a Machine Learning Model Using a Sentiment Analysis Algorithm to Detect Fake NewsJournal of Cases on Information Technology10.4018/JCIT.34481226:1(1-26)Online publication date: 21-Jun-2024
    • (2024)Examining the Effectiveness of Fact-Checking Tools on Social Media in Reducing the Spread of MisinformationInternational Journal of E-Adoption10.4018/IJEA.34794816:1(1-19)Online publication date: 17-Jul-2024
    • (2024)Constructive Manoeuvring of the Interconnected WorldCases on Forensic and Criminological Science for Criminal Detection and Avoidance10.4018/978-1-6684-9800-2.ch005(102-122)Online publication date: 17-May-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media