Skip to main content

Showing 1–1 of 1 results for author: Eranti, V K

  1. arXiv:2406.12800  [pdf, other

    cs.CR

    Supporting Human Raters with the Detection of Harmful Content using Large Language Models

    Authors: Kurt Thomas, Patrick Gage Kelley, David Tao, Sarah Meiklejohn, Owen Vallis, Shunwen Tan, Blaž Bratanič, Felipe Tiengo Ferreira, Vijay Kumar Eranti, Elie Bursztein

    Abstract: In this paper, we explore the feasibility of leveraging large language models (LLMs) to automate or otherwise assist human raters with identifying harmful content including hate speech, harassment, violent extremism, and election misinformation. Using a dataset of 50,000 comments, we demonstrate that LLMs can achieve 90% accuracy when compared to human verdicts. We explore how to best leverage the… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.