Skip to main content

Showing 1–4 of 4 results for author: Vallis, O

  1. arXiv:2406.12800  [pdf, other

    cs.CR

    Supporting Human Raters with the Detection of Harmful Content using Large Language Models

    Authors: Kurt Thomas, Patrick Gage Kelley, David Tao, Sarah Meiklejohn, Owen Vallis, Shunwen Tan, Blaž Bratanič, Felipe Tiengo Ferreira, Vijay Kumar Eranti, Elie Bursztein

    Abstract: In this paper, we explore the feasibility of leveraging large language models (LLMs) to automate or otherwise assist human raters with identifying harmful content including hate speech, harassment, violent extremism, and election misinformation. Using a dataset of 50,000 comments, we demonstrate that LLMs can achieve 90% accuracy when compared to human verdicts. We explore how to best leverage the… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2311.17264  [pdf, other

    cs.CL

    RETSim: Resilient and Efficient Text Similarity

    Authors: Marina Zhang, Owen Vallis, Aysegul Bumin, Tanay Vakharia, Elie Bursztein

    Abstract: This paper introduces RETSim (Resilient and Efficient Text Similarity), a lightweight, multilingual deep learning model trained to produce robust metric embeddings for near-duplicate text retrieval, clustering, and dataset deduplication tasks. We demonstrate that RETSim is significantly more robust and accurate than MinHash and neural text embeddings, achieving new state-of-the-art performance on… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  3. arXiv:2302.09207  [pdf, other

    cs.CL cs.AI

    RETVec: Resilient and Efficient Text Vectorizer

    Authors: Elie Bursztein, Marina Zhang, Owen Vallis, Xinyu Jia, Alexey Kurakin

    Abstract: This paper describes RETVec, an efficient, resilient, and multilingual text vectorizer designed for neural-based text processing. RETVec combines a novel character encoding with an optional small embedding model to embed words into a 256-dimensional vector space. The RETVec embedding model is pre-trained using pair-wise metric learning to be robust against typos and character-level adversarial att… ▽ More

    Submitted 22 April, 2024; v1 submitted 17 February, 2023; originally announced February 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  4. arXiv:1704.07706  [pdf, other

    cs.LG

    Automatic Anomaly Detection in the Cloud Via Statistical Learning

    Authors: Jordan Hochenbaum, Owen S. Vallis, Arun Kejariwal

    Abstract: Performance and high availability have become increasingly important drivers, amongst other drivers, for user retention in the context of web services such as social networks, and web search. Exogenic and/or endogenic factors often give rise to anomalies, making it very challenging to maintain high availability, while also delivering high performance. Given that service-oriented architectures (SOA… ▽ More

    Submitted 24 April, 2017; originally announced April 2017.

    Comments: 13 pages, 12 figures