Skip to main content

Showing 1–8 of 8 results for author: Bursztein, E

  1. arXiv:2406.12800  [pdf, other

    cs.CR

    Supporting Human Raters with the Detection of Harmful Content using Large Language Models

    Authors: Kurt Thomas, Patrick Gage Kelley, David Tao, Sarah Meiklejohn, Owen Vallis, Shunwen Tan, Blaž Bratanič, Felipe Tiengo Ferreira, Vijay Kumar Eranti, Elie Bursztein

    Abstract: In this paper, we explore the feasibility of leveraging large language models (LLMs) to automate or otherwise assist human raters with identifying harmful content including hate speech, harassment, violent extremism, and election misinformation. Using a dataset of 50,000 comments, we demonstrate that LLMs can achieve 90% accuracy when compared to human verdicts. We explore how to best leverage the… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2311.17264  [pdf, other

    cs.CL

    RETSim: Resilient and Efficient Text Similarity

    Authors: Marina Zhang, Owen Vallis, Aysegul Bumin, Tanay Vakharia, Elie Bursztein

    Abstract: This paper introduces RETSim (Resilient and Efficient Text Similarity), a lightweight, multilingual deep learning model trained to produce robust metric embeddings for near-duplicate text retrieval, clustering, and dataset deduplication tasks. We demonstrate that RETSim is significantly more robust and accurate than MinHash and neural text embeddings, achieving new state-of-the-art performance on… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  4. arXiv:2306.07249  [pdf, other

    cs.CR

    Generalized Power Attacks against Crypto Hardware using Long-Range Deep Learning

    Authors: Elie Bursztein, Luca Invernizzi, Karel Král, Daniel Moghimi, Jean-Michel Picod, Marina Zhang

    Abstract: To make cryptographic processors more resilient against side-channel attacks, engineers have developed various countermeasures. However, the effectiveness of these countermeasures is often uncertain, as it depends on the complex interplay between software and hardware. Assessing a countermeasure's effectiveness using profiling techniques or machine learning so far requires significant expertise an… ▽ More

    Submitted 26 April, 2024; v1 submitted 12 June, 2023; originally announced June 2023.

  5. arXiv:2302.09207  [pdf, other

    cs.CL cs.AI

    RETVec: Resilient and Efficient Text Vectorizer

    Authors: Elie Bursztein, Marina Zhang, Owen Vallis, Xinyu Jia, Alexey Kurakin

    Abstract: This paper describes RETVec, an efficient, resilient, and multilingual text vectorizer designed for neural-based text processing. RETVec combines a novel character encoding with an optional small embedding model to embed words into a 256-dimensional vector space. The RETVec embedding model is pre-trained using pair-wise metric learning to be robust against typos and character-level adversarial att… ▽ More

    Submitted 22 April, 2024; v1 submitted 17 February, 2023; originally announced February 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  6. arXiv:2106.04511  [pdf, other

    cs.SI cs.CR cs.CY cs.HC

    Designing Toxic Content Classification for a Diversity of Perspectives

    Authors: Deepak Kumar, Patrick Gage Kelley, Sunny Consolvo, Joshua Mason, Elie Bursztein, Zakir Durumeric, Kurt Thomas, Michael Bailey

    Abstract: In this work, we demonstrate how existing classifiers for identifying toxic comments online fail to generalize to the diverse concerns of Internet users. We survey 17,280 participants to understand how user expectations for what constitutes toxic content differ across demographics, beliefs, and personal experiences. We find that groups historically at-risk of harassment - such as people who identi… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

  7. arXiv:2106.00236  [pdf, other

    cs.CY

    "Why wouldn't someone think of democracy as a target?": Security practices & challenges of people involved with U.S. political campaigns

    Authors: Sunny Consolvo, Patrick Gage Kelley, Tara Matthews, Kurt Thomas, Lee Dunn, Elie Bursztein

    Abstract: People who are involved with political campaigns face increased digital security threats from well-funded, sophisticated attackers, especially nation-states. Improving political campaign security is a vital part of protecting democracy. To identify campaign security issues, we conducted qualitative research with 28 participants across the U.S. political spectrum to understand the digital security… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

    Comments: 18 pages, 2 tables, one ancillary file with 4 appendices

  8. arXiv:2006.10861  [pdf, other

    cs.CR

    CoinPolice:Detecting Hidden Cryptojacking Attacks with Neural Networks

    Authors: Ivan Petrov, Luca Invernizzi, Elie Bursztein

    Abstract: Traffic monetization is a crucial component of running most for-profit online businesses. One of its latest incarnations is cryptocurrency mining, where a website instructs the visitor's browser to participate in building a cryptocurrency ledger (e.g., Bitcoin, Monero) in exchange for a small reward in the same currency. In its essence, this practice trades the user's electric bill (or battery lev… ▽ More

    Submitted 23 June, 2020; v1 submitted 18 June, 2020; originally announced June 2020.