subscribe to arXiv mailings

arXiv:2406.17097 [pdf, other]

Lower Quantity, Higher Quality: Auditing News Content and User Perceptions on Twitter/X Algorithmic versus Chronological Timelines

Authors: Stephanie Wang, Shengchun Huang, Alvin Zhou, Danaë Metaxa

Abstract: Social media personalization algorithms increasingly influence the flow of civic information through society, resulting in concerns about "filter bubbles", "echo chambers", and other ways they might exacerbate ideological segregation and fan the spread of polarizing content. To address these concerns, we designed and conducted a sociotechnical audit (STA) to investigate how Twitter/X's timeline al… ▽ More Social media personalization algorithms increasingly influence the flow of civic information through society, resulting in concerns about "filter bubbles", "echo chambers", and other ways they might exacerbate ideological segregation and fan the spread of polarizing content. To address these concerns, we designed and conducted a sociotechnical audit (STA) to investigate how Twitter/X's timeline algorithm affects news curation while also tracking how user perceptions change in response. We deployed a custom-built system that, over the course of three weeks, passively tracked all tweets loaded in users' browsers in the first week, then in the second week enacted an intervention to users' Twitter/X homepage to restrict their view to only the algorithmic or chronological timeline (randomized). We flipped this condition for each user in the third week. We ran our audit in late 2023, collecting user-centered metrics (self-reported survey measures) and platform-centered metrics (views, clicks, likes) for 243 users, along with over 800,000 tweets. Using the STA framework, our results are two-fold: (1) Our algorithm audit finds that Twitter/X's algorithmic timeline resulted in a lower quantity but higher quality of news -- less ideologically congruent, less extreme, and slightly more reliable -- compared to the chronological timeline. (2) Our user audit suggests that although our timeline intervention had significant effects on users' behaviors, it had little impact on their overall perceptions of the platform. Our paper discusses these findings and their broader implications in the context of algorithmic news curation, user-centric audits, and avenues for independent social science research. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 24 pages, 5 figures, Computer-Supported Cooperative Work

arXiv:2405.04412 [pdf, other]

The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring

Authors: Lena Armstrong, Abbey Liu, Stephen MacNeil, Danaë Metaxa

Abstract: Large language models (LLMs) are increasingly being introduced in workplace settings, with the goals of improving efficiency and fairness. However, concerns have arisen regarding these models' potential to reflect or exacerbate social biases and stereotypes. This study explores the potential impact of LLMs on hiring practices. To do so, we conduct an algorithm audit of race and gender biases in on… ▽ More Large language models (LLMs) are increasingly being introduced in workplace settings, with the goals of improving efficiency and fairness. However, concerns have arisen regarding these models' potential to reflect or exacerbate social biases and stereotypes. This study explores the potential impact of LLMs on hiring practices. To do so, we conduct an algorithm audit of race and gender biases in one commonly-used LLM, OpenAI's GPT-3.5, taking inspiration from the history of traditional offline resume audits. We conduct two studies using names with varied race and gender connotations: resume assessment (Study 1) and resume generation (Study 2). In Study 1, we ask GPT to score resumes with 32 different names (4 names for each combination of the 2 gender and 4 racial groups) and two anonymous options across 10 occupations and 3 evaluation tasks (overall rating, willingness to interview, and hireability). We find that the model reflects some biases based on stereotypes. In Study 2, we prompt GPT to create resumes (10 for each name) for fictitious job candidates. When generating resumes, GPT reveals underlying biases; women's resumes had occupations with less experience, while Asian and Hispanic resumes had immigrant markers, such as non-native English and non-U.S. education and work experiences. Our findings contribute to a growing body of literature on LLM biases, in particular when used in workplace contexts. △ Less

Submitted 9 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2404.05874 [pdf, other]

doi 10.1145/3628516.3655752

Youth as Peer Auditors: Engaging Teenagers with Algorithm Auditing of Machine Learning Applications

Authors: Luis Morales-Navarro, Yasmin B. Kafai, Vedya Konda, Danaë Metaxa

Abstract: As artificial intelligence/machine learning (AI/ML) applications become more pervasive in youth lives, supporting them to interact, design, and evaluate applications is crucial. This paper positions youth as auditors of their peers' ML-powered applications to better understand algorithmic systems' opaque inner workings and external impacts. In a two-week workshop, 13 youth (ages 14-15) designed an… ▽ More As artificial intelligence/machine learning (AI/ML) applications become more pervasive in youth lives, supporting them to interact, design, and evaluate applications is crucial. This paper positions youth as auditors of their peers' ML-powered applications to better understand algorithmic systems' opaque inner workings and external impacts. In a two-week workshop, 13 youth (ages 14-15) designed and audited ML-powered applications. We analyzed pre/post clinical interviews in which youth were presented with auditing tasks. The analyses show that after the workshop all youth identified algorithmic biases and inferred dataset and model design issues. Youth also discussed algorithmic justice issues and ML model improvements. Furthermore, youth reflected that auditing provided them new perspectives on model functionality and ideas to improve their own models. This work contributes (1) a conceptualization of algorithm auditing for youth; and (2) empirical evidence of the potential benefits of auditing. We discuss potential uses of algorithm auditing in learning and child-computer interaction research. △ Less

Submitted 16 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

ACM Class: K.3.0

arXiv:2309.11599 [pdf, other]

The Role of Inclusion, Control, and Ownership in Workplace AI-Mediated Communication

Authors: Kowe Kadoma, Marianne Aubin Le Quere, Jenny Fu, Christin Munsch, Danae Metaxa, Mor Naaman

Abstract: Given large language models' (LLMs) increasing integration into workplace software, it is important to examine how biases in the models may impact workers. For example, stylistic biases in the language suggested by LLMs may cause feelings of alienation and result in increased labor for individuals or groups whose style does not match. We examine how such writer-style bias impacts inclusion, contro… ▽ More Given large language models' (LLMs) increasing integration into workplace software, it is important to examine how biases in the models may impact workers. For example, stylistic biases in the language suggested by LLMs may cause feelings of alienation and result in increased labor for individuals or groups whose style does not match. We examine how such writer-style bias impacts inclusion, control, and ownership over the work when co-writing with LLMs. In an online experiment, participants wrote hypothetical job promotion requests using either hesitant or self-assured autocomplete suggestions from an LLM and reported their subsequent perceptions. We found that the style of the AI model did not impact perceived inclusion. However, individuals with higher perceived inclusion did perceive greater agency and ownership, an effect more strongly impacting participants of minoritized genders. Feelings of inclusion mitigated a loss of control and agency when accepting more AI suggestions. △ Less

Submitted 26 February, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

arXiv:2308.15768 [pdf, other]

doi 10.1145/3610209

Sociotechnical Audits: Broadening the Algorithm Auditing Lens to Investigate Targeted Advertising

Authors: Michelle S. Lam, Ayush Pandit, Colin H. Kalicki, Rachit Gupta, Poonam Sahoo, Danaë Metaxa

Abstract: Algorithm audits are powerful tools for studying black-box systems. While very effective in examining technical components, the method stops short of a sociotechnical frame, which would also consider users as an integral and dynamic part of the system. Addressing this gap, we propose the concept of sociotechnical auditing: auditing methods that evaluate algorithmic systems at the sociotechnical le… ▽ More Algorithm audits are powerful tools for studying black-box systems. While very effective in examining technical components, the method stops short of a sociotechnical frame, which would also consider users as an integral and dynamic part of the system. Addressing this gap, we propose the concept of sociotechnical auditing: auditing methods that evaluate algorithmic systems at the sociotechnical level, focusing on the interplay between algorithms and users as each impacts the other. Just as algorithm audits probe an algorithm with varied inputs and observe outputs, a sociotechnical audit (STA) additionally probes users, exposing them to different algorithmic behavior and measuring resulting attitudes and behaviors. To instantiate this method, we develop Intervenr, a platform for conducting browser-based, longitudinal sociotechnical audits with consenting, compensated participants. Intervenr investigates the algorithmic content users encounter online and coordinates systematic client-side interventions to understand how users change in response. As a case study, we deploy Intervenr in a two-week sociotechnical audit of online advertising (N=244) to investigate the central premise that personalized ad targeting is more effective on users. In the first week, we collect all browser ads delivered to users, and in the second, we deploy an ablation-style intervention that disrupts normal targeting by randomly pairing participants and swapping all their ads. We collect user-oriented metrics (self-reported ad interest and feeling of representation) and advertiser-oriented metrics (ad views, clicks, and recognition) throughout, along with a total of over 500,000 ads. Our STA finds that targeted ads indeed perform better with users, but also that users begin to acclimate to different ads in only a week, casting doubt on the primacy of personalized ad targeting given the impact of repeated exposure. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: To appear at CSCW 2023

Showing 1–5 of 5 results for author: Metaxa, D