subscribe to arXiv mailings

Exploring Subjectivity for more Human-Centric Assessment of Social Biases in Large Language Models

Authors: Paula Akemi Aoyagui, Sharon Ferguson, Anastasia Kuzminykh

Abstract: An essential aspect of evaluating Large Language Models (LLMs) is identifying potential biases. This is especially relevant considering the substantial evidence that LLMs can replicate human social biases in their text outputs and further influence stakeholders, potentially amplifying harm to already marginalized individuals and communities. Therefore, recent efforts in bias detection invested in… ▽ More An essential aspect of evaluating Large Language Models (LLMs) is identifying potential biases. This is especially relevant considering the substantial evidence that LLMs can replicate human social biases in their text outputs and further influence stakeholders, potentially amplifying harm to already marginalized individuals and communities. Therefore, recent efforts in bias detection invested in automated benchmarks and objective metrics such as accuracy (i.e., an LLMs output is compared against a predefined ground truth). Nonetheless, social biases can be nuanced, oftentimes subjective and context-dependent, where a situation is open to interpretation and there is no ground truth. While these situations can be difficult for automated evaluation systems to identify, human evaluators could potentially pick up on these nuances. In this paper, we discuss the role of human evaluation and subjective interpretation to augment automated processes when identifying biases in LLMs as part of a human-centred approach to evaluate these models. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: HEAL Workshop at CHI Conference on Human Factors in Computing Systems, May 12, 2024, Honolulu, HI, USA

arXiv:2404.12558 [pdf, other]

Just Like Me: The Role of Opinions and Personal Experiences in The Perception of Explanations in Subjective Decision-Making

Authors: Sharon Ferguson, Paula Akemi Aoyagui, Young-Ho Kim, Anastasia Kuzminykh

Abstract: As large language models (LLMs) advance to produce human-like arguments in some contexts, the number of settings applicable for human-AI collaboration broadens. Specifically, we focus on subjective decision-making, where a decision is contextual, open to interpretation, and based on one's beliefs and values. In such cases, having multiple arguments and perspectives might be particularly useful for… ▽ More As large language models (LLMs) advance to produce human-like arguments in some contexts, the number of settings applicable for human-AI collaboration broadens. Specifically, we focus on subjective decision-making, where a decision is contextual, open to interpretation, and based on one's beliefs and values. In such cases, having multiple arguments and perspectives might be particularly useful for the decision-maker. Using subtle sexism online as an understudied application of subjective decision-making, we suggest that LLM output could effectively provide diverse argumentation to enrich subjective human decision-making. To evaluate the applicability of this case, we conducted an interview study (N=20) where participants evaluated the perceived authorship, relevance, convincingness, and trustworthiness of human and AI-generated explanation-text, generated in response to instances of subtle sexism from the internet. In this workshop paper, we focus on one troubling trend in our results related to opinions and experiences displayed in LLM argumentation. We found that participants rated explanations that contained these characteristics as more convincing and trustworthy, particularly so when those opinions and experiences aligned with their own opinions and experiences. We describe our findings, discuss the troubling role that confirmation bias plays, and bring attention to the ethical challenges surrounding the AI generation of human-like experiences. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: Presented at the Trust and Reliance in Evolving Human-AI Workflows (TREW) Workshop at CHI 2024

arXiv:2404.05103 [pdf, other]

doi 10.1145/3613905.3650921

Chart What I Say: Exploring Cross-Modality Prompt Alignment in AI-Assisted Chart Authoring

Authors: Nazar Ponochevnyi, Anastasia Kuzminykh

Abstract: Recent chart-authoring systems, such as Amazon Q in QuickSight and Copilot for Power BI, demonstrate an emergent focus on supporting natural language input to share meaningful insights from data through chart creation. Currently, chart-authoring systems tend to integrate voice input capabilities by relying on speech-to-text transcription, processing spoken and typed input similarly. However, cross… ▽ More Recent chart-authoring systems, such as Amazon Q in QuickSight and Copilot for Power BI, demonstrate an emergent focus on supporting natural language input to share meaningful insights from data through chart creation. Currently, chart-authoring systems tend to integrate voice input capabilities by relying on speech-to-text transcription, processing spoken and typed input similarly. However, cross-modality input comparisons in other interaction domains suggest that the structure of spoken and typed-in interactions could notably differ, reflecting variations in user expectations based on interface affordances. Thus, in this work, we compare spoken and typed instructions for chart creation. Findings suggest that while both text and voice instructions cover chart elements and element organization, voice descriptions have a variety of command formats, element characteristics, and complex linguistic features. Based on these findings, we developed guidelines for designing voice-based authoring-oriented systems and additional features that can be incorporated into existing text-based systems to support speech modality. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: Will be published In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA 2024)

arXiv:2401.13970 [pdf, ps, other]

doi 10.1145/3613905.3636287

CUI@CHI 2024: Building Trust in CUIs-From Design to Deployment

Authors: Smit Desai, Christina Wei, Jaisie Sin, Mateusz Dubiel, Nima Zargham, Shashank Ahire, Martin Porcheron, Anastasia Kuzminykh, Minha Lee, Heloisa Candello, Joel Fischer, Cosmin Munteanu, Benjamin R Cowan

Abstract: Conversational user interfaces (CUIs) have become an everyday technology for people the world over, as well as a booming area of research. Advances in voice synthesis and the emergence of chatbots powered by large language models (LLMs), notably ChatGPT, have pushed CUIs to the forefront of human-computer interaction (HCI) research and practice. Now that these technologies enable an elemental leve… ▽ More Conversational user interfaces (CUIs) have become an everyday technology for people the world over, as well as a booming area of research. Advances in voice synthesis and the emergence of chatbots powered by large language models (LLMs), notably ChatGPT, have pushed CUIs to the forefront of human-computer interaction (HCI) research and practice. Now that these technologies enable an elemental level of usability and user experience (UX), we must turn our attention to higher-order human factors: trust and reliance. In this workshop, we aim to bring together a multidisciplinary group of researchers and practitioners invested in the next phase of CUI design. Through keynotes, presentations, and breakout sessions, we will share our knowledge, identify cutting-edge resources, and fortify an international network of CUI scholars. In particular, we will engage with the complexity of trust and reliance as attitudes and behaviours that emerge when people interact with conversational agents. △ Less

Submitted 25 January, 2024; originally announced January 2024.

arXiv:2312.13581 [pdf, other]

Understanding the Role of Large Language Models in Personalizing and Scaffolding Strategies to Combat Academic Procrastination

Authors: Ananya Bhattacharjee, Yuchen Zeng, Sarah Yi Xu, Dana Kulzhabayeva, Minyi Ma, Rachel Kornfield, Syed Ishtiaque Ahmed, Alex Mariakakis, Mary P Czerwinski, Anastasia Kuzminykh, Michael Liut, Joseph Jay Williams

Abstract: Traditional interventions for academic procrastination often fail to capture the nuanced, individual-specific factors that underlie them. Large language models (LLMs) hold immense potential for addressing this gap by permitting open-ended inputs, including the ability to customize interventions to individuals' unique needs. However, user expectations and potential limitations of LLMs in this conte… ▽ More Traditional interventions for academic procrastination often fail to capture the nuanced, individual-specific factors that underlie them. Large language models (LLMs) hold immense potential for addressing this gap by permitting open-ended inputs, including the ability to customize interventions to individuals' unique needs. However, user expectations and potential limitations of LLMs in this context remain underexplored. To address this, we conducted interviews and focus group discussions with 15 university students and 6 experts, during which a technology probe for generating personalized advice for managing procrastination was presented. Our results highlight the necessity for LLMs to provide structured, deadline-oriented steps and enhanced user support mechanisms. Additionally, our results surface the need for an adaptive approach to questioning based on factors like busyness. These findings offer crucial design implications for the development of LLM-based tools for managing procrastination while cautioning the use of LLMs for therapeutic guidance. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2310.13712 [pdf, other]

Impact of Guidance and Interaction Strategies for LLM Use on Learner Performance and Perception

Authors: Harsh Kumar, Ilya Musabirov, Mohi Reza, Jiakai Shi, Xinyuan Wang, Joseph Jay Williams, Anastasia Kuzminykh, Michael Liut

Abstract: Personalized chatbot-based teaching assistants can be crucial in addressing increasing classroom sizes, especially where direct teacher presence is limited. Large language models (LLMs) offer a promising avenue, with increasing research exploring their educational utility. However, the challenge lies not only in establishing the efficacy of LLMs but also in discerning the nuances of interaction be… ▽ More Personalized chatbot-based teaching assistants can be crucial in addressing increasing classroom sizes, especially where direct teacher presence is limited. Large language models (LLMs) offer a promising avenue, with increasing research exploring their educational utility. However, the challenge lies not only in establishing the efficacy of LLMs but also in discerning the nuances of interaction between learners and these models, which impact learners' engagement and results. We conducted a formative study in an undergraduate computer science classroom (N=145) and a controlled experiment on Prolific (N=356) to explore the impact of four pedagogically informed guidance strategies on the learners' performance, confidence and trust in LLMs. Direct LLM answers marginally improved performance, while refining student solutions fostered trust. Structured guidance reduced random queries as well as instances of students copy-pasting assignment questions to the LLM. Our work highlights the role that teachers can play in shaping LLM-supported learning environments. △ Less

Submitted 23 January, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.00117 [pdf, other]

doi 10.1145/3613904.3641899

ABScribe: Rapid Exploration & Organization of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models

Authors: Mohi Reza, Nathan Laundry, Ilya Musabirov, Peter Dushniku, Zhi Yuan "Michael" Yu, Kashish Mittal, Tovi Grossman, Michael Liut, Anastasia Kuzminykh, Joseph Jay Williams

Abstract: Exploring alternative ideas by rewriting text is integral to the writing process. State-of-the-art Large Language Models (LLMs) can simplify writing variation generation. However, current interfaces pose challenges for simultaneous consideration of multiple variations: creating new variations without overwriting text can be difficult, and pasting them sequentially can clutter documents, increasing… ▽ More Exploring alternative ideas by rewriting text is integral to the writing process. State-of-the-art Large Language Models (LLMs) can simplify writing variation generation. However, current interfaces pose challenges for simultaneous consideration of multiple variations: creating new variations without overwriting text can be difficult, and pasting them sequentially can clutter documents, increasing workload and disrupting writers' flow. To tackle this, we present ABScribe, an interface that supports rapid, yet visually structured, exploration and organization of writing variations in human-AI co-writing tasks. With ABScribe, users can swiftly modify variations using LLM prompts, which are auto-converted into reusable buttons. Variations are stored adjacently within text fields for rapid in-place comparisons using mouse-over interactions on a popup toolbar. Our user study with 12 writers shows that ABScribe significantly reduces task workload (d = 1.20, p < 0.001), enhances user perceptions of the revision process (d = 2.41, p < 0.001) compared to a popular baseline workflow, and provides insights into how writers explore variations using LLMs. △ Less

Submitted 27 March, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

Comments: CHI 2024

arXiv:2302.05839 [pdf, other]

doi 10.1145/3544548.3580658

A Human-Centered Review of Algorithms in Decision-Making in Higher Education

Authors: Kelly McConvey, Shion Guha, Anastasia Kuzminykh

Abstract: The use of algorithms for decision-making in higher education is steadily growing, promising cost-savings to institutions and personalized service for students but also raising ethical challenges around surveillance, fairness, and interpretation of data. To address the lack of systematic understanding of how these algorithms are currently designed, we reviewed an extensive corpus of papers proposi… ▽ More The use of algorithms for decision-making in higher education is steadily growing, promising cost-savings to institutions and personalized service for students but also raising ethical challenges around surveillance, fairness, and interpretation of data. To address the lack of systematic understanding of how these algorithms are currently designed, we reviewed an extensive corpus of papers proposing algorithms for decision-making in higher education. We categorized them based on input data, computational method, and target outcome, and then investigated the interrelations of these factors with the application of human-centered lenses: theoretical, participatory, or speculative design. We found that the models are trending towards deep learning, and increased use of student personal data and protected attributes, with the target scope expanding towards automated decisions. However, despite the associated decrease in interpretability and explainability, current development predominantly fails to incorporate human-centered lenses. We discuss the challenges with these trends and advocate for a human-centered approach. △ Less

Submitted 11 February, 2023; originally announced February 2023.

Showing 1–8 of 8 results for author: Kuzminykh, A