subscribe to arXiv mailings

CRAG -- Comprehensive RAG Benchmark

Authors: Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar , et al. (2 additional authors not shown)

Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering bench… ▽ More Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation on this benchmark highlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve <=34% accuracy on CRAG, adding RAG in a straightforward manner improves the accuracy only to 44%. State-of-the-art industry RAG solutions only answer 63% questions without any hallucination. CRAG also reveals much lower accuracy in answering questions regarding facts with higher dynamism, lower popularity, or higher complexity, suggesting future research directions. The CRAG benchmark laid the groundwork for a KDD Cup 2024 challenge, attracting thousands of participants and submissions within the first 50 days of the competition. We commit to maintaining CRAG to serve research communities in advancing RAG solutions and general QA solutions. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2403.03308 [pdf, other]

Know your footprint -- Evaluation of the professional carbon footprint for individual researchers in high energy physics and related fields

Authors: Valerie Lang, Naman Kumar Bhalla, Simran Gurdasani, Pardis Niknejadi

Abstract: Understanding the environmental impact of professional activities is becoming paramount in current times, especially within sectors that historically have had significant resource utilisation, such as High Energy Physics (HEP) and related fields. The young High Energy Physicists (yHEP) association launched the Know your footprint (Kyf) campaign to evaluate the CO$_\text{2}$-equivalent emissions ge… ▽ More Understanding the environmental impact of professional activities is becoming paramount in current times, especially within sectors that historically have had significant resource utilisation, such as High Energy Physics (HEP) and related fields. The young High Energy Physicists (yHEP) association launched the Know your footprint (Kyf) campaign to evaluate the CO$_\text{2}$-equivalent emissions generated by HEP-related research. This study delves into the carbon footprints associated with four distinct categories: Experiments, tied to extensive collaborations with substantial infrastructure; Institutional, representing the resource consumption of research institutes and universities; Computing, focusing on simulations and data analysis; and Travel, covering professional trips such as to conferences, meetings, and workshops. The findings in this assessment are integrated into a tool for self-evaluation, the Know-your-footprint (Kyf) calculator, which allows colleagues to assess their personal and professional footprint, and optionally share their data with the yHEP association. The aim of the Kyf campaign is to heighten awareness, foster sustainability, and inspire the community to adopt more environmentally-responsible research methodologies. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 16 pages, 5 figures

arXiv:2309.02298 [pdf, other]

Possible Extragalactic Origins of Five LMC Globular Clusters: Proper Motion Deviations in Gaia DR3

Authors: Tamojeet Roychowdhury, Navdha Bhalla

Abstract: We use kinematic data of proper motions from Gaia of forty-two globular and open clusters from Large Magellanic Cloud (LMC) to explore the possibility of them having extragalactic origins. We find the difference between the proper motions of cluster stars and a surrounding patch of young LMC stars in each case. We find five globular clusters towards the north-east showing a high difference (> 0.11… ▽ More We use kinematic data of proper motions from Gaia of forty-two globular and open clusters from Large Magellanic Cloud (LMC) to explore the possibility of them having extragalactic origins. We find the difference between the proper motions of cluster stars and a surrounding patch of young LMC stars in each case. We find five globular clusters towards the north-east showing a high difference (> 0.11 mas/yr, or > 25 km/s). We also examine the statistical significance of this difference taking into account both measurement errors of cluster and surrounding stars as well as inherent dispersion of stellar motions in the local galactic environment. The five globular clusters (NGC 2005, NGC 2210, NGC 1978, Hodge 3 and Hodge 11) have mean proper motions that lie outside the 85% confidence interval of the mean of surrounding young stars, with a clear outlier (NGC 1978 outside 99.96% confidence) whose difference cannot be accounted for by statistical noise. A young cluster (NGC 2100) also fitting the criteria is ruled out owing to contrary evidence from literature. This indicates a possible interaction with a dwarf galaxy resulting in the accretion/disruption in path of the five globular clusters, or possibly one or more past merger(s) of smaller galaxy/galaxies with LMC from its north-eastern region. This direction also coincides with the location of Tarantula Nebula, suggesting the possibility of the interaction event or merger having triggered its star formation activity. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: 10 pages, 5 figures, revised version submitted to MNRAS

arXiv:2306.07499 [pdf, other]

Improving Opinion-based Question Answering Systems Through Label Error Detection and Overwrite

Authors: Xiao Yang, Ahmed K. Mohamed, Shashank Jain, Stanislav Peshterliev, Debojeet Chatterjee, Hanwen Zha, Nikita Bhalla, Gagan Aneja, Pranab Mohanty

Abstract: Label error is a ubiquitous problem in annotated data. Large amounts of label error substantially degrades the quality of deep learning models. Existing methods to tackle the label error problem largely focus on the classification task, and either rely on task specific architecture or require non-trivial additional computations, which is undesirable or even unattainable for industry usage. In this… ▽ More Label error is a ubiquitous problem in annotated data. Large amounts of label error substantially degrades the quality of deep learning models. Existing methods to tackle the label error problem largely focus on the classification task, and either rely on task specific architecture or require non-trivial additional computations, which is undesirable or even unattainable for industry usage. In this paper, we propose LEDO: a model-agnostic and computationally efficient framework for Label Error Detection and Overwrite. LEDO is based on Monte Carlo Dropout combined with uncertainty metrics, and can be easily generalized to multiple tasks and data sets. Applying LEDO to an industry opinion-based question answering system demonstrates it is effective at improving accuracy in all the core models. Specifically, LEDO brings 1.1% MRR gain for the retrieval model, 1.5% PR AUC improvement for the machine reading comprehension model, and 0.9% rise in the Average Precision for the ranker, on top of the strong baselines with a large-scale social media dataset. Importantly, LEDO is computationally efficient compared to methods that require loss function change, and cost-effective as the resulting data can be used in the same continuous training pipeline for production. Further analysis shows that these gains come from an improved decision boundary after cleaning the label errors existed in the training data. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2111.14020 [pdf, other]

doi 10.1145/3539597.3570442

Local Edge Dynamics and Opinion Polarization

Authors: Nikita Bhalla, Adam Lechowicz, Cameron Musco

Abstract: The proliferation of social media platforms, recommender systems, and their joint societal impacts have prompted significant interest in opinion formation and evolution within social networks. We study how local edge dynamics can drive opinion polarization. In particular, we introduce a variant of the classic Friedkin-Johnsen opinion dynamics, augmented with a simple time-evolving network model. E… ▽ More The proliferation of social media platforms, recommender systems, and their joint societal impacts have prompted significant interest in opinion formation and evolution within social networks. We study how local edge dynamics can drive opinion polarization. In particular, we introduce a variant of the classic Friedkin-Johnsen opinion dynamics, augmented with a simple time-evolving network model. Edges are iteratively added or deleted according to simple rules, modeling decisions based on individual preferences and network recommendations. Via simulations on synthetic and real-world graphs, we find that the combined presence of two dynamics gives rise to high polarization: 1) confirmation bias -- i.e., the preference for nodes to connect to other nodes with similar expressed opinions and 2) friend-of-friend link recommendations, which encourage new connections between closely connected nodes. We show that our model is tractable to theoretical analysis, which helps explain how these local dynamics erode connectivity across opinion groups, affecting polarization and a related measure of disagreement across edges. Finally, we validate our model against real-world data, showing that our edge dynamics drive the structure of arbitrary graphs, including random graphs, to more closely resemble real social networks. △ Less

Submitted 8 December, 2022; v1 submitted 27 November, 2021; originally announced November 2021.

Comments: Accepted to WSDM 2023. 14 pages, 30 figures

arXiv:2105.08106 [pdf, other]

Multi-Modal Image Captioning for the Visually Impaired

Authors: Hiba Ahsan, Nikita Bhalla, Daivat Bhatt, Kaivankumar Shah

Abstract: One of the ways blind people understand their surroundings is by clicking images and relying on descriptions generated by image captioning systems. Current work on captioning images for the visually impaired do not use the textual data present in the image when generating captions. This problem is critical as many visual scenes contain text. Moreover, up to 21% of the questions asked by blind peop… ▽ More One of the ways blind people understand their surroundings is by clicking images and relying on descriptions generated by image captioning systems. Current work on captioning images for the visually impaired do not use the textual data present in the image when generating captions. This problem is critical as many visual scenes contain text. Moreover, up to 21% of the questions asked by blind people about the images they click pertain to the text present in them. In this work, we propose altering AoANet, a state-of-the-art image captioning model, to leverage the text detected in the image as an input feature. In addition, we use a pointer-generator mechanism to copy the detected text to the caption when tokens need to be reproduced accurately. Our model outperforms AoANet on the benchmark dataset VizWiz, giving a 35% and 16.2% performance improvement on CIDEr and SPICE scores, respectively. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Comments: 8 pages, 2 figures, 2 tables, accepted to NAACL-HLT SRW 2021

arXiv:1707.07201 [pdf, other]

doi 10.1080/10724117.2018.1524093

PRIMES STEP Plays Games

Authors: Pratik Alladi, Neel Bhalla, Tanya Khovanova, Nathan Sheffield, Eddie Song, William Sun, Andrew The, Alan Wang, Naor Wiesel, Kevin Zhang Kevin Zhao

Abstract: A group of students in 7-9 grades are inventing combinatorial impartial games. The games are played on graphs, piles, and grids. We found winning positions, optimal strategies, and other interesting facts about the games. A group of students in 7-9 grades are inventing combinatorial impartial games. The games are played on graphs, piles, and grids. We found winning positions, optimal strategies, and other interesting facts about the games. △ Less

Submitted 22 July, 2017; originally announced July 2017.

Comments: 12 pages, 1 figure

MSC Class: 91A46; 97A20

Showing 1–7 of 7 results for author: Bhalla, N