CRAG -- Comprehensive RAG Benchmark
Authors:
Xiao Yang,
Kai Sun,
Hao Xin,
Yushi Sun,
Nikita Bhalla,
Xiangsen Chen,
Sajal Choudhary,
Rongze Daniel Gui,
Ziran Will Jiang,
Ziyu Jiang,
Lingkun Kong,
Brian Moran,
Jiaqi Wang,
Yifan Ethan Xu,
An Yan,
Chenyu Yang,
Eting Yuan,
Hanwen Zha,
Nan Tang,
Lei Chen,
Nicolas Scheffer,
Yue Liu,
Nirav Shah,
Rakesh Wanga,
Anuj Kumar
, et al. (2 additional authors not shown)
Abstract:
Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering bench…
▽ More
Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation on this benchmark highlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve <=34% accuracy on CRAG, adding RAG in a straightforward manner improves the accuracy only to 44%. State-of-the-art industry RAG solutions only answer 63% questions without any hallucination. CRAG also reveals much lower accuracy in answering questions regarding facts with higher dynamism, lower popularity, or higher complexity, suggesting future research directions. The CRAG benchmark laid the groundwork for a KDD Cup 2024 challenge, attracting thousands of participants and submissions within the first 50 days of the competition. We commit to maintaining CRAG to serve research communities in advancing RAG solutions and general QA solutions.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
Adaptive Line-Of-Sight guidance law based on vector fields path following for underactuated unmanned surface vehicle
Authors:
Jie Qi,
Ronghua Wanga,
Nailong Wu
Abstract:
The focus of this paper is to develop a methodology that enables an unmanned surface vehicle (USV) to efficiently track a planned path. The introduction of a vector field-based adaptive line of-sight guidance law (VFALOS) for accurate trajectory tracking and minimizing the overshoot response time during USV tracking of curved paths improves the overall line-of-sight (LOS) guidance method. These im…
▽ More
The focus of this paper is to develop a methodology that enables an unmanned surface vehicle (USV) to efficiently track a planned path. The introduction of a vector field-based adaptive line of-sight guidance law (VFALOS) for accurate trajectory tracking and minimizing the overshoot response time during USV tracking of curved paths improves the overall line-of-sight (LOS) guidance method. These improvements contribute to faster convergence to the desired path, reduce oscillations, and can mitigate the effects of persistent external disturbances. It is shown that the proposed guidance law exhibits k-exponential stability when converging to the desired path consisting of straight and curved lines. The results in the paper show that the proposed method effectively improves the accuracy of the USV tracking the desired path while ensuring the safety of the USV work.
△ Less
Submitted 5 April, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.