subscribe to arXiv mailings

Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models

Authors: Zhuo Chen, Jiawei Liu, Haotan Liu, Qikai Cheng, Fan Zhang, Wei Lu, Xiaozhong Liu

Abstract: Retrieval-Augmented Generation (RAG) is applied to solve hallucination problems and real-time constraints of large language models, but it also induces vulnerabilities against retrieval corruption attacks. Existing research mainly explores the unreliability of RAG in white-box and closed-domain QA tasks. In this paper, we aim to reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) mod… ▽ More Retrieval-Augmented Generation (RAG) is applied to solve hallucination problems and real-time constraints of large language models, but it also induces vulnerabilities against retrieval corruption attacks. Existing research mainly explores the unreliability of RAG in white-box and closed-domain QA tasks. In this paper, we aim to reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) models when faced with black-box attacks for opinion manipulation. We explore the impact of such attacks on user cognition and decision-making, providing new insight to enhance the reliability and security of RAG models. We manipulate the ranking results of the retrieval model in RAG with instruction and use these results as data to train a surrogate model. By employing adversarial retrieval attack methods to the surrogate model, black-box transfer attacks on RAG are further realized. Experiments conducted on opinion datasets across multiple topics show that the proposed attack strategy can significantly alter the opinion polarity of the content generated by RAG. This demonstrates the model's vulnerability and, more importantly, reveals the potential negative impact on user cognition and decision-making, making it easier to mislead users into accepting incorrect or biased information. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 10 pages, 3 figures, under review

arXiv:2407.12504 [pdf, other]

Case2Code: Learning Inductive Reasoning with Synthetic Data

Authors: Yunfan Shao, Linyang Li, Yichuan Ma, Peiji Li, Demin Song, Qinyuan Cheng, Shimin Li, Xiaonan Li, Pengyu Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang, Dahua Lin

Abstract: Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples o… ▽ More Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples or sequential transformations. However, collecting large-scale and diverse human-generated inductive data is challenging. We focus on data synthesis in the code domain and propose a \textbf{Case2Code} task by exploiting the expressiveness and correctness of programs. Specifically, we collect a diverse set of executable programs, synthesize input-output transformations for each program, and force LLMs to infer the underlying code implementations based on the synthetic I/O cases. We first evaluate representative LLMs on the synthesized Case2Code task and demonstrate that the Case-to-code induction is challenging for LLMs. Then, we synthesize large-scale Case2Code training samples to train LLMs to perform inductive reasoning. Experimental results show that such induction training benefits not only in distribution Case2Code performance but also enhances various coding abilities of trained LLMs, demonstrating the great potential of learning inductive reasoning via synthetic data. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.12105 [pdf, other]

AeroHaptix: A Wearable Vibrotactile Feedback System for Enhancing Collision Avoidance in UAV Teleoperation

Authors: Bingjian Huang, Zhecheng Wang, Qilong Cheng, Siyi Ren, Hanfeng Cai, Antonio Alvarez Valdivia, Karthik Mahadevan, Daniel Wigdor

Abstract: Haptic feedback enhances collision avoidance by providing directional obstacle information to operators in unmanned aerial vehicle (UAV) teleoperation. However, such feedback is often rendered via haptic joysticks, which are unfamiliar to UAV operators and limited to single-directional force feedback. Additionally, the direct coupling of the input device and the feedback method diminishes the oper… ▽ More Haptic feedback enhances collision avoidance by providing directional obstacle information to operators in unmanned aerial vehicle (UAV) teleoperation. However, such feedback is often rendered via haptic joysticks, which are unfamiliar to UAV operators and limited to single-directional force feedback. Additionally, the direct coupling of the input device and the feedback method diminishes the operators' control authority and causes oscillatory movements. To overcome these limitations, we propose AeroHaptix, a wearable haptic feedback system that uses high-resolution vibrations to communicate multiple obstacle directions simultaneously. The vibrotactile actuators' layout was optimized based on a perceptual study to eliminate perceptual biases and achieve uniform spatial coverage. A novel rendering algorithm, MultiCBF, was adapted from control barrier functions to support multi-directional feedback. System evaluation showed that AeroHaptix effectively reduced collisions in complex environment, and operators reported significantly lower physical workload, improved situational awareness, and increased control authority. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.00600 [pdf, other]

GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing

Authors: Yisong Xiao, Aishan Liu, QianJia Cheng, Zhenfei Yin, Siyuan Liang, Jiapeng Li, Jing Shao, Xianglong Liu, Dacheng Tao

Abstract: Large Vision-Language Models (LVLMs) have been widely adopted in various applications; however, they exhibit significant gender biases. Existing benchmarks primarily evaluate gender bias at the demographic group level, neglecting individual fairness, which emphasizes equal treatment of similar individuals. This research gap limits the detection of discriminatory behaviors, as individual fairness o… ▽ More Large Vision-Language Models (LVLMs) have been widely adopted in various applications; however, they exhibit significant gender biases. Existing benchmarks primarily evaluate gender bias at the demographic group level, neglecting individual fairness, which emphasizes equal treatment of similar individuals. This research gap limits the detection of discriminatory behaviors, as individual fairness offers a more granular examination of biases that group fairness may overlook. For the first time, this paper introduces the GenderBias-\emph{VL} benchmark to evaluate occupation-related gender bias in LVLMs using counterfactual visual questions under individual fairness criteria. To construct this benchmark, we first utilize text-to-image diffusion models to generate occupation images and their gender counterfactuals. Subsequently, we generate corresponding textual occupation options by identifying stereotyped occupation pairs with high semantic similarity but opposite gender proportions in real-world statistics. This method enables the creation of large-scale visual question counterfactuals to expose biases in LVLMs, applicable in both multimodal and unimodal contexts through modifying gender attributes in specific modalities. Overall, our GenderBias-\emph{VL} benchmark comprises 34,581 visual question counterfactual pairs, covering 177 occupations. Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs (\eg, LLaVA) and state-of-the-art commercial APIs, including GPT-4o and Gemini-Pro. Our findings reveal widespread gender biases in existing LVLMs. Our benchmark offers: (1) a comprehensive dataset for occupation-related gender bias evaluation; (2) an up-to-date leaderboard on LVLM biases; and (3) a nuanced understanding of the biases presented by these models. \footnote{The dataset and code are available at the \href{https://genderbiasvl.github.io/}{website}.} △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 9 pages, 4 figures

arXiv:2406.15720 [pdf, other]

Scaling Laws for Fact Memorization of Large Language Models

Authors: Xingyu Lu, Xiaonan Li, Qinyuan Cheng, Kai Ding, Xuanjing Huang, Xipeng Qiu

Abstract: Fact knowledge memorization is crucial for Large Language Models (LLM) to generate factual and reliable responses. However, the behaviors of LLM fact memorization remain under-explored. In this paper, we analyze the scaling laws for LLM's fact knowledge and LLMs' behaviors of memorizing different types of facts. We find that LLMs' fact knowledge capacity has a linear and negative exponential law r… ▽ More Fact knowledge memorization is crucial for Large Language Models (LLM) to generate factual and reliable responses. However, the behaviors of LLM fact memorization remain under-explored. In this paper, we analyze the scaling laws for LLM's fact knowledge and LLMs' behaviors of memorizing different types of facts. We find that LLMs' fact knowledge capacity has a linear and negative exponential law relationship with model size and training epochs, respectively. Estimated by the built scaling law, memorizing the whole Wikidata's facts requires training an LLM with 1000B non-embed parameters for 100 epochs, suggesting that using LLMs to memorize all public facts is almost implausible for a general pre-training setting. Meanwhile, we find that LLMs can generalize on unseen fact knowledge and its scaling law is similar to general pre-training. Additionally, we analyze the compatibility and preference of LLMs' fact memorization. For compatibility, we find LLMs struggle with memorizing redundant facts in a unified way. Only when correlated facts have the same direction and structure, the LLM can compatibly memorize them. This shows the inefficiency of LLM memorization for redundant facts. For preference, the LLM pays more attention to memorizing more frequent and difficult facts, and the subsequent facts can overwrite prior facts' memorization, which significantly hinders low-frequency facts memorization. Our findings reveal the capacity and characteristics of LLMs' fact knowledge learning, which provide directions for LLMs' fact knowledge augmentation. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.15279 [pdf, other]

Cross-Modality Safety Alignment

Authors: Siyin Wang, Xingsong Ye, Qinyuan Cheng, Junwen Duan, Shimin Li, Jinlan Fu, Xipeng Qiu, Xuanjing Huang

Abstract: As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of human life, ensuring the safety and ethical alignment of such systems is paramount. Previous studies primarily focus on single-modality threats, which may not suffice given the integrated and complex nature of cross-modality interactions. We introduce a novel safety alignment challenge called Safe Input… ▽ More As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of human life, ensuring the safety and ethical alignment of such systems is paramount. Previous studies primarily focus on single-modality threats, which may not suffice given the integrated and complex nature of cross-modality interactions. We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment. Specifically, it considers cases where single modalities are safe independently but could potentially lead to unsafe or unethical outputs when combined. To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations. Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, such as GPT-4V and LLaVA, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.13990 [pdf, other]

Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation

Authors: Qin Zhu, Qingyuan Cheng, Runyu Peng, Xiaonan Li, Tengxiao Liu, Ru Peng, Xipeng Qiu, Xuanjing Huang

Abstract: The training process of large language models (LLMs) often involves varying degrees of test data contamination. Although current LLMs are achieving increasingly better performance on various benchmarks, their performance in practical applications does not always match their benchmark results. Leakage of benchmarks can prevent the accurate assessment of LLMs' true performance. However, constructing… ▽ More The training process of large language models (LLMs) often involves varying degrees of test data contamination. Although current LLMs are achieving increasingly better performance on various benchmarks, their performance in practical applications does not always match their benchmark results. Leakage of benchmarks can prevent the accurate assessment of LLMs' true performance. However, constructing new benchmarks is costly, labor-intensive and still carries the risk of leakage. Therefore, in this paper, we ask the question, Can we reuse these leaked benchmarks for LLM evaluation? We propose Inference-Time Decontamination (ITD) to address this issue by detecting and rewriting leaked samples without altering their difficulties. ITD can mitigate performance inflation caused by memorizing leaked benchmarks. Our proof-of-concept experiments demonstrate that ITD reduces inflated accuracy by 22.9% on GSM8K and 19.0% on MMLU. On MMLU, using Inference-time Decontamination can lead to a decrease in the results of Phi3 and Mistral by 6.7% and 3.6% respectively. We hope that ITD can provide more truthful evaluation results for large language models. △ Less

Submitted 23 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13007 [pdf, other]

NTIRE 2024 Challenge on Night Photography Rendering

Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, Jingyuan Xiao , et al. (25 additional authors not shown)

Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algorithms was also measured alongside the quality of their output. To evaluate the results, a sufficient number of viewers were asked to assess the visual quality of the proposed solutions, considering the subjective nature of the task. There were 2 nominations: quality and efficiency. Top 5 solutions in terms of output quality were sorted by evaluation time (see Fig. 1). The top ranking participants' solutions effectively represent the state-of-the-art in nighttime photography rendering. More results can be found at https://nightimaging.org. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 10 pages, 10 figures

arXiv:2406.12847 [pdf, other]

ChangeViT: Unleashing Plain Vision Transformers for Change Detection

Authors: Duowang Zhu, Xiaohu Huang, Haiyan Huang, Zhenfeng Shao, Qimin Cheng

Abstract: Change detection in remote sensing images is essential for tracking environmental changes on the Earth's surface. Despite the success of vision transformers (ViTs) as backbones in numerous computer vision applications, they remain underutilized in change detection, where convolutional neural networks (CNNs) continue to dominate due to their powerful feature extraction capabilities. In this paper,… ▽ More Change detection in remote sensing images is essential for tracking environmental changes on the Earth's surface. Despite the success of vision transformers (ViTs) as backbones in numerous computer vision applications, they remain underutilized in change detection, where convolutional neural networks (CNNs) continue to dominate due to their powerful feature extraction capabilities. In this paper, our study uncovers ViTs' unique advantage in discerning large-scale changes, a capability where CNNs fall short. Capitalizing on this insight, we introduce ChangeViT, a framework that adopts a plain ViT backbone to enhance the performance of large-scale changes. This framework is supplemented by a detail-capture module that generates detailed spatial features and a feature injector that efficiently integrates fine-grained spatial information into high-level semantic learning. The feature integration ensures that ChangeViT excels in both detecting large-scale changes and capturing fine-grained details, providing comprehensive change detection across diverse scales. Without bells and whistles, ChangeViT achieves state-of-the-art performance on three popular high-resolution datasets (i.e., LEVIR-CD, WHU-CD, and CLCD) and one low-resolution dataset (i.e., OSCD), which underscores the unleashed potential of plain ViTs for change detection. Furthermore, thorough quantitative and qualitative analyses validate the efficacy of the introduced modules, solidifying the effectiveness of our approach. The source code is available at https://github.com/zhuduowang/ChangeViT. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12534 [pdf, other]

Unified Active Retrieval for Retrieval Augmented Generation

Authors: Qinyuan Cheng, Xiaonan Li, Shimin Li, Qin Zhu, Zhangyue Yin, Yunfan Shao, Linyang Li, Tianxiang Sun, Hang Yan, Xipeng Qiu

Abstract: In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal. Therefore, determining whether to retrieve is crucial for RAG, which is usually referred to as Active Retrieval. However, existing active retrieval methods face two challenges: 1. They usually rely on a single criterion, which struggles with handling various types of instru… ▽ More In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal. Therefore, determining whether to retrieve is crucial for RAG, which is usually referred to as Active Retrieval. However, existing active retrieval methods face two challenges: 1. They usually rely on a single criterion, which struggles with handling various types of instructions. 2. They depend on specialized and highly differentiated procedures, and thus combining them makes the RAG system more complicated and leads to higher response latency. To address these challenges, we propose Unified Active Retrieval (UAR). UAR contains four orthogonal criteria and casts them into plug-and-play classification tasks, which achieves multifaceted retrieval timing judgements with negligible extra inference cost. We further introduce the Unified Active Retrieval Criteria (UAR-Criteria), designed to process diverse active retrieval scenarios through a standardized procedure. Experiments on four representative types of user instructions show that UAR significantly outperforms existing work on the retrieval timing judgement and the performance of downstream tasks, which shows the effectiveness of UAR and its helpfulness to downstream tasks. △ Less

Submitted 27 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.04419 [pdf, other]

TSCMamba: Mamba Meets Multi-View Learning for Time Series Classification

Authors: Md Atik Ahamed, Qiang Cheng

Abstract: Time series classification (TSC) on multivariate time series is a critical problem. We propose a novel multi-view approach integrating frequency-domain and time-domain features to provide complementary contexts for TSC. Our method fuses continuous wavelet transform spectral features with temporal convolutional or multilayer perceptron features. We leverage the Mamba state space model for efficient… ▽ More Time series classification (TSC) on multivariate time series is a critical problem. We propose a novel multi-view approach integrating frequency-domain and time-domain features to provide complementary contexts for TSC. Our method fuses continuous wavelet transform spectral features with temporal convolutional or multilayer perceptron features. We leverage the Mamba state space model for efficient and scalable sequence modeling. We also introduce a novel tango scanning scheme to better model sequence relationships. Experiments on 10 standard benchmark datasets demonstrate our approach achieves an average 6.45% accuracy improvement over state-of-the-art TSC models. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.04145 [pdf, other]

Every Answer Matters: Evaluating Commonsense with Probabilistic Measures

Authors: Qi Cheng, Michael Boratko, Pranay Kumar Yelugam, Tim O'Gorman, Nalini Singh, Andrew McCallum, Xiang Lorraine Li

Abstract: Large language models have demonstrated impressive performance on commonsense tasks; however, these tasks are often posed as multiple-choice questions, allowing models to exploit systematic biases. Commonsense is also inherently probabilistic with multiple correct answers. The purpose of "boiling water" could be making tea and cooking, but it also could be killing germs. Existing tasks do not capt… ▽ More Large language models have demonstrated impressive performance on commonsense tasks; however, these tasks are often posed as multiple-choice questions, allowing models to exploit systematic biases. Commonsense is also inherently probabilistic with multiple correct answers. The purpose of "boiling water" could be making tea and cooking, but it also could be killing germs. Existing tasks do not capture the probabilistic nature of common sense. To this end, we present commonsense frame completion (CFC), a new generative task that evaluates common sense via multiple open-ended generations. We also propose a method of probabilistic evaluation that strongly correlates with human judgments. Humans drastically outperform strong language model baselines on our dataset, indicating this approach is both a challenging and useful evaluation of machine common sense. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: ACL 2024 Camera Ready

arXiv:2405.18458 [pdf]

Asymmetrical estimator for training grey-box deep photonic neural networks

Authors: Yizhi Wang, Minjia Chen, Chunhui Yao, Jie Ma, Ting Yan, Richard Penty, Qixiang Cheng

Abstract: Physical neural networks (PNNs) are emerging paradigms for neural network acceleration due to their high-bandwidth, in-propagation analogue processing. Despite the advantages of PNN for inference, training remains a challenge. The imperfect information of the physical transformation means the failure of conventional gradient-based updates from backpropagation (BP). Here, we present the asymmetrica… ▽ More Physical neural networks (PNNs) are emerging paradigms for neural network acceleration due to their high-bandwidth, in-propagation analogue processing. Despite the advantages of PNN for inference, training remains a challenge. The imperfect information of the physical transformation means the failure of conventional gradient-based updates from backpropagation (BP). Here, we present the asymmetrical training (AT) method, which treats the PNN structure as a grey box. AT performs training while only knowing the last layer output and neuron topological connectivity of a deep neural network structure, not requiring information about the physical control-transformation mapping. We experimentally demonstrated the AT method on deep grey-box PNNs implemented by uncalibrated photonic integrated circuits (PICs), improving the classification accuracy of Iris flower and modified MNIST hand-written digits from random guessing to near theoretical maximum. We also showcased the consistently enhanced performance of AT over BP for different datasets, including MNIST, fashion-MNIST, and Kuzushiji-MNIST. The AT method demonstrated successful training with minimal hardware overhead and reduced computational overhead, serving as a robust light-weight training alternative to fully explore the advantages of physical computation. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 17 pages, 5 figures

MSC Class: 78-05

arXiv:2405.13336 [pdf, other]

SIGGesture: Generalized Co-Speech Gesture Synthesis via Semantic Injection with Large-Scale Pre-Training Diffusion Models

Authors: Qingrong Cheng, Xu Li, Xinghui Fu

Abstract: The automated synthesis of high-quality 3D gestures from speech is of significant value in virtual humans and gaming. Previous methods focus on synthesizing gestures that are synchronized with speech rhythm, yet they frequently overlook the inclusion of semantic gestures. These are sparse and follow a long-tailed distribution across the gesture sequence, making them difficult to learn in an end-to… ▽ More The automated synthesis of high-quality 3D gestures from speech is of significant value in virtual humans and gaming. Previous methods focus on synthesizing gestures that are synchronized with speech rhythm, yet they frequently overlook the inclusion of semantic gestures. These are sparse and follow a long-tailed distribution across the gesture sequence, making them difficult to learn in an end-to-end manner. Moreover, generating gestures, rhythmically aligned with speech, faces a significant issue that cannot be generalized to in-the-wild speeches. To address these issues, we introduce SIGGesture, a novel diffusion-based approach for synthesizing realistic gestures that are of both high quality and semantically pertinent. Specifically, we firstly build a strong diffusion-based foundation model for rhythmical gesture synthesis by pre-training it on a collected large-scale dataset with pseudo labels. Secondly, we leverage the powerful generalization capabilities of Large Language Models (LLMs) to generate proper semantic gestures for the various speech content. Finally, we propose a semantic injection module to infuse semantic information into the synthesized results during diffusion reverse process. Extensive experiments demonstrate that the proposed SIGGesture significantly outperforms existing baselines and shows excellent generalization and controllability. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 18 pages, 5 figures

ACM Class: I.2.6

arXiv:2405.12939 [pdf, other]

Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models

Authors: Zhangyue Yin, Qiushi Sun, Qipeng Guo, Zhiyuan Zeng, Xiaonan Li, Tianxiang Sun, Cheng Chang, Qinyuan Cheng, Ding Wang, Xiaofeng Mou, Xipeng Qiu, XuanJing Huang

Abstract: Recent advancements in Chain-of-Thought prompting have facilitated significant breakthroughs for Large Language Models (LLMs) in complex reasoning tasks. Current research enhances the reasoning performance of LLMs by sampling multiple reasoning chains and ensembling based on the answer frequency. However, this approach fails in scenarios where the correct answers are in the minority. We identify t… ▽ More Recent advancements in Chain-of-Thought prompting have facilitated significant breakthroughs for Large Language Models (LLMs) in complex reasoning tasks. Current research enhances the reasoning performance of LLMs by sampling multiple reasoning chains and ensembling based on the answer frequency. However, this approach fails in scenarios where the correct answers are in the minority. We identify this as a primary factor constraining the reasoning capabilities of LLMs, a limitation that cannot be resolved solely based on the predicted answers. To address this shortcoming, we introduce a hierarchical reasoning aggregation framework AoR (Aggregation of Reasoning), which selects answers based on the evaluation of reasoning chains. Additionally, AoR incorporates dynamic sampling, adjusting the number of reasoning chains in accordance with the complexity of the task. Experimental results on a series of complex reasoning tasks show that AoR outperforms prominent ensemble methods. Further analysis reveals that AoR not only adapts various LLMs but also achieves a superior performance ceiling when compared to current methods. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 17 pages, 14 figures, accepted by LREC-COLING 2024

arXiv:2404.19534 [pdf, other]

MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/. △ Less

Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

arXiv:2404.07108 [pdf, other]

From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications

Authors: Yongqiang Ma, Lizhi Qing, Jiawei Liu, Yangyang Kang, Yue Zhang, Wei Lu, Xiaozhong Liu, Qikai Cheng

Abstract: Evaluating large language models (LLMs) is fundamental, particularly in the context of practical applications. Conventional evaluation methods, typically designed primarily for LLM development, yield numerical scores that ignore the user experience. Therefore, our study shifts the focus from model-centered to human-centered evaluation in the context of AI-powered writing assistance applications. O… ▽ More Evaluating large language models (LLMs) is fundamental, particularly in the context of practical applications. Conventional evaluation methods, typically designed primarily for LLM development, yield numerical scores that ignore the user experience. Therefore, our study shifts the focus from model-centered to human-centered evaluation in the context of AI-powered writing assistance applications. Our proposed metric, termed ``Revision Distance,'' utilizes LLMs to suggest revision edits that mimic the human writing process. It is determined by counting the revision edits generated by LLMs. Benefiting from the generated revision edit details, our metric can provide a self-explained text evaluation result in a human-understandable manner beyond the context-independent score. Our results show that for the easy-writing task, ``Revision Distance'' is consistent with established metrics (ROUGE, Bert-score, and GPT-score), but offers more insightful, detailed feedback and better distinguishes between texts. Moreover, in the context of challenging academic writing tasks, our metric still delivers reliable evaluations where other metrics tend to struggle. Furthermore, our metric also holds significant potential for scenarios lacking reference texts. △ Less

Submitted 10 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: 9 pages, 2 figures, under review

arXiv:2404.01624 [pdf]

Intelligent Optimization of Mine Environmental Damage Assessment and Repair Strategies Based on Deep Learning

Authors: Qishuo Cheng

Abstract: In recent decades, financial quantification has emerged and matured rapidly. For financial institutions such as funds, investment institutions are increasingly dissatisfied with the situation of passively constructing investment portfolios with average market returns, and are paying more and more attention to active quantitative strategy investment portfolios. This requires the introduction of act… ▽ More In recent decades, financial quantification has emerged and matured rapidly. For financial institutions such as funds, investment institutions are increasingly dissatisfied with the situation of passively constructing investment portfolios with average market returns, and are paying more and more attention to active quantitative strategy investment portfolios. This requires the introduction of active stock investment fund management models. Currently, in my country's stock fund investment market, there are many active quantitative investment strategies, and the algorithms used vary widely, such as SVM, random forest, RNN recurrent memory network, etc. This article focuses on this trend, using the emerging LSTM-GRU gate-controlled long short-term memory network model in the field of financial stock investment as a basis to build a set of active investment stock strategies, and combining it with SVM, which has been widely used in the field of quantitative stock investment. Comparing models such as RNN, theoretically speaking, compared to SVM that simply relies on kernel functions for high-order mapping and classification of data, neural network algorithms such as RNN and LSTM-GRU have better principles and are more suitable for processing financial stock data. Then, through multiple By comparison, it was finally found that the LSTM- GRU gate-controlled long short-term memory network has a better accuracy. By selecting the LSTM-GRU algorithm to construct a trading strategy based on the Shanghai and Shenzhen 300 Index constituent stocks, the parameters were adjusted and the neural layer connection was adjusted. Finally, It has significantly outperformed the benchmark index CSI 300 over the long term. The conclusion of this article is that the research results can provide certain quantitative strategy references for financial institutions to construct active stock investment portfolios. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.09898 [pdf, other]

TimeMachine: A Time Series is Worth 4 Mambas for Long-term Forecasting

Authors: Md Atik Ahamed, Qiang Cheng

Abstract: Long-term time-series forecasting remains challenging due to the difficulty in capturing long-term dependencies, achieving linear scalability, and maintaining computational efficiency. We introduce TimeMachine, an innovative model that leverages Mamba, a state-space model, to capture long-term dependencies in multivariate time series data while maintaining linear scalability and small memory footp… ▽ More Long-term time-series forecasting remains challenging due to the difficulty in capturing long-term dependencies, achieving linear scalability, and maintaining computational efficiency. We introduce TimeMachine, an innovative model that leverages Mamba, a state-space model, to capture long-term dependencies in multivariate time series data while maintaining linear scalability and small memory footprints. TimeMachine exploits the unique properties of time series data to produce salient contextual cues at multi-scales and leverage an innovative integrated quadruple-Mamba architecture to unify the handling of channel-mixing and channel-independence situations, thus enabling effective selection of contents for prediction against global and local contexts at different scales. Experimentally, TimeMachine achieves superior performance in prediction accuracy, scalability, and memory efficiency, as extensively validated using benchmark datasets. Code availability: https://github.com/Atik-Ahamed/TimeMachine △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.02866 [pdf]

Unlocking Electro-optic Resonant Phase Shifting for Multi-dimensional, Ultra-dynamic Photonic Switches

Authors: Lingzhi Luo, Rui Ma, Richard V. Penty, Qixiang Cheng

Abstract: Optical circuit switching is connection-oriented, being deterministic through the reservation of a complete wavelength channel or spatial path for a certain period. However, this comes at a trade-off against link dynamics, and overall capacity can thus be constrained by the time slot reservations, especially for switches with microsecond- to millisecond-scale reconfiguration times. For data-intens… ▽ More Optical circuit switching is connection-oriented, being deterministic through the reservation of a complete wavelength channel or spatial path for a certain period. However, this comes at a trade-off against link dynamics, and overall capacity can thus be constrained by the time slot reservations, especially for switches with microsecond- to millisecond-scale reconfiguration times. For data-intensive applications, the communication patterns associated with random data sets typically yield short-lived flows. This situation calls for a new multi-dimensional switching paradigm that fully exploits not only the space and wavelength domains but also with nanosecond-scale reconfigurable capability in the time domain to enable ultra-dynamic links. In this work, we focus on the exploitation of micro-ring resonant phase shifters (RPSs) that are wavelength selective for optical switching in a single plane. By proposing an innovative analytical method with transmission circle chart, we fully unlock the power of RPS with nanosecond-scale reconfigurability and the capability to arbitrarily manipulate its phase and amplitude. Such a compact model offers fresh insights into designs with under and critically coupled RPSs beyond the commonly explored over-coupling condition. This creates not only versatile switch elements but also perfect absorbers for robust multi-wavelength operations. The proposed device can bring about a breakthrough in the optical switching capacity that potentially addresses the challenges faced by modern data center networks, as well as other photonic circuits for high-throughput signal processing. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 10 pages

arXiv:2403.02757 [pdf, other]

In-Memory Learning: A Declarative Learning Framework for Large Language Models

Authors: Bo Wang, Tianxiang Sun, Hang Yan, Siyin Wang, Qingyuan Cheng, Xipeng Qiu

Abstract: The exploration of whether agents can align with their environment without relying on human-labeled data presents an intriguing research topic. Drawing inspiration from the alignment process observed in intelligent organisms, where declarative memory plays a pivotal role in summarizing past experiences, we propose a novel learning framework. The agents adeptly distill insights from past experience… ▽ More The exploration of whether agents can align with their environment without relying on human-labeled data presents an intriguing research topic. Drawing inspiration from the alignment process observed in intelligent organisms, where declarative memory plays a pivotal role in summarizing past experiences, we propose a novel learning framework. The agents adeptly distill insights from past experiences, refining and updating existing notes to enhance their performance in the environment. This entire process transpires within the memory components and is implemented through natural language, so we character this framework as In-memory Learning. We also delve into the key features of benchmarks designed to evaluate the self-improvement process. Through systematic experiments, we demonstrate the effectiveness of our framework and provide insights into this problem. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.02232 [pdf]

doi 10.62051/ijcsit.v2n1.01

Comprehensive evaluation of Mal-API-2019 dataset by machine learning in malware detection

Authors: Zhenglin Li, Haibei Zhu, Houze Liu, Jintong Song, Qishuo Cheng

Abstract: This study conducts a thorough examination of malware detection using machine learning techniques, focusing on the evaluation of various classification models using the Mal-API-2019 dataset. The aim is to advance cybersecurity capabilities by identifying and mitigating threats more effectively. Both ensemble and non-ensemble machine learning methods, such as Random Forest, XGBoost, K Nearest Neigh… ▽ More This study conducts a thorough examination of malware detection using machine learning techniques, focusing on the evaluation of various classification models using the Mal-API-2019 dataset. The aim is to advance cybersecurity capabilities by identifying and mitigating threats more effectively. Both ensemble and non-ensemble machine learning methods, such as Random Forest, XGBoost, K Nearest Neighbor (KNN), and Neural Networks, are explored. Special emphasis is placed on the importance of data pre-processing techniques, particularly TF-IDF representation and Principal Component Analysis, in improving model performance. Results indicate that ensemble methods, particularly Random Forest and XGBoost, exhibit superior accuracy, precision, and recall compared to others, highlighting their effectiveness in malware detection. The paper also discusses limitations and potential future directions, emphasizing the need for continuous adaptation to address the evolving nature of malware. This research contributes to ongoing discussions in cybersecurity and provides practical insights for developing more robust malware detection systems in the digital era. △ Less

Submitted 25 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Journal ref: International Journal of Computer Science and Information Technology, 2024, 2(1), 1-9

arXiv:2403.01209 [pdf, other]

Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning

Authors: Shuo Yang, Zirui Shang, Yongqi Wang, Derong Deng, Hongwei Chen, Qiyuan Cheng, Xinxiao Wu

Abstract: This paper proposes a novel framework for multi-label image recognition without any training data, called data-free framework, which uses knowledge of pre-trained Large Language Model (LLM) to learn prompts to adapt pretrained Vision-Language Model (VLM) like CLIP to multilabel classification. Through asking LLM by well-designed questions, we acquire comprehensive knowledge about characteristics a… ▽ More This paper proposes a novel framework for multi-label image recognition without any training data, called data-free framework, which uses knowledge of pre-trained Large Language Model (LLM) to learn prompts to adapt pretrained Vision-Language Model (VLM) like CLIP to multilabel classification. Through asking LLM by well-designed questions, we acquire comprehensive knowledge about characteristics and contexts of objects, which provides valuable text descriptions for learning prompts. Then we propose a hierarchical prompt learning method by taking the multi-label dependency into consideration, wherein a subset of category-specific prompt tokens are shared when the corresponding objects exhibit similar attributes or are more likely to co-occur. Benefiting from the remarkable alignment between visual and linguistic semantics of CLIP, the hierarchical prompts learned from text descriptions are applied to perform classification of images during inference. Our framework presents a new way to explore the synergies between multiple pre-trained models for novel category recognition. Extensive experiments on three public datasets (MS-COCO, VOC2007, and NUS-WIDE) demonstrate that our method achieves better results than the state-of-the-art methods, especially outperforming the zero-shot multi-label recognition methods by 4.7% in mAP on MS-COCO. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.17194 [pdf]

The Random Forest Model for Analyzing and Forecasting the US Stock Market in the Context of Smart Finance

Authors: Jiajian Zheng, Duan Xin, Qishuo Cheng, Miao Tian, Le Yang

Abstract: The stock market is a crucial component of the financial market, playing a vital role in wealth accumulation for investors, financing costs for listed companies, and the stable development of the national macroeconomy. Significant fluctuations in the stock market can damage the interests of stock investors and cause an imbalance in the industrial structure, which can interfere with the macro level… ▽ More The stock market is a crucial component of the financial market, playing a vital role in wealth accumulation for investors, financing costs for listed companies, and the stable development of the national macroeconomy. Significant fluctuations in the stock market can damage the interests of stock investors and cause an imbalance in the industrial structure, which can interfere with the macro level development of the national economy. The prediction of stock price trends is a popular research topic in academia. Predicting the three trends of stock pricesrising, sideways, and falling can assist investors in making informed decisions about buying, holding, or selling stocks. Establishing an effective forecasting model for predicting these trends is of substantial practical importance. This paper evaluates the predictive performance of random forest models combined with artificial intelligence on a test set of four stocks using optimal parameters. The evaluation considers both predictive accuracy and time efficiency. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 10 pages, 8 figures

arXiv:2402.17191 [pdf]

AI-Driven Anonymization: Protecting Personal Data Privacy While Leveraging Machine Learning

Authors: Le Yang, Miao Tian, Duan Xin, Qishuo Cheng, Jiajian Zheng

Abstract: The development of artificial intelligence has significantly transformed people's lives. However, it has also posed a significant threat to privacy and security, with numerous instances of personal information being exposed online and reports of criminal attacks and theft. Consequently, the need to achieve intelligent protection of personal information through machine learning algorithms has becom… ▽ More The development of artificial intelligence has significantly transformed people's lives. However, it has also posed a significant threat to privacy and security, with numerous instances of personal information being exposed online and reports of criminal attacks and theft. Consequently, the need to achieve intelligent protection of personal information through machine learning algorithms has become a paramount concern. Artificial intelligence leverages advanced algorithms and technologies to effectively encrypt and anonymize personal data, enabling valuable data analysis and utilization while safeguarding privacy. This paper focuses on personal data privacy protection and the promotion of anonymity as its core research objectives. It achieves personal data privacy protection and detection through the use of machine learning's differential privacy protection algorithm. The paper also addresses existing challenges in machine learning related to privacy and personal data protection, offers improvement suggestions, and analyzes factors impacting datasets to enable timely personal data privacy detection and protection. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 9 pages, 6 figures

arXiv:2402.15994 [pdf]

Optimizing Portfolio Management and Risk Assessment in Digital Assets Using Deep Learning for Predictive Analysis

Authors: Qishuo Cheng, Le Yang, Jiajian Zheng, Miao Tian, Duan Xin

Abstract: Portfolio management issues have been extensively studied in the field of artificial intelligence in recent years, but existing deep learning-based quantitative trading methods have some areas where they could be improved. First of all, the prediction mode of stocks is singular; often, only one trading expert is trained by a model, and the trading decision is solely based on the prediction results… ▽ More Portfolio management issues have been extensively studied in the field of artificial intelligence in recent years, but existing deep learning-based quantitative trading methods have some areas where they could be improved. First of all, the prediction mode of stocks is singular; often, only one trading expert is trained by a model, and the trading decision is solely based on the prediction results of the model. Secondly, the data source used by the model is relatively simple, and only considers the data of the stock itself, ignoring the impact of the whole market risk on the stock. In this paper, the DQN algorithm is introduced into asset management portfolios in a novel and straightforward way, and the performance greatly exceeds the benchmark, which fully proves the effectiveness of the DRL algorithm in portfolio management. This also inspires us to consider the complexity of financial problems, and the use of algorithms should be fully combined with the problems to adapt. Finally, in this paper, the strategy is implemented by selecting the assets and actions with the largest Q value. Since different assets are trained separately as environments, there may be a phenomenon of Q value drift among different assets (different assets have different Q value distribution areas), which may easily lead to incorrect asset selection. Consider adding constraints so that the Q values of different assets share a Q value distribution to improve results. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: 10 pages, 5 figures

arXiv:2402.13008 [pdf, other]

Efficient Enumeration of Large Maximal k-Plexes

Authors: Qihao Cheng, Da Yan, Tianhao Wu, Lyuheng Yuan, Ji Cheng, Zhongyi Huang, Yang Zhou

Abstract: Finding cohesive subgraphs in a large graph has many important applications, such as community detection and biological network analysis. Clique is often a too strict cohesive structure since communities or biological modules rarely form as cliques for various reasons such as data noise. Therefore, $k$-plex is introduced as a popular clique relaxation, which is a graph where every vertex is adjace… ▽ More Finding cohesive subgraphs in a large graph has many important applications, such as community detection and biological network analysis. Clique is often a too strict cohesive structure since communities or biological modules rarely form as cliques for various reasons such as data noise. Therefore, $k$-plex is introduced as a popular clique relaxation, which is a graph where every vertex is adjacent to all but at most $k$ vertices. In this paper, we propose a fast branch-and-bound algorithm as well as its task-based parallel version to enumerate all maximal $k$-plexes with at least $q$ vertices. Our algorithm adopts an effective search space partitioning approach that provides a lower time complexity, a new pivot vertex selection method that reduces candidate vertex size, an effective upper-bounding technique to prune useless branches, and three novel pruning techniques by vertex pairs. Our parallel algorithm uses a timeout mechanism to eliminate straggler tasks, and maximizes cache locality while ensuring load balancing. Extensive experiments show that compared with the state-of-the-art algorithms, our sequential and parallel algorithms enumerate large maximal $k$-plexes with up to $5 \times$ and $18.9 \times$ speedup, respectively. Ablation results also demonstrate that our pruning techniques bring up to $7 \times$ speedup compared with our basic algorithm. △ Less

Submitted 10 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: Accepted by EDBT2025. Camera-ready version

arXiv:2402.12201 [pdf, other]

Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT

Authors: Zhengfu He, Xuyang Ge, Qiong Tang, Tianxiang Sun, Qinyuan Cheng, Xipeng Qiu

Abstract: Sparse dictionary learning has been a rapidly growing technique in mechanistic interpretability to attack superposition and extract more human-understandable features from model activations. We ask a further question based on the extracted more monosemantic features: How do we recognize circuits connecting the enormous amount of dictionary features? We propose a circuit discovery framework alterna… ▽ More Sparse dictionary learning has been a rapidly growing technique in mechanistic interpretability to attack superposition and extract more human-understandable features from model activations. We ask a further question based on the extracted more monosemantic features: How do we recognize circuits connecting the enormous amount of dictionary features? We propose a circuit discovery framework alternative to activation patching. Our framework suffers less from out-of-distribution and proves to be more efficient in terms of asymptotic complexity. The basic unit in our framework is dictionary features decomposed from all modules writing to the residual stream, including embedding, attention output and MLP output. Starting from any logit, dictionary feature or attention score, we manage to trace down to lower-level dictionary features of all tokens and compute their contribution to these more interpretable and local model behaviors. We dig in a small transformer trained on a synthetic task named Othello and find a number of human-understandable fine-grained circuits inside of it. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 24 pages, 13 figures. Not final version. Better dictionary training in progress

arXiv:2402.11251 [pdf, other]

LLM can Achieve Self-Regulation via Hyperparameter Aware Generation

Authors: Siyin Wang, Shimin Li, Tianxiang Sun, Jinlan Fu, Qinyuan Cheng, Jiasheng Ye, Junjie Ye, Xipeng Qiu, Xuanjing Huang

Abstract: In the realm of Large Language Models (LLMs), users commonly employ diverse decoding strategies and adjust hyperparameters to control the generated text. However, a critical question emerges: Are LLMs conscious of the existence of these decoding strategies and capable of regulating themselves? The current decoding generation process often relies on empirical and heuristic manual adjustments to hyp… ▽ More In the realm of Large Language Models (LLMs), users commonly employ diverse decoding strategies and adjust hyperparameters to control the generated text. However, a critical question emerges: Are LLMs conscious of the existence of these decoding strategies and capable of regulating themselves? The current decoding generation process often relies on empirical and heuristic manual adjustments to hyperparameters based on types of tasks and demands. However, this process is typically cumbersome, and the decoding hyperparameters may not always be optimal for each sample. To address the aforementioned challenges, we propose a novel text generation paradigm termed Hyperparameter Aware Generation (HAG). By leveraging hyperparameter-aware instruction tuning, the LLM autonomously determines the optimal decoding strategy and configs based on the input samples, enabling self-regulation. Our approach eliminates the need for extensive manual tuning, offering a more autonomous, self-regulate model behavior. Experimental results spanning six datasets across reasoning, creativity, translation, and mathematics tasks demonstrate that hyperparameter-aware instruction tuning empowers the LLMs to self-regulate the decoding strategy and hyperparameter. HAG extends the current paradigm in the text generation process, highlighting the feasibility of endowing the LLMs with self-regulate decoding strategies. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.10738 [pdf, other]

Let's Learn Step by Step: Enhancing In-Context Learning Ability with Curriculum Learning

Authors: Yinpeng Liu, Jiawei Liu, Xiang Shi, Qikai Cheng, Yong Huang, Wei Lu

Abstract: Demonstration ordering, which is an important strategy for in-context learning (ICL), can significantly affects the performance of large language models (LLMs). However, most of the current approaches of ordering require high computational costs to introduce the priori knowledge. In this paper, inspired by the human learning process, we propose a simple but effective demonstration ordering method… ▽ More Demonstration ordering, which is an important strategy for in-context learning (ICL), can significantly affects the performance of large language models (LLMs). However, most of the current approaches of ordering require high computational costs to introduce the priori knowledge. In this paper, inspired by the human learning process, we propose a simple but effective demonstration ordering method for ICL, named the few-shot In-Context Curriculum Learning (ICCL). The ICCL implies gradually increasing the complexity of prompt demonstrations during the inference process. The difficulty can be assessed by human experts or LLMs-driven metrics, such as perplexity. Then we design extensive experiments to discuss the effectiveness of the ICCL at both corpus-level and instance-level. Moreover, we also investigate the formation mechanism of LLM's ICCL capability. Experimental results demonstrate that ICCL, developed during the instruction-tuning stage, is effective for representative open-source LLMs. To facilitate further research and applications by other scholars, we make the code publicly available. △ Less

Submitted 16 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.07234 [pdf, other]

CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for Chinese Public Security Domain

Authors: Xin Tong, Bo Jin, Zhi Lin, Binjun Wang, Ting Yu, Qiang Cheng

Abstract: Large Language Models (LLMs) have demonstrated significant potential and effectiveness across multiple application domains. To assess the performance of mainstream LLMs in public security tasks, this study aims to construct a specialized evaluation benchmark tailored to the Chinese public security domain--CPSDbench. CPSDbench integrates datasets related to public security collected from real-world… ▽ More Large Language Models (LLMs) have demonstrated significant potential and effectiveness across multiple application domains. To assess the performance of mainstream LLMs in public security tasks, this study aims to construct a specialized evaluation benchmark tailored to the Chinese public security domain--CPSDbench. CPSDbench integrates datasets related to public security collected from real-world scenarios, supporting a comprehensive assessment of LLMs across four key dimensions: text classification, information extraction, question answering, and text generation. Furthermore, this study introduces a set of innovative evaluation metrics designed to more precisely quantify the efficacy of LLMs in executing tasks related to public security. Through the in-depth analysis and evaluation conducted in this research, we not only enhance our understanding of the performance strengths and limitations of existing models in addressing public security issues but also provide references for the future development of more accurate and customized LLM models targeted at applications in this field. △ Less

Submitted 21 March, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.01144 [pdf, ps, other]

A Construction of Evolving $k$-threshold Secret Sharing Scheme over A Polynomial Ring

Authors: Qi Cheng, Hongru Cao, Sian-Jheng Lin, Nenghai Yu

Abstract: The threshold secret sharing scheme allows the dealer to distribute the share to every participant such that the secret is correctly recovered from a certain amount of shares. The traditional $(k, n)$-threshold secret sharing scheme requests that the number of participants $n$ is known in advance. In contrast, the evolving secret sharing scheme allows that $n$ can be uncertain and even ever-growin… ▽ More The threshold secret sharing scheme allows the dealer to distribute the share to every participant such that the secret is correctly recovered from a certain amount of shares. The traditional $(k, n)$-threshold secret sharing scheme requests that the number of participants $n$ is known in advance. In contrast, the evolving secret sharing scheme allows that $n$ can be uncertain and even ever-growing. In this paper, we consider the evolving secret sharing scenario. Using the prefix codes and the properties of the polynomial ring, we propose a brand-new construction of evolving $k$-threshold secret sharing scheme for an $\ell$-bit secret over a polynomial ring, with correctness and perfect security. The proposed schemes establish the connection between prefix codes and the evolving schemes for $k\geq2$, and are also first evolving $k$-threshold secret sharing schemes by generalizing Shamir's scheme onto a polynomial ring. Specifically, the proposal also provides an unified mathematical decryption for prior evolving $2$-threshold secret sharing schemes. Besides, the analysis of the proposed schemes show that the size of the $t$-th share is $(k-1)(\ell_t-1)+\ell$ bits, where $\ell_t$ denotes the length of a binary prefix code of encoding integer $t$. In particular, when $δ$ code is chosen as the prefix code, the share size achieves $(k-1)\lfloor\lg t\rfloor+2(k-1)\lfloor\lg ({\lfloor\lg t\rfloor+1}) \rfloor+\ell$, which improves the prior best result $(k-1)\lg t+6k^4\ell\lg{\lg t}\cdot\lg{\lg {\lg t}}+ 7k^4\ell\lg k$, where $\lg$ denotes the binary logarithm. When $k=2$, the proposed scheme also achieves the minimal share size for single-bit secret, which is the same as the best known scheme. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.13275 [pdf, other]

Can AI Assistants Know What They Don't Know?

Authors: Qinyuan Cheng, Tianxiang Sun, Xiangyang Liu, Wenwei Zhang, Zhangyue Yin, Shimin Li, Linyang Li, Zhengfu He, Kai Chen, Xipeng Qiu

Abstract: Recently, AI assistants based on large language models (LLMs) show surprising performance in many tasks, such as dialogue, solving math problems, writing code, and using tools. Although LLMs possess intensive world knowledge, they still make factual errors when facing some knowledge intensive tasks, like open-domain question answering. These untruthful responses from the AI assistant may cause sig… ▽ More Recently, AI assistants based on large language models (LLMs) show surprising performance in many tasks, such as dialogue, solving math problems, writing code, and using tools. Although LLMs possess intensive world knowledge, they still make factual errors when facing some knowledge intensive tasks, like open-domain question answering. These untruthful responses from the AI assistant may cause significant risks in practical applications. We believe that an AI assistant's refusal to answer questions it does not know is a crucial method for reducing hallucinations and making the assistant truthful. Therefore, in this paper, we ask the question "Can AI assistants know what they don't know and express them through natural language?" To answer this question, we construct a model-specific "I don't know" (Idk) dataset for an assistant, which contains its known and unknown questions, based on existing open-domain question answering datasets. Then we align the assistant with its corresponding Idk dataset and observe whether it can refuse to answer its unknown questions after alignment. Experimental results show that after alignment with Idk datasets, the assistant can refuse to answer most its unknown questions. For questions they attempt to answer, the accuracy is significantly higher than before the alignment. △ Less

Submitted 28 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: Work in progress

arXiv:2401.08867 [pdf, other]

MambaTab: A Plug-and-Play Model for Learning Tabular Data

Authors: Md Atik Ahamed, Qiang Cheng

Abstract: Despite the prevalence of images and texts in machine learning, tabular data remains widely used across various domains. Existing deep learning models, such as convolutional neural networks and transformers, perform well however demand extensive preprocessing and tuning limiting accessibility and scalability. This work introduces an innovative approach based on a structured state-space model (SSM)… ▽ More Despite the prevalence of images and texts in machine learning, tabular data remains widely used across various domains. Existing deep learning models, such as convolutional neural networks and transformers, perform well however demand extensive preprocessing and tuning limiting accessibility and scalability. This work introduces an innovative approach based on a structured state-space model (SSM), MambaTab, for tabular data. SSMs have strong capabilities for efficiently extracting effective representations from data with long-range dependencies. MambaTab leverages Mamba, an emerging SSM variant, for end-to-end supervised learning on tables. Compared to state-of-the-art baselines, MambaTab delivers superior performance while requiring significantly fewer parameters, as empirically validated on diverse benchmark datasets. MambaTab's efficiency, scalability, generalizability, and predictive gains signify it as a lightweight, "plug-and-play" solution for diverse tabular data with promise for enabling wider practical applications. △ Less

Submitted 24 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted by IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR), 2024

arXiv:2401.04620 [pdf, other]

Agent Alignment in Evolving Social Norms

Authors: Shimin Li, Tianxiang Sun, Qinyuan Cheng, Xipeng Qiu

Abstract: Agents based on Large Language Models (LLMs) are increasingly permeating various domains of human production and life, highlighting the importance of aligning them with human values. The current alignment of AI systems primarily focuses on passively aligning LLMs through human intervention. However, agents possess characteristics like receiving environmental feedback and self-evolution, rendering… ▽ More Agents based on Large Language Models (LLMs) are increasingly permeating various domains of human production and life, highlighting the importance of aligning them with human values. The current alignment of AI systems primarily focuses on passively aligning LLMs through human intervention. However, agents possess characteristics like receiving environmental feedback and self-evolution, rendering the LLM alignment methods inadequate. In response, we propose an evolutionary framework for agent evolution and alignment, named EvolutionaryAgent, which transforms agent alignment into a process of evolution and selection under the principle of survival of the fittest. In an environment where social norms continuously evolve, agents better adapted to the current social norms will have a higher probability of survival and proliferation, while those inadequately aligned dwindle over time. Experimental results assessing the agents from multiple perspectives in aligning with social norms demonstrate that EvolutionaryAgent can align progressively better with the evolving social norms while maintaining its proficiency in general tasks. Effectiveness tests conducted on various open and closed-source LLMs as the foundation for agents also prove the applicability of our approach. △ Less

Submitted 19 February, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

Comments: Work in progress

arXiv:2312.10074 [pdf]

STAGER checklist: Standardized Testing and Assessment Guidelines for Evaluating Generative AI Reliability

Authors: Jinghong Chen, Lingxuan Zhu, Weiming Mou, Zaoqu Liu, Quan Cheng, Anqi Lin, Jian Zhang, Peng Luo

Abstract: Generative Artificial Intelligence (AI) holds immense potential in medical applications. Numerous studies have explored the efficacy of various generative AI models within healthcare contexts, but there is a lack of a comprehensive and systematic evaluation framework. Given that some studies evaluating the ability of generative AI for medical applications have deficiencies in their methodological… ▽ More Generative Artificial Intelligence (AI) holds immense potential in medical applications. Numerous studies have explored the efficacy of various generative AI models within healthcare contexts, but there is a lack of a comprehensive and systematic evaluation framework. Given that some studies evaluating the ability of generative AI for medical applications have deficiencies in their methodological design, standardized guidelines for their evaluation are also currently lacking. In response, our objective is to devise standardized assessment guidelines tailored for evaluating the performance of generative AI systems in medical contexts. To this end, we conducted a thorough literature review using the PubMed and Google Scholar databases, focusing on research that tests generative AI capabilities in medicine. Our multidisciplinary team, comprising experts in life sciences, clinical medicine, medical engineering, and generative AI users, conducted several discussion sessions and developed a checklist of 23 items. The checklist is designed to encompass the critical evaluation aspects of generative AI in medical applications comprehensively. This checklist, and the broader assessment framework it anchors, address several key dimensions, including question collection, querying methodologies, and assessment techniques. We aim to provide a holistic evaluation of AI systems. The checklist delineates a clear pathway from question gathering to result assessment, offering researchers guidance through potential challenges and pitfalls. Our framework furnishes a standardized, systematic approach for research involving the testing of generative AI's applicability in medicine. It enhances the quality of research reporting and aids in the evolution of generative AI in medicine and life sciences. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: 11 pages, 0 figure, 2 tables

arXiv:2312.06723 [pdf, other]

Learning to See Low-Light Images via Feature Domain Adaptation

Authors: Qirui Yang, Qihua Cheng, Huanjing Yue, Le Zhang, Yihao Liu, Jingyu Yang

Abstract: Raw low light image enhancement (LLIE) has achieved much better performance than the sRGB domain enhancement methods due to the merits of raw data. However, the ambiguity between noisy to clean and raw to sRGB mappings may mislead the single-stage enhancement networks. The two-stage networks avoid ambiguity by decoupling the two mappings but usually have large computing complexity. To solve this p… ▽ More Raw low light image enhancement (LLIE) has achieved much better performance than the sRGB domain enhancement methods due to the merits of raw data. However, the ambiguity between noisy to clean and raw to sRGB mappings may mislead the single-stage enhancement networks. The two-stage networks avoid ambiguity by decoupling the two mappings but usually have large computing complexity. To solve this problem, we propose a single-stage network empowered by Feature Domain Adaptation (FDA) to decouple the denoising and color mapping tasks in raw LLIE. The denoising encoder is supervised by the clean raw image, and then the denoised features are adapted for the color mapping task by an FDA module. We propose a Lineformer to serve as the FDA, which can well explore the global and local correlations with fewer line buffers (friendly to the line-based imaging process). During inference, the raw supervision branch is removed. In this way, our network combines the advantage of a two-stage enhancement process with the efficiency of single-stage inference. Experiments on four benchmark datasets demonstrate that our method achieves state-of-the-art performance with fewer computing costs (60% FLOPs of the two-stage method DNF). Our codes will be released after the acceptance of this work. △ Less

Submitted 19 December, 2023; v1 submitted 10 December, 2023; originally announced December 2023.

arXiv:2311.18254 [pdf, other]

Sketch Input Method Editor: A Comprehensive Dataset and Methodology for Systematic Input Recognition

Authors: Guangming Zhu, Siyuan Wang, Qing Cheng, Kelong Wu, Hao Li, Liang Zhang

Abstract: With the recent surge in the use of touchscreen devices, free-hand sketching has emerged as a promising modality for human-computer interaction. While previous research has focused on tasks such as recognition, retrieval, and generation of familiar everyday objects, this study aims to create a Sketch Input Method Editor (SketchIME) specifically designed for a professional C4I system. Within this s… ▽ More With the recent surge in the use of touchscreen devices, free-hand sketching has emerged as a promising modality for human-computer interaction. While previous research has focused on tasks such as recognition, retrieval, and generation of familiar everyday objects, this study aims to create a Sketch Input Method Editor (SketchIME) specifically designed for a professional C4I system. Within this system, sketches are utilized as low-fidelity prototypes for recommending standardized symbols in the creation of comprehensive situation maps. This paper also presents a systematic dataset comprising 374 specialized sketch types, and proposes a simultaneous recognition and segmentation architecture with multilevel supervision between recognition and segmentation to improve performance and enhance interpretability. By incorporating few-shot domain adaptation and class-incremental learning, the network's ability to adapt to new users and extend to new task-specific classes is significantly enhanced. Results from experiments conducted on both the proposed dataset and the SPG dataset illustrate the superior performance of the proposed architecture. Our dataset and code are publicly available at https://github.com/GuangmingZhu/SketchIME. △ Less

Submitted 31 March, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: The paper has been accepted by ACM Multimedia 2023

arXiv:2311.16603 [pdf, other]

l2Match: Optimization Techniques on Subgraph Matching Algorithm using Label Pair, Neighboring Label Index, and Jump-Redo method

Authors: C. Q. Cheng, K. S. Wong, L. K. Soon

Abstract: Graph database is designed to store bidirectional relationships between objects and facilitate the traversal process to extract a subgraph. However, the subgraph matching process is an NP-Complete problem. Existing solutions to this problem usually employ a filter-and-verification framework and a divide-and-conquer method. The filter-and-verification framework minimizes the number of inputs to the… ▽ More Graph database is designed to store bidirectional relationships between objects and facilitate the traversal process to extract a subgraph. However, the subgraph matching process is an NP-Complete problem. Existing solutions to this problem usually employ a filter-and-verification framework and a divide-and-conquer method. The filter-and-verification framework minimizes the number of inputs to the verification stage by filtering and pruning invalid candidates as much as possible. Meanwhile, subgraph matching is performed on the substructure decomposed from the larger graph to yield partial embedding. Subsequently, the recursive traversal or set intersection technique combines the partial embedding into a complete subgraph. In this paper, we first present a comprehensive literature review of the state-of-the-art solutions. l2Match, a subgraph isomorphism algorithm for small queries utilizing a Label-Pair Index and filtering method, is then proposed and presented as a proof of concept. Empirical experimentation shows that l2Match outperforms related state-of-the-art solutions, and the proposed methods optimize the existing algorithms. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: This short version of this article (6 pages) is accepted by ICEIC 2024

MSC Class: 05C60 (Primary); 05C30 (Secondary); 68R10 ACM Class: G.4.1; H.3.3

arXiv:2311.15643 [pdf, other]

A Survey on Monocular Re-Localization: From the Perspective of Scene Map Representation

Authors: Jinyu Miao, Kun Jiang, Tuopu Wen, Yunlong Wang, Peijing Jia, Xuhe Zhao, Qian Cheng, Zhongyang Xiao, Jin Huang, Zhihua Zhong, Diange Yang

Abstract: Monocular Re-Localization (MRL) is a critical component in autonomous applications, estimating 6 degree-of-freedom ego poses w.r.t. the scene map based on monocular images. In recent decades, significant progress has been made in the development of MRL techniques. Numerous algorithms have accomplished extraordinary success in terms of localization accuracy and robustness. In MRL, scene maps are re… ▽ More Monocular Re-Localization (MRL) is a critical component in autonomous applications, estimating 6 degree-of-freedom ego poses w.r.t. the scene map based on monocular images. In recent decades, significant progress has been made in the development of MRL techniques. Numerous algorithms have accomplished extraordinary success in terms of localization accuracy and robustness. In MRL, scene maps are represented in various forms, and they determine how MRL methods work and how MRL methods perform. However, to the best of our knowledge, existing surveys do not provide systematic reviews about the relationship between MRL solutions and their used scene map representation. This survey fills the gap by comprehensively reviewing MRL methods from such a perspective, promoting further research. 1) We commence by delving into the problem definition of MRL, exploring current challenges, and comparing ours with existing surveys. 2) Many well-known MRL methods are categorized and reviewed into five classes according to the representation forms of utilized map, i.e., geo-tagged frames, visual landmarks, point clouds, vectorized semantic map, and neural network-based map. 3) To quantitatively and fairly compare MRL methods with various map, we introduce some public datasets and provide the performances of some state-of-the-art MRL methods. The strengths and weakness of MRL methods with different map are analyzed. 4) We finally introduce some topics of interest in this field and give personal opinions. This survey can serve as a valuable referenced materials for MRL, and a continuously updated summary of this survey is publicly available to the community at: https://github.com/jinyummiao/map-in-mono-reloc. △ Less

Submitted 12 January, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: 33 pages, 10 tables, 16 figures, under review

arXiv:2311.14379 [pdf]

Robot Learning in the Era of Foundation Models: A Survey

Authors: Xuan Xiao, Jiahang Liu, Zhipeng Wang, Yanmin Zhou, Yong Qi, Qian Cheng, Bin He, Shuo Jiang

Abstract: The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning from automation towards general embodied Artificial Intelligence (AI). Adopting foundation models together with traditional learning methods to robot learning has increasingly gained recent interest research community and showed potential for real-life application. However, there are few literatures comprehens… ▽ More The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning from automation towards general embodied Artificial Intelligence (AI). Adopting foundation models together with traditional learning methods to robot learning has increasingly gained recent interest research community and showed potential for real-life application. However, there are few literatures comprehensively reviewing the relatively new technologies combined with robotics. The purpose of this review is to systematically assess the state-of-the-art foundation model techniques in the robot learning and to identify future potential areas. Specifically, we first summarized the technical evolution of robot learning and identified the necessary preliminary preparations for foundation models including the simulators, datasets, foundation model framework. In addition, we focused on the following four mainstream areas of robot learning including manipulation, navigation, planning, and reasoning and demonstrated how the foundation model techniques can be adopted in the above scenarios. Furthermore, critical issues which are neglected in the current literatures including robot hardware and software decoupling, dynamic data, generalization performance with the presence of human, etc. were discussed. This review highlights the state-of-the-art progress of foundation models in robot learning and future research should focus on multimodal interaction especially dynamics data, exclusive foundation models for robots, and AI alignment, etc. △ Less

Submitted 24 November, 2023; originally announced November 2023.

arXiv:2310.12443 [pdf, other]

Know Where to Go: Make LLM a Relevant, Responsible, and Trustworthy Searcher

Authors: Xiang Shi, Jiawei Liu, Yinpeng Liu, Qikai Cheng, Wei Lu

Abstract: The advent of Large Language Models (LLMs) has shown the potential to improve relevance and provide direct answers in web searches. However, challenges arise in validating the reliability of generated results and the credibility of contributing sources, due to the limitations of traditional information retrieval algorithms and the LLM hallucination problem. Aiming to create a "PageRank" for the LL… ▽ More The advent of Large Language Models (LLMs) has shown the potential to improve relevance and provide direct answers in web searches. However, challenges arise in validating the reliability of generated results and the credibility of contributing sources, due to the limitations of traditional information retrieval algorithms and the LLM hallucination problem. Aiming to create a "PageRank" for the LLM era, we strive to transform LLM into a relevant, responsible, and trustworthy searcher. We propose a novel generative retrieval framework leveraging the knowledge of LLMs to foster a direct link between queries and online sources. This framework consists of three core modules: Generator, Validator, and Optimizer, each focusing on generating trustworthy online sources, verifying source reliability, and refining unreliable sources, respectively. Extensive experiments and evaluations highlight our method's superior relevance, responsibility, and trustfulness against various SOTA methods. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 14 pages, 4 figures, under peer review

arXiv:2310.11702 [pdf, other]

DPF-Nutrition: Food Nutrition Estimation via Depth Prediction and Fusion

Authors: Yuzhe Han, Qimin Cheng, Wenjin Wu, Ziyang Huang

Abstract: A reasonable and balanced diet is essential for maintaining good health. With the advancements in deep learning, automated nutrition estimation method based on food images offers a promising solution for monitoring daily nutritional intake and promoting dietary health. While monocular image-based nutrition estimation is convenient, efficient, and economical, the challenge of limited accuracy remai… ▽ More A reasonable and balanced diet is essential for maintaining good health. With the advancements in deep learning, automated nutrition estimation method based on food images offers a promising solution for monitoring daily nutritional intake and promoting dietary health. While monocular image-based nutrition estimation is convenient, efficient, and economical, the challenge of limited accuracy remains a significant concern. To tackle this issue, we proposed DPF-Nutrition, an end-to-end nutrition estimation method using monocular images. In DPF-Nutrition, we introduced a depth prediction module to generate depth maps, thereby improving the accuracy of food portion estimation. Additionally, we designed an RGB-D fusion module that combined monocular images with the predicted depth information, resulting in better performance for nutrition estimation. To the best of our knowledge, this was the pioneering effort that integrated depth prediction and RGB-D fusion techniques in food nutrition estimation. Comprehensive experiments performed on Nutrition5k evaluated the effectiveness and efficiency of DPF-Nutrition. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.04787 [pdf, other]

HI-SLAM: Monocular Real-time Dense Mapping with Hybrid Implicit Fields

Authors: Wei Zhang, Tiecheng Sun, Sen Wang, Qing Cheng, Norbert Haala

Abstract: In this letter, we present a neural field-based real-time monocular mapping framework for accurate and dense Simultaneous Localization and Mapping (SLAM). Recent neural mapping frameworks show promising results, but rely on RGB-D or pose inputs, or cannot run in real-time. To address these limitations, our approach integrates dense-SLAM with neural implicit fields. Specifically, our dense SLAM app… ▽ More In this letter, we present a neural field-based real-time monocular mapping framework for accurate and dense Simultaneous Localization and Mapping (SLAM). Recent neural mapping frameworks show promising results, but rely on RGB-D or pose inputs, or cannot run in real-time. To address these limitations, our approach integrates dense-SLAM with neural implicit fields. Specifically, our dense SLAM approach runs parallel tracking and global optimization, while a neural field-based map is constructed incrementally based on the latest SLAM estimates. For the efficient construction of neural fields, we employ multi-resolution grid encoding and signed distance function (SDF) representation. This allows us to keep the map always up-to-date and adapt instantly to global updates via loop closing. For global consistency, we propose an efficient Sim(3)-based pose graph bundle adjustment (PGBA) approach to run online loop closing and mitigate the pose and scale drift. To enhance depth accuracy further, we incorporate learned monocular depth priors. We propose a novel joint depth and scale adjustment (JDSA) module to solve the scale ambiguity inherent in depth priors. Extensive evaluations across synthetic and real-world datasets validate that our approach outperforms existing methods in accuracy and map completeness while preserving real-time performance. △ Less

Submitted 15 December, 2023; v1 submitted 7 October, 2023; originally announced October 2023.

Comments: Accepted by IEEE Robotics and Automation Letters

arXiv:2310.03368 [pdf, other]

Evaluating Hallucinations in Chinese Large Language Models

Authors: Qinyuan Cheng, Tianxiang Sun, Wenwei Zhang, Siyin Wang, Xiangyang Liu, Mozhi Zhang, Junliang He, Mianqiu Huang, Zhangyue Yin, Kai Chen, Xipeng Qiu

Abstract: In this paper, we establish a benchmark named HalluQA (Chinese Hallucination Question-Answering) to measure the hallucination phenomenon in Chinese large language models. HalluQA contains 450 meticulously designed adversarial questions, spanning multiple domains, and takes into account Chinese historical culture, customs, and social phenomena. During the construction of HalluQA, we consider two ty… ▽ More In this paper, we establish a benchmark named HalluQA (Chinese Hallucination Question-Answering) to measure the hallucination phenomenon in Chinese large language models. HalluQA contains 450 meticulously designed adversarial questions, spanning multiple domains, and takes into account Chinese historical culture, customs, and social phenomena. During the construction of HalluQA, we consider two types of hallucinations: imitative falsehoods and factual errors, and we construct adversarial samples based on GLM-130B and ChatGPT. For evaluation, we design an automated evaluation method using GPT-4 to judge whether a model output is hallucinated. We conduct extensive experiments on 24 large language models, including ERNIE-Bot, Baichuan2, ChatGLM, Qwen, SparkDesk and etc. Out of the 24 models, 18 achieved non-hallucination rates lower than 50%. This indicates that HalluQA is highly challenging. We analyze the primary types of hallucinations in different types of models and their causes. Additionally, we discuss which types of hallucinations should be prioritized for different types of models. △ Less

Submitted 25 October, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: Work in progress

arXiv:2309.08799 [pdf, other]

SHAPNN: Shapley Value Regularized Tabular Neural Network

Authors: Qisen Cheng, Shuhui Qu, Janghwan Lee

Abstract: We present SHAPNN, a novel deep tabular data modeling architecture designed for supervised learning. Our approach leverages Shapley values, a well-established technique for explaining black-box models. Our neural network is trained using standard backward propagation optimization methods, and is regularized with realtime estimated Shapley values. Our method offers several advantages, including the… ▽ More We present SHAPNN, a novel deep tabular data modeling architecture designed for supervised learning. Our approach leverages Shapley values, a well-established technique for explaining black-box models. Our neural network is trained using standard backward propagation optimization methods, and is regularized with realtime estimated Shapley values. Our method offers several advantages, including the ability to provide valid explanations with no computational overhead for data instances and datasets. Additionally, prediction with explanation serves as a regularizer, which improves the model's performance. Moreover, the regularized prediction enhances the model's capability for continual learning. We evaluate our method on various publicly available datasets and compare it with state-of-the-art deep neural network models, demonstrating the superior performance of SHAPNN in terms of AUROC, transparency, as well as robustness to streaming data. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: 9 pages, 8 figures

arXiv:2307.06527 [pdf, other]

Free-Form Composition Networks for Egocentric Action Recognition

Authors: Haoran Wang, Qinghua Cheng, Baosheng Yu, Yibing Zhan, Dapeng Tao, Liang Ding, Haibin Ling

Abstract: Egocentric action recognition is gaining significant attention in the field of human action recognition. In this paper, we address data scarcity issue in egocentric action recognition from a compositional generalization perspective. To tackle this problem, we propose a free-form composition network (FFCN) that can simultaneously learn disentangled verb, preposition, and noun representations, and t… ▽ More Egocentric action recognition is gaining significant attention in the field of human action recognition. In this paper, we address data scarcity issue in egocentric action recognition from a compositional generalization perspective. To tackle this problem, we propose a free-form composition network (FFCN) that can simultaneously learn disentangled verb, preposition, and noun representations, and then use them to compose new samples in the feature space for rare classes of action videos. First, we use a graph to capture the spatial-temporal relations among different hand/object instances in each action video. We thus decompose each action into a set of verb and preposition spatial-temporal representations using the edge features in the graph. The temporal decomposition extracts verb and preposition representations from different video frames, while the spatial decomposition adaptively learns verb and preposition representations from action-related instances in each frame. With these spatial-temporal representations of verbs and prepositions, we can compose new samples for those rare classes in a free-form manner, which is not restricted to a rigid form of a verb and a noun. The proposed FFCN can directly generate new training data samples for rare classes, hence significantly improve action recognition performance. We evaluated our method on three popular egocentric action recognition datasets, Something-Something V2, H2O, and EPIC-KITCHENS-100, and the experimental results demonstrate the effectiveness of the proposed method for handling data scarcity problems, including long-tailed and few-shot egocentric action recognition. △ Less

Submitted 14 October, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

arXiv:2306.00412 [pdf, ps, other]

Beamforming Design for IRS-and-UAV-Aided Two-Way Amplify-and-Forward Relay Networks in Maritime IoT

Authors: Xuehui Wang, Feng Shu, Yuanyuan Wu, Weiping Shi, Shihao Yan, Yifan Zhao, Qiankun Cheng, Zhongwen Sun, Jiangzhou Wang

Abstract: In this paper, an intelligent reflecting surface (IRS)-and-unmanned aerial vehicle (UAV)-assisted two-way amplify-and-forward (AF) relay network in maritime Internet of Things (IoT) is proposed, where ship1 ($\text{S}_1$) and ship2 ($\text{S}_2$) can be viewed as data collecting centers. To enhance the message exchange rate between $\text{S}_1$ and $\text{S}_2$, a problem of maximizing minimum rat… ▽ More In this paper, an intelligent reflecting surface (IRS)-and-unmanned aerial vehicle (UAV)-assisted two-way amplify-and-forward (AF) relay network in maritime Internet of Things (IoT) is proposed, where ship1 ($\text{S}_1$) and ship2 ($\text{S}_2$) can be viewed as data collecting centers. To enhance the message exchange rate between $\text{S}_1$ and $\text{S}_2$, a problem of maximizing minimum rate is cast, where the variables, namely AF relay beamforming matrix and IRS phase shifts of two time slots, need to be optimized. To achieve a maximum rate, a low-complexity alternately iterative (AI) scheme based on zero forcing and successive convex approximation (LC-ZF-SCA) algorithm is presented. To obtain a significant rate enhancement, a high-performance AI method based on one step, semidefinite programming and penalty SCA (ONS-SDP-PSCA) is proposed. Simulation results show that by the proposed LC-ZF-SCA and ONS-SDP-PSCA methods, the rate of the IRS-and-UAV-assisted AF relay network surpass those of with random phase and only AF relay networks. Moreover, ONS-SDP-PSCA perform better than LC-ZF-SCA in aspect of rate. △ Less

Submitted 24 June, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.18548 [pdf]

I/O-efficient iterative matrix inversion with photonic integrated circuits

Authors: Minjia Chen, Yizhi Wang, Chunhui Yao, Adrian Wonfor, Shuai Yang, Richard Penty, Qixiang Cheng

Abstract: Photonic integrated circuits have been extensively explored for optical processing with the aim of breaking the speed bottleneck of digital electronics. However, the input/output (IO) bottleneck remains one of the key barriers. Here we report a novel photonic iterative processor (PIP) for matrix-inversion-intensive applications. The direct reuse of inputted data in the optical domain unlocks the p… ▽ More Photonic integrated circuits have been extensively explored for optical processing with the aim of breaking the speed bottleneck of digital electronics. However, the input/output (IO) bottleneck remains one of the key barriers. Here we report a novel photonic iterative processor (PIP) for matrix-inversion-intensive applications. The direct reuse of inputted data in the optical domain unlocks the potential to break the IO bottleneck. We demonstrate notable IO advantages with a lossless PIP for real-valued matrix inversion and integral-differential equation solving, as well as a coherent PIP with optical loops integrated on-chip, enabling complex-valued computation and a net inversion time of 1.2 ns. Furthermore, we estimate at least an order of magnitude enhancement in IO efficiency of a PIP over photonic single-pass processors and the state-of-the-art electronic processors for reservoir training tasks and MIMO precoding tasks, indicating the huge potential of PIP technology in practical applications. △ Less

Submitted 22 May, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.07835 [pdf, other]

Multi-Scenario Broadband Channel Measurement and Modeling for Sub-6 GHz RIS-Assisted Wireless Communication Systems

Authors: Jian Sang, Mingyong Zhou, Jifeng Lan, Boning Gao, Wankai Tang, Xiao Li, Shi Jin, Ertugrul Basar, Cen Li, Qiang Cheng, Tie Jun Cui

Abstract: Reconfigurable intelligent surface (RIS)-empowered communication, has been considered widely as one of the revolutionary technologies for next generation networks. However, due to the novel propagation characteristics of RISs, underlying RIS channel modeling and measurement research is still in its infancy and not fully investigated. In this paper, we conduct multi-scenario broadband channel measu… ▽ More Reconfigurable intelligent surface (RIS)-empowered communication, has been considered widely as one of the revolutionary technologies for next generation networks. However, due to the novel propagation characteristics of RISs, underlying RIS channel modeling and measurement research is still in its infancy and not fully investigated. In this paper, we conduct multi-scenario broadband channel measurements and modeling for RIS-assisted communications at the sub-6 GHz band. The measurements are carried out in three scenarios covering outdoor, indoor, and outdoor-to-indoor (O2I) environments, which suffer from non-line-of-sight (NLOS) propagation inherently. Three propagation modes including intelligent reflection with RIS, specular reflection with RIS and the mode without RIS, are taken into account in each scenario. In addition, considering the cascaded characteristics of RIS-assisted channel by nature, two modified empirical models including floating-intercept (FI) and close-in (CI) are proposed, which cover distance and angle domains. The measurement results rooted in 2096 channel acquisitions verify the prediction accuracy of these proposed models. Moreover, the propagation characteristics for RIS-assisted channels, including path loss (PL) gain, PL exponent, spatial consistency, time dispersion, frequency stationarity, etc., are compared and analyzed comprehensively. These channel measurement and modeling results may lay the groundwork for future applications of RIS-assisted communication systems in practice. △ Less

Submitted 13 May, 2023; originally announced May 2023.

Showing 1–50 of 148 results for author: Cheng, Q