subscribe to arXiv mailings

Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

Authors: Wanling Gao, Yunyou Huang, Dandan Cui, Zhuoming Yu, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Gangyuan Zhao, Chongrong Jiang, Fan Huang, Tianyi Wei, Suqin Tang, Bingjie Xia, Zhifei Zhang, Jianfeng Zhan

Abstract: A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of cl… ▽ More A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of clinicians in collaborating with AI, pivotal for determining its impact on clinical practice, is often overlooked. For the first time, we emphasize the critical necessity for rigorous and cost-effective evaluation methodologies for AI models in clinical practice, featuring patient/clinician-centered (dual-centered) AI randomized controlled trials (DC-AI RCTs) and virtual clinician-based in-silico trials (VC-MedAI) as an effective proxy for DC-AI RCTs. Leveraging 7500 diagnosis records from two-phase inaugural DC-AI RCTs across 14 medical centers with 125 clinicians, our results demonstrate the necessity of DC-AI RCTs and the effectiveness of VC-MedAI. Notably, VC-MedAI performs comparably to human clinicians, replicating insights and conclusions from prospective DC-AI RCTs. We envision DC-AI RCTs and VC-MedAI as pivotal advancements, presenting innovative and transformative evaluation methodologies for AI models in clinical practice, offering a preclinical-like setting mirroring conventional medicine, and reshaping development paradigms in a cost-effective and fast-iterative manner. Chinese Clinical Trial Registration: ChiCTR2400086816. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 23 pages

arXiv:2407.05682 [pdf, other]

Retrieved In-Context Principles from Previous Mistakes

Authors: Hao Sun, Yong Jiang, Bo Wang, Yingyan Hou, Yan Zhang, Pengjun Xie, Fei Huang

Abstract: In-context learning (ICL) has been instrumental in adapting Large Language Models (LLMs) to downstream tasks using correct input-output examples. Recent advances have attempted to improve model performance through principles derived from mistakes, yet these approaches suffer from lack of customization and inadequate error coverage. To address these limitations, we propose Retrieved In-Context Prin… ▽ More In-context learning (ICL) has been instrumental in adapting Large Language Models (LLMs) to downstream tasks using correct input-output examples. Recent advances have attempted to improve model performance through principles derived from mistakes, yet these approaches suffer from lack of customization and inadequate error coverage. To address these limitations, we propose Retrieved In-Context Principles (RICP), a novel teacher-student framework. In RICP, the teacher model analyzes mistakes from the student model to generate reasons and insights for preventing similar mistakes. These mistakes are clustered based on their underlying reasons for developing task-level principles, enhancing the error coverage of principles. During inference, the most relevant mistakes for each question are retrieved to create question-level principles, improving the customization of the provided guidance. RICP is orthogonal to existing prompting methods and does not require intervention from the teacher model during inference. Experimental results across seven reasoning benchmarks reveal that RICP effectively enhances performance when applied to various prompting strategies. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.00942 [pdf, other]

ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions

Authors: Jingheng Ye, Yong Jiang, Xiaobin Wang, Yinghui Li, Yangning Li, Hai-Tao Zheng, Pengjun Xie, Fei Huang

Abstract: This paper introduces the task of product demand clarification within an e-commercial scenario, where the user commences the conversation with ambiguous queries and the task-oriented agent is designed to achieve more accurate and tailored product searching by asking clarification questions. To address this task, we propose ProductAgent, a conversational information seeking agent equipped with abil… ▽ More This paper introduces the task of product demand clarification within an e-commercial scenario, where the user commences the conversation with ambiguous queries and the task-oriented agent is designed to achieve more accurate and tailored product searching by asking clarification questions. To address this task, we propose ProductAgent, a conversational information seeking agent equipped with abilities of strategic clarification question generation and dynamic product retrieval. Specifically, we develop the agent with strategies for product feature summarization, query generation, and product retrieval. Furthermore, we propose the benchmark called PROCLARE to evaluate the agent's performance both automatically and qualitatively with the aid of a LLM-driven user simulator. Experiments show that ProductAgent interacts positively with the user and enhances retrieval performance with increasing dialogue turns, where user demands become gradually more explicit and detailed. All the source codes will be released after the review anonymity period. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 17 pages, 13 tables, 6 figures. Under review

arXiv:2407.00891 [pdf, other]

ZeroDDI: A Zero-Shot Drug-Drug Interaction Event Prediction Method with Semantic Enhanced Learning and Dual-Modal Uniform Alignment

Authors: Ziyan Wang, Zhankun Xiong, Feng Huang, Xuan Liu, Wen Zhang

Abstract: Drug-drug interactions (DDIs) can result in various pharmacological changes, which can be categorized into different classes known as DDI events (DDIEs). In recent years, previously unobserved/unseen DDIEs have been emerging, posing a new classification task when unseen classes have no labelled instances in the training stage, which is formulated as a zero-shot DDIE prediction (ZS-DDIE) task. Howe… ▽ More Drug-drug interactions (DDIs) can result in various pharmacological changes, which can be categorized into different classes known as DDI events (DDIEs). In recent years, previously unobserved/unseen DDIEs have been emerging, posing a new classification task when unseen classes have no labelled instances in the training stage, which is formulated as a zero-shot DDIE prediction (ZS-DDIE) task. However, existing computational methods are not directly applicable to ZS-DDIE, which has two primary challenges: obtaining suitable DDIE representations and handling the class imbalance issue. To overcome these challenges, we propose a novel method named ZeroDDI for the ZS-DDIE task. Specifically, we design a biological semantic enhanced DDIE representation learning module, which emphasizes the key biological semantics and distills discriminative molecular substructure-related semantics for DDIE representation learning. Furthermore, we propose a dual-modal uniform alignment strategy to distribute drug pair representations and DDIE semantic representations uniformly in a unit sphere and align the matched ones, which can mitigate the issue of class imbalance. Extensive experiments showed that ZeroDDI surpasses the baselines and indicate that it is a promising tool for detecting unseen DDIEs. Our code has been released in https://github.com/wzy-Sarah/ZeroDDI. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: Accepted by IJCAI2024

arXiv:2407.00328 [pdf, other]

Constraints on the energy spectrum of the diffuse cosmic neutrino flux from the ANTARES neutrino telescope

Authors: ANTARES Collaboration, A. Albert, S. Alves, M. André, M. Ardid, S. Ardid, J. -J. Aubert, J. Aublin, B. Baret, S. Basa, Y. Becherini, B. Belhorma, M. Bendahman, F. Benfenati, V. Bertin, S. Biagi, J. Boumaaza, M. Bouta, M. C. Bouwhuis, H. Brânzaş, R. Bruijn, J. Brunner, J. Busto, B. Caiffi, D. Calvo , et al. (117 additional authors not shown)

Abstract: High-significance evidences of the existence of a high-energy diffuse flux of cosmic neutrinos have emerged in the last decade from several observations by the IceCube Collaboration. The ANTARES neutrino telescope took data for 15 years in the Mediterranean Sea, from 2007 to 2022, and collected a high-purity all-flavour neutrino sample. The search for a diffuse cosmic neutrino signal using this da… ▽ More High-significance evidences of the existence of a high-energy diffuse flux of cosmic neutrinos have emerged in the last decade from several observations by the IceCube Collaboration. The ANTARES neutrino telescope took data for 15 years in the Mediterranean Sea, from 2007 to 2022, and collected a high-purity all-flavour neutrino sample. The search for a diffuse cosmic neutrino signal using this dataset is presented in this article. This final analysis did not provide a statistically significant observation of the cosmic diffuse flux: this is converted into limits on the properties of the cosmic neutrino spectrum. In particular, given the sensitivity of the ANTARES neutrino telescope between 1 and 50 TeV, constraints on single-power-law hypotheses are derived for the cosmic diffuse flux below 20 TeV. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.19156 [pdf, other]

Heterogeneous Causal Metapath Graph Neural Network for Gene-Microbe-Disease Association Prediction

Authors: Kexin Zhang, Feng Huang, Luotao Liu, Zhankun Xiong, Hongyu Zhang, Yuan Quan, Wen Zhang

Abstract: The recent focus on microbes in human medicine highlights their potential role in the genetic framework of diseases. To decode the complex interactions among genes, microbes, and diseases, computational predictions of gene-microbe-disease (GMD) associations are crucial. Existing methods primarily address gene-disease and microbe-disease associations, but the more intricate triple-wise GMD associat… ▽ More The recent focus on microbes in human medicine highlights their potential role in the genetic framework of diseases. To decode the complex interactions among genes, microbes, and diseases, computational predictions of gene-microbe-disease (GMD) associations are crucial. Existing methods primarily address gene-disease and microbe-disease associations, but the more intricate triple-wise GMD associations remain less explored. In this paper, we propose a Heterogeneous Causal Metapath Graph Neural Network (HCMGNN) to predict GMD associations. HCMGNN constructs a heterogeneous graph linking genes, microbes, and diseases through their pairwise associations, and utilizes six predefined causal metapaths to extract directed causal subgraphs, which facilitate the multi-view analysis of causal relations among three entity types. Within each subgraph, we employ a causal semantic sharing message passing network for node representation learning, coupled with an attentive fusion method to integrate these representations for predicting GMD associations. Our extensive experiments show that HCMGNN effectively predicts GMD associations and addresses association sparsity issue by enhancing the graph's semantics and structure. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.17419 [pdf, other]

Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

Authors: Minzheng Wang, Longze Chen, Cheng Fu, Shengyi Liao, Xinghua Zhang, Bingli Wu, Haiyang Yu, Nan Xu, Lei Zhang, Run Luo, Yunshui Li, Min Yang, Fei Huang, Yongbin Li

Abstract: Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows. Meanwhile, benchmarks for evaluating long-context LLMs are gradually catching up. However, existing benchmarks employ irrelevant noise texts to artificially extend the length of test cases, diverging from the real-world scenarios of long-contex… ▽ More Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows. Meanwhile, benchmarks for evaluating long-context LLMs are gradually catching up. However, existing benchmarks employ irrelevant noise texts to artificially extend the length of test cases, diverging from the real-world scenarios of long-context applications. To bridge this gap, we propose a novel long-context benchmark, Loong, aligning with realistic scenarios through extended multi-document question answering (QA). Unlike typical document QA, in Loong's test cases, each document is relevant to the final answer, ignoring any document will lead to the failure of the answer. Furthermore, Loong introduces four types of tasks with a range of context lengths: Spotlight Locating, Comparison, Clustering, and Chain of Reasoning, to facilitate a more realistic and comprehensive evaluation of long-context understanding. Extensive experiments indicate that existing long-context language models still exhibit considerable potential for enhancement. Retrieval augmented generation (RAG) achieves poor performance, demonstrating that Loong can reliably assess the model's long-context modeling capabilities. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: We release our code and data publicly at https://github.com/MozerWang/Loong

arXiv:2406.15575 [pdf, ps, other]

Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity

Authors: Mucong Ding, Tahseen Rabbani, Bang An, Evan Z Wang, Furong Huang

Abstract: Graph Neural Networks (GNNs) are widely applied to graph learning problems such as node classification. When scaling up the underlying graphs of GNNs to a larger size, we are forced to either train on the complete graph and keep the full graph adjacency and node embeddings in memory (which is often infeasible) or mini-batch sample the graph (which results in exponentially growing computational com… ▽ More Graph Neural Networks (GNNs) are widely applied to graph learning problems such as node classification. When scaling up the underlying graphs of GNNs to a larger size, we are forced to either train on the complete graph and keep the full graph adjacency and node embeddings in memory (which is often infeasible) or mini-batch sample the graph (which results in exponentially growing computational complexities with respect to the number of GNN layers). Various sampling-based and historical-embedding-based methods are proposed to avoid this exponential growth of complexities. However, none of these solutions eliminates the linear dependence on graph size. This paper proposes a sketch-based algorithm whose training time and memory grow sublinearly with respect to graph size by training GNNs atop a few compact sketches of graph adjacency and node embeddings. Based on polynomial tensor-sketch (PTS) theory, our framework provides a novel protocol for sketching non-linear activations and graph convolution matrices in GNNs, as opposed to existing methods that sketch linear weights or gradients in neural networks. In addition, we develop a locality-sensitive hashing (LSH) technique that can be trained to improve the quality of sketches. Experiments on large-graph benchmarks demonstrate the scalability and competitive performance of our Sketch-GNNs versus their full-size GNN counterparts. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: NeurIPS 2022

arXiv:2406.15567 [pdf, other]

SAIL: Self-Improving Efficient Online Alignment of Large Language Models

Authors: Mucong Ding, Souradip Chakraborty, Vibhu Agrawal, Zora Che, Alec Koppel, Mengdi Wang, Amrit Bedi, Furong Huang

Abstract: Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which can lead to sub-optimal performance. On the other hand, recent literature has focused on designing online RLHF methods but still lacks a unified conc… ▽ More Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which can lead to sub-optimal performance. On the other hand, recent literature has focused on designing online RLHF methods but still lacks a unified conceptual formulation and suffers from distribution shift issues. To address this, we establish that online LLM alignment is underpinned by bilevel optimization. By reducing this formulation to an efficient single-level first-order method (using the reward-policy equivalence), our approach generates new samples and iteratively refines model alignment by exploring responses and regulating preference labels. In doing so, we permit alignment methods to operate in an online and self-improving manner, as well as generalize prior online RLHF methods as special cases. Compared to state-of-the-art iterative RLHF methods, our approach significantly improves alignment performance on open-sourced datasets with minimal computational overhead. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 24 pages, 6 figures, 3 tables

arXiv:2406.14884 [pdf, other]

FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents

Authors: Ruixuan Xiao, Wentao Ma, Ke Wang, Yuchuan Wu, Junbo Zhao, Haobo Wang, Fei Huang, Yongbin Li

Abstract: LLM-based agents have emerged as promising tools, which are crafted to fulfill complex tasks by iterative planning and action. However, these agents are susceptible to undesired planning hallucinations when lacking specific knowledge for expertise-intensive tasks. To address this, preliminary attempts are made to enhance planning reliability by incorporating external workflow-related knowledge. De… ▽ More LLM-based agents have emerged as promising tools, which are crafted to fulfill complex tasks by iterative planning and action. However, these agents are susceptible to undesired planning hallucinations when lacking specific knowledge for expertise-intensive tasks. To address this, preliminary attempts are made to enhance planning reliability by incorporating external workflow-related knowledge. Despite the promise, such infused knowledge is mostly disorganized and diverse in formats, lacking rigorous formalization and comprehensive comparisons. Motivated by this, we formalize different formats of workflow knowledge and present FlowBench, the first benchmark for workflow-guided planning. FlowBench covers 51 different scenarios from 6 domains, with knowledge presented in diverse formats. To assess different LLMs on FlowBench, we design a multi-tiered evaluation framework. We evaluate the efficacy of workflow knowledge across multiple formats, and the results indicate that current LLM agents need considerable improvements for satisfactory planning. We hope that our challenging benchmark can pave the way for future agent planning research. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14320 [pdf, ps, other]

Anyon condensation in mixed-state topological order

Authors: Ken Kikuchi, Kah-Sen Kam, Fu-Hsiang Huang

Abstract: We discuss anyon condensation in mixed-state topological order. The phases were recently conjectured to be classified by pre-modular fusion categories. Just like anyon condensation in pure-state topological order, a bootstrap analysis shows condensable anyons are given by connected étale algebras. We explain how to perform generic anyon condensation including non-invertible anyons and successive c… ▽ More We discuss anyon condensation in mixed-state topological order. The phases were recently conjectured to be classified by pre-modular fusion categories. Just like anyon condensation in pure-state topological order, a bootstrap analysis shows condensable anyons are given by connected étale algebras. We explain how to perform generic anyon condensation including non-invertible anyons and successive condensations. Interestingly, some condensations lead to pure-state topological orders. We clarify when this happens. We also compute topological invariants of equivalence classes. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 52 pages, 14 figures

arXiv:2406.13114 [pdf, other]

Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation

Authors: Yuhang Zhou, Jing Zhu, Paiheng Xu, Xiaoyu Liu, Xiyao Wang, Danai Koutra, Wei Ai, Furong Huang

Abstract: Large language models (LLMs) have significantly advanced various natural language processing tasks, but deploying them remains computationally expensive. Knowledge distillation (KD) is a promising solution, enabling the transfer of capabilities from larger teacher LLMs to more compact student models. Particularly, sequence-level KD, which distills rationale-based reasoning processes instead of mer… ▽ More Large language models (LLMs) have significantly advanced various natural language processing tasks, but deploying them remains computationally expensive. Knowledge distillation (KD) is a promising solution, enabling the transfer of capabilities from larger teacher LLMs to more compact student models. Particularly, sequence-level KD, which distills rationale-based reasoning processes instead of merely final outcomes, shows great potential in enhancing students' reasoning capabilities. However, current methods struggle with sequence level KD under long-tailed data distributions, adversely affecting generalization on sparsely represented domains. We introduce the Multi-Stage Balanced Distillation (BalDistill) framework, which iteratively balances training data within a fixed computational budget. By dynamically selecting representative head domain examples and synthesizing tail domain examples, BalDistill achieves state-of-the-art performance across diverse long-tailed datasets, enhancing both the efficiency and efficacy of the distilled models. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: preprint

arXiv:2406.12429 [pdf, other]

Adaptive Selection for Homogeneous Tools: An Instantiation in the RAG Scenario

Authors: Feiteng Mu, Yong Jiang, Liwen Zhang, Chu Liu, Wenjie Li, Pengjun Xie, Fei Huang

Abstract: Current research on tool learning primarily focuses on selecting the most effective tool from a wide array of options, often overlooking cost-effectiveness, a crucial factor in human problem-solving. In this paper, we address the selection of homogeneous tools by predicting both their performance and the associated cost required to accomplish a given task. We then assign queries to the optimal too… ▽ More Current research on tool learning primarily focuses on selecting the most effective tool from a wide array of options, often overlooking cost-effectiveness, a crucial factor in human problem-solving. In this paper, we address the selection of homogeneous tools by predicting both their performance and the associated cost required to accomplish a given task. We then assign queries to the optimal tools in a cost-effective manner. Our experimental results demonstrate that our method achieves higher performance at a lower cost compared to strong baseline approaches. △ Less

Submitted 11 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12259 [pdf]

Adversarial Attacks on Large Language Models in Medicine

Authors: Yifan Yang, Qiao Jin, Furong Huang, Zhiyong Lu

Abstract: The integration of Large Language Models (LLMs) into healthcare applications offers promising advancements in medical diagnostics, treatment recommendations, and patient care. However, the susceptibility of LLMs to adversarial attacks poses a significant threat, potentially leading to harmful outcomes in delicate medical contexts. This study investigates the vulnerability of LLMs to two types of a… ▽ More The integration of Large Language Models (LLMs) into healthcare applications offers promising advancements in medical diagnostics, treatment recommendations, and patient care. However, the susceptibility of LLMs to adversarial attacks poses a significant threat, potentially leading to harmful outcomes in delicate medical contexts. This study investigates the vulnerability of LLMs to two types of adversarial attacks in three medical tasks. Utilizing real-world patient data, we demonstrate that both open-source and proprietary LLMs are susceptible to manipulation across multiple tasks. This research further reveals that domain-specific tasks demand more adversarial data in model fine-tuning than general domain tasks for effective attack execution, especially for more capable models. We discover that while integrating adversarial data does not markedly degrade overall model performance on medical benchmarks, it does lead to noticeable shifts in fine-tuned model weights, suggesting a potential pathway for detecting and countering model attacks. This research highlights the urgent need for robust security measures and the development of defensive mechanisms to safeguard LLMs in medical applications, to ensure their safe and effective deployment in healthcare settings. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12091 [pdf, other]

Is poisoning a real threat to LLM alignment? Maybe more so than you think

Authors: Pankayaraj Pathmanathan, Souradip Chakraborty, Xiangyu Liu, Yongyuan Liang, Furong Huang

Abstract: Recent advancements in Reinforcement Learning with Human Feedback (RLHF) have significantly impacted the alignment of Large Language Models (LLMs). The sensitivity of reinforcement learning algorithms such as Proximal Policy Optimization (PPO) has led to new line work on Direct Policy Optimization (DPO), which treats RLHF in a supervised learning framework. The increased practical use of these RLH… ▽ More Recent advancements in Reinforcement Learning with Human Feedback (RLHF) have significantly impacted the alignment of Large Language Models (LLMs). The sensitivity of reinforcement learning algorithms such as Proximal Policy Optimization (PPO) has led to new line work on Direct Policy Optimization (DPO), which treats RLHF in a supervised learning framework. The increased practical use of these RLHF methods warrants an analysis of their vulnerabilities. In this work, we investigate the vulnerabilities of DPO to poisoning attacks under different scenarios and compare the effectiveness of preference poisoning, a first of its kind. We comprehensively analyze DPO's vulnerabilities under different types of attacks, i.e., backdoor and non-backdoor attacks, and different poisoning methods across a wide array of language models, i.e., LLama 7B, Mistral 7B, and Gemma 7B. We find that unlike PPO-based methods, which, when it comes to backdoor attacks, require at least 4\% of the data to be poisoned to elicit harmful behavior, we exploit the true vulnerabilities of DPO more simply so we can poison the model with only as much as 0.5\% of the data. We further investigate the potential reasons behind the vulnerability and how well this vulnerability translates into backdoor vs non-backdoor attacks. △ Less

Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Journal ref: ICML 2024 Workshop MHFAIA

arXiv:2406.11882 [pdf]

Applications of Explainable artificial intelligence in Earth system science

Authors: Feini Huang, Shijie Jiang, Lu Li, Yongkun Zhang, Ye Zhang, Ruqing Zhang, Qingliang Li, Danxi Li, Wei Shangguan, Yongjiu Dai

Abstract: In recent years, artificial intelligence (AI) rapidly accelerated its influence and is expected to promote the development of Earth system science (ESS) if properly harnessed. In application of AI to ESS, a significant hurdle lies in the interpretability conundrum, an inherent problem of black-box nature arising from the complexity of AI algorithms. To address this, explainable AI (XAI) offers a s… ▽ More In recent years, artificial intelligence (AI) rapidly accelerated its influence and is expected to promote the development of Earth system science (ESS) if properly harnessed. In application of AI to ESS, a significant hurdle lies in the interpretability conundrum, an inherent problem of black-box nature arising from the complexity of AI algorithms. To address this, explainable AI (XAI) offers a set of powerful tools that make the models more transparent. The purpose of this review is twofold: First, to provide ESS scholars, especially newcomers, with a foundational understanding of XAI, serving as a primer to inspire future research advances; second, to encourage ESS professionals to embrace the benefits of AI, free from preconceived biases due to its lack of interpretability. We begin with elucidating the concept of XAI, along with typical methods. We then delve into a review of XAI applications in the ESS literature, highlighting the important role that XAI has played in facilitating communication with AI model decisions, improving model diagnosis, and uncovering scientific insights. We identify four significant challenges that XAI faces within the ESS, and propose solutions. Furthermore, we provide a comprehensive illustration of multifaceted perspectives. Given the unique challenges in ESS, an interpretable hybrid approach that seamlessly integrates AI with domain-specific knowledge appears to be a promising way to enhance the utility of AI in ESS. A visionary outlook for ESS envisions a harmonious blend where process-based models govern the known, AI models explore the unknown, and XAI bridges the gap by providing explanations. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.11371 [pdf, other]

Video Frame Interpolation for Polarization via Swin-Transformer

Authors: Feng Huang, Xin Zhang, Yixuan Xu, Xuesong Wang, Xianyu Wu

Abstract: Video Frame Interpolation (VFI) has been extensively explored and demonstrated, yet its application to polarization remains largely unexplored. Due to the selective transmission of light by polarized filters, longer exposure times are typically required to ensure sufficient light intensity, which consequently lower the temporal sample rates. Furthermore, because polarization reflected by objects v… ▽ More Video Frame Interpolation (VFI) has been extensively explored and demonstrated, yet its application to polarization remains largely unexplored. Due to the selective transmission of light by polarized filters, longer exposure times are typically required to ensure sufficient light intensity, which consequently lower the temporal sample rates. Furthermore, because polarization reflected by objects varies with shooting perspective, focusing solely on estimating pixel displacement is insufficient to accurately reconstruct the intermediate polarization. To tackle these challenges, this study proposes a multi-stage and multi-scale network called Swin-VFI based on the Swin-Transformer and introduces a tailored loss function to facilitate the network's understanding of polarization changes. To ensure the practicality of our proposed method, this study evaluates its interpolated frames in Shape from Polarization (SfP) and Human Shape Reconstruction tasks, comparing them with other state-of-the-art methods such as CAIN, FLAVR, and VFIT. Experimental results demonstrate our approach's superior reconstruction accuracy across all tasks. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 18 pages, 10 figures, 7 tables, 73 citations

arXiv:2406.10900 [pdf, other]

AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

Authors: Xiyang Wu, Tianrui Guan, Dianqi Li, Shuaiyi Huang, Xiaoyu Liu, Xijun Wang, Ruiqi Xian, Abhinav Shrivastava, Furong Huang, Jordan Lee Boyd-Graber, Tianyi Zhou, Dinesh Manocha

Abstract: Large vision-language models (LVLMs) hallucinate: certain context cues in an image may trigger the language module's overconfident and incorrect reasoning on abnormal or hypothetical objects. Though a few benchmarks have been developed to investigate LVLM hallucinations, they mainly rely on hand-crafted corner cases whose fail patterns may hardly generalize, and finetuning on them could undermine… ▽ More Large vision-language models (LVLMs) hallucinate: certain context cues in an image may trigger the language module's overconfident and incorrect reasoning on abnormal or hypothetical objects. Though a few benchmarks have been developed to investigate LVLM hallucinations, they mainly rely on hand-crafted corner cases whose fail patterns may hardly generalize, and finetuning on them could undermine their validity. These motivate us to develop the first automatic benchmark generation approach, AUTOHALLUSION, that harnesses a few principal strategies to create diverse hallucination examples. It probes the language modules in LVLMs for context cues and uses them to synthesize images by: (1) adding objects abnormal to the context cues; (2) for two co-occurring objects, keeping one and excluding the other; or (3) removing objects closely tied to the context cues. It then generates image-based questions whose ground-truth answers contradict the language module's prior. A model has to overcome contextual biases and distractions to reach correct answers, while incorrect or inconsistent answers indicate hallucinations. AUTOHALLUSION enables us to create new benchmarks at the minimum cost and thus overcomes the fragility of hand-crafted benchmarks. It also reveals common failure patterns and reasons, providing key insights to detect, avoid, or control hallucinations. Comprehensive evaluations of top-tier LVLMs, e.g., GPT-4V(ision), Gemini Pro Vision, Claude 3, and LLaVA-1.5, show a 97.7% and 98.7% success rate of hallucination induction on synthetic and real-world datasets of AUTOHALLUSION, paving the way for a long battle against hallucinations. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.08426 [pdf, other]

Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

Authors: Zijin Hong, Zheng Yuan, Qinggang Zhang, Hao Chen, Junnan Dong, Feiran Huang, Xiao Huang

Abstract: Generating accurate SQL according to natural language questions (text-to-SQL) is a long-standing challenge due to the complexities involved in user question understanding, database schema comprehension, and SQL generation. Conventional text-to-SQL systems, comprising human engineering and deep neural networks, have made substantial progress. Subsequently, pre-trained language models (PLMs) have be… ▽ More Generating accurate SQL according to natural language questions (text-to-SQL) is a long-standing challenge due to the complexities involved in user question understanding, database schema comprehension, and SQL generation. Conventional text-to-SQL systems, comprising human engineering and deep neural networks, have made substantial progress. Subsequently, pre-trained language models (PLMs) have been developed and utilized for text-to-SQL tasks, achieving promising performance. As modern databases become more complex, the corresponding user questions also grow more challenging, leading PLMs with limited comprehension capabilities to produce incorrect SQL. This necessitates more sophisticated and tailored optimization methods for PLMs, which, in turn, restricts the applications of PLM-based systems. Most recently, large language models (LLMs) have demonstrated significant capabilities in natural language understanding as the model scale remains increasing. Therefore, integrating the LLM-based implementation can bring unique opportunities, improvements, and solutions to text-to-SQL research. In this survey, we present a comprehensive review of LLM-based text-to-SQL. Specifically, we propose a brief overview of the technical challenges and the evolutionary process of text-to-SQL. Then, we provide a detailed introduction to the datasets and metrics designed to evaluate text-to-SQL systems. After that, we present a systematic analysis of recent advances in LLM-based text-to-SQL. Finally, we discuss the remaining challenges in this field and propose expectations for future research directions. △ Less

Submitted 27 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08229 [pdf, other]

doi 10.1145/3626772.3657720

GPT4Rec: Graph Prompt Tuning for Streaming Recommendation

Authors: Peiyan Zhang, Yuchen Yan, Xi Zhang, Liying Kang, Chaozhuo Li, Feiran Huang, Senzhang Wang, Sunghun Kim

Abstract: In the realm of personalized recommender systems, the challenge of adapting to evolving user preferences and the continuous influx of new users and items is paramount. Conventional models, typically reliant on a static training-test approach, struggle to keep pace with these dynamic demands. Streaming recommendation, particularly through continual graph learning, has emerged as a novel solution. H… ▽ More In the realm of personalized recommender systems, the challenge of adapting to evolving user preferences and the continuous influx of new users and items is paramount. Conventional models, typically reliant on a static training-test approach, struggle to keep pace with these dynamic demands. Streaming recommendation, particularly through continual graph learning, has emerged as a novel solution. However, existing methods in this area either rely on historical data replay, which is increasingly impractical due to stringent data privacy regulations; or are inability to effectively address the over-stability issue; or depend on model-isolation and expansion strategies. To tackle these difficulties, we present GPT4Rec, a Graph Prompt Tuning method for streaming Recommendation. Given the evolving user-item interaction graph, GPT4Rec first disentangles the graph patterns into multiple views. After isolating specific interaction patterns and relationships in different views, GPT4Rec utilizes lightweight graph prompts to efficiently guide the model across varying interaction patterns within the user-item graph. Firstly, node-level prompts are employed to instruct the model to adapt to changes in the attributes or properties of individual nodes within the graph. Secondly, structure-level prompts guide the model in adapting to broader patterns of connectivity and relationships within the graph. Finally, view-level prompts are innovatively designed to facilitate the aggregation of information from multiple disentangled views. These prompt designs allow GPT4Rec to synthesize a comprehensive understanding of the graph, ensuring that all vital aspects of the user-item interactions are considered and effectively integrated. Experiments on four diverse real-world datasets demonstrate the effectiveness and efficiency of our proposal. △ Less

Submitted 11 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted by SIGIR 2024

ACM Class: H.3.3

arXiv:2406.08116 [pdf, other]

Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling

Authors: Zile Qiao, Wei Ye, Yong Jiang, Tong Mo, Pengjun Xie, Weiping Li, Fei Huang, Shikun Zhang

Abstract: Retrieval-augmented language models (RALMs) have recently shown great potential in mitigating the limitations of implicit knowledge in LLMs, such as untimely updating of the latest expertise and unreliable retention of long-tail knowledge. However, since the external knowledge base, as well as the retriever, can not guarantee reliability, potentially leading to the knowledge retrieved not being he… ▽ More Retrieval-augmented language models (RALMs) have recently shown great potential in mitigating the limitations of implicit knowledge in LLMs, such as untimely updating of the latest expertise and unreliable retention of long-tail knowledge. However, since the external knowledge base, as well as the retriever, can not guarantee reliability, potentially leading to the knowledge retrieved not being helpful or even misleading for LLM generation. In this paper, we introduce Supportiveness-based Knowledge Rewriting (SKR), a robust and pluggable knowledge rewriter inherently optimized for LLM generation. Specifically, we introduce the novel concept of "supportiveness"--which represents how effectively a knowledge piece facilitates downstream tasks--by considering the perplexity impact of augmented knowledge on the response text of a white-box LLM. Based on knowledge supportiveness, we first design a training data curation strategy for our rewriter model, effectively identifying and filtering out poor or irrelevant rewrites (e.g., with low supportiveness scores) to improve data efficacy. We then introduce the direct preference optimization (DPO) algorithm to align the generated rewrites to optimal supportiveness, guiding the rewriter model to summarize augmented content that better improves the final response. Comprehensive evaluations across six popular knowledge-intensive tasks and four LLMs have demonstrated the effectiveness and superiority of SKR. With only 7B parameters, SKR has shown better knowledge rewriting capability over GPT-4, the current state-of-the-art general-purpose LLM. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07381 [pdf, other]

World Models with Hints of Large Language Models for Goal Achieving

Authors: Zeyuan Liu, Ziyu Huan, Xiyao Wang, Jiafei Lyu, Jian Tao, Xiu Li, Furong Huang, Huazhe Xu

Abstract: Reinforcement learning struggles in the face of long-horizon tasks and sparse goals due to the difficulty in manual reward specification. While existing methods address this by adding intrinsic rewards, they may fail to provide meaningful guidance in long-horizon decision-making tasks with large state and action spaces, lacking purposeful exploration. Inspired by human cognition, we propose a new… ▽ More Reinforcement learning struggles in the face of long-horizon tasks and sparse goals due to the difficulty in manual reward specification. While existing methods address this by adding intrinsic rewards, they may fail to provide meaningful guidance in long-horizon decision-making tasks with large state and action spaces, lacking purposeful exploration. Inspired by human cognition, we propose a new multi-modal model-based RL approach named Dreaming with Large Language Models (DLLM). DLLM integrates the proposed hinting subgoals from the LLMs into the model rollouts to encourage goal discovery and reaching in challenging tasks. By assigning higher intrinsic rewards to samples that align with the hints outlined by the language model during model rollouts, DLLM guides the agent toward meaningful and efficient exploration. Extensive experiments demonstrate that the DLLM outperforms recent methods in various challenging, sparse-reward environments such as HomeGrid, Crafter, and Minecraft by 27.7\%, 21.1\%, and 9.9\%, respectively. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07362 [pdf, other]

AI.vs.Clinician: Unveiling Intricate Interactions Between AI and Clinicians through an Open-Access Database

Authors: Wanling Gao, Yuan Liu, Zhuoming Yu, Dandan Cui, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Fan Huang, Gangyuan Zhao, Chongrong Jiang, Tianyi Wei, Zhifei Zhang, Yunyou Huang, Jianfeng Zhan

Abstract: Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI f… ▽ More Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI from being translated into medical practice. To address this gap, we have curated a groundbreaking database called AI.vs.Clinician. This database is the first of its kind for studying the interactions between AI and clinicians. It derives from 7,500 collaborative diagnosis records on a life-threatening medical emergency -- Sepsis -- from 14 medical centers across China. For the patient cohorts well-chosen from MIMIC databases, the AI-related information comprises the model property, feature input, diagnosis decision, and inferred probabilities of sepsis onset presently and within next three hours. The clinician-related information includes the viewed examination data and sequence, viewed time, preliminary and final diagnosis decisions with or without AI assistance, and recommended treatment. △ Less

Submitted 15 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: 12 pages

arXiv:2406.06830 [pdf, other]

Cosmological Stasis from Dynamical Scalars: Tracking Solutions and the Possibility of a Stasis-Induced Inflation

Authors: Keith R. Dienes, Lucien Heurtier, Fei Huang, Tim M. P. Tait, Brooks Thomas

Abstract: It has recently been realized that many theories of physics beyond the Standard Model give rise to cosmological histories exhibiting extended epochs of cosmological stasis. During such epochs, the abundances of different energy components such as matter, radiation, and vacuum energy each remain fixed despite cosmological expansion. In previous analyses of the stasis phenomenon, these different ene… ▽ More It has recently been realized that many theories of physics beyond the Standard Model give rise to cosmological histories exhibiting extended epochs of cosmological stasis. During such epochs, the abundances of different energy components such as matter, radiation, and vacuum energy each remain fixed despite cosmological expansion. In previous analyses of the stasis phenomenon, these different energy components were modeled as fluids with fixed, unchanging equations of state. In this paper, by contrast, we consider more realistic systems involving dynamical scalars which pass through underdamping transitions as the universe expands. Indeed, such systems might be highly relevant for BSM scenarios involving higher-dimensional bulk moduli and inflatons. Remarkably, we find that stasis emerges even in such situations, despite the appearance of time-varying equations of state. Moreover, this stasis includes several new features which might have important phenomenological implications and applications. For example, in the presence of an additional "background" energy component, we find that the scalars evolve into a "tracking" stasis in which the stasis equation of state automatically tracks that of the background. This phenomenon exists even if the background has only a small initial abundance. We also discuss the intriguing possibility that our results might form the basis of a new "Stasis Inflation" scenario in which no ad-hoc inflaton potential is needed and in which there is no graceful-exit problem. Within such a scenario, the number of e-folds of cosmological expansion produced is directly related to the hierarchies between physical BSM mass scales. Moreover, non-zero matter and radiation abundances can be sustained throughout the inflationary epoch. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 25 pages, LaTeX, 11 figures

Report number: KCL-PH-TH/2024-23, UCI-HEP-TR-2024-09

arXiv:2406.05644 [pdf, other]

How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States

Authors: Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Yongbin Li

Abstract: Large language models (LLMs) rely on safety alignment to avoid responding to malicious user inputs. Unfortunately, jailbreak can circumvent safety guardrails, resulting in LLMs generating harmful content and raising concerns about LLM safety. Due to language models with intensive parameters often regarded as black boxes, the mechanisms of alignment and jailbreak are challenging to elucidate. In th… ▽ More Large language models (LLMs) rely on safety alignment to avoid responding to malicious user inputs. Unfortunately, jailbreak can circumvent safety guardrails, resulting in LLMs generating harmful content and raising concerns about LLM safety. Due to language models with intensive parameters often regarded as black boxes, the mechanisms of alignment and jailbreak are challenging to elucidate. In this paper, we employ weak classifiers to explain LLM safety through the intermediate hidden states. We first confirm that LLMs learn ethical concepts during pre-training rather than alignment and can identify malicious and normal inputs in the early layers. Alignment actually associates the early concepts with emotion guesses in the middle layers and then refines them to the specific reject tokens for safe generations. Jailbreak disturbs the transformation of early unethical classification into negative emotions. We conduct experiments on models from 7B to 70B across various model families to prove our conclusion. Overall, our paper indicates the intrinsical mechanism of LLM safety and how jailbreaks circumvent safety guardrails, offering a new perspective on LLM safety and reducing concerns. Our code is available at https://github.com/ydyjya/LLM-IHS-Explanation. △ Less

Submitted 13 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: 27 pages

arXiv:2406.03884 [pdf, ps, other]

Steady supersonic combustion flows with a contact discontinuity in two-dimensional finitely long nozzles

Authors: Junlei Gao, Feimin Huang, Jie Kuang, Dehua Wang, Wei Xiang

Abstract: In this paper, we are concerned with the two-dimensional steady supersonic combustion flows with a contact discontinuity moving through a nozzle of finite length. Mathematically, it can be formulated as a free boundary value problem governed by the two -dimensional steady combustion Euler equations with a contact discontinuity as the free boundary. The main mathematical difficulties are that the c… ▽ More In this paper, we are concerned with the two-dimensional steady supersonic combustion flows with a contact discontinuity moving through a nozzle of finite length. Mathematically, it can be formulated as a free boundary value problem governed by the two -dimensional steady combustion Euler equations with a contact discontinuity as the free boundary. The main mathematical difficulties are that the contact discontinuity is a characteristic free boundary and the equations for all states are coupled with each other due to the combustion process. We first employ the Lagrangian coordinate transformation to fix the free boundary. Then by introducing the flow slope and Bernoulli function, we further reduce the fixed boundary value problem into an initial boundary value problem for a first order hyperbolic system coupled with several ordinary differential equations. A new iteration scheme is developed near the background states by employing the intrinsic structure of the equation for the mass fraction of the non-combustion gas. We show that there is a fixed point for the iteration by deriving some novel $C^{1,α}$-estimates of the solutions and applying the fixed point theorem, and then the uniqueness of the fixed point is proved by a contraction argument. On the other hand, a quasi-one-dimensional approximate system is often used to simplify the two-dimensional steady supersonic combustion model. The error between these two systems is estimated. Finally, given a piece-wise $C^{1,α}$-solution containing a contact discontinuity with piece-wise constant states on the entrance of the nozzle, we can show that the solution is the piece-wise constant states with a straight contact discontinuity. △ Less

Submitted 9 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03836 [pdf, other]

Proactive Detection of Physical Inter-rule Vulnerabilities in IoT Services Using a Deep Learning Approach

Authors: Bing Huang, Chen Chen, Kwok-Yan Lam, Fuqun Huang

Abstract: Emerging Internet of Things (IoT) platforms provide sophisticated capabilities to automate IoT services by enabling occupants to create trigger-action rules. Multiple trigger-action rules can physically interact with each other via shared environment channels, such as temperature, humidity, and illumination. We refer to inter-rule interactions via shared environment channels as a physical inter-ru… ▽ More Emerging Internet of Things (IoT) platforms provide sophisticated capabilities to automate IoT services by enabling occupants to create trigger-action rules. Multiple trigger-action rules can physically interact with each other via shared environment channels, such as temperature, humidity, and illumination. We refer to inter-rule interactions via shared environment channels as a physical inter-rule vulnerability. Such vulnerability can be exploited by attackers to launch attacks against IoT systems. We propose a new framework to proactively discover possible physical inter-rule interactions from user requirement specifications (i.e., descriptions) using a deep learning approach. Specifically, we utilize the Transformer model to generate trigger-action rules from their associated descriptions. We discover two types of physical inter-rule vulnerabilities and determine associated environment channels using natural language processing (NLP) tools. Given the extracted trigger-action rules and associated environment channels, an approach is proposed to identify hidden physical inter-rule vulnerabilities among them. Our experiment on 27983 IFTTT style rules shows that the Transformer can successfully extract trigger-action rules from descriptions with 95.22% accuracy. We also validate the effectiveness of our approach on 60 SmartThings official IoT apps and discover 99 possible physical inter-rule vulnerabilities. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Accepted by IEEE ICWS 2024 Workshop

arXiv:2406.01422 [pdf, other]

How to Understand Whole Software Repository?

Authors: Yingwei Ma, Qingping Yang, Rongyu Cao, Binhua Li, Fei Huang, Yongbin Li

Abstract: Recently, Large Language Model (LLM) based agents have advanced the significant development of Automatic Software Engineering (ASE). Although verified effectiveness, the designs of the existing methods mainly focus on the local information of codes, e.g., issues, classes, and functions, leading to limitations in capturing the global context and interdependencies within the software system. From th… ▽ More Recently, Large Language Model (LLM) based agents have advanced the significant development of Automatic Software Engineering (ASE). Although verified effectiveness, the designs of the existing methods mainly focus on the local information of codes, e.g., issues, classes, and functions, leading to limitations in capturing the global context and interdependencies within the software system. From the practical experiences of the human SE developers, we argue that an excellent understanding of the whole repository will be the critical path to ASE. However, understanding the whole repository raises various challenges, e.g., the extremely long code input, the noisy code information, the complex dependency relationships, etc. To this end, we develop a novel ASE method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories. Specifically, we first condense the critical information of the whole repository into the repository knowledge graph in a top-to-down mode to decrease the complexity of repository. Subsequently, we empower the agents the ability of understanding whole repository by proposing a Monte Carlo tree search based repository exploration strategy. In addition, to better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan. Then, they can manipulate the tools to dynamically acquire information and generate the patches to solve the real-world GitHub issues. Extensive experiments demonstrate the superiority and effectiveness of the proposed RepoUnderstander. It achieved 18.5\% relative improvement on the SWE-bench Lite benchmark compared to SWE-agent. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01014 [pdf, other]

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

Authors: Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang

Abstract: Mobile device operation tasks are increasingly becoming a popular multi-modal AI application scenario. Current Multi-modal Large Language Models (MLLMs), constrained by their training data, lack the capability to function effectively as operation assistants. Instead, MLLM-based agents, which enhance capabilities through tool invocation, are gradually being applied to this scenario. However, the tw… ▽ More Mobile device operation tasks are increasingly becoming a popular multi-modal AI application scenario. Current Multi-modal Large Language Models (MLLMs), constrained by their training data, lack the capability to function effectively as operation assistants. Instead, MLLM-based agents, which enhance capabilities through tool invocation, are gradually being applied to this scenario. However, the two major navigation challenges in mobile device operation tasks, task progress navigation and focus content navigation, are significantly complicated under the single-agent architecture of existing work. This is due to the overly long token sequences and the interleaved text-image data format, which limit performance. To address these navigation challenges effectively, we propose Mobile-Agent-v2, a multi-agent architecture for mobile device operation assistance. The architecture comprises three agents: planning agent, decision agent, and reflection agent. The planning agent generates task progress, making the navigation of history operations more efficient. To retain focus content, we design a memory unit that updates with task progress. Additionally, to correct erroneous operations, the reflection agent observes the outcomes of each operation and handles any mistakes accordingly. Experimental results indicate that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture of Mobile-Agent. The code is open-sourced at https://github.com/X-PLUG/MobileAgent. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 22 pages, 11 figures, 10 Tables

arXiv:2406.00476 [pdf, other]

Revisiting Energy Distribution and Formation Rate of CHIME Fast Radio Bursts

Authors: K. J. Zhang, X. F. Dong, A. E. Rodin, V. A. Fedorova, Y. F. Huang, D. Li, P. Wang, Q. M. Li, C. Du, F. Xu, Z. B. Zhang

Abstract: Using a large sample of fast radio bursts (FRBs) from the first CHIME/FRB catalog, we apply the Lynden-Bell's c$^-$ method to study their energy function and formation rate evolutions with redshift. It is found with the non-parametric Kendell's $τ$ statistics that the FRB energy strongly evolves with the cosmological redshift as $E(z)\propto(1 + z)^{5.23}$. After removing the redshift dependence,… ▽ More Using a large sample of fast radio bursts (FRBs) from the first CHIME/FRB catalog, we apply the Lynden-Bell's c$^-$ method to study their energy function and formation rate evolutions with redshift. It is found with the non-parametric Kendell's $τ$ statistics that the FRB energy strongly evolves with the cosmological redshift as $E(z)\propto(1 + z)^{5.23}$. After removing the redshift dependence, the local energy distribution can be described by a broken power-law form of $Ψ(E_{0})\propto E_{0}^{-0.38}$ for the low-energy segment and $Ψ(E_{0})\propto E_{0}^{-2.01}$ for the high-energy segment with a dividing line of $\sim2.1\times10^{40} \rm erg$. Interestingly, we find that the formation rate of CHIME FRBs also evolves with redshift as $ρ(z)\propto(1+z)^{-4.73\pm0.08}$. The local formation rate $ρ(0)$ of the CHIME FRBs is constrained to be about $ 1.25\times 10^4\rm{\,Gpc^{-3}yr^{-1}}$ that is comparable with some previous estimations. In addition, we notice the formation rate not only exceeds the star formation rate at the lower redshifts but also always declines with the increase of redshift, which does not match the star formation history at all. Consequently, we suggest that most FRBs could originate from the older stellar populations. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.20495 [pdf, other]

Transfer Q Star: Principled Decoding for LLM Alignment

Authors: Souradip Chakraborty, Soumya Suvra Ghosal, Ming Yin, Dinesh Manocha, Mengdi Wang, Amrit Singh Bedi, Furong Huang

Abstract: Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$, thus providing a lightweight and adaptable frame… ▽ More Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$, thus providing a lightweight and adaptable framework for alignment. However, principled decoding methods rely on oracle access to an optimal Q-function ($Q^*$), which is often unavailable in practice. Hence, prior SoTA methods either approximate this $Q^*$ using $Q^{π_{\texttt{sft}}}$ (derived from the reference $\texttt{SFT}$ model) or rely on short-term rewards, resulting in sub-optimal decoding performance. In this work, we propose Transfer $Q^*$, which implicitly estimates the optimal value function for a target reward $r$ through a baseline model $ρ_{\texttt{BL}}$ aligned with a baseline reward $ρ_{\texttt{BL}}$ (which can be different from the target reward $r$). Theoretical analyses of Transfer $Q^*$ provide a rigorous characterization of its optimality, deriving an upper bound on the sub-optimality gap and identifying a hyperparameter to control the deviation from the pre-trained reference $\texttt{SFT}$ model based on user needs. Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods and demonstrates superior empirical performance across key metrics such as coherence, diversity, and quality in extensive tests on several synthetic and real datasets. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19856 [pdf, other]

DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories

Authors: Jia Li, Ge Li, Yunfei Zhao, Yongmin Li, Huanyu Liu, Hao Zhu, Lecheng Wang, Kaibo Liu, Zheng Fang, Lanshen Wang, Jiazheng Ding, Xuanming Zhang, Yuqi Zhu, Yihong Dong, Zhi Jin, Binhua Li, Fei Huang, Yongbin Li

Abstract: How to evaluate the coding abilities of Large Language Models (LLMs) remains an open question. We find that existing benchmarks are poorly aligned with real-world code repositories and are insufficient to evaluate the coding abilities of LLMs. To address the knowledge gap, we propose a new benchmark named DevEval, which has three advances. (1) DevEval aligns with real-world repositories in multi… ▽ More How to evaluate the coding abilities of Large Language Models (LLMs) remains an open question. We find that existing benchmarks are poorly aligned with real-world code repositories and are insufficient to evaluate the coding abilities of LLMs. To address the knowledge gap, we propose a new benchmark named DevEval, which has three advances. (1) DevEval aligns with real-world repositories in multiple dimensions, e.g., code distributions and dependency distributions. (2) DevEval is annotated by 13 developers and contains comprehensive annotations (e.g., requirements, original repositories, reference code, and reference dependencies). (3) DevEval comprises 1,874 testing samples from 117 repositories, covering 10 popular domains (e.g., Internet, Database). Based on DevEval, we propose repository-level code generation and evaluate 8 popular LLMs on DevEval (e.g., gpt-4, gpt-3.5, StarCoder 2, DeepSeek Coder, CodeLLaMa). Our experiments reveal these LLMs' coding abilities in real-world code repositories. For example, in our experiments, the highest Pass@1 of gpt-4-turbo is only 53.04%. We also analyze LLMs' failed cases and summarize their shortcomings. We hope DevEval can facilitate the development of LLMs in real code repositories. DevEval, prompts, and LLMs' predictions have been released. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted by the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). arXiv admin note: substantial text overlap with arXiv:2404.00599, arXiv:2401.06401

arXiv:2405.17931 [pdf, other]

Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment

Authors: Keming Lu, Bowen Yu, Fei Huang, Yang Fan, Runji Lin, Chang Zhou

Abstract: Effectively aligning Large Language Models (LLMs) with human-centric values while preventing the degradation of abilities acquired through Pre-training and Supervised Fine-tuning (SFT) poses a central challenge in Reinforcement Learning from Human Feedback (RLHF). In this paper, we first discover that interpolating RLHF and SFT model parameters can adjust the trade-off between human preference and… ▽ More Effectively aligning Large Language Models (LLMs) with human-centric values while preventing the degradation of abilities acquired through Pre-training and Supervised Fine-tuning (SFT) poses a central challenge in Reinforcement Learning from Human Feedback (RLHF). In this paper, we first discover that interpolating RLHF and SFT model parameters can adjust the trade-off between human preference and basic capabilities, thereby reducing the alignment tax at the cost of alignment reward. Inspired by this, we propose integrating the RL policy and SFT models at each optimization step in RLHF to continuously regulate the training direction, introducing the Online Merging Optimizer. Specifically, we merge gradients with the parameter differences between SFT and pretrained models, effectively steering the gradient towards maximizing rewards in the direction of SFT optimization. We demonstrate that our optimizer works well with different LLM families, such as Qwen and LLaMA, across various model sizes ranging from 1.8B to 8B, various RLHF algorithms like DPO and KTO, and existing model merging methods. It significantly enhances alignment reward while mitigating alignment tax, achieving higher overall performance across 14 benchmarks. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17535 [pdf, other]

Calibrated Dataset Condensation for Faster Hyperparameter Search

Authors: Mucong Ding, Yuancheng Xu, Tahseen Rabbani, Xiaoyu Liu, Brian Gravelle, Teresa Ranadive, Tai-Ching Tuan, Furong Huang

Abstract: Dataset condensation can be used to reduce the computational cost of training multiple models on a large dataset by condensing the training dataset into a small synthetic set. State-of-the-art approaches rely on matching the model gradients between the real and synthetic data. However, there is no theoretical guarantee of the generalizability of the condensed data: data condensation often generali… ▽ More Dataset condensation can be used to reduce the computational cost of training multiple models on a large dataset by condensing the training dataset into a small synthetic set. State-of-the-art approaches rely on matching the model gradients between the real and synthetic data. However, there is no theoretical guarantee of the generalizability of the condensed data: data condensation often generalizes poorly across hyperparameters/architectures in practice. This paper considers a different condensation objective specifically geared toward hyperparameter search. We aim to generate a synthetic validation dataset so that the validation-performance rankings of the models, with different hyperparameters, on the condensed and original datasets are comparable. We propose a novel hyperparameter-calibrated dataset condensation (HCDC) algorithm, which obtains the synthetic validation dataset by matching the hyperparameter gradients computed via implicit differentiation and efficient inverse Hessian approximation. Experiments demonstrate that the proposed framework effectively maintains the validation-performance rankings of models and speeds up hyperparameter/architecture search for tasks on both images and graphs. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17404 [pdf, other]

Spectral Greedy Coresets for Graph Neural Networks

Authors: Mucong Ding, Yinhan He, Jundong Li, Furong Huang

Abstract: The ubiquity of large-scale graphs in node-classification tasks significantly hinders the real-world applications of Graph Neural Networks (GNNs). Node sampling, graph coarsening, and dataset condensation are effective strategies for enhancing data efficiency. However, owing to the interdependence of graph nodes, coreset selection, which selects subsets of the data examples, has not been successfu… ▽ More The ubiquity of large-scale graphs in node-classification tasks significantly hinders the real-world applications of Graph Neural Networks (GNNs). Node sampling, graph coarsening, and dataset condensation are effective strategies for enhancing data efficiency. However, owing to the interdependence of graph nodes, coreset selection, which selects subsets of the data examples, has not been successfully applied to speed up GNN training on large graphs, warranting special treatment. This paper studies graph coresets for GNNs and avoids the interdependence issue by selecting ego-graphs (i.e., neighborhood subgraphs around a node) based on their spectral embeddings. We decompose the coreset selection problem for GNNs into two phases: a coarse selection of widely spread ego graphs and a refined selection to diversify their topologies. We design a greedy algorithm that approximately optimizes both objectives. Our spectral greedy graph coreset (SGGC) scales to graphs with millions of nodes, obviates the need for model pre-training, and applies to low-homophily graphs. Extensive experiments on ten datasets demonstrate that SGGC outperforms other coreset methods by a wide margin, generalizes well across GNN architectures, and is much faster than graph condensation. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16863 [pdf]

All-voltage control of Giant Magnetoresistance

Authors: Lujun Wei, Yiyang Zhang, Fei Huang, Jiajv Yang, Jincheng Peng, Yanghui Li, Yu Lu, Jiarui Chen, Tianyu Liu, Yong Pu, Jun Du

Abstract: The aim of voltage control of magnetism is to reduce the power consumption of spintronic devices. For a spin valve, the magnetization directions of two ferromagnetic layers determine the giant magnetoresistance magnitude. However, achieving all-voltage manipulation of the magnetization directions between parallel and antiparallel states is a significant challenge. Here, we demonstrate that by util… ▽ More The aim of voltage control of magnetism is to reduce the power consumption of spintronic devices. For a spin valve, the magnetization directions of two ferromagnetic layers determine the giant magnetoresistance magnitude. However, achieving all-voltage manipulation of the magnetization directions between parallel and antiparallel states is a significant challenge. Here, we demonstrate that by utilizing two exchange-biased Co/IrMn bilayers with opposite pinning directions and with ferromagnetic coupling through the Ruderman-Kittel-Kasuya-Yosida interaction between two Co layers, the magnetization directions of the two ferromagnetic layers of a spin valve can be switched between parallel and antiparallel states through allvoltage-induced strain control. The all-voltage controlled giant magnetoresistance is repeatable and nonvolatile. The rotation of magnetizations in the two Co layers under voltages, from antiparallel to parallel states, occurs in opposite directions as revealed through simulations utilizing the Landau-Lifshitz-Gilbert equation. This work can provide valuable reference for the development of low-power all-voltage-controlled spintronic devices. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.15973 [pdf, other]

Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement

Authors: Xiyao Wang, Jiuhai Chen, Zhaoyang Wang, Yuhang Zhou, Yiyang Zhou, Huaxiu Yao, Tianyi Zhou, Tom Goldstein, Parminder Bhatia, Furong Huang, Cao Xiao

Abstract: Large vision-language models (LVLMs) have achieved impressive results in various visual question-answering and reasoning tasks through vision instruction tuning on specific datasets. However, there is still significant room for improvement in the alignment between visual and language modalities. Previous methods to enhance this alignment typically require external models or data, heavily depending… ▽ More Large vision-language models (LVLMs) have achieved impressive results in various visual question-answering and reasoning tasks through vision instruction tuning on specific datasets. However, there is still significant room for improvement in the alignment between visual and language modalities. Previous methods to enhance this alignment typically require external models or data, heavily depending on their capabilities and quality, which inevitably sets an upper bound on performance. In this paper, we propose SIMA, a framework that enhances visual and language modality alignment through self-improvement, eliminating the needs for external models or data. SIMA leverages prompts from existing vision instruction tuning datasets to self-generate responses and employs an in-context self-critic mechanism to select response pairs for preference tuning. The key innovation is the introduction of three vision metrics during the in-context self-critic process, which can guide the LVLM in selecting responses that enhance image comprehension. Through experiments across 14 hallucination and comprehensive benchmarks, we demonstrate that SIMA not only improves model performance across all benchmarks but also achieves superior modality alignment, outperforming previous approaches. △ Less

Submitted 7 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 15 pages, 8 figures

arXiv:2405.14768 [pdf, other]

WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

Authors: Peng Wang, Zexi Li, Ningyu Zhang, Ziwen Xu, Yunzhi Yao, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen

Abstract: Large language models (LLMs) need knowledge updates to meet the ever-growing world facts and correct the hallucinated responses, facilitating the methods of lifelong model editing. Where the updated knowledge resides in memories is a fundamental question for model editing. In this paper, we find that editing either long-term memory (direct model parameters) or working memory (non-parametric knowle… ▽ More Large language models (LLMs) need knowledge updates to meet the ever-growing world facts and correct the hallucinated responses, facilitating the methods of lifelong model editing. Where the updated knowledge resides in memories is a fundamental question for model editing. In this paper, we find that editing either long-term memory (direct model parameters) or working memory (non-parametric knowledge of neural network activations/representations by retrieval) will result in an impossible triangle -- reliability, generalization, and locality can not be realized together in the lifelong editing settings. For long-term memory, directly editing the parameters will cause conflicts with irrelevant pretrained knowledge or previous edits (poor reliability and locality). For working memory, retrieval-based activations can hardly make the model understand the edits and generalize (poor generalization). Therefore, we propose WISE to bridge the gap between memories. In WISE, we design a dual parametric memory scheme, which consists of the main memory for the pretrained knowledge and a side memory for the edited knowledge. We only edit the knowledge in the side memory and train a router to decide which memory to go through when given a query. For continual editing, we devise a knowledge-sharding mechanism where different sets of edits reside in distinct subspaces of parameters, and are subsequently merged into a shared memory without conflicts. Extensive experiments show that WISE can outperform previous model editing methods and overcome the impossible triangle under lifelong model editing of question answering, hallucination, and out-of-distribution settings across trending LLM architectures, e.g., GPT, LLaMA, and Mistral. Code will be released at https://github.com/zjunlp/EasyEdit. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Work in progress

arXiv:2405.14431 [pdf, other]

RaFe: Ranking Feedback Improves Query Rewriting for RAG

Authors: Shengyu Mao, Yong Jiang, Boli Chen, Xiao Li, Peng Wang, Xinyu Wang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang

Abstract: As Large Language Models (LLMs) and Retrieval Augmentation Generation (RAG) techniques have evolved, query rewriting has been widely incorporated into the RAG system for downstream tasks like open-domain QA. Many works have attempted to utilize small models with reinforcement learning rather than costly LLMs to improve query rewriting. However, current methods require annotations (e.g., labeled re… ▽ More As Large Language Models (LLMs) and Retrieval Augmentation Generation (RAG) techniques have evolved, query rewriting has been widely incorporated into the RAG system for downstream tasks like open-domain QA. Many works have attempted to utilize small models with reinforcement learning rather than costly LLMs to improve query rewriting. However, current methods require annotations (e.g., labeled relevant documents or downstream answers) or predesigned rewards for feedback, which lack generalization, and fail to utilize signals tailored for query rewriting. In this paper, we propose ours, a framework for training query rewriting models free of annotations. By leveraging a publicly available reranker, ours~provides feedback aligned well with the rewriting objectives. Experimental results demonstrate that ours~can obtain better performance than baselines. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 16 pages

arXiv:2405.14205 [pdf, other]

Agent Planning with World Knowledge Model

Authors: Shuofei Qiao, Runnan Fang, Ningyu Zhang, Yuqi Zhu, Xiang Chen, Shumin Deng, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen

Abstract: Recent endeavors towards directly using large language models (LLMs) as agent models to execute interactive planning tasks have shown commendable results. Despite their achievements, however, they still struggle with brainless trial-and-error in global planning and generating hallucinatory actions in local planning due to their poor understanding of the ''real'' physical world. Imitating humans' m… ▽ More Recent endeavors towards directly using large language models (LLMs) as agent models to execute interactive planning tasks have shown commendable results. Despite their achievements, however, they still struggle with brainless trial-and-error in global planning and generating hallucinatory actions in local planning due to their poor understanding of the ''real'' physical world. Imitating humans' mental world knowledge model which provides global prior knowledge before the task and maintains local dynamic knowledge during the task, in this paper, we introduce parametric World Knowledge Model (WKM) to facilitate agent planning. Concretely, we steer the agent model to self-synthesize knowledge from both expert and sampled trajectories. Then we develop WKM, providing prior task knowledge to guide the global planning and dynamic state knowledge to assist the local planning. Experimental results on three complex real-world simulated datasets with three state-of-the-art open-source LLMs, Mistral-7B, Gemma-7B, and Llama-3-8B, demonstrate that our method can achieve superior performance compared to various strong baselines. Besides, we analyze to illustrate that our WKM can effectively alleviate the blind trial-and-error and hallucinatory action issues, providing strong support for the agent's understanding of the world. Other interesting findings include: 1) our instance-level task knowledge can generalize better to unseen tasks, 2) weak WKM can guide strong agent model planning, and 3) unified WKM training has promising potential for further development. Code will be available at https://github.com/zjunlp/WKM. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Work in progress

arXiv:2405.13879 [pdf, other]

FACT or Fiction: Can Truthful Mechanisms Eliminate Federated Free Riding?

Authors: Marco Bornstein, Amrit Singh Bedi, Abdirisak Mohamed, Furong Huang

Abstract: Standard federated learning (FL) approaches are vulnerable to the free-rider dilemma: participating agents can contribute little to nothing yet receive a well-trained aggregated model. While prior mechanisms attempt to solve the free-rider dilemma, none have addressed the issue of truthfulness. In practice, adversarial agents can provide false information to the server in order to cheat its way ou… ▽ More Standard federated learning (FL) approaches are vulnerable to the free-rider dilemma: participating agents can contribute little to nothing yet receive a well-trained aggregated model. While prior mechanisms attempt to solve the free-rider dilemma, none have addressed the issue of truthfulness. In practice, adversarial agents can provide false information to the server in order to cheat its way out of contributing to federated training. In an effort to make free-riding-averse federated mechanisms truthful, and consequently less prone to breaking down in practice, we propose FACT. FACT is the first federated mechanism that: (1) eliminates federated free riding by using a penalty system, (2) ensures agents provide truthful information by creating a competitive environment, and (3) encourages agent participation by offering better performance than training alone. Empirically, FACT avoids free-riding when agents are untruthful, and reduces agent loss by over 4x. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 18 pages, 5 figures

arXiv:2405.13045 [pdf, other]

CoLay: Controllable Layout Generation through Multi-conditional Latent Diffusion

Authors: Chin-Yi Cheng, Ruiqi Gao, Forrest Huang, Yang Li

Abstract: Layout design generation has recently gained significant attention due to its potential applications in various fields, including UI, graphic, and floor plan design. However, existing models face two main challenges that limits their adoption in practice. Firstly, the limited expressiveness of individual condition types used in previous works restricts designers' ability to convey complex design i… ▽ More Layout design generation has recently gained significant attention due to its potential applications in various fields, including UI, graphic, and floor plan design. However, existing models face two main challenges that limits their adoption in practice. Firstly, the limited expressiveness of individual condition types used in previous works restricts designers' ability to convey complex design intentions and constraints. Secondly, most existing models focus on generating labels and coordinates, while real layouts contain a range of style properties. To address these limitations, we propose a novel framework, CoLay, that integrates multiple condition types and generates complex layouts with diverse style properties. Our approach outperforms prior works in terms of generation quality and condition satisfaction while empowering users to express their design intents using a flexible combination of modalities, including natural language prompts, layout guidelines, element types, and partially completed designs. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.13026 [pdf, other]

Leveraging Human Revisions for Improving Text-to-Layout Models

Authors: Amber Xie, Chin-Yi Cheng, Forrest Huang, Yang Li

Abstract: Learning from human feedback has shown success in aligning large, pretrained models with human values. Prior works have mostly focused on learning from high-level labels, such as preferences between pairs of model outputs. On the other hand, many domains could benefit from more involved, detailed feedback, such as revisions, explanations, and reasoning of human users. Our work proposes using nuanc… ▽ More Learning from human feedback has shown success in aligning large, pretrained models with human values. Prior works have mostly focused on learning from high-level labels, such as preferences between pairs of model outputs. On the other hand, many domains could benefit from more involved, detailed feedback, such as revisions, explanations, and reasoning of human users. Our work proposes using nuanced feedback through the form of human revisions for stronger alignment. In this paper, we ask expert designers to fix layouts generated from a generative layout model that is pretrained on a large-scale dataset of mobile screens. Then, we train a reward model based on how human designers revise these generated layouts. With the learned reward model, we optimize our model with reinforcement learning from human feedback (RLHF). Our method, Revision-Aware Reward Models ($\method$), allows a generative text-to-layout model to produce more modern, designer-aligned layouts, showing the potential for utilizing human revisions and stronger forms of feedback in improving generative models. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.11431 [pdf, other]

Review of deep learning models for crypto price prediction: implementation and evaluation

Authors: Jingyang Wu, Xinyi Zhang, Fangyixuan Huang, Haochen Zhou, Rohtiash Chandra

Abstract: There has been much interest in accurate cryptocurrency price forecast models by investors and researchers. Deep Learning models are prominent machine learning techniques that have transformed various fields and have shown potential for finance and economics. Although various deep learning models have been explored for cryptocurrency price forecasting, it is not clear which models are suitable due… ▽ More There has been much interest in accurate cryptocurrency price forecast models by investors and researchers. Deep Learning models are prominent machine learning techniques that have transformed various fields and have shown potential for finance and economics. Although various deep learning models have been explored for cryptocurrency price forecasting, it is not clear which models are suitable due to high market volatility. In this study, we review the literature about deep learning for cryptocurrency price forecasting and evaluate novel deep learning models for cryptocurrency stock price prediction. Our deep learning models include variants of long short-term memory (LSTM) recurrent neural networks, variants of convolutional neural networks (CNNs), and the Transformer model. We evaluate univariate and multivariate approaches for multi-step ahead predicting of cryptocurrencies close-price. We also carry out volatility analysis on the four cryptocurrencies which reveals significant fluctuations in their prices throughout the COVID-19 pandemic. Additionally, we investigate the prediction accuracy of two scenarios identified by different training sets for the models. First, we use the pre-COVID-19 datasets to model cryptocurrency close-price forecasting during the early period of COVID-19. Secondly, we utilise data from the COVID-19 period to predict prices for 2023 to 2024. Our results show that the convolutional LSTM with a multivariate approach provides the best prediction accuracy in two major experimental settings. Our results also indicate that the multivariate deep learning models exhibit better performance in forecasting four different cryptocurrencies when compared to the univariate models. △ Less

Submitted 2 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.07426 [pdf]

Multiple Bound States in the Continuum: Towards Intense Terahertz Matter Interaction

Authors: Quanlong Yang, Zhibo Yao, Lei Xu, Yapeng Dou, Lingli Ba, Fan Huang, Quan Xu, Longqing Cong, Jianqiang Gu, Junliang Yang, Mohsen Rahmani, Jiaguang Han, Ilya Shadrivov

Abstract: Bound states in the continuum (BICs) are an excellent platform enabling highly efficient light-matter interaction in applications for lasing, nonlinear generation, and sensing. However, the current focus in implementing BICs has primarily been on single sharp resonances, limiting the extent of electric field enhancement for multiple resonances. In this study, we conducted experimental demonstratio… ▽ More Bound states in the continuum (BICs) are an excellent platform enabling highly efficient light-matter interaction in applications for lasing, nonlinear generation, and sensing. However, the current focus in implementing BICs has primarily been on single sharp resonances, limiting the extent of electric field enhancement for multiple resonances. In this study, we conducted experimental demonstrations to showcase how metasurfaces can enable the control of symmetry-broken and Friedrich-Wintgen BICs by leveraging the asymmetry of split resonant rings. This approach allows for the existence of multiple free-control BIC resonances and tailored enhancement of controlling light-matter interactions. We have conducted further experiments to validate the effectiveness and performance of our approach for identification of the distinct fingerprint of α-lactose with high sensitivity using only one single metasurface. These findings present a novel and efficient platform for the development of miniaturized and chip-scale photonics devices with intense light-matter interaction. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.07230 [pdf, other]

Acoustic Positioning for Deep Sea Neutrino Telescopes with a System of Piezo Sensors Integrated into Glass Spheres

Authors: A. Albert, S. Alves, M. André, M. Ardid, S. Ardid, J. -J. Aubert, J. Aublin, B. Baret, S. Basa, Y. Becherini, B. Belhorma, M. Bendahman, F. Benfenati, V. Bertin, S. Biagi, J. Boumaaza, M. Bouta, M. C. Bouwhuis, H. Brânzaş, R. Bruijn, J. Brunner, J. Busto, B. Caiffi, D. Calvo, S. Campion , et al. (115 additional authors not shown)

Abstract: Position calibration in the deep sea is typically done by means of acoustic multilateration using three or more acoustic emitters installed at known positions. Rather than using hydrophones as receivers that are exposed to the ambient pressure, the sound signals can be coupled to piezo ceramics glued to the inside of existing containers for electronics or measuring instruments of a deep sea infras… ▽ More Position calibration in the deep sea is typically done by means of acoustic multilateration using three or more acoustic emitters installed at known positions. Rather than using hydrophones as receivers that are exposed to the ambient pressure, the sound signals can be coupled to piezo ceramics glued to the inside of existing containers for electronics or measuring instruments of a deep sea infrastructure. The ANTARES neutrino telescope operated from 2006 until 2022 in the Mediterranean Sea at a depth exceeding 2000m. It comprised nearly 900 glass spheres with 432mm diameter and 15mm thickness, equipped with photomultiplier tubes to detect Cherenkov light from tracks of charged elementary particles. In an experimental setup within ANTARES, piezo sensors have been glued to the inside of such - otherwise empty - glass spheres. These sensors recorded signals from acoustic emitters with frequencies from 46545 to 60235Hz. Two waves propagating through the glass sphere are found as a result of the excitation by the waves in the water. These can be qualitatively associated with symmetric and asymmetric Lamb-like waves of zeroth order: a fast (early) one with $v_e \approx 5$mm/$μ$s and a slow (late) one with $v_\ell \approx 2$mm/$μ$s. Taking these findings into account improves the accuracy of the position calibration. The results can be transferred to the KM3NeT neutrino telescope, currently under construction at multiple sites in the Mediterranean Sea, for which the concept of piezo sensors glued to the inside of glass spheres has been adapted for monitoring the positions of the photomultiplier tubes. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: submitted to "Experimental Astronomy"

arXiv:2405.05497 [pdf, other]

Multi-Level Feature Fusion Network for Lightweight Stereo Image Super-Resolution

Authors: Yunxiang Li, Wenbin Zou, Qiaomu Wei, Feng Huang, Jing Wu

Abstract: Stereo image super-resolution utilizes the cross-view complementary information brought by the disparity effect of left and right perspective images to reconstruct higher-quality images. Cascading feature extraction modules and cross-view feature interaction modules to make use of the information from stereo images is the focus of numerous methods. However, this adds a great deal of network parame… ▽ More Stereo image super-resolution utilizes the cross-view complementary information brought by the disparity effect of left and right perspective images to reconstruct higher-quality images. Cascading feature extraction modules and cross-view feature interaction modules to make use of the information from stereo images is the focus of numerous methods. However, this adds a great deal of network parameters and structural redundancy. To facilitate the application of stereo image super-resolution in downstream tasks, we propose an efficient Multi-Level Feature Fusion Network for Lightweight Stereo Image Super-Resolution (MFFSSR). Specifically, MFFSSR utilizes the Hybrid Attention Feature Extraction Block (HAFEB) to extract multi-level intra-view features. Using the channel separation strategy, HAFEB can efficiently interact with the embedded cross-view interaction module. This structural configuration can efficiently mine features inside the view while improving the efficiency of cross-view information sharing. Hence, reconstruct image details and textures more accurately. Abundant experiments demonstrate the effectiveness of MFFSSR. We achieve superior performance with fewer parameters. The source code is available at https://github.com/KarosLYX/MFFSSR. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 10 pages, 7 figures, CVPRWorkshop NTIRE2024

arXiv:2405.05094 [pdf, other]

Mass function of stellar black holes as revealed by the LIGO-Virgo-KAGRA observations

Authors: Xiao-Fei Dong, Yong-Feng Huang, Zhi-Bin Zhang, Xiu-Juan Li, Ze-Cheng Zou, Chen-Ran Hu, Chen Deng, Yang Liu

Abstract: Ninety gravitational wave events have been detected by the LIGO-Virgo-KAGRA network and are released in the Gravitational-Wave Transient Catalog. Among these events, 83 cases are definitely binary black hole mergers since the masses of all the objects involved significantly exceed the upper limit of neutron stars. The black holes in these merger events naturally form two interesting samples, a pre… ▽ More Ninety gravitational wave events have been detected by the LIGO-Virgo-KAGRA network and are released in the Gravitational-Wave Transient Catalog. Among these events, 83 cases are definitely binary black hole mergers since the masses of all the objects involved significantly exceed the upper limit of neutron stars. The black holes in these merger events naturally form two interesting samples, a pre-merger sample that includes all the black holes before the mergers and a post-merger sample that consists of the black holes generated during the merging processes. The former represents black holes that once existed in the Universe, while the latter represents newly born black holes. Here we present a statistical analysis on these two samples. The non-parametric $τ$ statistic method is adopted to correct for the observational selection effect. The Lynden-Bell's $C^{-}$ method is further applied to derive the mass distribution and density function of black holes. It is found that the mass distribution can be expressed as a broken power-law function. More interestingly, the power-law index in the high mass region is comparable for the two samples. The number density of black holes is found to depend on redshift as $ρ(z) \propto z^{-2.06}$-$z^{-2.12}$ based on the two samples. Implications of these findings on the origin of black holes are discussed. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 14 pages, 5 figures, 1 table

MSC Class: 85-08; 62L10; 62G09; 62E10 ACM Class: F.2.1

arXiv:2405.03296 [pdf, ps, other]

Coefficient Decomposition for Spectral Graph Convolution

Authors: Feng Huang, Wen Zhang

Abstract: Spectral graph convolutional network (SGCN) is a kind of graph neural networks (GNN) based on graph signal filters, and has shown compelling expressivity for modeling graph-structured data. Most SGCNs adopt polynomial filters and learn the coefficients from the training data. Many of them focus on which polynomial basis leads to optimal expressive power and models' architecture is little discussed… ▽ More Spectral graph convolutional network (SGCN) is a kind of graph neural networks (GNN) based on graph signal filters, and has shown compelling expressivity for modeling graph-structured data. Most SGCNs adopt polynomial filters and learn the coefficients from the training data. Many of them focus on which polynomial basis leads to optimal expressive power and models' architecture is little discussed. In this paper, we propose a general form in terms of spectral graph convolution, where the coefficients of polynomial basis are stored in a third-order tensor. Then, we show that the convolution block in existing SGCNs can be derived by performing a certain coefficient decomposition operation on the coefficient tensor. Based on the generalized view, we develop novel spectral graph convolutions CoDeSGC-CP and -Tucker by tensor decomposition CP and Tucker on the coefficient tensor. Extensive experimental results demonstrate that the proposed convolutions achieve favorable performance improvements. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.03126 [pdf]

Infrared Polarization Imaging-based Non-destructive Thermography Inspection

Authors: Xianyu Wu, Bin Zhou, Peng Lin, Rongjin Cao, Feng Huang

Abstract: Infrared pulse thermography non-destructive testing (NDT) method is developed based on the difference in the infrared radiation intensity emitted by defective and non-defective areas of an object. However, when the radiation intensity of the defective target is similar to that of the non-defective area of the object, the detection results are poor. To address this issue, this study investigated th… ▽ More Infrared pulse thermography non-destructive testing (NDT) method is developed based on the difference in the infrared radiation intensity emitted by defective and non-defective areas of an object. However, when the radiation intensity of the defective target is similar to that of the non-defective area of the object, the detection results are poor. To address this issue, this study investigated the polarization characteristics of the infrared radiation of different materials. Simulation results showed that the degree of infrared polarization of the object surface changed regularly with changes in thermal environment radiation. An infrared polarization imaging-based NDT method was proposed and demonstrated using specimens with four different simulated defective areas, which were designed and fabricated using four different materials. The experimental results were consistent with the simulation results, thereby proving the effectiveness of the proposed method. Compared with the infrared-radiation-intensity-based NDT method, the proposed method improved the image detail presentation and detection accuracy. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Showing 1–50 of 1,312 results for author: Huang, F