subscribe to arXiv mailings

Accelerating Eigenvalue Computation for Nuclear Structure Calculations via Perturbative Corrections

Authors: Dong Min Roh, Esmond Ng, Chao Yang, Dean Lee, Pieter Maris, James P. Vary

Abstract: We present a new method for computing the lowest few eigenvalues and the corresponding eigenvectors of a nuclear many-body Hamiltonian represented in a truncated configuration interaction subspace, i.e., the no-core shell model (NCSM). The method uses the hierarchical structure of the NCSM Hamiltonian to partition the Hamiltonian as the sum of two matrices. The first matrix corresponds to the Hami… ▽ More We present a new method for computing the lowest few eigenvalues and the corresponding eigenvectors of a nuclear many-body Hamiltonian represented in a truncated configuration interaction subspace, i.e., the no-core shell model (NCSM). The method uses the hierarchical structure of the NCSM Hamiltonian to partition the Hamiltonian as the sum of two matrices. The first matrix corresponds to the Hamiltonian represented in a small configuration space, whereas the second is viewed as the perturbation to the first matrix. Eigenvalues and eigenvectors of the first matrix can be computed efficiently. Perturbative corrections to the eigenvectors of the first matrix can be obtained from the solutions of a sequence of linear systems of equations defined in the small configuration space. These correction vectors can be combined with the approximate eigenvectors of the first matrix to construct a subspace from which more accurate approximations of the desired eigenpairs can be obtained. We call this method a Subspace Projection with Perturbative Corrections (SPPC) method. We show by numerical examples that the SPPC method can be more efficient than conventional iterative methods for solving large-scale eigenvalue problems such as the Lanczos, block Lanczos and the locally optimal block preconditioned conjugate gradient (LOBPCG) method. The method can also be combined with other methods to avoid convergence stagnation. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.07924 [pdf, other]

Solving General Natural-Language-Description Optimization Problems with Large Language Models

Authors: Jihai Zhang, Wei Wang, Siyan Guo, Li Wang, Fangquan Lin, Cheng Yang, Wotao Yin

Abstract: Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this p… ▽ More Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this paper, we propose a novel framework called OptLLM that augments LLMs with external solvers. Specifically, OptLLM accepts user queries in natural language, convert them into mathematical formulations and programming codes, and calls the solvers to calculate the results for decision-making. In addition, OptLLM supports multi-round dialogues to gradually refine the modeling and solving of optimization problems. To illustrate the effectiveness of OptLLM, we provide tutorials on three typical optimization applications and conduct experiments on both prompt-based GPT models and a fine-tuned Qwen model using a large-scale selfdeveloped optimization dataset. Experimental results show that OptLLM works with various LLMs, and the fine-tuned model achieves an accuracy boost compared to the promptbased models. Some features of OptLLM framework have been available for trial since June 2023 (https://opt.alibabacloud.com/chat or https://opt.aliyun.com/chat). △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.07690 [pdf]

High power GaSb-based distributed feedback laser with laterally coupled dielectric gratings at 1.95μm

Authors: Zhengqing Ding, Juntian Cao, Kun Zhan, Yihang Chen, Lidan Zhou, Hao Tan, Chenao Yang, Ying Yu, Zhichuan Niu, Siyuan Yu

Abstract: Traditional Distributed Feedback (DFB) or Distributed Bragg Reflector (DBR) lasers typically utilize buried gratings as frequency-selective optical feedback mechanisms. However, the fabrication of such gratings often necessitates regrowth processes, which can pose technical challenges for materials platforms such as GaAs and GaSb. Metal gratings were also used for GaSb lasers but they introduce ad… ▽ More Traditional Distributed Feedback (DFB) or Distributed Bragg Reflector (DBR) lasers typically utilize buried gratings as frequency-selective optical feedback mechanisms. However, the fabrication of such gratings often necessitates regrowth processes, which can pose technical challenges for materials platforms such as GaAs and GaSb. Metal gratings were also used for GaSb lasers but they introduce additional absorption loss that limits device efficiency and output power. In this paper, we introduce a novel laterally coupled dielectric Bragg grating structure, which enables highly controllable, deterministic, and stable coupling between the grating and the optical mode. Our device demonstrates a continuous-wave output power of 47.02 mW at room temperature, exhibiting stable single-mode operation from 300-1000 mA and achieving a maximum side mode suppression ratio of 46.7 dB. These results underscore the innovative lateral coupled dielectric grating as a feasible and technologically superior approach for fabricating DFB and DBR lasers, which hold universal applicability across different material platforms and wavelength bands. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 9 pages, 7 figures, 1 table

MSC Class: 78A60 ACM Class: J.2.6

arXiv:2407.07061 [pdf, other]

Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

Authors: Weize Chen, Ziming You, Ran Li, Yitong Guan, Chen Qian, Chenyang Zhao, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun

Abstract: The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distributed environments, as most frameworks are limited to… ▽ More The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distributed environments, as most frameworks are limited to single-device setups. Furthermore, these frameworks often rely on hard-coded communication pipelines, limiting their adaptability to dynamic task requirements. Inspired by the concept of the Internet, we propose the Internet of Agents (IoA), a novel framework that addresses these limitations by providing a flexible and scalable platform for LLM-based multi-agent collaboration. IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control. Through extensive experiments on general assistant tasks, embodied AI tasks, and retrieval-augmented generation benchmarks, we demonstrate that IoA consistently outperforms state-of-the-art baselines, showcasing its ability to facilitate effective collaboration among heterogeneous agents. IoA represents a step towards linking diverse agents in an Internet-like environment, where agents can seamlessly collaborate to achieve greater intelligence and capabilities. Our codebase has been released at \url{https://github.com/OpenBMB/IoA}. △ Less

Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: work in progress

arXiv:2407.06957 [pdf, other]

Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models

Authors: Yi-Cheng Lin, Tzu-Quan Lin, Chih-Kai Yang, Ke-Han Lu, Wei-Chih Chen, Chun-Yi Kuan, Hung-yi Lee

Abstract: Speech Integrated Large Language Models (SILLMs) combine large language models with speech perception to perform diverse tasks, such as emotion recognition to speaker verification, demonstrating universal audio understanding capability. However, these models may amplify biases present in training data, potentially leading to biased access to information for marginalized groups. This work introduce… ▽ More Speech Integrated Large Language Models (SILLMs) combine large language models with speech perception to perform diverse tasks, such as emotion recognition to speaker verification, demonstrating universal audio understanding capability. However, these models may amplify biases present in training data, potentially leading to biased access to information for marginalized groups. This work introduces a curated spoken bias evaluation toolkit and corresponding dataset. We evaluate gender bias in SILLMs across four semantic-related tasks: speech-to-text translation (STT), spoken coreference resolution (SCR), spoken sentence continuation (SSC), and spoken question answering (SQA). Our analysis reveals that bias levels are language-dependent and vary with different evaluation methods. Our findings emphasize the necessity of employing multiple approaches to comprehensively assess biases in SILLMs, providing insights for developing fairer SILLM systems. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06103 [pdf, other]

QTRL: Toward Practical Quantum Reinforcement Learning via Quantum-Train

Authors: Chen-Yu Liu, Chu-Hsuan Abraham Lin, Chao-Han Huck Yang, Kuan-Cheng Chen, Min-Hsiu Hsieh

Abstract: Quantum reinforcement learning utilizes quantum layers to process information within a machine learning model. However, both pure and hybrid quantum reinforcement learning face challenges such as data encoding and the use of quantum computers during the inference stage. We apply the Quantum-Train method to reinforcement learning tasks, called QTRL, training the classical policy network model using… ▽ More Quantum reinforcement learning utilizes quantum layers to process information within a machine learning model. However, both pure and hybrid quantum reinforcement learning face challenges such as data encoding and the use of quantum computers during the inference stage. We apply the Quantum-Train method to reinforcement learning tasks, called QTRL, training the classical policy network model using a quantum machine learning model with polylogarithmic parameter reduction. This QTRL approach eliminates the data encoding issues of conventional quantum machine learning and reduces the training parameters of the corresponding classical policy network. Most importantly, the training result of the QTRL is a classical model, meaning the inference stage only requires classical computer. This is extremely practical and cost-efficient for reinforcement learning tasks, where low-latency feedback from the policy model is essential. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 6 pages, 1 figure

arXiv:2407.05934 [pdf, other]

Graph Anomaly Detection with Noisy Labels by Reinforcement Learning

Authors: Zhu Wang, Shuang Zhou, Junnan Dong, Chang Yang, Xiao Huang, Shengjie Zhao

Abstract: Graph anomaly detection (GAD) has been widely applied in many areas, e.g., fraud detection in finance and robot accounts in social networks. Existing methods are dedicated to identifying the outlier nodes that deviate from normal ones. While they heavily rely on high-quality annotation, which is hard to obtain in real-world scenarios, this could lead to severely degraded performance based on noisy… ▽ More Graph anomaly detection (GAD) has been widely applied in many areas, e.g., fraud detection in finance and robot accounts in social networks. Existing methods are dedicated to identifying the outlier nodes that deviate from normal ones. While they heavily rely on high-quality annotation, which is hard to obtain in real-world scenarios, this could lead to severely degraded performance based on noisy labels. Thus, we are motivated to cut the edges of suspicious nodes to alleviate the impact of noise. However, it remains difficult to precisely identify the nodes with noisy labels. Moreover, it is hard to quantitatively evaluate the regret of cutting the edges, which may have either positive or negative influences. To this end, we propose a novel framework REGAD, i.e., REinforced Graph Anomaly Detector. Specifically, we aim to maximize the performance improvement (AUC) of a base detector by cutting noisy edges approximated through the nodes with high-confidence labels. (i) We design a tailored action and search space to train a policy network to carefully prune edges step by step, where only a few suspicious edges are prioritized in each step. (ii) We design a policy-in-the-loop mechanism to iteratively optimize the policy based on the feedback from base detector. The overall performance is evaluated by the cumulative rewards. Extensive experiments are conducted on three datasets under different anomaly ratios. The results indicate the superior performance of our proposed REGAD. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05813 [pdf, other]

DarkSide-20k sensitivity to light dark matter particles

Authors: DarkSide-20k Collaboration, :, F. Acerbi, P. Adhikari, P. Agnes, I. Ahmad, S. Albergo, I. F. M. Albuquerque, T. Alexander, A. K. Alton, P. Amaudruz, M. Angiolilli, E. Aprile, R. Ardito, M. Atzori Corona, D. J. Auty, M. Ave, I. C. Avetisov, O. Azzolini, H. O. Back, Z. Balmforth, A. Barrado Olmedo, P. Barrillon, G. Batignani, P. Bhowmick , et al. (289 additional authors not shown)

Abstract: The dual-phase liquid argon time projection chamber is presently one of the leading technologies to search for dark matter particles with masses below 10 GeV/c$^2$. This was demonstrated by the DarkSide-50 experiment with approximately 50 kg of low-radioactivity liquid argon as target material. The next generation experiment DarkSide-20k, currently under construction, will use 1,000 times more arg… ▽ More The dual-phase liquid argon time projection chamber is presently one of the leading technologies to search for dark matter particles with masses below 10 GeV/c$^2$. This was demonstrated by the DarkSide-50 experiment with approximately 50 kg of low-radioactivity liquid argon as target material. The next generation experiment DarkSide-20k, currently under construction, will use 1,000 times more argon and is expected to start operation in 2027. Based on the DarkSide-50 experience, here we assess the DarkSide-20k sensitivity to models predicting light dark matter particles, including Weakly Interacting Massive Particles (WIMPs) and sub-GeV/c$^2$ particles interacting with electrons in argon atoms. With one year of data, a sensitivity improvement to dark matter interaction cross-sections by at least one order of magnitude with respect to DarkSide-50 is expected for all these models. A sensitivity to WIMP--nucleon interaction cross-sections below $1\times10^{-42}$ cm$^2$ is achievable for WIMP masses above 800 MeV/c$^2$. With 10 years exposure, the neutrino fog can be reached for WIMP masses around 5 GeV/c$^2$. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: submitted to Nature Communications

arXiv:2407.05718 [pdf, other]

A Factuality and Diversity Reconciled Decoding Method for Knowledge-Grounded Dialogue Generation

Authors: Chenxu Yang, Zheng Lin, Chong Tian, Liang Pang, Lanrui Wang, Zhengyang Tong, Qirong Ho, Yanan Cao, Weiping Wang

Abstract: Grounding external knowledge can enhance the factuality of responses in dialogue generation. However, excessive emphasis on it might result in the lack of engaging and diverse expressions. Through the introduction of randomness in sampling, current approaches can increase the diversity. Nevertheless, such sampling method could undermine the factuality in dialogue generation. In this study, to disc… ▽ More Grounding external knowledge can enhance the factuality of responses in dialogue generation. However, excessive emphasis on it might result in the lack of engaging and diverse expressions. Through the introduction of randomness in sampling, current approaches can increase the diversity. Nevertheless, such sampling method could undermine the factuality in dialogue generation. In this study, to discover a solution for advancing creativity without relying on questionable randomness and to subtly reconcile the factuality and diversity within the source-grounded paradigm, a novel method named DoGe is proposed. DoGe can dynamically alternate between the utilization of internal parameter knowledge and external source knowledge based on the model's factual confidence. Extensive experiments on three widely-used datasets show that DoGe can not only enhance response diversity but also maintain factuality, and it significantly surpasses other various decoding strategy baselines. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05536 [pdf, other]

Effective Many-body Interactions in Reduced-Dimensionality Spaces Through Neural Network Models

Authors: Senwei Liang, Karol Kowalski, Chao Yang, Nicholas P. Bauman

Abstract: Accurately describing properties of challenging problems in physical sciences often requires complex mathematical models that are unmanageable to tackle head-on. Therefore, developing reduced dimensionality representations that encapsulate complex correlation effects in many-body systems is crucial to advance the understanding of these complicated problems. However, a numerical evaluation of these… ▽ More Accurately describing properties of challenging problems in physical sciences often requires complex mathematical models that are unmanageable to tackle head-on. Therefore, developing reduced dimensionality representations that encapsulate complex correlation effects in many-body systems is crucial to advance the understanding of these complicated problems. However, a numerical evaluation of these predictive models can still be associated with a significant computational overhead. To address this challenge, in this paper, we discuss a combined framework that integrates recent advances in the development of active-space representations of coupled cluster (CC) downfolded Hamiltonians with neural network approaches. The primary objective of this effort is to train neural networks to eliminate the computationally expensive steps required for evaluating hundreds or thousands of Hugenholtz diagrams, which correspond to multidimensional tensor contractions necessary for evaluating a many-body form of downfolded/effective Hamiltonians. Using small molecular systems (the H2O and HF molecules) as examples, we demonstrate that training neural networks employing effective Hamiltonians for a few nuclear geometries of molecules can accurately interpolate/ extrapolate their forms to other geometrical configurations characterized by different intensities of correlation effects. We also discuss differences between effective interactions that define CC downfolded Hamiltonians with those of bare Hamiltonians defined by Coulomb interactions in the active spaces. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.05361 [pdf, other]

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

Authors: Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, Zhizheng Wu

Abstract: Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneous speech data. This paper presents \textit{Emilia}, the first multilingual speech generation dataset from in-the-wild speech data, and Emilia-Pipe, th… ▽ More Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneous speech data. This paper presents \textit{Emilia}, the first multilingual speech generation dataset from in-the-wild speech data, and Emilia-Pipe, the first open-source preprocessing pipeline designed to transform in-the-wild speech data into high-quality training data with annotations for speech generation. Emilia starts with over 101k hours of speech in six languages and features diverse speech with varied speaking styles. To facilitate the scale-up of Emilia, the open-source pipeline Emilia-Pipe can process one hour of raw speech data ready for model training in a few mins, which enables the research community to collaborate on large-scale speech generation research. Experimental results validate the effectiveness of Emilia. Demos are available at: https://emilia-dataset.github.io/Emilia-Demo-Page/. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.05216 [pdf, other]

Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course

Authors: Cheng-Han Chiang, Wei-Chih Chen, Chun-Yi Kuan, Chienchou Yang, Hung-yi Lee

Abstract: Using large language models (LLMs) for automatic evaluation has become an important evaluation method in NLP research. However, it is unclear whether these LLM-based evaluators can be applied in real-world classrooms to assess student assignments. This empirical report shares how we use GPT-4 as an automatic assignment evaluator in a university course with 1,028 students. Based on student response… ▽ More Using large language models (LLMs) for automatic evaluation has become an important evaluation method in NLP research. However, it is unclear whether these LLM-based evaluators can be applied in real-world classrooms to assess student assignments. This empirical report shares how we use GPT-4 as an automatic assignment evaluator in a university course with 1,028 students. Based on student responses, we find that LLM-based assignment evaluators are generally acceptable to students when students have free access to these LLM-based evaluators. However, students also noted that the LLM sometimes fails to adhere to the evaluation instructions. Additionally, we observe that students can easily manipulate the LLM-based evaluator to output specific strings, allowing them to achieve high scores without meeting the assignment rubric. Based on student feedback and our experience, we provide several recommendations for integrating LLM-based evaluators into future classrooms. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: An empirical report of our course: Introduction to Generative AI 2024 Spring (https://speech.ee.ntu.edu.tw/~hylee/genai/2024-spring.php)

arXiv:2407.05149 [pdf]

Quantized Acoustic Phonons Map the Dynamics of a Single Virus

Authors: Yaqing Zhang, Rihan Wu, Md Shahjahan, Canchai Yang, Dohun Pyeon, Elad Harel

Abstract: The natural vibrational frequencies of biological particles such as viruses and bacteria encode critical information about their mechanical and biological states as they interact with their local environment and undergo structural evolution. However, detecting and tracking these vibrations within a biological context at the single particle level has remained elusive. In this study, we track the vi… ▽ More The natural vibrational frequencies of biological particles such as viruses and bacteria encode critical information about their mechanical and biological states as they interact with their local environment and undergo structural evolution. However, detecting and tracking these vibrations within a biological context at the single particle level has remained elusive. In this study, we track the vibrational motions of single, unlabeled virus particles under ambient conditions using ultrafast spectroscopy. The ultrasonic spectrum of an 80-100 nm lentiviral pseudovirus reveals vibrational modes in the 19-22 GHz range sensitive to virus morphology and 2-10 GHz modes with nanosecond dephasing times reflecting viral envelope protein interactions. By tracking virus trajectories over minutes, we observe acoustic mode coupling mediated by the local environment. Single particle tracking allows capture of viral disassembly through correlated mode softening and dephasing. The sensitivity, high resolution, and speed of this approach promise deeper insights into biological dynamics and early-stage diagnostics at the single microorganism level. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: Main Manuscript: 19 pages, 4 figures Supplementary Information: 29 pages, 17 figures

arXiv:2407.04738 [pdf]

A Contrastive Learning Based Convolutional Neural Network for ERP Brain-Computer Interfaces

Authors: Yuntian Cui, Xinke Shen, Dan Zhang, Chen Yang

Abstract: ERP-based EEG detection is gaining increasing attention in the field of brain-computer interfaces. However, due to the complexity of ERP signal components, their low signal-to-noise ratio, and significant inter-subject variability, cross-subject ERP signal detection has been challenging. The continuous advancement in deep learning has greatly contributed to addressing this issue. This brief propos… ▽ More ERP-based EEG detection is gaining increasing attention in the field of brain-computer interfaces. However, due to the complexity of ERP signal components, their low signal-to-noise ratio, and significant inter-subject variability, cross-subject ERP signal detection has been challenging. The continuous advancement in deep learning has greatly contributed to addressing this issue. This brief proposes a contrastive learning training framework and an Inception module to extract multi-scale temporal and spatial features, representing the subject-invariant components of ERP signals. Specifically, a base encoder integrated with a linear Inception module and a nonlinear projector is used to project the raw data into latent space. By maximizing signal similarity under different targets, the inter-subject EEG signal differences in latent space are minimized. The extracted spatiotemporal features are then used for ERP target detection. The proposed algorithm achieved the best AUC performance in single-trial binary classification tasks on the P300 dataset and showed significant optimization in speller decoding tasks compared to existing algorithms. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 5 pages, 2 figures, 2 tables

arXiv:2407.03053 [pdf]

Visible, Near-, and Mid-infrared Computational Spectrometer Enabled by Single-Spinning Film Encoder

Authors: Junren Wen, Weiming Shi, Cheng Gao, Yujie Liu, Shuaibo Feng, Yu Shao, Haiqi Gao, Yuchuan Shao, Yueguang Zhang, Weidong Shen, Chenying Yang

Abstract: Computational spectrometers are pivotal in enabling low-cost, in-situ and rapid spectral analysis, with potential applications in chemistry, biology, and environmental science. However, filter-based spectral encoding approaches typically use filter arrays, complicating the manufacturing process and hindering device consistency. By capitalizing on the polarization separation effect under oblique in… ▽ More Computational spectrometers are pivotal in enabling low-cost, in-situ and rapid spectral analysis, with potential applications in chemistry, biology, and environmental science. However, filter-based spectral encoding approaches typically use filter arrays, complicating the manufacturing process and hindering device consistency. By capitalizing on the polarization separation effect under oblique incidence (PSEOI), we pioneer the use of a single filter for highly efficient spectral encoding, and propose a novel computational spectrometer spanning visible to mid-infrared wavelengths by combining the Single-Spinning Film Encoder (SSFE) with deep learning-based reconstruction algorithm. The particle swarm optimization (PSO) method is employed to optimize the film configuration of SSFE, achieving low-correlation and high-complexity spectral responses under different polarizations and spinning angles, thereby enhancing both spectral resolution and accuracy of reconstruction across diverse spectral ranges. Spectral resolutions up to 0.5 nm, 2 nm, 10 nm can be realized for single-peak narrowband spectra, and 3 nm, 6 nm, 20 nm for dual-peak narrowband spectra, over the visible, near-, and mid-infrared wavelength ranges, respectively. Moreover, the proposed spectrometer demonstrates an overall 81.38% precision for the classification of 220 chemical compounds, confirming its robustness and precision in practical scenarios, along with the capability for compact, cost-effective spectroscopic solutions. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02935 [pdf, other]

Properties of the QCD Matter -- An Experimental Review of Selected Results from RHIC BES Program

Authors: Jinhui Chen, Xin Dong, Xionghong He, Huanzhong Huang, Feng Liu, Xiaofeng Luo, Yu-Gang Ma, Lijuan Ruan, Ming Shao, Shusu Shi, Xu Sun, Aihong Tang, Zebo Tang, Fuqiang Wang, Hai Wang, Yi Wang, Zhigang Xiao, Guannan Xie, Nu Xu, Qinghua Xu, Zhangbu Xu, Chi Yang, Shuai Yang, Wangmei Zha, Yapeng Zhang , et al. (3 additional authors not shown)

Abstract: In the paper, we discuss the development of the multi-gap resistive plate chamber Time-of-Flight (TOF) technology and the production of the STAR TOF detector in China at the beginning of the 21st century. Then we review recent experimental results from the first beam energy scan program (BES-I) at the Relativistic Heavy Ion Collider (RHIC). Topics cover measurements of collectivity, chirality, cri… ▽ More In the paper, we discuss the development of the multi-gap resistive plate chamber Time-of-Flight (TOF) technology and the production of the STAR TOF detector in China at the beginning of the 21st century. Then we review recent experimental results from the first beam energy scan program (BES-I) at the Relativistic Heavy Ion Collider (RHIC). Topics cover measurements of collectivity, chirality, criticality, global polarization, strangeness, heavy-flavor, di-lepton and light nuclei productions. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 31 pages, 33 figures. This review is dedicated to Professor Wenqing Shen on the occasion to celebrate his leadership of the Chinese STAR Collaboration, the development and production of the STAR MRPC TOF detector in China and many physics analyses

arXiv:2407.02835 [pdf, other]

doi 10.1109/LSP.2023.3324581

A Pairwise DomMix Attentive Adversarial Network for Unsupervised Domain Adaptive Object Detection

Authors: Jie Shao, Jiacheng Wu, Wenzhong Shen, Cheng Yang

Abstract: Unsupervised Domain Adaptive Object Detection (DAOD) could adapt a model trained on a source domain to an unlabeled target domain for object detection. Existing unsupervised DAOD methods usually perform feature alignments from the target to the source. Unidirectional domain transfer would omit information about the target samples and result in suboptimal adaptation when there are large domain shif… ▽ More Unsupervised Domain Adaptive Object Detection (DAOD) could adapt a model trained on a source domain to an unlabeled target domain for object detection. Existing unsupervised DAOD methods usually perform feature alignments from the target to the source. Unidirectional domain transfer would omit information about the target samples and result in suboptimal adaptation when there are large domain shifts. Therefore, we propose a pairwise attentive adversarial network with a Domain Mixup (DomMix) module to mitigate the aforementioned challenges. Specifically, a deep-level mixup is employed to construct an intermediate domain that allows features from both domains to share their differences. Then a pairwise attentive adversarial network is applied with attentive encoding on both image-level and instance-level features at different scales and optimizes domain alignment by adversarial learning. This allows the network to focus on regions with disparate contextual information and learn their similarities between different domains. Extensive experiments are conducted on several benchmark datasets, demonstrating the superiority of our proposed method. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: has published on IEEE Signal Processing Letters, 2023

arXiv:2407.02511 [pdf, other]

LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning

Authors: Silin Meng, Yiwei Wang, Cheng-Fu Yang, Nanyun Peng, Kai-Wei Chang

Abstract: Path planning is a fundamental scientific problem in robotics and autonomous navigation, requiring the derivation of efficient routes from starting to destination points while avoiding obstacles. Traditional algorithms like A* and its variants are capable of ensuring path validity but suffer from significant computational and memory inefficiencies as the state space grows. Conversely, large langua… ▽ More Path planning is a fundamental scientific problem in robotics and autonomous navigation, requiring the derivation of efficient routes from starting to destination points while avoiding obstacles. Traditional algorithms like A* and its variants are capable of ensuring path validity but suffer from significant computational and memory inefficiencies as the state space grows. Conversely, large language models (LLMs) excel in broader environmental analysis through contextual understanding, providing global insights into environments. However, they fall short in detailed spatial and temporal reasoning, often leading to invalid or inefficient routes. In this work, we propose LLM-A*, an new LLM based route planning method that synergistically combines the precise pathfinding capabilities of A* with the global reasoning capability of LLMs. This hybrid approach aims to enhance pathfinding efficiency in terms of time and space complexity while maintaining the integrity of path validity, especially in large-scale scenarios. By integrating the strengths of both methodologies, LLM-A* addresses the computational and memory limitations of conventional algorithms without compromising on the validity required for effective pathfinding. △ Less

Submitted 19 June, 2024; originally announced July 2024.

Comments: Submitted to The 2024 Conference on Empirical Methods in Natural Language Processing

arXiv:2407.02235 [pdf]

Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation

Authors: Cheng-Yi Li, Kao-Jung Chang, Cheng-Fu Yang, Hsin-Yu Wu, Wenting Chen, Hritik Bansal, Ling Chen, Yi-Ping Yang, Yu-Chun Chen, Shih-Pin Chen, Jiing-Feng Lirng, Kai-Wei Chang, Shih-Hwa Chiou

Abstract: Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation. Nevertheless, the preliminary success in 2D radiology captioning is incompetent to reflect the real-world diagnostic challenge in the volumetric 3D anatomy. To mitigate three crucial limitation aspects in the existing literature, includin… ▽ More Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation. Nevertheless, the preliminary success in 2D radiology captioning is incompetent to reflect the real-world diagnostic challenge in the volumetric 3D anatomy. To mitigate three crucial limitation aspects in the existing literature, including (1) data complexity, (2) model capacity, and (3) evaluation metric fidelity, we collected an 18,885 text-scan pairs 3D-BrainCT dataset and applied clinical visual instruction tuning (CVIT) to train BrainGPT models to generate radiology-adherent 3D brain CT reports. Statistically, our BrainGPT scored BLEU-1 = 44.35, BLEU-4 = 20.38, METEOR = 30.13, ROUGE-L = 47.6, and CIDEr-R = 211.77 during internal testing and demonstrated an accuracy of 0.91 in captioning midline shifts on the external validation CQ500 dataset. By further inspecting the captioned report, we reported that the traditional metrics appeared to measure only the surface text similarity and failed to gauge the information density of the diagnostic purpose. To close this gap, we proposed a novel Feature-Oriented Radiology Task Evaluation (FORTE) to estimate the report's clinical relevance (lesion feature and landmarks). Notably, the BrainGPT model scored an average FORTE F1-score of 0.71 (degree=0.661; landmark=0.706; feature=0.693; impression=0.779). To demonstrate that BrainGPT models possess objective readiness to generate human-like radiology reports, we conducted a Turing test that enrolled 11 physician evaluators, and around 74% of the BrainGPT-generated captions were indistinguishable from those written by humans. Our work embodies a holistic framework that showcased the first-hand experience of curating a 3D brain CT dataset, fine-tuning anatomy-sensible language models, and proposing robust radiology evaluation metrics. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 6 figures, 5 supplementary figures, 8 supplementary tables

arXiv:2407.02047 [pdf, other]

CountFormer: Multi-View Crowd Counting Transformer

Authors: Hong Mo, Xiong Zhang, Jianchao Tan, Cheng Yang, Qiong Gu, Bo Hang, Wenqi Ren

Abstract: Multi-view counting (MVC) methods have shown their superiority over single-view counterparts, particularly in situations characterized by heavy occlusion and severe perspective distortions. However, hand-crafted heuristic features and identical camera layout requirements in conventional MVC methods limit their applicability and scalability in real-world scenarios.In this work, we propose a concise… ▽ More Multi-view counting (MVC) methods have shown their superiority over single-view counterparts, particularly in situations characterized by heavy occlusion and severe perspective distortions. However, hand-crafted heuristic features and identical camera layout requirements in conventional MVC methods limit their applicability and scalability in real-world scenarios.In this work, we propose a concise 3D MVC framework called \textbf{CountFormer}to elevate multi-view image-level features to a scene-level volume representation and estimate the 3D density map based on the volume features. By incorporating a camera encoding strategy, CountFormer successfully embeds camera parameters into the volume query and image-level features, enabling it to handle various camera layouts with significant differences.Furthermore, we introduce a feature lifting module capitalized on the attention mechanism to transform image-level features into a 3D volume representation for each camera view. Subsequently, the multi-view volume aggregation module attentively aggregates various multi-view volumes to create a comprehensive scene-level volume representation, allowing CountFormer to handle images captured by arbitrary dynamic camera layouts. The proposed method performs favorably against the state-of-the-art approaches across various widely used datasets, demonstrating its greater suitability for real-world deployment compared to conventional MVC frameworks. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted By ECCV2024

arXiv:2407.01885 [pdf, other]

Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application

Authors: Chuanpeng Yang, Wang Lu, Yao Zhu, Yidong Wang, Qian Chen, Chenlong Gao, Bingjie Yan, Yiqiang Chen

Abstract: Large Language Models (LLMs) have showcased exceptional capabilities in various domains, attracting significant interest from both academia and industry. Despite their impressive performance, the substantial size and computational demands of LLMs pose considerable challenges for practical deployment, particularly in environments with limited resources. The endeavor to compress language models whil… ▽ More Large Language Models (LLMs) have showcased exceptional capabilities in various domains, attracting significant interest from both academia and industry. Despite their impressive performance, the substantial size and computational demands of LLMs pose considerable challenges for practical deployment, particularly in environments with limited resources. The endeavor to compress language models while maintaining their accuracy has become a focal point of research. Among the various methods, knowledge distillation has emerged as an effective technique to enhance inference speed without greatly compromising performance. This paper presents a thorough survey from three aspects: method, evaluation, and application, exploring knowledge distillation techniques tailored specifically for LLMs. Specifically, we divide the methods into white-box KD and black-box KD to better illustrate their differences. Furthermore, we also explored the evaluation tasks and distillation effects between different distillation methods, and proposed directions for future research. Through in-depth understanding of the latest advancements and practical applications, this survey provides valuable resources for researchers, paving the way for sustained progress in this field. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 28 pages

arXiv:2407.00903 [pdf, other]

Observation of topological transitions associated with a Weyl exceptional ring

Authors: Hao-Long Zhang, Pei-Rong Han, Xue-Jia Yu, Shou-Bang Yang, Jia-Hao Lü, Wen Ning, Fan Wu, Qi-Ping Su, Chui-Ping Yang, Zhen-Biao Yang, Shi-Biao Zheng

Abstract: The environment-induced dissipation of an open system, once thought as a nuisance, can actually lead to emergence of many intriguing phenomena that are absent in an isolated system. Among these, Weyl exceptional rings (WER), extended from point-like singularities, are particularly interesting. Theoretically, a WER was predicted to carry a topological charge with a nonzero Chern number, but it has… ▽ More The environment-induced dissipation of an open system, once thought as a nuisance, can actually lead to emergence of many intriguing phenomena that are absent in an isolated system. Among these, Weyl exceptional rings (WER), extended from point-like singularities, are particularly interesting. Theoretically, a WER was predicted to carry a topological charge with a nonzero Chern number, but it has not been measured so far. We here investigate this topology in a circuit, where the WER is synthesized with a superconducting qubit controllably coupled to a decaying resonator. The high flexibility of the system enables us to characterize its eigenvectors on different manifolds of parameter space. We extract both the quantized Berry phase and Chern number from these eigenvectors. Furthermore, we demonstrate a topological transition triggered by shrinking the size of the manifold$-$a unique feature of the WER. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 16 pages, 10 figures

arXiv:2407.00427 [pdf, ps, other]

On the boundedness of degenerate hypergraphs

Authors: Jianfeng Hou, Caiyun Hu, Heng Li, Xizhi Liu, Caihong Yang, Yixiao Zhang

Abstract: We investigate the impact of a high-degree vertex in Turán problems for degenerate hypergraphs (including graphs). We say an $r$-graph $F$ is bounded if there exist constants $α, β>0$ such that for large $n$, every $n$-vertex $F$-free $r$-graph with a vertex of degree at least $α\binom{n-1}{r-1}$ has fewer than $(1-β) \cdot \mathrm{ex}(n,F)$ edges. The boundedness property is crucial for recent wo… ▽ More We investigate the impact of a high-degree vertex in Turán problems for degenerate hypergraphs (including graphs). We say an $r$-graph $F$ is bounded if there exist constants $α, β>0$ such that for large $n$, every $n$-vertex $F$-free $r$-graph with a vertex of degree at least $α\binom{n-1}{r-1}$ has fewer than $(1-β) \cdot \mathrm{ex}(n,F)$ edges. The boundedness property is crucial for recent works~\cite{HHLLYZ23a,DHLY24} that aim to extend the classical Hajnal--Szemerédi Theorem and the anti-Ramsey theorems of Erdős--Simonovits--Sós. We show that many well-studied degenerate hypergraphs, such as all even cycles, most complete bipartite graphs, and the expansion of most complete bipartite graphs, are bounded. In addition, to prove the boundedness of the expansion of complete bipartite graphs, we introduce and solve a Zarankiewicz-type problem for $3$-graphs, strengthening a theorem by Kostochka--Mubayi--Verstraëte~\cite{KMV15}. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: comments are welcome

arXiv:2407.00365 [pdf, other]

Financial Knowledge Large Language Model

Authors: Cehao Yang, Chengjin Xu, Yiyan Qi

Abstract: Artificial intelligence is making significant strides in the finance industry, revolutionizing how data is processed and interpreted. Among these technologies, large language models (LLMs) have demonstrated substantial potential to transform financial services by automating complex tasks, enhancing customer service, and providing detailed financial analysis. Firstly, we introduce IDEA-FinBench, an… ▽ More Artificial intelligence is making significant strides in the finance industry, revolutionizing how data is processed and interpreted. Among these technologies, large language models (LLMs) have demonstrated substantial potential to transform financial services by automating complex tasks, enhancing customer service, and providing detailed financial analysis. Firstly, we introduce IDEA-FinBench, an evaluation benchmark specifically tailored for assessing financial knowledge in large language models (LLMs). This benchmark utilizes questions from two globally respected and authoritative financial professional exams, aimimg to comprehensively evaluate the capability of LLMs to directly address exam questions pertinent to the finance sector. Secondly, we propose IDEA-FinKER, a Financial Knowledge Enhancement framework designed to facilitate the rapid adaptation of general LLMs to the financial domain, introducing a retrieval-based few-shot learning method for real-time context-level knowledge injection, and a set of high-quality financial knowledge instructions for fine-tuning any general LLM. Finally, we present IDEA-FinQA, a financial question-answering system powered by LLMs. This system is structured around a scheme of real-time knowledge injection and factual enhancement using external knowledge. IDEA-FinQA is comprised of three main modules: the data collector, the data querying module, and LLM-based agents tasked with specific functions. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 66 pages

arXiv:2407.00072 [pdf, other]

Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation

Authors: Yu Bai, Yukai Miao, Li Chen, Dan Li, Yanyu Ren, Hongtao Xie, Ce Yang, Xuhui Cai

Abstract: In Greek mythology, Pistis symbolized good faith, trust, and reliability. Drawing inspiration from these principles, Pistis-RAG is a scalable multi-stage framework designed to address the challenges of large-scale retrieval-augmented generation (RAG) systems. This framework consists of distinct stages: matching, pre-ranking, ranking, reasoning, and aggregating. Each stage contributes to narrowing… ▽ More In Greek mythology, Pistis symbolized good faith, trust, and reliability. Drawing inspiration from these principles, Pistis-RAG is a scalable multi-stage framework designed to address the challenges of large-scale retrieval-augmented generation (RAG) systems. This framework consists of distinct stages: matching, pre-ranking, ranking, reasoning, and aggregating. Each stage contributes to narrowing the search space, prioritizing semantically relevant documents, aligning with the large language model's (LLM) preferences, supporting complex chain-of-thought (CoT) methods, and combining information from multiple sources. Our ranking stage introduces a significant innovation by recognizing that semantic relevance alone may not lead to improved generation quality, due to the sensitivity of the few-shot prompt order, as noted in previous research. This critical aspect is often overlooked in current RAG frameworks. We argue that the alignment issue between LLMs and external knowledge ranking methods is tied to the model-centric paradigm dominant in RAG systems. We propose a content-centric approach, emphasizing seamless integration between LLMs and external information sources to optimize content transformation for specific tasks. Our novel ranking stage is designed specifically for RAG systems, incorporating principles of information retrieval while considering the unique business scenarios reflected in LLM preferences and user feedback. We simulated feedback signals on the MMLU benchmark, resulting in a 9.3% performance improvement. Our model and code will be open-sourced on GitHub. Additionally, experiments on real-world, large-scale data validate the scalability of our framework. △ Less

Submitted 11 July, 2024; v1 submitted 21 June, 2024; originally announced July 2024.

arXiv:2406.20015 [pdf, other]

ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models

Authors: Yuxiang Zhang, Jing Chen, Junjie Wang, Yaxin Liu, Cheng Yang, Chufan Shi, Xinyu Zhu, Zihao Lin, Hanwen Wan, Yujiu Yang, Tetsuya Sakai, Tian Feng, Hayato Yamana

Abstract: Tool-augmented large language models (LLMs) are rapidly being integrated into real-world applications. Due to the lack of benchmarks, the community still needs to fully understand the hallucination issues within these models. To address this challenge, we introduce a comprehensive diagnostic benchmark, ToolBH. Specifically, we assess the LLM's hallucinations through two perspectives: depth and bre… ▽ More Tool-augmented large language models (LLMs) are rapidly being integrated into real-world applications. Due to the lack of benchmarks, the community still needs to fully understand the hallucination issues within these models. To address this challenge, we introduce a comprehensive diagnostic benchmark, ToolBH. Specifically, we assess the LLM's hallucinations through two perspectives: depth and breadth. In terms of depth, we propose a multi-level diagnostic process, including (1) solvability detection, (2) solution planning, and (3) missing-tool analysis. For breadth, we consider three scenarios based on the characteristics of the toolset: missing necessary tools, potential tools, and limited functionality tools. Furthermore, we developed seven tasks and collected 700 evaluation samples through multiple rounds of manual annotation. The results show the significant challenges presented by the ToolBH benchmark. The current advanced models Gemini-1.5-Pro and GPT-4o only achieve a total score of 45.3 and 37.0, respectively, on a scale of 100. In this benchmark, larger model parameters do not guarantee better performance; the training data and response strategies also play a crucial role in tool-enhanced LLM scenarios. Our diagnostic analysis indicates that the primary reason for model errors lies in assessing task solvability. Additionally, open-weight models suffer from performance drops with verbose replies, whereas proprietary models excel with longer reasoning. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19167 [pdf, other]

doi 10.1088/0256-307X/41/7/070302

Quantum voting machine encoded with microwave photons

Authors: Yu Zhang, Chuiping Yang, Qiping Su, Yihao Kang, Wen Zheng, Shaoxiong Li, Yang Yu

Abstract: We propose a simple quantum voting machine using microwave photon qubit encoding, based on a setup comprising multiple microwave cavities and a coupled superconducting flux qutrit. This approach primarily relies on a multi-control single-target quantum phase gate. The scheme offers operational simplicity, requiring only a single step, while ensuring verifiability through the measurement of a singl… ▽ More We propose a simple quantum voting machine using microwave photon qubit encoding, based on a setup comprising multiple microwave cavities and a coupled superconducting flux qutrit. This approach primarily relies on a multi-control single-target quantum phase gate. The scheme offers operational simplicity, requiring only a single step, while ensuring verifiability through the measurement of a single qubit phase information to obtain the voting results. And it provides voter anonymity, as the voting outcome is solely tied to the total number of affirmative votes. Our quantum voting machine also has scalability in terms of the number of voters. Additionally, the physical realization of the quantum voting machine is general and not limited to circuit QED. Quantum voting machine can be implemented as long as the multi-control single-phase quantum phase gate is realized in other physical systems. Numerical simulations indicate the feasibility of this quantum voting machine within the current quantum technology. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 14pages,4 figures. arXiv admin note: text overlap with arXiv:2306.02227

MSC Class: 81V99

Journal ref: Chin. Phys. Lett. 41 070302 (2024)

arXiv:2406.18181 [pdf, ps, other]

An Empirical Study of Unit Test Generation with Large Language Models

Authors: Lin Yang, Chen Yang, Shutao Gao, Weijing Wang, Bo Wang, Qihao Zhu, Xiao Chu, Jianyi Zhou, Guangtai Liang, Qianxiang Wang, Junjie Chen

Abstract: Unit testing is an essential activity in software development for verifying the correctness of software components. However, manually writing unit tests is challenging and time-consuming. The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation. Existing research primarily focuses on closed-source LLMs (e.g., ChatGPT and CodeX) with fixed prompting s… ▽ More Unit testing is an essential activity in software development for verifying the correctness of software components. However, manually writing unit tests is challenging and time-consuming. The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation. Existing research primarily focuses on closed-source LLMs (e.g., ChatGPT and CodeX) with fixed prompting strategies, leaving the capabilities of advanced open-source LLMs with various prompting settings unexplored. Particularly, open-source LLMs offer advantages in data privacy protection and have demonstrated superior performance in some tasks. Moreover, effective prompting is crucial for maximizing LLMs' capabilities. In this paper, we conduct the first empirical study to fill this gap, based on 17 Java projects, five widely-used open-source LLMs with different structures and parameter sizes, and comprehensive evaluation metrics. Our findings highlight the significant influence of various prompt factors, show the performance of open-source LLMs compared to the commercial GPT-4 and the traditional Evosuite, and identify limitations in LLM-based unit test generation. We then derive a series of implications from our study to guide future research and practical use of LLM-based unit test generation. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17934 [pdf, other]

Rapid protoplanet formation in vortices: three-dimensional local simulations with selfgravity

Authors: Wladimir Lyra, Chao-Chin Yang, Jacob B. Simon, Orkan M. Umurhan, Andrew N. Youdin

Abstract: Disk vortices, seen in numerical simulations of protoplanetary disks and found observationally in ALMA and VLA images of these objects, are promising sites for planet formation given their pebble trapping abilities. Previous works have shown strong concentration of pebbles in vortices, but gravitational collapse has only been shown in low-resolution, two-dimensional, global models. In this letter,… ▽ More Disk vortices, seen in numerical simulations of protoplanetary disks and found observationally in ALMA and VLA images of these objects, are promising sites for planet formation given their pebble trapping abilities. Previous works have shown strong concentration of pebbles in vortices, but gravitational collapse has only been shown in low-resolution, two-dimensional, global models. In this letter, we aim to study the pebble concentration and gravitational collapse of pebble clouds in vortices via high-resolution, three-dimensional, local models. We performed simulations of the dynamics of gas and solids in a local shearing box where the gas is subject to convective overstability, generating a persistent giant vortex. We find that the vortex produces objects of Moon and Mars mass, with mass function of power law $d\ln N/d\ln M=-1.6\pm 0.3$. The protoplanets grow rapidly, doubling in mass in about 5 orbits, following pebble accretion rates. The mass range and mass doubling rate are in broad agreement with previous low resolution global models. We conclude that Mars-mass planetary embryos are the natural outcome of planet formation inside the disk vortices seen in millimeter and radio images of protoplanetary disks. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 9 pages, 5 figures, accepted for publication in ApJ letters

arXiv:2406.17720 [pdf, other]

Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity

Authors: Chih-Hsuan Yang, Benjamin Feuer, Zaki Jubery, Zi K. Deng, Andre Nakkab, Md Zahid Hasan, Shivani Chiranjeevi, Kelly Marshall, Nirmal Baishnab, Asheesh K Singh, Arti Singh, Soumik Sarkar, Nirav Merchant, Chinmay Hegde, Baskar Ganapathysubramanian

Abstract: We introduce Arboretum, the largest publicly accessible dataset designed to advance AI for biodiversity applications. This dataset, curated from the iNaturalist community science platform and vetted by domain experts to ensure accuracy, includes 134.6 million images, surpassing existing datasets in scale by an order of magnitude. The dataset encompasses image-language paired data for a diverse set… ▽ More We introduce Arboretum, the largest publicly accessible dataset designed to advance AI for biodiversity applications. This dataset, curated from the iNaturalist community science platform and vetted by domain experts to ensure accuracy, includes 134.6 million images, surpassing existing datasets in scale by an order of magnitude. The dataset encompasses image-language paired data for a diverse set of species from birds (Aves), spiders/ticks/mites (Arachnida), insects (Insecta), plants (Plantae), fungus/mushrooms (Fungi), snails (Mollusca), and snakes/lizards (Reptilia), making it a valuable resource for multimodal vision-language AI models for biodiversity assessment and agriculture research. Each image is annotated with scientific names, taxonomic details, and common names, enhancing the robustness of AI model training. We showcase the value of Arboretum by releasing a suite of CLIP models trained using a subset of 40 million captioned images. We introduce several new benchmarks for rigorous assessment, report accuracy for zero-shot learning, and evaluations across life stages, rare species, confounding species, and various levels of the taxonomic hierarchy. We anticipate that Arboretum will spur the development of AI models that can enable a variety of digital tools ranging from pest control strategies, crop monitoring, and worldwide biodiversity assessment and environmental conservation. These advancements are critical for ensuring food security, preserving ecosystems, and mitigating the impacts of climate change. Arboretum is publicly available, easily accessible, and ready for immediate use. Please see the \href{https://baskargroup.github.io/Arboretum/}{project website} for links to our data, models, and code. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Preprint under review

arXiv:2406.17488 [pdf, other]

Environmental Variation or Instrumental Drift? A Probabilistic Approach to Gas Sensor Drift Modeling and Evaluation

Authors: Cheng Yang, Gustav Bohlin, Tobias Oechtering

Abstract: Drift is a significant issue that undermines the reliability of gas sensors. This paper introduces a probabilistic model to distinguish between environmental variation and instrumental drift, using low-cost non-dispersive infrared (NDIR) CO2 sensors as a case study. Data from a long-term field experiment is analyzed to evaluate both sensor performance and environmental changes over time. Our appro… ▽ More Drift is a significant issue that undermines the reliability of gas sensors. This paper introduces a probabilistic model to distinguish between environmental variation and instrumental drift, using low-cost non-dispersive infrared (NDIR) CO2 sensors as a case study. Data from a long-term field experiment is analyzed to evaluate both sensor performance and environmental changes over time. Our approach employs importance sampling to isolate instrumental drift from environmental variation, providing a more accurate assessment of sensor performance. The results show that failing to account for environmental variation can significantly affect the evaluation of sensor drift, leading to improper calibration processes. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: This conference paper has been submitted to IEEE SENSORS 2024

arXiv:2406.16744 [pdf]

Lone Pair Induced 1D Character and Weak Cation-anion Interactions: Two Ingredients for Low Thermal Conductivity in Mixed-anion Metal Chalcohalides

Authors: Xingchen Shen, Koushik Pal, Paribesh Acharyya, Bernard Raveau, Philippe Boullay, Carmelo Prestipino, Susumu Fujii, Chun-Chuen Yang, I-Yu Tsao, Adele Renaud, Pierric Lemoine, Christophe Candolfi, Emmanuel Guilmeau

Abstract: Mixed-anion compounds, which incorporate multiple types of anions into materials, displays tailored crystal structures and physical/chemical properties, garnering immense interests in various applications such as batteries, catalysis, photovoltaics, and thermoelectrics. However, detailed studies regarding correlations between crystal structure, chemical bonding, and thermal/vibrational properties… ▽ More Mixed-anion compounds, which incorporate multiple types of anions into materials, displays tailored crystal structures and physical/chemical properties, garnering immense interests in various applications such as batteries, catalysis, photovoltaics, and thermoelectrics. However, detailed studies regarding correlations between crystal structure, chemical bonding, and thermal/vibrational properties are rare for these compounds, which limits the exploration of mixed-anion compounds for associated thermal applications. In this work, we investigate the lattice dynamics and thermal transport properties of the metal chalcohalides, CuBiSCl2. A high-purity polycrystalline CuBiSCl2 sample, successfully synthesized via modified solid-state synthetic method, exhibits a low lattice thermal conductivity of 0.9-0.6 W m-1 K-1 from 300 to 573 K. By combining various experimental techniques including 3D electron diffraction with theoretical calculations, we elucidate the origin of low lattice thermal conductivity in CuBiSCl2. The stereo-chemical activity of the 6s2 lone pair of Bi3+ favors an asymmetric environment with neighboring anions involving both short and long bond lengths. This particularity often implies weak bonding, low structure dimensionality, and strong anharmonicity, leading to low lattice thermal conductivity. In addition, the strong two-fold linear S-Cu-S coordination with weak Cu -- Cl interactions induces large anisotropic vibration of Cu or structural disorder, which enables strong phonon-phonon scattering and decreases lattice thermal conductivity. The investigations into lattice dynamics and thermal transport properties of CuBiSCl2 broadens the scope of the existing mixed-anion compounds suitable for the associated thermal applications, offering a new avenue for the search of low thermal conductivity materials in low-cost mixed-anion compounds. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16715 [pdf, other]

GC-Bench: A Benchmark Framework for Graph Condensation with New Insights

Authors: Shengbo Gong, Juntong Ni, Noveen Sachdeva, Carl Yang, Wei Jin

Abstract: Graph condensation (GC) is an emerging technique designed to learn a significantly smaller graph that retains the essential information of the original graph. This condensed graph has shown promise in accelerating graph neural networks while preserving performance comparable to those achieved with the original, larger graphs. Additionally, this technique facilitates downstream applications such as… ▽ More Graph condensation (GC) is an emerging technique designed to learn a significantly smaller graph that retains the essential information of the original graph. This condensed graph has shown promise in accelerating graph neural networks while preserving performance comparable to those achieved with the original, larger graphs. Additionally, this technique facilitates downstream applications such as neural architecture search and enhances our understanding of redundancy in large graphs. Despite the rapid development of GC methods, a systematic evaluation framework remains absent, which is necessary to clarify the critical designs for particular evaluative aspects. Furthermore, several meaningful questions have not been investigated, such as whether GC inherently preserves certain graph properties and offers robustness even without targeted design efforts. In this paper, we introduce GC-Bench, a comprehensive framework to evaluate recent GC methods across multiple dimensions and to generate new insights. Our experimental findings provide a deeper insights into the GC process and the characteristics of condensed graphs, guiding future efforts in enhancing performance and exploring new applications. Our code is available at \url{https://github.com/Emory-Melody/GraphSlim/tree/main/benchmark}. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 9 pages

arXiv:2406.16529 [pdf, other]

Towards Better Graph-based Cross-document Relation Extraction via Non-bridge Entity Enhancement and Prediction Debiasing

Authors: Hao Yue, Shaopeng Lai, Chengyi Yang, Liang Zhang, Junfeng Yao, Jinsong Su

Abstract: Cross-document Relation Extraction aims to predict the relation between target entities located in different documents. In this regard, the dominant models commonly retain useful information for relation prediction via bridge entities, which allows the model to elaborately capture the intrinsic interdependence between target entities. However, these studies ignore the non-bridge entities, each of… ▽ More Cross-document Relation Extraction aims to predict the relation between target entities located in different documents. In this regard, the dominant models commonly retain useful information for relation prediction via bridge entities, which allows the model to elaborately capture the intrinsic interdependence between target entities. However, these studies ignore the non-bridge entities, each of which co-occurs with only one target entity and offers the semantic association between target entities for relation prediction. Besides, the commonly-used dataset--CodRED contains substantial NA instances, leading to the prediction bias during inference. To address these issues, in this paper, we propose a novel graph-based cross-document RE model with non-bridge entity enhancement and prediction debiasing. Specifically, we use a unified entity graph to integrate numerous non-bridge entities with target entities and bridge entities, modeling various associations between them, and then use a graph recurrent network to encode this graph. Finally, we introduce a novel debiasing strategy to calibrate the original prediction distribution. Experimental results on the closed and open settings show that our model significantly outperforms all baselines, including the GPT-3.5-turbo and InstructUIE, achieving state-of-the-art performance. Particularly, our model obtains 66.23% and 55.87% AUC points in the official leaderboard\footnote{\url{https://codalab.lisn.upsaclay.fr/competitions/3770#results}} under the two settings, respectively, ranking the first place in all submissions since December 2023. Our code is available at https://github.com/DeepLearnXMU/CoRE-NEPD. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024 Findings

arXiv:2406.16303 [pdf, other]

Hybrid Precoding With Low-Resolution PSs for Wideband Terahertz Communication Systems in The Face of Beam Squint

Authors: Yang Wang, Chuang Yang, Mugen Peng

Abstract: Terahertz (THz) communication is considered one of the most critical technologies for 6G because of its abundant bandwidth. To compensate the high propagation of THz, analog/digital hybrid precoding for THz massive multiple input multiple output (MIMO) is proposed to focus signals and extend communication range. Notably, considering hardware cost and power consumption, infinite and high-resolution… ▽ More Terahertz (THz) communication is considered one of the most critical technologies for 6G because of its abundant bandwidth. To compensate the high propagation of THz, analog/digital hybrid precoding for THz massive multiple input multiple output (MIMO) is proposed to focus signals and extend communication range. Notably, considering hardware cost and power consumption, infinite and high-resolution phase shifters (PSs) are difficult to implement in THz massive MIMO and low-resolution PSs are typically adopted in practice. However, low-resolution PSs cause severe performance degradation. Moreover, the beam squint in wideband THz massive MIMO increases the performance degradation because of the frequency independence of the analog PSs. Motivated by the above factors, in this paper, we firstly propose a heuristic algorithm under fully connected (FC) structure, which optimize the digital precoder and the analog precoder alternately. Then we migrate the proposed heuristic algorithm to the partially-connected (PC) architecture. To further improve the performance, we extend our design to dynamic subarrays in which each RF chain is connected to any antenna that does not duplicate. The numerical results demonstrate that our proposed wideband hybrid precoding with low-resolution PSs achieves better performance to the comparisons for both FC structure and PC structure. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.15784 [pdf]

Data Issues in Industrial AI System: A Meta-Review and Research Strategy

Authors: Xuejiao Li, Cheng Yang, Charles Møller, Jay Lee

Abstract: In the era of Industry 4.0, artificial intelligence (AI) is assuming an increasingly pivotal role within industrial systems. Despite the recent trend within various industries to adopt AI, the actual adoption of AI is not as developed as perceived. A significant factor contributing to this lag is the data issues in AI implementation. How to address these data issues stands as a significant concern… ▽ More In the era of Industry 4.0, artificial intelligence (AI) is assuming an increasingly pivotal role within industrial systems. Despite the recent trend within various industries to adopt AI, the actual adoption of AI is not as developed as perceived. A significant factor contributing to this lag is the data issues in AI implementation. How to address these data issues stands as a significant concern confronting both industry and academia. To address data issues, the first step involves mapping out these issues. Therefore, this study conducts a meta-review to explore data issues and methods within the implementation of industrial AI. Seventy-two data issues are identified and categorized into various stages of the data lifecycle, including data source and collection, data access and storage, data integration and interoperation, data pre-processing, data processing, data security and privacy, and AI technology adoption. Subsequently, the study analyzes the data requirements of various AI algorithms. Building on the aforementioned analyses, it proposes a data management framework, addressing how data issues can be systematically resolved at every stage of the data lifecycle. Finally, the study highlights future research directions. In doing so, this study enriches the existing body of knowledge and provides guidelines for professionals navigating the complex landscape of achieving data usability and usefulness in industrial AI. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.15486 [pdf, other]

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

Authors: Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Huanqi Cao, Xiao Chuanfu, Xingcheng Zhang, Dahua Lin, Chao Yang

Abstract: Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional pretraining or finetuning, and often sacrifice model accuracy. In this paper, we first provide both theoretical and empirical foundations for nea… ▽ More Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional pretraining or finetuning, and often sacrifice model accuracy. In this paper, we first provide both theoretical and empirical foundations for near-lossless sparse attention. We find dynamically capturing head-specific sparse patterns at runtime with low overhead is crucial. To address this, we propose SampleAttention, an adaptive structured and near-lossless sparse attention. Leveraging observed significant sparse patterns, SampleAttention attends to a fixed percentage of adjacent tokens to capture local window patterns, and employs a two-stage query-guided key-value filtering approach, which adaptively select a minimum set of key-values with low overhead, to capture column stripe patterns. Comprehensive evaluations show that SampleAttention can seamlessly replace vanilla attention in off-the-shelf LLMs with nearly no accuracy loss, and reduces TTFT by up to $2.42\times$ compared with FlashAttention. △ Less

Submitted 28 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.14928 [pdf, other]

Autonomous Agents for Collaborative Task under Information Asymmetry

Authors: Wei Liu, Chenxi Wang, Yifei Wang, Zihao Xie, Rennai Qiu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Chen Qian

Abstract: Large Language Model Multi-Agent Systems (LLM-MAS) have achieved great progress in solving complex tasks. It performs communication among agents within the system to collaboratively solve tasks, under the premise of shared information. However, when agents' communication is leveraged to enhance human cooperation, a new challenge arises due to information asymmetry, since each agent can only access… ▽ More Large Language Model Multi-Agent Systems (LLM-MAS) have achieved great progress in solving complex tasks. It performs communication among agents within the system to collaboratively solve tasks, under the premise of shared information. However, when agents' communication is leveraged to enhance human cooperation, a new challenge arises due to information asymmetry, since each agent can only access the information of its human user. Previous MAS struggle to complete tasks under this condition. To address this, we propose a new MAS paradigm termed iAgents, which denotes Informative Multi-Agent Systems. In iAgents, the human social network is mirrored in the agent network, where agents proactively exchange human information necessary for task resolution, thereby overcoming information asymmetry. iAgents employs a novel agent reasoning mechanism, InfoNav, to navigate agents' communication towards effective information exchange. Together with InfoNav, iAgents organizes human information in a mixed memory to provide agents with accurate and comprehensive information for exchange. Additionally, we introduce InformativeBench, the first benchmark tailored for evaluating LLM agents' task-solving ability under information asymmetry. Experimental results show that iAgents can collaborate within a social network of 140 individuals and 588 relationships, autonomously communicate over 30 turns, and retrieve information from nearly 70,000 messages to complete tasks within 3 minutes. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 16 pages, 8 figures, 5 tables, Work in progress

arXiv:2406.14083 [pdf, ps, other]

Tight bounds for rainbow partial $F$-tiling in edge-colored complete hypergraphs

Authors: Jinghua Deng, Jianfeng Hou, Xizhi Liu, Caihong Yang

Abstract: For an $r$-graph $F$ and integers $n,t$ satisfying $t \le n/v(F)$, let $\mathrm{ar}(n,tF)$ denote the minimum integer $N$ such that every edge-coloring of $K_{n}^{r}$ using $N$ colors contains a rainbow copy of $tF$, where $tF$ is the $r$-graphs consisting of $t$ vertex-disjoint copies of $F$. The case $t=1$ is the classical anti-Ramsey problem proposed by Erdős--Simonovits--Sós~\cite{ESS75}. When… ▽ More For an $r$-graph $F$ and integers $n,t$ satisfying $t \le n/v(F)$, let $\mathrm{ar}(n,tF)$ denote the minimum integer $N$ such that every edge-coloring of $K_{n}^{r}$ using $N$ colors contains a rainbow copy of $tF$, where $tF$ is the $r$-graphs consisting of $t$ vertex-disjoint copies of $F$. The case $t=1$ is the classical anti-Ramsey problem proposed by Erdős--Simonovits--Sós~\cite{ESS75}. When $F$ is a single edge, this becomes the rainbow matching problem introduced by Schiermeyer~\cite{Sch04} and Özkahya--Young~\cite{OY13}. We conduct a systematic study of $\mathrm{ar}(n,tF)$ for the case where $t$ is much smaller than $\mathrm{ex}(n,F)/n^{r-1}$. Our first main result provides a reduction of $\mathrm{ar}(n,tF)$ to $\mathrm{ar}(n,2F)$ when $F$ is bounded and smooth, two properties satisfied by most previously studied hypergraphs. Complementing the first result, the second main result, which utilizes gaps between Turán numbers, determines $\mathrm{ar}(n,tF)$ for relatively smaller $t$. Together, these two results determine $\mathrm{ar}(n,tF)$ for a large class of hypergraphs. Additionally, the latter result has the advantage of being applicable to hypergraphs with unknown Turán densities, such as the famous tetrahedron $K_{4}^{3}$. △ Less

Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: 19 pages, 1 figues, comments are welcome

arXiv:2406.14036 [pdf, other]

Toward Infinite-Long Prefix in Transformer

Authors: Jiuxiang Gu, Yingyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang

Abstract: Prompting and contextual-based fine-tuning methods, which we call Prefix Learning, have been proposed to enhance the performance of language models on various downstream tasks that can match full parameter fine-tuning. There remains a limited theoretical understanding of how these methods work. In this paper, we aim to relieve this limitation by studying the learning ability of Prefix Learning fro… ▽ More Prompting and contextual-based fine-tuning methods, which we call Prefix Learning, have been proposed to enhance the performance of language models on various downstream tasks that can match full parameter fine-tuning. There remains a limited theoretical understanding of how these methods work. In this paper, we aim to relieve this limitation by studying the learning ability of Prefix Learning from the perspective of prefix length. In particular, we approximate the infinite-long Prefix Learning optimization process by the Neural Tangent Kernel (NTK) technique. We formulate and solve it as a learning problem of the infinite-long prefix in a one-layer attention network. Our results confirm the over-parameterization property and arbitrary small loss convergence guarantee of the infinite-long Prefix Learning in attention. To the implementation end, we propose our NTK-Attention method, which is "equivalent" to attention computation with arbitrary prefix length efficiently. Its time complexity mainly depends on the sub-quadratic of input length (without prefix), and our method only requires $d^2 + d$ extra parameters for representation, where $d$ is the feature dimension. In addition, we conducted experiments that compare our NTK-Attention with full parameters fine-tuning, LoRA, and P-Tuning V2 methods across vision or natural language datasets. The results indicate our approach may be a promising parameter-efficient-fine-tuning method since it has demonstrated superior performance in numerous scenarios. Our code can be found at \url{https://github.com/ChristianYang37/chiwun/tree/main/src/NTK-Attention}. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13988 [pdf, other]

LGmap: Local-to-Global Mapping Network for Online Long-Range Vectorized HD Map Construction

Authors: Kuang Wu, Sulei Nian, Can Shen, Chuan Yang, Zhanbin Li

Abstract: This report introduces the first-place winning solution for the Autonomous Grand Challenge 2024 - Mapless Driving. In this report, we introduce a novel online mapping pipeline LGmap, which adept at long-range temporal model. Firstly, we propose symmetric view transformation(SVT), a hybrid view transformation module. Our approach overcomes the limitations of forward sparse feature representation an… ▽ More This report introduces the first-place winning solution for the Autonomous Grand Challenge 2024 - Mapless Driving. In this report, we introduce a novel online mapping pipeline LGmap, which adept at long-range temporal model. Firstly, we propose symmetric view transformation(SVT), a hybrid view transformation module. Our approach overcomes the limitations of forward sparse feature representation and utilizing depth perception and SD prior information. Secondly, we propose hierarchical temporal fusion(HTF) module. It employs temporal information from local to global, which empowers the construction of long-range HD map with high stability. Lastly, we propose a novel ped-crossing resampling. The simplified ped crossing representation accelerates the instance attention based decoder convergence performance. Our method achieves 0.66 UniScore in the Mapless Driving OpenLaneV2 test set. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13912 [pdf, other]

From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment

Authors: Yusuke Hirota, Ryo Hachiuma, Chao-Han Huck Yang, Yuta Nakashima

Abstract: Large language models (LLMs) have enhanced the capacity of vision-language models to caption visual text. This generative approach to image caption enrichment further makes textual captions more descriptive, improving alignment with the visual context. However, while many studies focus on benefits of generative caption enrichment (GCE), are there any negative side effects? We compare standard-form… ▽ More Large language models (LLMs) have enhanced the capacity of vision-language models to caption visual text. This generative approach to image caption enrichment further makes textual captions more descriptive, improving alignment with the visual context. However, while many studies focus on benefits of generative caption enrichment (GCE), are there any negative side effects? We compare standard-format captions and recent GCE processes from the perspectives of "gender bias" and "hallucination", showing that enriched captions suffer from increased gender bias and hallucination. Furthermore, models trained on these enriched captions amplify gender bias by an average of 30.9% and increase hallucination by 59.5%. This study serves as a caution against the trend of making captions more descriptive. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13873 [pdf, other]

A Pure Transformer Pretraining Framework on Text-attributed Graphs

Authors: Yu Song, Haitao Mao, Jiachen Xiao, Jingzhe Liu, Zhikai Chen, Wei Jin, Carl Yang, Jiliang Tang, Hui Liu

Abstract: Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges such as feature heterogeneity and structural heterogeneity. Recently, increasing efforts have been made to enhance node feature quality with Large Lan… ▽ More Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges such as feature heterogeneity and structural heterogeneity. Recently, increasing efforts have been made to enhance node feature quality with Large Language Models (LLMs) on text-attributed graphs (TAGs), demonstrating superiority to traditional bag-of-words or word2vec techniques. These high-quality node features reduce the previously critical role of graph structure, resulting in a modest performance gap between Graph Neural Networks (GNNs) and structure-agnostic Multi-Layer Perceptrons (MLPs). Motivated by this, we introduce a feature-centric pretraining perspective by treating graph structure as a prior and leveraging the rich, unified feature space to learn refined interaction patterns that generalizes across graphs. Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks and employs masked feature reconstruction to capture pairwise proximity in the LLM-unified feature space using a standard Transformer. By utilizing unified text representations rather than varying structures, our framework achieves significantly better transferability among graphs within the same domain. GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13173 [pdf, other]

Biomedical Visual Instruction Tuning with Clinician Preference Alignment

Authors: Hejie Cui, Lingjun Mao, Xin Liang, Jieyu Zhang, Hui Ren, Quanzheng Li, Xiang Li, Carl Yang

Abstract: Recent advancements in multimodal foundation models have showcased impressive capabilities in understanding and reasoning with visual and textual information. Adapting these foundation models trained for general usage to specialized domains like biomedicine requires large-scale domain-specific instruction datasets. While existing works have explored curating such datasets automatically, the result… ▽ More Recent advancements in multimodal foundation models have showcased impressive capabilities in understanding and reasoning with visual and textual information. Adapting these foundation models trained for general usage to specialized domains like biomedicine requires large-scale domain-specific instruction datasets. While existing works have explored curating such datasets automatically, the resultant datasets are not explicitly aligned with domain expertise. In this work, we propose a data-centric framework, Biomedical Visual Instruction Tuning with Clinician Preference Alignment (BioMed-VITAL), that incorporates clinician preferences into both stages of generating and selecting instruction data for tuning biomedical multimodal foundation models. First, during the generation stage, we prompt the GPT-4V generator with a diverse set of clinician-selected demonstrations for preference-aligned data candidate generation. Then, during the selection phase, we train a separate selection model, which explicitly distills clinician and policy-guided model preferences into a rating function to select high-quality data for medical instruction tuning. Results show that the model tuned with the instruction-following data from our method demonstrates a significant improvement in open visual chat (18.5% relatively) and medical VQA (win rate up to 81.73%). Our instruction-following data and models are available at BioMed-VITAL.github.io. △ Less

Submitted 29 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

MSC Class: 68T50; 68T45; 68T37; 68T05; 68T07; 68T09; ACM Class: I.2.7; I.2.6; I.2.10

arXiv:2406.13055 [pdf, other]

Self-consistent strong screening applied to thermonuclear reactions

Authors: Christopher Grayson, Cheng Tao Yang, Martin Formanek, Johann Rafelski

Abstract: Self-consistent strong plasma screening around light nuclei is implemented in the Big Bang nucleosynthesis (BBN) epoch to determine the short-range screening potential, $eφ(r)/T \geq 1$, relevant for thermonuclear reactions. We numerically solve the non-linear Poisson-Boltzmann equation incorporating Fermi-Dirac statistics adopting a generalized screening mass to find the electric potential in the… ▽ More Self-consistent strong plasma screening around light nuclei is implemented in the Big Bang nucleosynthesis (BBN) epoch to determine the short-range screening potential, $eφ(r)/T \geq 1$, relevant for thermonuclear reactions. We numerically solve the non-linear Poisson-Boltzmann equation incorporating Fermi-Dirac statistics adopting a generalized screening mass to find the electric potential in the cosmic BBN electron-positron plasma for finite-sized $^4$He nuclei as an example. Although the plasma follows Boltzmann statistics at large distances, Fermi-Dirac statistics is necessary when work performed by ions on electrons is comparable to their rest mass energy. While strong screening effects are generally minor due to the high BBN temperatures, they can enhance the fusion rates of high-$Z>2$ elements while leaving fusion rates of lower-$Z\le 2$ elements relatively unaffected. Our results also reveal a pronounced spatial dependence of the strong screening potential near the nuclear surface. These findings about the electron-positron plasma's role refine BBN theory predictions and offer broader applications for studying weakly coupled plasmas in diverse cosmic and laboratory settings. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 16 pages, 5 figures, typeset using LATEX default style in AASTeX631

arXiv:2406.11817 [pdf, other]

Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level

Authors: Jie Liu, Zhanhui Zhou, Jiaheng Liu, Xingyuan Bu, Chao Yang, Han-Sen Zhong, Wanli Ouyang

Abstract: Direct Preference Optimization (DPO), a standard method for aligning language models with human preferences, is traditionally applied to offline preferences. Recent studies show that DPO benefits from iterative training with online preferences labeled by a trained reward model. In this work, we identify a pitfall of vanilla iterative DPO - improved response quality can lead to increased verbosity.… ▽ More Direct Preference Optimization (DPO), a standard method for aligning language models with human preferences, is traditionally applied to offline preferences. Recent studies show that DPO benefits from iterative training with online preferences labeled by a trained reward model. In this work, we identify a pitfall of vanilla iterative DPO - improved response quality can lead to increased verbosity. To address this, we introduce iterative length-regularized DPO (iLR-DPO) to penalize response length. Our empirical results show that iLR-DPO can enhance a 7B model to perform on par with GPT-4 without increasing verbosity. Specifically, our 7B model achieves a $50.5\%$ length-controlled win rate against $\texttt{GPT-4 Preview}$ on AlpacaEval 2.0, and excels across standard benchmarks including MT-Bench, Arena-Hard and OpenLLM Leaderboard. These results demonstrate the effectiveness of iterative DPO in aligning language models with human feedback. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11683 [pdf, other]

HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing

Authors: Jing Chen, Xinyu Zhu, Cheng Yang, Chufan Shi, Yadong Xi, Yuxiang Zhang, Junjie Wang, Jiashu Pu, Rongsheng Zhang, Yujiu Yang, Tian Feng

Abstract: Generative AI has demonstrated unprecedented creativity in the field of computer vision, yet such phenomena have not been observed in natural language processing. In particular, large language models (LLMs) can hardly produce written works at the level of human experts due to the extremely high complexity of literature writing. In this paper, we present HoLLMwood, an automated framework for unleas… ▽ More Generative AI has demonstrated unprecedented creativity in the field of computer vision, yet such phenomena have not been observed in natural language processing. In particular, large language models (LLMs) can hardly produce written works at the level of human experts due to the extremely high complexity of literature writing. In this paper, we present HoLLMwood, an automated framework for unleashing the creativity of LLMs and exploring their potential in screenwriting, which is a highly demanding task. Mimicking the human creative process, we assign LLMs to different roles involved in the real-world scenario. In addition to the common practice of treating LLMs as ${Writer}$, we also apply LLMs as ${Editor}$, who is responsible for providing feedback and revision advice to ${Writer}$. Besides, to enrich the characters and deepen the plots, we introduce a role-playing mechanism and adopt LLMs as ${Actors}$ that can communicate and interact with each other. Evaluations on automatically generated screenplays show that HoLLMwood substantially outperforms strong baselines in terms of coherence, relevance, interestingness and overall quality. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11160 [pdf, other]

Context Graph

Authors: Chengjin Xu, Muzhi Li, Cehao Yang, Xuhui Jiang, Lumingyuan Tang, Yiyan Qi, Jian Guo

Abstract: Knowledge Graphs (KGs) are foundational structures in many AI applications, representing entities and their interrelations through triples. However, triple-based KGs lack the contextual information of relational knowledge, like temporal dynamics and provenance details, which are crucial for comprehensive knowledge representation and effective reasoning. Instead, \textbf{Context Graphs} (CGs) expan… ▽ More Knowledge Graphs (KGs) are foundational structures in many AI applications, representing entities and their interrelations through triples. However, triple-based KGs lack the contextual information of relational knowledge, like temporal dynamics and provenance details, which are crucial for comprehensive knowledge representation and effective reasoning. Instead, \textbf{Context Graphs} (CGs) expand upon the conventional structure by incorporating additional information such as time validity, geographic location, and source provenance. This integration provides a more nuanced and accurate understanding of knowledge, enabling KGs to offer richer insights and support more sophisticated reasoning processes. In this work, we first discuss the inherent limitations of triple-based KGs and introduce the concept of CGs, highlighting their advantages in knowledge representation and reasoning. We then present a context graph reasoning \textbf{CGR$^3$} paradigm that leverages large language models (LLMs) to retrieve candidate entities and related contexts, rank them based on the retrieved information, and reason whether sufficient information has been obtained to answer a query. Our experimental results demonstrate that CGR$^3$ significantly improves performance on KG completion (KGC) and KG question answering (KGQA) tasks, validating the effectiveness of incorporating contextual information on KG representation and reasoning. △ Less

Submitted 27 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.11085 [pdf, other]

Multiple Sources are Better Than One: Incorporating External Knowledge in Low-Resource Glossing

Authors: Changbing Yang, Garrett Nicolai, Miikka Silfverberg

Abstract: In this paper, we address the data scarcity problem in automatic data-driven glossing for low-resource languages by coordinating multiple sources of linguistic expertise. We supplement models with translations at both the token and sentence level as well as leverage the extensive linguistic capability of modern LLMs. Our enhancements lead to an average absolute improvement of 5%-points in word-lev… ▽ More In this paper, we address the data scarcity problem in automatic data-driven glossing for low-resource languages by coordinating multiple sources of linguistic expertise. We supplement models with translations at both the token and sentence level as well as leverage the extensive linguistic capability of modern LLMs. Our enhancements lead to an average absolute improvement of 5%-points in word-level accuracy over the previous state of the art on a typologically diverse dataset spanning six low-resource languages. The improvements are particularly noticeable for the lowest-resourced language Gitksan, where we achieve a 10%-point improvement. Furthermore, in a simulated ultra-low resource setting for the same six languages, training on fewer than 100 glossed sentences, we establish an average 10%-point improvement in word-level accuracy over the previous state-of-the-art system. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2403.08189

arXiv:2406.10869 [pdf, other]

Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution

Authors: Cuixin Yang, Rongkang Dong, Jun Xiao, Cong Zhang, Kin-Man Lam, Fei Zhou, Guoping Qiu

Abstract: As virtual and augmented reality applications gain popularity, omnidirectional image (ODI) super-resolution has become increasingly important. Unlike 2D plain images that are formed on a plane, ODIs are projected onto spherical surfaces. Applying established image super-resolution methods to ODIs, therefore, requires performing equirectangular projection (ERP) to map the ODIs onto a plane. ODI sup… ▽ More As virtual and augmented reality applications gain popularity, omnidirectional image (ODI) super-resolution has become increasingly important. Unlike 2D plain images that are formed on a plane, ODIs are projected onto spherical surfaces. Applying established image super-resolution methods to ODIs, therefore, requires performing equirectangular projection (ERP) to map the ODIs onto a plane. ODI super-resolution needs to take into account geometric distortion resulting from ERP. However, without considering such geometric distortion of ERP images, previous deep-learning-based methods only utilize a limited range of pixels and may easily miss self-similar textures for reconstruction. In this paper, we introduce a novel Geometric Distortion Guided Transformer for Omnidirectional image Super-Resolution (GDGT-OSR). Specifically, a distortion modulated rectangle-window self-attention mechanism, integrated with deformable self-attention, is proposed to better perceive the distortion and thus involve more self-similar textures. Distortion modulation is achieved through a newly devised distortion guidance generator that produces guidance by exploiting the variability of distortion across latitudes. Furthermore, we propose a dynamic feature aggregation scheme to adaptively fuse the features from different self-attention modules. We present extensive experimental results on public datasets and show that the new GDGT-OSR outperforms methods in existing literature. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 13 pages, 12 figures, journal

Showing 1–50 of 3,144 results for author: Yang, C