subscribe to arXiv mailings

Electronic Correlation and Pseudogap-like Behavior of High-Temperature Superconductor La3Ni2O7

Authors: Yidian Li, Xian Du, Yantao Cao, Cuiying Pei, Mingxin Zhang, Wenxuan Zhao, Kaiyi Zhai, Runzhe Xu, Zhongkai Liu, Zhiwei Li, Jinkui Zhao, Gang Li, Yanpeng Qi, Hanjie Guo, Yulin Chen, Lexian Yang

Abstract: High-temperature superconductivity (HTSC) remains one of the most challenging and fascinating mysteries in condensed matter physics. Recently, superconductivity with transition temperature exceeding liquid-nitrogen temperature is discovered in La3Ni2O7 at high pressure, which provides a new platform to explore the unconventional HTSC. In this work, using high-resolution angle-resolved photoemissio… ▽ More High-temperature superconductivity (HTSC) remains one of the most challenging and fascinating mysteries in condensed matter physics. Recently, superconductivity with transition temperature exceeding liquid-nitrogen temperature is discovered in La3Ni2O7 at high pressure, which provides a new platform to explore the unconventional HTSC. In this work, using high-resolution angle-resolved photoemission spectroscopy and ab-initio calculation, we systematically investigate the electronic structures of La3Ni2O7 at ambient pressure. Our experiments are in nice agreement with ab-initio calculations after considering an orbital-dependent band renormalization effect. The strong electron correlation effect pushes a flat band of d_(z^2 ) orbital component below the Fermi level (EF), which is predicted to locate right at EF under high pressure. Moreover, the d_(x^2-y^2 ) band shows a pseudogap-like behavior with suppressed spectral weight and diminished quasiparticle peak near EF. Our findings provide important insights into the electronic structure of La3Ni2O7, which will shed light on the understanding of the unconventional superconductivity in nickelates. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.06881 [pdf, other]

Efficient Stochastic Routing in Path-Centric Uncertain Road Networks -- Extended Version

Authors: Chenjuan Guo, Ronghui Xu, Bin Yang, Ye Yuan, Tung Kieu, Yan Zhao, Christian S. Jensen

Abstract: The availability of massive vehicle trajectory data enables the modeling of road-network constrained movement as travel-cost distributions rather than just single-valued costs, thereby capturing the inherent uncertainty of movement and enabling improved routing quality. Thus, stochastic routing has been studied extensively in the edge-centric model, where such costs are assigned to the edges in a… ▽ More The availability of massive vehicle trajectory data enables the modeling of road-network constrained movement as travel-cost distributions rather than just single-valued costs, thereby capturing the inherent uncertainty of movement and enabling improved routing quality. Thus, stochastic routing has been studied extensively in the edge-centric model, where such costs are assigned to the edges in a graph representation of a road network. However, as this model still disregards important information in trajectories and fails to capture dependencies among cost distributions, a path-centric model, where costs are assigned to paths, has been proposed that captures dependencies better and provides an improved foundation for routing. Unfortunately, when applied in this model, existing routing algorithms are inefficient due to two shortcomings that we eliminate. First, when exploring candidate paths, existing algorithms only consider the costs of candidate paths from the source to intermediate vertices, while disregarding the costs of travel from the intermediate vertices to the destination, causing many non-competitive paths to be explored. We propose two heuristics for estimating the cost from an intermediate vertex to the destination, thus improving routing efficiency. Second, the edge-centric model relies on stochastic dominance-based pruning to improve efficiency. This pruning assumes that costs are independent and is therefore inapplicable in the path-centric model that takes dependencies into account. We introduce a notion of virtual path that effectively enables stochastic dominance-based pruning in the path-based model, thus further improving efficiency. Empirical studies using two real-world trajectory sets offer insight into the properties of the proposed solution, indicating that it enables efficient stochastic routing in the path-centric model. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06410 [pdf, other]

Rotational Properties of Inverted Hybrid Stars

Authors: Rodrigo Negreiros, Chen Zhang, Renxin Xu

Abstract: We study the rotational properties of inverted hybrid stars (also termed cross stars), which have been recently proposed as a possible new class of compact stars characterized by an outer layer of quark matter and a core of hadrons, in an inverted structure compared to traditional hybrid stars. We analyze distinct models representing varying depths of quark-hadron phase transitions. Our findings r… ▽ More We study the rotational properties of inverted hybrid stars (also termed cross stars), which have been recently proposed as a possible new class of compact stars characterized by an outer layer of quark matter and a core of hadrons, in an inverted structure compared to traditional hybrid stars. We analyze distinct models representing varying depths of quark-hadron phase transitions. Our findings reveal that, while cross stars rotating at their Kepler frequencies typically exhibit a significantly higher mass and larger circumferential radius as anticipated, interestingly, there is a significant increase in potential twin configurations in the case of rapid rotations. We further study sequences of constant baryonic mass, representing potential paths of rotational evolution. Our results indicate that not all stars in these sequences are viable due to the onset of phase transitions during spin-down, leading to possible mini-collapses. We also investigate the phenomenon of ``back-bending" during spin-down sequences, which is manifested in a rather different shape for cross stars due to their inverted structure and the large density discontinuity caused by the strong phase transition, in contrast to traditional hybrid stars. Our research enriches existing studies by introducing the significant aspect of rotation, unveiling intr △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 8 pages, 9 figures

arXiv:2407.05305 [pdf, other]

MINDECHO: Role-Playing Language Agents for Key Opinion Leaders

Authors: Rui Xu, Dakuan Lu, Xiaoyu Tan, Xintao Wang, Siyu Yuan, Jiangjie Chen, Wei Chu, Xu Yinghui

Abstract: Large language models~(LLMs) have demonstrated impressive performance in various applications, among which role-playing language agents (RPLAs) have engaged a broad user base. Now, there is a growing demand for RPLAs that represent Key Opinion Leaders (KOLs), \ie, Internet celebrities who shape the trends and opinions in their domains. However, research in this line remains underexplored. In this… ▽ More Large language models~(LLMs) have demonstrated impressive performance in various applications, among which role-playing language agents (RPLAs) have engaged a broad user base. Now, there is a growing demand for RPLAs that represent Key Opinion Leaders (KOLs), \ie, Internet celebrities who shape the trends and opinions in their domains. However, research in this line remains underexplored. In this paper, we hence introduce MINDECHO, a comprehensive framework for the development and evaluation of KOL RPLAs. MINDECHO collects KOL data from Internet video transcripts in various professional fields, and synthesizes their conversations leveraging GPT-4. Then, the conversations and the transcripts are used for individualized model training and inference-time retrieval, respectively. Our evaluation covers both general dimensions (\ie, knowledge and tones) and fan-centric dimensions for KOLs. Extensive experiments validate the effectiveness of MINDECHO in developing and evaluating KOL RPLAs. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.04923 [pdf, other]

OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding

Authors: Tiancheng Zhao, Qianqian Zhang, Kyusong Lee, Peng Liu, Lu Zhang, Chunxin Fang, Jiajia Liao, Kelei Jiang, Yibo Ma, Ruochen Xu

Abstract: We introduce OmChat, a model designed to excel in handling long contexts and video understanding tasks. OmChat's new architecture standardizes how different visual inputs are processed, making it more efficient and adaptable. It uses a dynamic vision encoding process to effectively handle images of various resolutions, capturing fine details across a range of image qualities. OmChat utilizes an ac… ▽ More We introduce OmChat, a model designed to excel in handling long contexts and video understanding tasks. OmChat's new architecture standardizes how different visual inputs are processed, making it more efficient and adaptable. It uses a dynamic vision encoding process to effectively handle images of various resolutions, capturing fine details across a range of image qualities. OmChat utilizes an active progressive multimodal pretraining strategy, which gradually increases the model's capacity for long contexts and enhances its overall abilities. By selecting high-quality data during training, OmChat learns from the most relevant and informative data points. With support for a context length of up to 512K, OmChat demonstrates promising performance in tasks involving multiple images and videos, outperforming most open-source models in these benchmarks. Additionally, OmChat proposes a prompting strategy for unifying complex multimodal inputs including single image text, multi-image text and videos, and achieving competitive performance on single-image benchmarks. To further evaluate the model's capabilities, we proposed a benchmark dataset named Temporal Visual Needle in a Haystack. This dataset assesses OmChat's ability to comprehend temporal visual details within long videos. Our analysis highlights several key factors contributing to OmChat's success: support for any-aspect high image resolution, the active progressive pretraining strategy, and high-quality supervised fine-tuning datasets. This report provides a detailed overview of OmChat's capabilities and the strategies that enhance its performance in visual understanding. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 14 pages

arXiv:2407.04514 [pdf, other]

Giant Second Harmonic Generation from Wafer-Scale Aligned Chiral Carbon Nanotubes

Authors: Rui Xu, Jacques Doumani, Viktor Labuntsov, Nina Hong, Anna-Christina Samaha, Weiran Tu, Fuyang Tay, Elizabeth Blackert, Jiaming Luo, Mario El Tahchi, Weilu Gao, Jun Lou, Yohei Yomogida, Kazuhiro Yanagi, Riichiro Saito, Vasili Perebeinos, Andrey Baydin, Junichiro Kono, Hanyu Zhu

Abstract: Chiral carbon nanotubes (CNTs) are direct-gap semiconductors with optical properties governed by one-dimensional excitons with enormous oscillator strengths. Each species of chiral CNTs has an enantiomeric pair of left- and right-handed CNTs with nearly identical properties, but enantiomer-dependent phenomena can emerge, especially in nonlinear optical processes. Theoretical studies have predicted… ▽ More Chiral carbon nanotubes (CNTs) are direct-gap semiconductors with optical properties governed by one-dimensional excitons with enormous oscillator strengths. Each species of chiral CNTs has an enantiomeric pair of left- and right-handed CNTs with nearly identical properties, but enantiomer-dependent phenomena can emerge, especially in nonlinear optical processes. Theoretical studies have predicted strong second-order nonlinearities for chiral CNTs, but there has been no experimental verification due to the lack of macroscopically ordered assemblies of single-enantiomer chiral CNTs. Here for the first time, we report the synthesis of centimeter-scale films of densely packed and aligned single-enantiomer chiral CNTs that exhibit micro-fabrication compatibility. We observe giant second harmonic generation (SHG) emission from the chiral CNT film, which originates from the intrinsic chirality and inversion symmetry breaking of the atomic structure of chiral CNTs. The observed value of the dominant element of the second-order nonlinear optical susceptibility tensor reaches $1.5\times 10^{3}$ pm/V at a pump wavelength of 1030 nm, corresponding to the lowest-energy excitonic resonance. Our calculations based on many-body theory correctly estimate the spectrum and magnitude of such excitonically enhanced optical nonlinearity. These results are promising for developing scalable chiral-CNT electronics, nonlinear photonics and photonic quantum computing. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04195 [pdf, ps, other]

Hadronuclear interactions in AGN jets as the origin of the diffuse high-energy neutrino background

Authors: Rui Xue, Ze-Rui Wang, Jagdish C. Joshi, Wei-Jian Li

Abstract: The origin of diffuse high-energy neutrinos from TeV to PeV energies detected by IceCube Observatory remains a mystery. In our previous work, we have shown that hadronuclear (p-p) interactions in AGN jets could be important and generate detectable very-high-energy emissions. Here, we further explore these interactions in the AGN jets based on their luminosity function. The diffuse neutrino flux an… ▽ More The origin of diffuse high-energy neutrinos from TeV to PeV energies detected by IceCube Observatory remains a mystery. In our previous work, we have shown that hadronuclear (p-p) interactions in AGN jets could be important and generate detectable very-high-energy emissions. Here, we further explore these interactions in the AGN jets based on their luminosity function. The diffuse neutrino flux and corresponding $γ$-ray flux have been calculated and compared with observational data. In our modeling, two beaming patterns are considered separately. To make sure that the corresponding $γ$-ray flux does not overshoot the diffuse $γ$-ray background, we find that if the neutrino production region in jet is opaque to $γ$ rays, p-p interactions in AGN jets with a small viewing angle (the blazar case) are able to interpret the PeV neutrino background. Similarly, AGN jets with a large viewing angle (the radio galaxy case) may interpret the TeV neutrino background. While, if the neutrino production region is transparent to $γ$ rays, only blazars have the potential to interpret the DNB around PeV band. Some caveats are also discussed. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 12 pages, 6 figures, accepted for publication in ApJ

arXiv:2407.03857 [pdf, other]

PFGS: High Fidelity Point Cloud Rendering via Feature Splatting

Authors: Jiaxu Wang, Ziyi Zhang, Junhao He, Renjing Xu

Abstract: Rendering high-fidelity images from sparse point clouds is still challenging. Existing learning-based approaches suffer from either hole artifacts, missing details, or expensive computations. In this paper, we propose a novel framework to render high-quality images from sparse points. This method first attempts to bridge the 3D Gaussian Splatting and point cloud rendering, which includes several c… ▽ More Rendering high-fidelity images from sparse point clouds is still challenging. Existing learning-based approaches suffer from either hole artifacts, missing details, or expensive computations. In this paper, we propose a novel framework to render high-quality images from sparse points. This method first attempts to bridge the 3D Gaussian Splatting and point cloud rendering, which includes several cascaded modules. We first use a regressor to estimate Gaussian properties in a point-wise manner, the estimated properties are used to rasterize neural feature descriptors into 2D planes which are extracted from a multiscale extractor. The projected feature volume is gradually decoded toward the final prediction via a multiscale and progressive decoder. The whole pipeline experiences a two-stage training and is driven by our well-designed progressive and multiscale reconstruction loss. Experiments on different benchmarks show the superiority of our method in terms of rendering qualities and the necessities of our main components. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.01906 [pdf, other]

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

Authors: Zihan Wang, Deli Chen, Damai Dai, Runxin Xu, Zhuoshu Li, Y. Wu

Abstract: Parameter-efficient fine-tuning (PEFT) is crucial for customizing Large Language Models (LLMs) with constrained resources. Although there have been various PEFT methods for dense-architecture LLMs, PEFT for sparse-architecture LLMs is still underexplored. In this work, we study the PEFT method for LLMs with the Mixture-of-Experts (MoE) architecture and the contents of this work are mainly threefol… ▽ More Parameter-efficient fine-tuning (PEFT) is crucial for customizing Large Language Models (LLMs) with constrained resources. Although there have been various PEFT methods for dense-architecture LLMs, PEFT for sparse-architecture LLMs is still underexplored. In this work, we study the PEFT method for LLMs with the Mixture-of-Experts (MoE) architecture and the contents of this work are mainly threefold: (1) We investigate the dispersion degree of the activated experts in customized tasks, and found that the routing distribution for a specific task tends to be highly concentrated, while the distribution of activated experts varies significantly across different tasks. (2) We propose Expert-Specialized Fine-Tuning, or ESFT, which tunes the experts most relevant to downstream tasks while freezing the other experts and modules; experimental results demonstrate that our method not only improves the tuning efficiency, but also matches or even surpasses the performance of full-parameter fine-tuning. (3) We further analyze the impact of the MoE architecture on expert-specialized fine-tuning. We find that MoE models with finer-grained experts are more advantageous in selecting the combination of experts that are most relevant to downstream tasks, thereby enhancing both the training efficiency and effectiveness. Our code is available at https://github.com/deepseek-ai/ESFT. △ Less

Submitted 4 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.19939 [pdf, other]

Data-driven methods for flow and transport in porous media: a review

Authors: Guang Yang, Ran Xu, Yusong Tian, Songyuan Guo, Jingyi Wu, Xu Chu

Abstract: This review examined the current advancements in data-driven methods for analyzing flow and transport in porous media, which has various applications in energy, chemical engineering, environmental science, and beyond. Although there has been progress in recent years, the challenges of current experimental and high-fidelity numerical simulations, such as high computational costs and difficulties in… ▽ More This review examined the current advancements in data-driven methods for analyzing flow and transport in porous media, which has various applications in energy, chemical engineering, environmental science, and beyond. Although there has been progress in recent years, the challenges of current experimental and high-fidelity numerical simulations, such as high computational costs and difficulties in accurately representing complex, heterogeneous structures, can still potentially be addressed by state-of-the-art data-driven methods. We analyzed the synergistic potential of these methods, addressed their limitations, and suggested how they can be effectively integrated to improve both the fidelity and efficiency of current research. A discussion on future research directions in this field was conducted, emphasizing the need for collaborative efforts that combine domain expertise in physics and advanced computationald and data-driven methodologies. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.18921 [pdf, other]

Capturing Minds, Not Just Words: Enhancing Role-Playing Language Models with Personality-Indicative Data

Authors: Yiting Ran, Xintao Wang, Rui Xu, Xinfeng Yuan, Jiaqing Liang, Yanghua Xiao, Deqing Yang

Abstract: Role-playing agents (RPA) have been a popular application area for large language models (LLMs), attracting significant interest from both industry and academia.While existing RPAs well portray the characters' knowledge and tones, they face challenges in capturing their minds, especially for small role-playing language models (RPLMs). In this paper, we propose to enhance RPLMs via personality-indi… ▽ More Role-playing agents (RPA) have been a popular application area for large language models (LLMs), attracting significant interest from both industry and academia.While existing RPAs well portray the characters' knowledge and tones, they face challenges in capturing their minds, especially for small role-playing language models (RPLMs). In this paper, we propose to enhance RPLMs via personality-indicative data. Specifically, we leverage questions from psychological scales and distill advanced RPAs to generate dialogues that grasp the minds of characters. Experimental results validate that RPLMs trained with our dataset exhibit advanced role-playing capabilities for both general and personality-related evaluations. Code and data are available at \href{https://github.com/alienet1109/RolePersonality}{this URL}. △ Less

Submitted 29 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 10pages

arXiv:2406.18078 [pdf, other]

Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction

Authors: Yice Zhang, Jie Zeng, Weiming Hu, Ziyi Wang, Shiwei Chen, Ruifeng Xu

Abstract: Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review, which is the most representative and challenging task in aspect-based sentiment analysis. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. To tackle this issue, we propose a self-tra… ▽ More Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review, which is the most representative and challenging task in aspect-based sentiment analysis. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. To tackle this issue, we propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels, aiming to filter out mismatches and thereby enhance the effectiveness of self-training. We highlight two critical aspects to ensure the scorer's effectiveness and reliability: the quality of the training dataset and its model architecture. To this end, we create a human-annotated comparison dataset and train a generative model on it using ranking-based objectives. Extensive experiments on public ASQP datasets reveal that using our scorer can greatly and consistently improve the effectiveness of self-training. Moreover, we explore the possibility of replacing humans with large language models for comparison dataset annotation, and experiments demonstrate its feasibility. We release our code and data at https://github.com/HITSZ-HLT/ST-w-Scorer-ABSA . △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024 Main Conference

arXiv:2406.17248 [pdf, other]

MindSpore Quantum: A User-Friendly, High-Performance, and AI-Compatible Quantum Computing Framework

Authors: Xusheng Xu, Jiangyu Cui, Zidong Cui, Runhong He, Qingyu Li, Xiaowei Li, Yanling Lin, Jiale Liu, Wuxin Liu, Jiale Lu, Maolin Luo, Chufan Lyu, Shijie Pan, Mosharev Pavel, Runqiu Shu, Jialiang Tang, Ruoqian Xu, Shu Xu, Kang Yang, Fan Yu, Qingguo Zeng, Haiying Zhao, Qiang Zheng, Junyuan Zhou, Xu Zhou , et al. (14 additional authors not shown)

Abstract: We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum… ▽ More We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum algorithms on both CPU and GPU platforms, delivering remarkable performance. Furthermore, this framework places a strong emphasis on enhancing the operational efficiency of quantum algorithms when executed on real quantum hardware. This encompasses the development of algorithms for quantum circuit compilation and qubit mapping, crucial components for achieving optimal performance on quantum processors. In addition to the core framework, we introduce QuPack, a meticulously crafted quantum computing acceleration engine. QuPack significantly accelerates the simulation speed of MindSpore Quantum, particularly in variational quantum eigensolver (VQE), quantum approximate optimization algorithm (QAOA), and tensor network simulations, providing astonishing speed. This combination of cutting-edge technologies empowers researchers and practitioners to explore the frontiers of quantum computing with unprecedented efficiency and performance. △ Less

Submitted 10 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.17202 [pdf, other]

Constraining the Physical Parameters of Blazars Using the Seed Factor Approach

Authors: Chang-Bin Deng, Yong-You Shi, Yu-Jie Song, Rui Xue, Lei-Ming Du, Ze-Rui Wang, Zhao-Hua Xie

Abstract: The discovery that blazars dominate the extra-galactic γ-ray sky is a triumph in the Fermi era. However, the exact location of γ-ray emission region still remains in debate. Low-synchrotron-peaked blazars (LSPs) are estimated to produce high-energy radiation through the external Compton process, thus their emission regions are closely related to the external photon fields. We employed the seed fac… ▽ More The discovery that blazars dominate the extra-galactic γ-ray sky is a triumph in the Fermi era. However, the exact location of γ-ray emission region still remains in debate. Low-synchrotron-peaked blazars (LSPs) are estimated to produce high-energy radiation through the external Compton process, thus their emission regions are closely related to the external photon fields. We employed the seed factor approach proposed by Georganopoulos et al. It directly matches the observed seed factor of each LSP with the characteristic seed factors of external photon fields to locate the γ-ray emission region. A sample of 1138 LSPs with peak frequencies and peak luminosities was adopted to plot a histogram distribution of observed seed factors. We also collected some spectral energy distributions (SEDs) of historical flare states to investigate the variation of γ-ray emission region. Those SEDs were fitted by both quadratic and cubic functions using the Markov-chain Monte Carlo method. Furthermore, we derived some physical parameters of blazars and compared them with the constraint of internal γγ-absorption. We find that dusty torus dominates the soft photon fields of LSPs and most γ-ray emission regions of LSPs are located at 1-10 pc. The soft photon fields could also transition from dusty torus to broad line region and cosmic microwave background in different flare states. Our results suggest that the cubic function is better than the quadratic function to fit the SEDs. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 21 pages, 12 figures, Accepted for publication in PASA

arXiv:2406.16139 [pdf, ps, other]

Brownian friction dynamics: fluctuations in sliding distance

Authors: Ruibin Xu, Feng Zhou, B. N. J. Persson

Abstract: We have studied the fluctuation (noise) in the position of sliding blocks under constant driving forces on different substrate surfaces. The experimental data are complemented by simulations using a simple spring-block model where the asperity contact regions are modeled by miniblocks connected to the big block by viscoelastic springs. The miniblocks experience forces that fluctuate randomly with… ▽ More We have studied the fluctuation (noise) in the position of sliding blocks under constant driving forces on different substrate surfaces. The experimental data are complemented by simulations using a simple spring-block model where the asperity contact regions are modeled by miniblocks connected to the big block by viscoelastic springs. The miniblocks experience forces that fluctuate randomly with the lateral position, simulating the interaction between asperities on the block and the substrate. The theoretical model provides displacement power spectra that agree well with the experimental results. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.15000 [pdf, other]

Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations

Authors: Lichao Zhang, Jia Yu, Shuai Zhang, Long Li, Yangyang Zhong, Guanbao Liang, Yuming Yan, Qing Ma, Fangsheng Weng, Fayu Pan, Jing Li, Renjun Xu, Zhenzhong Lan

Abstract: Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We cond… ▽ More Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We conduct a comprehensive analysis using a diverse set of chatbots and real-user interaction data, employing metrics such as retention rate and conversation length to evaluate user engagement. Our findings reveal a significant enhancement in user engagement with multi-modal interactions compared to text-only dialogues. Notably, the incorporation of a third modality significantly amplifies engagement beyond the benefits observed with just two modalities. These results suggest that multi-modal interactions optimize cognitive processing and facilitate richer information comprehension. This study underscores the importance of multi-modality in chatbot design, offering valuable insights for creating more engaging and immersive AI communication experiences and informing the broader AI community about the benefits of multi-modal interactions in enhancing user engagement. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14024 [pdf, other]

LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback

Authors: Bofei Gao, Zefan Cai, Runxin Xu, Peiyi Wang, Ce Zheng, Runji Lin, Keming Lu, Dayiheng Liu, Chang Zhou, Wen Xiao, Junjie Hu, Tianyu Liu, Baobao Chang

Abstract: Mathematical verfier achieves success in mathematical reasoning tasks by validating the correctness of solutions. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately assess the solutions. To mitigate the aforementioned insufficiency of binary labels, we introduce step-wise natural language feedbacks as rationale la… ▽ More Mathematical verfier achieves success in mathematical reasoning tasks by validating the correctness of solutions. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately assess the solutions. To mitigate the aforementioned insufficiency of binary labels, we introduce step-wise natural language feedbacks as rationale labels (i.e., the correctness of the current step and the explanations). In this paper, we propose \textbf{Math-Minos}, a natural language feedback enhanced verifier by constructing automatically-generated training data and a two-stage training paradigm for effective training and efficient inference. Our experiments reveal that a small set (30k) of natural language feedbacks can significantly boost the performance of the verifier by the accuracy of 1.6\% (86.6\% $\rightarrow$ 88.2\%) on GSM8K and 0.8\% (37.8\% $\rightarrow$ 38.6\%) on MATH. We have released our code and data for further exploration. △ Less

Submitted 8 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: 9 pages

arXiv:2406.13975 [pdf, other]

MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

Authors: Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia

Abstract: Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we pr… ▽ More Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we present a process-based benchmark MR-BEN that demands a meta reasoning skill, where LMs are asked to locate and analyse potential errors in automatically generated reasoning steps. MR-BEN is a comprehensive benchmark comprising 5,975 questions collected from human experts, covering various subjects such as physics, chemistry, logic, coding, and more. Through our designed metrics for assessing meta-reasoning on this benchmark, we identify interesting limitations and weaknesses of current LLMs (open-source and closed-source models). For example, open-source models are seemingly comparable to GPT-4 on outcome-based benchmarks, but they lag far behind on our benchmark, revealing the underlying reasoning capability gap between them. Our dataset and codes are available on https://randolph-zeng.github.io/Mr-Ben.github.io/. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13843 [pdf, other]

Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data

Authors: Nahema Marchal, Rachel Xu, Rasmi Elasmar, Iason Gabriel, Beth Goldberg, William Isaac

Abstract: Generative, multimodal artificial intelligence (GenAI) offers transformative potential across industries, but its misuse poses significant risks. Prior research has shed light on the potential of advanced AI systems to be exploited for malicious purposes. However, we still lack a concrete understanding of how GenAI models are specifically exploited or abused in practice, including the tactics empl… ▽ More Generative, multimodal artificial intelligence (GenAI) offers transformative potential across industries, but its misuse poses significant risks. Prior research has shed light on the potential of advanced AI systems to be exploited for malicious purposes. However, we still lack a concrete understanding of how GenAI models are specifically exploited or abused in practice, including the tactics employed to inflict harm. In this paper, we present a taxonomy of GenAI misuse tactics, informed by existing academic literature and a qualitative analysis of approximately 200 observed incidents of misuse reported between January 2023 and March 2024. Through this analysis, we illuminate key and novel patterns in misuse during this time period, including potential motivations, strategies, and how attackers leverage and abuse system capabilities across modalities (e.g. image, text, audio, video) in the wild. △ Less

Submitted 21 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12753 [pdf, other]

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

Authors: Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang , et al. (3 additional authors not shown)

Abstract: The evolution of Artificial Intelligence (AI) has been significantly accelerated by advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), gradually showcasing potential cognitive reasoning abilities in problem-solving and scientific discovery (i.e., AI4Science) once exclusive to human intellect. To comprehensively evaluate current models' performance in cognitive reasoni… ▽ More The evolution of Artificial Intelligence (AI) has been significantly accelerated by advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), gradually showcasing potential cognitive reasoning abilities in problem-solving and scientific discovery (i.e., AI4Science) once exclusive to human intellect. To comprehensively evaluate current models' performance in cognitive reasoning abilities, we introduce OlympicArena, which includes 11,163 bilingual problems across both text-only and interleaved text-image modalities. These challenges encompass a wide range of disciplines spanning seven fields and 62 international Olympic competitions, rigorously examined for data leakage. We argue that the challenges in Olympic competition problems are ideal for evaluating AI's cognitive reasoning due to their complexity and interdisciplinary nature, which are essential for tackling complex scientific challenges and facilitating discoveries. Beyond evaluating performance across various disciplines using answer-only criteria, we conduct detailed experiments and analyses from multiple perspectives. We delve into the models' cognitive reasoning abilities, their performance across different modalities, and their outcomes in process-level evaluations, which are vital for tasks requiring complex reasoning with lengthy solutions. Our extensive evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy, illustrating current AI limitations in complex reasoning and multimodal integration. Through the OlympicArena, we aim to advance AI towards superintelligence, equipping it to address more complex challenges in science and beyond. We also provide a comprehensive set of resources to support AI research, including a benchmark dataset, an open-source annotation platform, a detailed evaluation tool, and a leaderboard with automatic submission features. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 44 pages

arXiv:2406.11931 [pdf, other]

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11354 [pdf, other]

Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression

Authors: Zilun Zhang, Yutao Sun, Tiancheng Zhao, Leigang Sha, Ruochen Xu, Kyusong Lee, Jianwei Yin

Abstract: Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Language Models (MLLMs) which are composed of the LLM base and visual projector (e.g. LLaVA), a significant decline in performance on language benchmarks… ▽ More Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Language Models (MLLMs) which are composed of the LLM base and visual projector (e.g. LLaVA), a significant decline in performance on language benchmarks was observed compared to their single-modality counterparts. To address these challenges, we introduce a novel model-agnostic self-decompression method, Tree Generation (TG), that decompresses knowledge within LLMs into the training corpus. This paper focuses on TG-SFT, which can synthetically generate SFT data for the instruction tuning steps. By incorporating the dumped corpus during SFT for MLLMs, we significantly reduce the forgetting problem. △ Less

Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11271 [pdf, other]

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

Authors: Anas Awadalla, Le Xue, Oscar Lo, Manli Shu, Hannah Lee, Etash Kumar Guha, Matt Jordan, Sheng Shen, Mohamed Awadalla, Silvio Savarese, Caiming Xiong, Ran Xu, Yejin Choi, Ludwig Schmidt

Abstract: Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of large-scale, diverse open-source multimodal interleaved datasets. In response, we introduce MINT-1T, the most extensive and diverse open-source Multimo… ▽ More Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of large-scale, diverse open-source multimodal interleaved datasets. In response, we introduce MINT-1T, the most extensive and diverse open-source Multimodal INTerleaved dataset to date. MINT-1T comprises one trillion text tokens and three billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. As scaling multimodal interleaved datasets requires substantial engineering effort, sharing the data curation process and releasing the dataset greatly benefits the community. Our experiments show that LMMs trained on MINT-1T rival the performance of models trained on the previous leading dataset, OBELICS. Our data and code will be released at https://github.com/mlfoundations/MINT-1T. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10290 [pdf, other]

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

Authors: Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese

Abstract: The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understand… ▽ More The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understanding of quantization's impact on various task performances, including LLM tasks, LMM tasks, and, critically, trust and safety. There is a lack of adequate tools for systematically testing these models on mobile devices. To address these gaps, we introduce MobileAIBench, a comprehensive benchmarking framework for evaluating mobile-optimized LLMs and LMMs. MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices. Our two-part open-source framework includes a library for running evaluations on desktops and an iOS app for on-device latency and hardware utilization measurements. Our thorough analysis aims to accelerate mobile AI research and deployment by providing insights into the performance and feasibility of deploying LLMs and LMMs on mobile platforms. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.10061 [pdf, other]

doi 10.1145/3637528.3671594

TACCO: Task-guided Co-clustering of Clinical Concepts and Patient Visits for Disease Subtyping based on EHR Data

Authors: Ziyang Zhang, Hejie Cui, Ran Xu, Yuzhang Xie, Joyce C. Ho, Carl Yang

Abstract: The growing availability of well-organized Electronic Health Records (EHR) data has enabled the development of various machine learning models towards disease risk prediction. However, existing risk prediction methods overlook the heterogeneity of complex diseases, failing to model the potential disease subtypes regarding their corresponding patient visits and clinical concept subgroups. In this w… ▽ More The growing availability of well-organized Electronic Health Records (EHR) data has enabled the development of various machine learning models towards disease risk prediction. However, existing risk prediction methods overlook the heterogeneity of complex diseases, failing to model the potential disease subtypes regarding their corresponding patient visits and clinical concept subgroups. In this work, we introduce TACCO, a novel framework that jointly discovers clusters of clinical concepts and patient visits based on a hypergraph modeling of EHR data. Specifically, we develop a novel self-supervised co-clustering framework that can be guided by the risk prediction task of specific diseases. Furthermore, we enhance the hypergraph model of EHR data with textual embeddings and enforce the alignment between the clusters of clinical concepts and patient visits through a contrastive objective. Comprehensive experiments conducted on the public MIMIC-III dataset and Emory internal CRADLE dataset over the downstream clinical tasks of phenotype classification and cardiovascular risk prediction demonstrate an average 31.25% performance improvement compared to traditional ML baselines and a 5.26% improvement on top of the vanilla hypergraph model without our co-clustering mechanism. In-depth model analysis, clustering results analysis, and clinical case studies further validate the improved utilities and insightful interpretations delivered by TACCO. Code is available at https://github.com/PericlesHat/TACCO. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 11 pages, 5 figures, to be published in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

arXiv:2406.09401 [pdf, other]

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

Authors: Ruiyuan Lyu, Tai Wang, Jingli Lin, Shuai Yang, Xiaohan Mao, Yilun Chen, Runsen Xu, Haifeng Huang, Chenming Zhu, Dahua Lin, Jiangmiao Pang

Abstract: With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress. However, limited by existing datasets, previous works mainly focus on understanding object properties or inter-object spatial relationships in a 3D scene. To tackle this problem, this paper builds the… ▽ More With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress. However, limited by existing datasets, previous works mainly focus on understanding object properties or inter-object spatial relationships in a 3D scene. To tackle this problem, this paper builds the first largest ever multi-modal 3D scene dataset and benchmark with hierarchical grounded language annotations, MMScan. It is constructed based on a top-down logic, from region to object level, from a single target to inter-target relationships, covering holistic aspects of spatial and attribute understanding. The overall pipeline incorporates powerful VLMs via carefully designed prompts to initialize the annotations efficiently and further involve humans' correction in the loop to ensure the annotations are natural, correct, and comprehensive. Built upon existing 3D scanning data, the resulting multi-modal 3D dataset encompasses 1.4M meta-annotated captions on 109k objects and 7.7k regions as well as over 3.04M diverse samples for 3D visual grounding and question-answering benchmarks. We evaluate representative baselines on our benchmarks, analyze their capabilities in different aspects, and showcase the key problems to be addressed in the future. Furthermore, we use this high-quality dataset to train state-of-the-art 3D visual grounding and LLMs and obtain remarkable performance improvement both on existing benchmarks and in-the-wild evaluation. Codes, datasets, and benchmarks will be available at https://github.com/OpenRobotLab/EmbodiedScan. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Follow-up of EmbodiedScan. A multi-modal 3D dataset with the most-ever comprehensive language annotations for 3D-LLMs. Project page: https://tai-wang.github.io/mmscan/

arXiv:2406.08698 [pdf, other]

Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 pages, 12 figures, accepted by PRL

arXiv:2406.08204 [pdf, other]

Diffusion-Promoted HDR Video Reconstruction

Authors: Yuanshen Guan, Ruikang Xu, Mingde Yao, Ruisheng Gao, Lizhi Wang, Zhiwei Xiong

Abstract: High dynamic range (HDR) video reconstruction aims to generate HDR videos from low dynamic range (LDR) frames captured with alternating exposures. Most existing works solely rely on the regression-based paradigm, leading to adverse effects such as ghosting artifacts and missing details in saturated regions. In this paper, we propose a diffusion-promoted method for HDR video reconstruction, termed… ▽ More High dynamic range (HDR) video reconstruction aims to generate HDR videos from low dynamic range (LDR) frames captured with alternating exposures. Most existing works solely rely on the regression-based paradigm, leading to adverse effects such as ghosting artifacts and missing details in saturated regions. In this paper, we propose a diffusion-promoted method for HDR video reconstruction, termed HDR-V-Diff, which incorporates a diffusion model to capture the HDR distribution. As such, HDR-V-Diff can reconstruct HDR videos with realistic details while alleviating ghosting artifacts. However, the direct introduction of video diffusion models would impose massive computational burden. Instead, to alleviate this burden, we first propose an HDR Latent Diffusion Model (HDR-LDM) to learn the distribution prior of single HDR frames. Specifically, HDR-LDM incorporates a tonemapping strategy to compress HDR frames into the latent space and a novel exposure embedding to aggregate the exposure information into the diffusion process. We then propose a Temporal-Consistent Alignment Module (TCAM) to learn the temporal information as a complement for HDR-LDM, which conducts coarse-to-fine feature alignment at different scales among video frames. Finally, we design a Zero-Init Cross-Attention (ZiCA) mechanism to effectively integrate the learned distribution prior and temporal information for generating HDR frames. Extensive experiments validate that HDR-V-Diff achieves state-of-the-art results on several representative datasets. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Arxiv Preprint

arXiv:2406.06253 [pdf, other]

PretVM: Predictable, Efficient Virtual Machine for Real-Time Concurrency

Authors: Shaokai Lin, Erling Jellum, Mirco Theile, Tassilo Tanneberger, Binqi Sun, Chadlia Jerad, Ruomu Xu, Guangyu Feng, Christian Menard, Marten Lohstroh, Jeronimo Castrillon, Sanjit Seshia, Edward Lee

Abstract: This paper introduces the Precision-Timed Virtual Machine (PretVM), an intermediate platform facilitating the execution of quasi-static schedules compiled from a subset of programs written in the Lingua Franca (LF) coordination language. The subset consists of those programs that in principle should have statically verifiable and predictable timing behavior. The PretVM provides a schedule with wel… ▽ More This paper introduces the Precision-Timed Virtual Machine (PretVM), an intermediate platform facilitating the execution of quasi-static schedules compiled from a subset of programs written in the Lingua Franca (LF) coordination language. The subset consists of those programs that in principle should have statically verifiable and predictable timing behavior. The PretVM provides a schedule with well-defined worst-case timing bounds. The PretVM provides a clean separation between application logic and coordination logic, yielding more analyzable program executions. Experiments compare the PretVM against the default (more dynamic) LF scheduler and show that it delivers time-accurate deterministic execution. △ Less

Submitted 25 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06104 [pdf]

Correlated electrons of the flat band in charge density wave state of 4Hb-TaSexS2-x

Authors: Yanyan Geng, Jianfeng Guo, Fanyu Meng, Manyu Wang, Shuo Mi, Li Huang, Rui Xu, Fei Pang, Kai Liu, Shancai Wang, Hong-Jun Gao, Weichang Zhou, Wei Ji, Hechang Lei, Zhihai Cheng

Abstract: Many intriguing quantum states of matter, such as unconventional superconductivity, magnetic phases and fractional quantum Hall physics, emergent from the spatially-correlated localized electrons in the flat band of solid materials. By using scanning tunneling microscopy and spectroscopy (STM/STS), we report the real-space investigation of correlated electrons in the flat band of superlattice 4Hb-… ▽ More Many intriguing quantum states of matter, such as unconventional superconductivity, magnetic phases and fractional quantum Hall physics, emergent from the spatially-correlated localized electrons in the flat band of solid materials. By using scanning tunneling microscopy and spectroscopy (STM/STS), we report the real-space investigation of correlated electrons in the flat band of superlattice 4Hb-TaSexS2-x. In contrast with the pristine 4Hb-TaS2, the selenium (Se) substitutions significantly affect the interfacial transfer of correlated electrons between the CDW states of 1T- and 1H-TaS2 layers, and contribute a real-space fractional electron-filling configurations with the distributed electron-filled and -void SoD clusters of 1T-layer. The site-specific STS spectra directly reveal their respective prominent spectra weight above EF and symmetric Mott-like spectra. In addition, the spatial distributions of these electron-filled SoDs in the 1T-layer of 4Hb-TaSe0.7S1.3 demonstrate different local short-range patterning, clearly indicating the complex neighboring interactions among the localized electrons in the flat band of 1T-layer. Our results not only provide an in-depth insight of correlated electrons in the flat CDW band, and provide a simple platform to manipulate the electron-correlation-related quantum states. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 18 pages, 4 figures

arXiv:2406.06023 [pdf, other]

The Limits of Interval-Regulated Price Discrimination

Authors: Kamesh Munagala, Yiheng Shen, Renzhe Xu

Abstract: In this paper, we study third-degree price discrimination in a model first presented in Bergemann, Brooks, and Morris [2015]. Since such price discrimination might create market segments with vastly different posted prices, we consider regulating these prices, specifically, via restricting them to lie within an interval. Given a price interval, we consider segmentations of the market where a selle… ▽ More In this paper, we study third-degree price discrimination in a model first presented in Bergemann, Brooks, and Morris [2015]. Since such price discrimination might create market segments with vastly different posted prices, we consider regulating these prices, specifically, via restricting them to lie within an interval. Given a price interval, we consider segmentations of the market where a seller, who is oblivious to the existence of such regulation, still posts prices within the price interval. We show the following surprising result: For any market and price interval where such segmentation is feasible, there is always a different segmentation that optimally transfers all excess surplus to the consumers. In addition, we characterize the entire space of buyer and seller surplus that are achievable by such segmentation, including maximizing seller surplus, and simultaneously minimizing buyer and seller surplus. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.05898 [pdf, other]

Async Learned User Embeddings for Ads Delivery Optimization

Authors: Mingwei Tang, Meng Liu, Hong Li, Junjie Yang, Chenglin Wei, Boyang Li, Dai Li, Rengan Xu, Yifan Xu, Zehua Zhang, Xiangyu Wang, Linfeng Liu, Yuelei Xie, Chengye Liu, Labib Fawaz, Li Li, Hongnan Wang, Bill Zhu, Sri Reddy

Abstract: In recommendation systems, high-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance. The effectiveness of recommendation systems depends on the quality of user embedding. We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based mul… ▽ More In recommendation systems, high-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance. The effectiveness of recommendation systems depends on the quality of user embedding. We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based multimodal user activities through a Transformer-like large scale feature learning module. The async learned user representations embeddings (ALURE) are further converted to user similarity graphs through graph learning and then combined with user realtime activities to retrieval highly related ads candidates for the ads delivery system. Our method shows significant gains in both offline and online experiments. △ Less

Submitted 23 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: Accepted by workshop on Multimodal Representation and Retrieval at SIGIR 2024, Washington DC

arXiv:2406.05862 [pdf, other]

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Authors: Ziqiang Liu, Feiteng Fang, Xi Feng, Xinrun Du, Chenhao Zhang, Zekun Wang, Yuelin Bai, Qixuan Zhao, Liyang Fan, Chengguang Gan, Hongquan Lin, Jiaming Li, Yuansheng Ni, Haihong Wu, Yaswanth Narsupalli, Zhigang Zheng, Chengming Li, Xiping Hu, Ruifeng Xu, Xiaojun Chen, Min Yang, Jiaheng Liu, Ruibo Liu, Wenhao Huang, Ge Zhang , et al. (1 additional authors not shown)

Abstract: The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap,… ▽ More The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap, we propose the Image Implication understanding Benchmark, II-Bench, which aims to evaluate the model's higher-order perception of images. Through extensive experiments on II-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on II-Bench. The pinnacle accuracy of MLLMs attains 74.8%, whereas human accuracy averages 90%, peaking at an impressive 98%. Subsequently, MLLMs perform worse on abstract and complex images, suggesting limitations in their ability to understand high-level semantics and capture image details. Finally, it is observed that most models exhibit enhanced accuracy when image sentiment polarity hints are incorporated into the prompts. This observation underscores a notable deficiency in their inherent understanding of image sentiment. We believe that II-Bench will inspire the community to develop the next generation of MLLMs, advancing the journey towards expert artificial general intelligence (AGI). II-Bench is publicly available at https://huggingface.co/datasets/m-a-p/II-Bench. △ Less

Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: 100 pages, 82 figures, add citations

arXiv:2406.05682 [pdf, other]

From Basic to Extra Features: Hypergraph Transformer Pretrain-then-Finetuning for Balanced Clinical Predictions on EHR

Authors: Ran Xu, Yiwen Lu, Chang Liu, Yong Chen, Yan Sun, Xiao Hu, Joyce C Ho, Carl Yang

Abstract: Electronic Health Records (EHRs) contain rich patient information and are crucial for clinical research and practice. In recent years, deep learning models have been applied to EHRs, but they often rely on massive features, which may not be readily available for all patients. We propose HTP-Star, which leverages hypergraph structures with a pretrain-then-finetune framework for modeling EHR data, e… ▽ More Electronic Health Records (EHRs) contain rich patient information and are crucial for clinical research and practice. In recent years, deep learning models have been applied to EHRs, but they often rely on massive features, which may not be readily available for all patients. We propose HTP-Star, which leverages hypergraph structures with a pretrain-then-finetune framework for modeling EHR data, enabling seamless integration of additional features. Additionally, we design two techniques, namely (1) Smoothness-inducing Regularization and (2) Group-balanced Reweighting, to enhance the model's robustness during fine-tuning. Through experiments conducted on two real EHR datasets, we demonstrate that HTP-Star consistently outperforms various baselines while striking a balance between patients with basic and extra features. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: CHIL 2024

arXiv:2406.05644 [pdf, other]

How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States

Authors: Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Yongbin Li

Abstract: Large language models (LLMs) rely on safety alignment to avoid responding to malicious user inputs. Unfortunately, jailbreak can circumvent safety guardrails, resulting in LLMs generating harmful content and raising concerns about LLM safety. Due to language models with intensive parameters often regarded as black boxes, the mechanisms of alignment and jailbreak are challenging to elucidate. In th… ▽ More Large language models (LLMs) rely on safety alignment to avoid responding to malicious user inputs. Unfortunately, jailbreak can circumvent safety guardrails, resulting in LLMs generating harmful content and raising concerns about LLM safety. Due to language models with intensive parameters often regarded as black boxes, the mechanisms of alignment and jailbreak are challenging to elucidate. In this paper, we employ weak classifiers to explain LLM safety through the intermediate hidden states. We first confirm that LLMs learn ethical concepts during pre-training rather than alignment and can identify malicious and normal inputs in the early layers. Alignment actually associates the early concepts with emotion guesses in the middle layers and then refines them to the specific reject tokens for safe generations. Jailbreak disturbs the transformation of early unethical classification into negative emotions. We conduct experiments on models from 7B to 70B across various model families to prove our conclusion. Overall, our paper indicates the intrinsical mechanism of LLM safety and how jailbreaks circumvent safety guardrails, offering a new perspective on LLM safety and reducing concerns. Our code is available at https://github.com/ydyjya/LLM-IHS-Explanation. △ Less

Submitted 13 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: 27 pages

arXiv:2406.03798 [pdf]

Optical biomarker of metabolism for breast tumor diagnosis: Insights from subcellular dynamics

Authors: Zichen Yin, Shuwei Zhang, Bin He, Houpu Yang, Zhengyu Chen, Zhangwei Hu, Yejiong Shi, Ruizhi Xue, Panqi Yang, Yuzhe Ying, Chengming Wang, Shu Wang, Ping Xue

Abstract: Label-free metabolic dynamics contrast is highly appealing but difficult to achieve in biomedical imaging. Interference offers a highly sensitive mechanism for capturing the metabolic dynamics of the subcellular scatterers. However, traditional interference detection methods fail to isolate pure metabolic dynamics, as the dynamic signals are coupled with scatterer reflectivity and other uncontroll… ▽ More Label-free metabolic dynamics contrast is highly appealing but difficult to achieve in biomedical imaging. Interference offers a highly sensitive mechanism for capturing the metabolic dynamics of the subcellular scatterers. However, traditional interference detection methods fail to isolate pure metabolic dynamics, as the dynamic signals are coupled with scatterer reflectivity and other uncontrollable imaging factors. Here, we demonstrate active phase modulation-assisted dynamic full-field optical coherence tomography (APMD-FFOCT) that decouples and quantifies the metabolic dynamics by adding a reference movement for all interferential scatterers. This novel technique enables imaging and dynamic analysis of subcellular structures along with their changes during the apoptotic process in tumor tissues. Furthermore, the nucleus-to-cytoplasm dynamic intensity ratio could serve as an optical biomarker for breast tumor grading, enhancing intraoperative diagnosis. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03498 [pdf, other]

GWnext 2024: Meeting Summary

Authors: Alejandro Torres-Orjuela, Veronica Vazquez-Aceves, Rui Xu, Jin-Hong Chen, Andrea Derdzinski, Matthias U. Kruckow, Stefano Rinaldi, Lorenzo Speri, Ziming Wang, Garvin Yim, Xue-Ting Zhang, Qian Hu, Miaoxin Liu, Xiangyu Lyu, Zheng Wu, Cong Zhou, Manuel Arca Sedda, Yan-Chen Bi, Hong-Yu Chen, Xian Chen, Jiageng Jiao, Yu-Mei Wu

Abstract: GWnext 2024 was a meeting held in the Kavli Institute for Astronomy and Astrophysics at Peking University in March $4^\text{th} - 8^\text{th}$, 2024. In the meeting researchers at different career stages -- with a particular focus on early career scientists -- working on the different aspects of gravitational wave (GW) astronomy gathered to discuss the current status as well as prospects of the fi… ▽ More GWnext 2024 was a meeting held in the Kavli Institute for Astronomy and Astrophysics at Peking University in March $4^\text{th} - 8^\text{th}$, 2024. In the meeting researchers at different career stages -- with a particular focus on early career scientists -- working on the different aspects of gravitational wave (GW) astronomy gathered to discuss the current status as well as prospects of the field. The meeting was divided into three core sessions: Astrophysics, GW Theory, and Detection. Each session consisted of introductory talks and extended discussion sessions. Moreover, there was a poster session where students could present their results. In this paper, we summarize the results presented during the meeting and present the most important outcomes. △ Less

Submitted 27 May, 2024; originally announced June 2024.

arXiv:2406.02911 [pdf, other]

Improving In-Context Learning with Prediction Feedback for Sentiment Analysis

Authors: Hongling Xu, Qianlong Wang, Yice Zhang, Min Yang, Xi Zeng, Bing Qin, Ruifeng Xu

Abstract: Large language models (LLMs) have achieved promising results in sentiment analysis through the in-context learning (ICL) paradigm. However, their ability to distinguish subtle sentiments still remains a challenge. Inspired by the human ability to adjust understanding via feedback, this paper enhances ICL by incorporating prior predictions and feedback, aiming to rectify sentiment misinterpretation… ▽ More Large language models (LLMs) have achieved promising results in sentiment analysis through the in-context learning (ICL) paradigm. However, their ability to distinguish subtle sentiments still remains a challenge. Inspired by the human ability to adjust understanding via feedback, this paper enhances ICL by incorporating prior predictions and feedback, aiming to rectify sentiment misinterpretation of LLMs. Specifically, the proposed framework consists of three steps: (1) acquiring prior predictions of LLMs, (2) devising predictive feedback based on correctness, and (3) leveraging a feedback-driven prompt to refine sentiment understanding. Experimental results across nine sentiment analysis datasets demonstrate the superiority of our framework over conventional ICL methods, with an average F1 improvement of 5.95%. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted by ACL 2024 (Findings)

arXiv:2406.02864 [pdf, other]

NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models

Authors: Ancheng Xu, Minghuan Tan, Lei Wang, Min Yang, Ruifeng Xu

Abstract: Numeral systems and units of measurement are two conjoined topics in activities of human beings and have mutual effects with the languages expressing them. Currently, the evaluation of Large Language Models (LLMs) often involves mathematical reasoning, yet little attention is given to how minor changes in numbers or units can drastically alter the complexity of problems and the performance of LLMs… ▽ More Numeral systems and units of measurement are two conjoined topics in activities of human beings and have mutual effects with the languages expressing them. Currently, the evaluation of Large Language Models (LLMs) often involves mathematical reasoning, yet little attention is given to how minor changes in numbers or units can drastically alter the complexity of problems and the performance of LLMs. In this paper, we scrutinize existing LLMs on processing of numerals and units of measurement by constructing datasets with perturbations. We first anatomize the reasoning of math word problems to different sub-procedures like numeral conversions from language to numbers and measurement conversions based on units. Then we further annotate math word problems from ancient Chinese arithmetic works which are challenging in numerals and units of measurement. Experiments on perturbed datasets demonstrate that LLMs still encounter difficulties in handling numeral and measurement conversions. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Findings of ACL 2024

arXiv:2406.02370 [pdf, other]

Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning

Authors: Jiaxu Wang, Ziyi Zhang, Qiang Zhang, Jia Li, Jingkai Sun, Mingyuan Sun, Junhao He, Renjing Xu

Abstract: Latent scene representation plays a significant role in training reinforcement learning (RL) agents. To obtain good latent vectors describing the scenes, recent works incorporate the 3D-aware latent-conditioned NeRF pipeline into scene representation learning. However, these NeRF-related methods struggle to perceive 3D structural information due to the inefficient dense sampling in volumetric rend… ▽ More Latent scene representation plays a significant role in training reinforcement learning (RL) agents. To obtain good latent vectors describing the scenes, recent works incorporate the 3D-aware latent-conditioned NeRF pipeline into scene representation learning. However, these NeRF-related methods struggle to perceive 3D structural information due to the inefficient dense sampling in volumetric rendering. Moreover, they lack fine-grained semantic information included in their scene representation vectors because they evenly consider free and occupied spaces. Both of them can destroy the performance of downstream RL tasks. To address the above challenges, we propose a novel framework that adopts the efficient 3D Gaussian Splatting (3DGS) to learn 3D scene representation for the first time. In brief, we present the Query-based Generalizable 3DGS to bridge the 3DGS technique and scene representations with more geometrical awareness than those in NeRFs. Moreover, we present the Hierarchical Semantics Encoding to ground the fine-grained semantic features to 3D Gaussians and further distilled to the scene representation vectors. We conduct extensive experiments on two RL platforms including Maniskill2 and Robomimic across 10 different tasks. The results show that our method outperforms the other 5 baselines by a large margin. We achieve the best success rates on 8 tasks and the second-best on the other two tasks. △ Less

Submitted 9 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.02318 [pdf, other]

PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection

Authors: Ronghui Xu, Hao Miao, Senzhang Wang, Philip S. Yu, Jianxin Wang

Abstract: With the proliferation of mobile sensing techniques, huge amounts of time series data are generated and accumulated in various domains, fueling plenty of real-world applications. In this setting, time series anomaly detection is practically important. It endeavors to identify deviant samples from the normal sample distribution in time series. Existing approaches generally assume that all the time… ▽ More With the proliferation of mobile sensing techniques, huge amounts of time series data are generated and accumulated in various domains, fueling plenty of real-world applications. In this setting, time series anomaly detection is practically important. It endeavors to identify deviant samples from the normal sample distribution in time series. Existing approaches generally assume that all the time series is available at a central location. However, we are witnessing the decentralized collection of time series due to the deployment of various edge devices. To bridge the gap between the decentralized time series data and the centralized anomaly detection algorithms, we propose a Parameter-efficient Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns. PeFAD for the first time employs the pre-trained language model (PLM) as the body of the client's local model, which can benefit from its cross-modality knowledge transfer capability. To reduce the communication overhead and local model adaptation cost, we propose a parameter-efficient federated training module such that clients only need to fine-tune small-scale parameters and transmit them to the server for update. PeFAD utilizes a novel anomaly-driven mask selection strategy to mitigate the impact of neglected anomalies during training. A knowledge distillation operation on a synthetic privacy-preserving dataset that is shared by all the clients is also proposed to address the data heterogeneity issue across clients. We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%. △ Less

Submitted 4 July, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by SIGKDD 2024 (Research Track)

arXiv:2406.02013 [pdf, other]

Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning

Authors: Jiahang Cao, Qiang Zhang, Ziqing Wang, Jiaxu Wang, Hao Cheng, Yecheng Shao, Wen Zhao, Gang Han, Yijie Guo, Renjing Xu

Abstract: Sequential modeling has demonstrated remarkable capabilities in offline reinforcement learning (RL), with Decision Transformer (DT) being one of the most notable representatives, achieving significant success. However, RL trajectories possess unique properties to be distinguished from the conventional sequence (e.g., text or audio): (1) local correlation, where the next states in RL are theoretica… ▽ More Sequential modeling has demonstrated remarkable capabilities in offline reinforcement learning (RL), with Decision Transformer (DT) being one of the most notable representatives, achieving significant success. However, RL trajectories possess unique properties to be distinguished from the conventional sequence (e.g., text or audio): (1) local correlation, where the next states in RL are theoretically determined solely by current states and actions based on the Markov Decision Process (MDP), and (2) global correlation, where each step's features are related to long-term historical information due to the time-continuous nature of trajectories. In this paper, we propose a novel action sequence predictor, named Mamba Decision Maker (MambaDM), where Mamba is expected to be a promising alternative for sequence modeling paradigms, owing to its efficient modeling of multi-scale dependencies. In particular, we introduce a novel mixer module that proficiently extracts and integrates both global and local features of the input sequence, effectively capturing interrelationships in RL datasets. Extensive experiments demonstrate that MambaDM achieves state-of-the-art performance in Atari and OpenAI Gym datasets. Furthermore, we empirically investigate the scaling laws of MambaDM, finding that increasing model size does not bring performance improvement, but scaling the dataset amount by 2x for MambaDM can obtain up to 33.7% score improvement on Atari dataset. This paper delves into the sequence modeling capabilities of MambaDM in the RL domain, paving the way for future advancements in robust and efficient decision-making systems. Our code will be available at https://github.com/AndyCao1125/MambaDM. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 16 pages, 5 figures

arXiv:2406.01885 [pdf, other]

Nonlinear Eigen-approach ADMM for Sparse Optimization on Stiefel Manifold

Authors: Jiawei Wang, Rencang Li, Richard Yi Da Xu

Abstract: With the growing interest and applications in machine learning and data science, finding an efficient method to sparse analysis the high-dimensional data and optimizing a dimension reduction model to extract lower dimensional features has becoming more and more important. Orthogonal constraints (Stiefel manifold) is a commonly met constraint in these applications, and the sparsity is usually enfor… ▽ More With the growing interest and applications in machine learning and data science, finding an efficient method to sparse analysis the high-dimensional data and optimizing a dimension reduction model to extract lower dimensional features has becoming more and more important. Orthogonal constraints (Stiefel manifold) is a commonly met constraint in these applications, and the sparsity is usually enforced through the element-wise L1 norm. Many applications can be found on optimization over Stiefel manifold within the area of physics and machine learning. In this paper, we propose a novel idea by tackling the Stiefel manifold through an nonlinear eigen-approach by first using ADMM to split the problem into smooth optimization over manifold and convex non-smooth optimization, and then transforming the former into the form of nonlinear eigenvalue problem with eigenvector dependency (NEPv) which is solved by self-consistent field (SCF) iteration, and the latter can be found to have an closed-form solution through proximal gradient. Compared with existing methods, our proposed algorithm takes the advantage of specific structure of the objective function, and has efficient convergence results under mild assumptions. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01512 [pdf, other]

MAD: Multi-Alignment MEG-to-Text Decoding

Authors: Yiqian Yang, Hyejeong Jo, Yiqun Duan, Qiang Zhang, Jinni Zhou, Won Hee Lee, Renjing Xu, Hui Xiong

Abstract: Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and magnetoencephalography (MEG) are becoming increasingly popular due to their safety and practicality, avoiding invasive electrode implantation. However, current works under-investigated three points: 1) a predomi… ▽ More Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and magnetoencephalography (MEG) are becoming increasingly popular due to their safety and practicality, avoiding invasive electrode implantation. However, current works under-investigated three points: 1) a predominant focus on EEG with limited exploration of MEG, which provides superior signal quality; 2) poor performance on unseen text, indicating the need for models that can better generalize to diverse linguistic contexts; 3) insufficient integration of information from other modalities, which could potentially constrain our capacity to comprehensively understand the intricate dynamics of brain activity. This study presents a novel approach for translating MEG signals into text using a speech-decoding framework with multiple alignments. Our method is the first to introduce an end-to-end multi-alignment framework for totally unseen text generation directly from MEG signals. We achieve an impressive BLEU-1 score on the $\textit{GWilliams}$ dataset, significantly outperforming the baseline from 5.49 to 10.44 on the BLEU-1 metric. This improvement demonstrates the advancement of our model towards real-world applications and underscores its potential in advancing BCI research. Code is available at $\href{https://github.com/NeuSpeech/MAD-MEG2text}{https://github.com/NeuSpeech/MAD-MEG2text}$. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00988 [pdf, other]

ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation

Authors: Dengke Han, Meng Wu, Runzhen Xue, Mingyu Yan, Xiaochun Ye, Dongrui Fan

Abstract: Heterogeneous Graph Neural Networks (HGNNs) have recently demonstrated great power in handling heterogeneous graph data, rendering them widely applied in many critical real-world domains. Most HGNN models leverage attention mechanisms to significantly improvemodel accuracy, albeit at the cost of increased computational complexity and memory bandwidth requirements. Fortunately, the attention dispar… ▽ More Heterogeneous Graph Neural Networks (HGNNs) have recently demonstrated great power in handling heterogeneous graph data, rendering them widely applied in many critical real-world domains. Most HGNN models leverage attention mechanisms to significantly improvemodel accuracy, albeit at the cost of increased computational complexity and memory bandwidth requirements. Fortunately, the attention disparity from source vertices towards a common target vertex unveils an opportunity to boost the model execution performance by pruning unimportant source vertices during neighbor aggregation. In this study, we commence with a quantitative analysis of the attention disparity in HGNN models, where the importance of different source vertices varies for the same target vertex. To fully exploit this finding for inference acceleration, we propose a runtime pruning method based on min-heap and map it to a dedicated hardware pruner to discard unimportant vertices. Given that the pruning overhead itself is non-negligible and cannot be amortized by conventional staged execution paradigm, an operation-fusion execution fow of HGNNs is introduced to overlap the pruning overhead while harnessing inter-stage parallelism. Finally, we present the design of a novel HGNN accelerator, ADE-HGNN, tailored to support the proposed execution framework. Our experimental results demonstrate that ADE-HGNN achieves an average performance improvement of 28.21x over the NVIDIA GPU T4 platform and 7.98x over the advanced GPU A100, with the inference accuracy loss kept within a negligible range of 0.11%~1.47%. Furthermore, ADE-HGNN significantly reduces energy consumption to 1.97% and 5.37% of the two platforms, respectively. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 15 pages, 9 figures, accepted by Euro-PAR 2024

arXiv:2406.00613 [pdf, other]

Compact dwarfs made of light-quark nuggets

Authors: Hao-Song You, Hao Sun, Hong-Bo Li, Cheng-Jun Xia, Ren-Xin Xu

Abstract: Utilizing an equivparticle model with both linear confinement and leading-order perturbative interactions, we obtain systematically the properties of strangelets and nonstrange quark matter ($ud$QM) nuggets at various baryon ($A$) and charge ($Z$) numbers, where the detailed single-quark-energy levels are fixed by solving Dirac equations in mean-field approximation (MFA). We then examine the struc… ▽ More Utilizing an equivparticle model with both linear confinement and leading-order perturbative interactions, we obtain systematically the properties of strangelets and nonstrange quark matter ($ud$QM) nuggets at various baryon ($A$) and charge ($Z$) numbers, where the detailed single-quark-energy levels are fixed by solving Dirac equations in mean-field approximation (MFA). We then examine the structures of compact dwarfs made of light strangelets or $ud$QM nuggets forming body-centered cubic lattices in a uniform electron background. Despite the strangelets and $ud$QM nuggets generally become more stable at larger $A$, the compact dwarfs are still stable since the fusion reactions between those objects do not take place in the presence of a Coulomb barrier, which is similar to the cases of light nuclei in normal white dwarfs. If $ud$QM dwarfs or strangelet dwarfs are covered with normal matter, their masses and radii become larger but do not exceed those of ordinary white dwarfs. Finally, we investigate the radial oscillation frequencies of $ud$QM dwarfs and strangelet dwarfs, and find that their frequencies are typically higher than traditional white dwarfs. The stability of compact dwarfs are then analysised by examining radial oscillation frequencies of the fundamental mode, where compact dwarfs covered by normal matter are still stable. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00283 [pdf, other]

High priority targets for transient gravitational waves from glitching pulsars

Authors: Garvin Yim, Lijing Shao, Renxin Xu

Abstract: Glitching pulsars are expected to be important sources of gravitational waves. In this paper, we explore six different models that propose the emission of transient continuous waves, lasting days to months, coincident with glitches. The maximal gravitational wave energy is calculated for each model, which is then used to determine whether associated gravitational waves could be detectable with LIG… ▽ More Glitching pulsars are expected to be important sources of gravitational waves. In this paper, we explore six different models that propose the emission of transient continuous waves, lasting days to months, coincident with glitches. The maximal gravitational wave energy is calculated for each model, which is then used to determine whether associated gravitational waves could be detectable with LIGO-Virgo-KAGRA's O4 detectors. We provide an analytical approximation to calculate the signal-to-noise ratio which includes information about the source's sky position, improving on previous estimates that assume isotropic or sky and orientation averaged sensitivities. Applying the calculation to the entire glitching population, we find that certain models predict detectable signals in O4, whereas others do not. We also rank glitching pulsars in order of how significant a signal would be, based on archival data, and we find that for all models, the Vela pulsar (PSR J0835$-$4510) would provide the strongest signal. Moreover, PSR J0537$-$6910 is not expected to yield a detectable signal in O4, but will start becoming relevant for next generation detectors. Our analysis also extends to the entire pulsar population, regardless of whether they have glitched or not, and we provide a list of pulsars that would present a significant signal, if they were to glitch. Finally, we apply our analysis to the latest April 2024 Vela glitch and find that a signal should be detectable under certain models. The non-detection of a supposedly detectable signal would provide an efficiency factor that quantifies how much a model can contribute to gravitational wave emission, eventually leading to a differentiation of models and independent constraints on physical parameters. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 20 pages, 8 figures, 7 tables, 3 appendices

arXiv:2405.20978 [pdf, other]

Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training

Authors: Feiteng Fang, Yuelin Bai, Shiwen Ni, Min Yang, Xiaojun Chen, Ruifeng Xu

Abstract: Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges. However, inappropriate retrieved passages can potentially hinder the LLMs' capac… ▽ More Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges. However, inappropriate retrieved passages can potentially hinder the LLMs' capacity to generate comprehensive and high-quality responses. Prior RAG studies on the robustness of retrieval noises often confine themselves to a limited set of noise types, deviating from real-world retrieval environments and limiting practical applicability. In this study, we initially investigate retrieval noises and categorize them into three distinct types, reflecting real-world environments. We analyze the impact of these various retrieval noises on the robustness of LLMs. Subsequently, we propose a novel RAG approach known as Retrieval-augmented Adaptive Adversarial Training (RAAT). RAAT leverages adaptive adversarial training to dynamically adjust the model's training process in response to retrieval noises. Concurrently, it employs multi-task learning to ensure the model's capacity to internally recognize noisy contexts. Extensive experiments demonstrate that the LLaMA-2 7B model trained using RAAT exhibits significant improvements in F1 and EM scores under diverse noise conditions. For reproducibility, we release our code and data at: https://github.com/calubkk/RAAT. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Journal ref: ACL 2024, Main Conference

arXiv:2405.20902 [pdf, other]

Preemptive Answer "Attacks" on Chain-of-Thought Reasoning

Authors: Rongwu Xu, Zehan Qi, Wei Xu

Abstract: Large language models (LLMs) showcase impressive reasoning capabilities when coupled with Chain-of-Thought (CoT) prompting. However, the robustness of this approach warrants further investigation. In this paper, we introduce a novel scenario termed preemptive answers, where the LLM obtains an answer before engaging in reasoning. This situation can arise inadvertently or induced by malicious users… ▽ More Large language models (LLMs) showcase impressive reasoning capabilities when coupled with Chain-of-Thought (CoT) prompting. However, the robustness of this approach warrants further investigation. In this paper, we introduce a novel scenario termed preemptive answers, where the LLM obtains an answer before engaging in reasoning. This situation can arise inadvertently or induced by malicious users by prompt injection attacks. Experiments reveal that preemptive answers significantly impair the model's reasoning capability across various CoT methods and a broad spectrum of datasets. To bolster the robustness of reasoning, we propose two measures aimed at mitigating this issue to some extent. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: Accepted to ACL'24 (Findings). Camera-ready version

arXiv:2405.20090 [pdf, other]

Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models

Authors: Hao Cheng, Erjia Xiao, Jiahang Cao, Le Yang, Kaidi Xu, Jindong Gu, Renjing Xu

Abstract: Following the advent of the Artificial Intelligence (AI) era of large models, Multimodal Large Language Models (MLLMs) with the ability to understand cross-modal interactions between vision and text have attracted wide attention. Adversarial examples with human-imperceptible perturbation are shown to possess a characteristic known as transferability, which means that a perturbation generated by on… ▽ More Following the advent of the Artificial Intelligence (AI) era of large models, Multimodal Large Language Models (MLLMs) with the ability to understand cross-modal interactions between vision and text have attracted wide attention. Adversarial examples with human-imperceptible perturbation are shown to possess a characteristic known as transferability, which means that a perturbation generated by one model could also mislead another different model. Augmenting the diversity in input data is one of the most significant methods for enhancing adversarial transferability. This method has been certified as a way to significantly enlarge the threat impact under black-box conditions. Research works also demonstrate that MLLMs can be exploited to generate adversarial examples in the white-box scenario. However, the adversarial transferability of such perturbations is quite limited, failing to achieve effective black-box attacks across different models. In this paper, we propose the Typographic-based Semantic Transfer Attack (TSTA), which is inspired by: (1) MLLMs tend to process semantic-level information; (2) Typographic Attack could effectively distract the visual information captured by MLLMs. In the scenarios of Harmful Word Insertion and Important Information Protection, our TSTA demonstrates superior performance. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Showing 1–50 of 1,572 results for author: Xu, R