Skip to main content

Showing 1–50 of 489 results for author: Yan, R

  1. arXiv:2407.06677  [pdf, other

    cs.CL

    Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules

    Authors: Zhuocheng Gong, Ang Lv, Jian Guan, Junxi Yan, Wei Wu, Huishuai Zhang, Minlie Huang, Dongyan Zhao, Rui Yan

    Abstract: Is it always necessary to compute tokens from shallow to deep layers in Transformers? The continued success of vanilla Transformers and their variants suggests an undoubted "yes". In this work, however, we attempt to break the depth-ordered convention by proposing a novel architecture dubbed mixture-of-modules (MoM), which is motivated by an intuition that any layer, regardless of its position, ca… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  2. arXiv:2407.05246  [pdf, other

    cs.LG cs.CV

    Deep Probability Aggregation Clustering

    Authors: Yuxuan Yan, Na Lu, Ruofan Yan

    Abstract: Combining machine clustering with deep models has shown remarkable superiority in deep clustering. It modifies the data processing pipeline into two alternating phases: feature clustering and model training. However, such alternating schedule may lead to instability and computational burden issues. We propose a centerless clustering algorithm called Probability Aggregation Clustering (PAC) to proa… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 19 pages,2 figures, conference

  3. arXiv:2407.01601  [pdf, other

    cs.LG cs.AI

    Unveiling and Controlling Anomalous Attention Distribution in Transformers

    Authors: Ruiqing Yan, Xingbo Du, Haoyu Deng, Linghan Zheng, Qiuzhuang Sun, Jifang Hu, Yuhang Shao, Penghao Jiang, Jinrong Jiang, Lian Zhao

    Abstract: With the advent of large models based on the Transformer architecture, researchers have observed an anomalous phenomenon in the Attention mechanism--there is a very high attention on the first element, which is prevalent across Transformer-based models. It is crucial to understand it for the development of techniques focusing on attention distribution, such as Key-Value (KV) Cache compression and… ▽ More

    Submitted 3 July, 2024; v1 submitted 26 June, 2024; originally announced July 2024.

  4. arXiv:2407.01564  [pdf

    econ.GN

    Decarbonization analysis on residential end uses in the emerging economies

    Authors: Ran Yan, Minda Ma

    Abstract: This study explores the historical emission patterns and decarbonization efforts of China and India, the largest emerging emitters in residential building operations. Using a novel carbon intensity model and structural decomposition approach, it assesses the operational decarbonization progress over the past two decades. Results show significant decarbonization, with China and India collectively r… ▽ More

    Submitted 17 May, 2024; originally announced July 2024.

    Comments: 7 pages, 5 figures. arXiv admin note: substantial text overlap with arXiv:2306.13858

  5. arXiv:2407.00993  [pdf, other

    cs.AI cs.CL

    Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents

    Authors: Shihan Deng, Weikai Xu, Hongda Sun, Wei Liu, Tao Tan, Jianfeng Liu, Ang Li, Jian Luan, Bin Wang, Rui Yan, Shuo Shang

    Abstract: With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  6. arXiv:2406.19934  [pdf, other

    cs.CL cs.AI

    From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis

    Authors: Chuanqi Cheng, Jian Guan, Wei Wu, Rui Yan

    Abstract: We explore multi-step reasoning in vision-language models (VLMs). The problem is challenging, as reasoning data consisting of multiple steps of visual and language processing are barely available. To overcome the challenge, we first introduce a least-to-most visual reasoning paradigm, which interleaves steps of decomposing a question into sub-questions and invoking external tools for resolving sub… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  7. arXiv:2406.19853  [pdf, other

    cs.CL cs.AI

    YuLan: An Open-source Large Language Model

    Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  8. arXiv:2406.19598  [pdf, other

    cs.CL

    Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

    Authors: Hongzhan Lin, Ang Lv, Yuhan Chen, Chen Zhu, Yang Song, Hengshu Zhu, Rui Yan

    Abstract: Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions.Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging.In this paper, for LLMs utili… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 14 pages, 5 figures

  9. arXiv:2406.10629  [pdf, ps, other

    quant-ph

    m-QMDS codes over mixed alphabets via orthogonal arrays

    Authors: Shanqi Pang, Mengqian Chen, Rong Yan, Yan Zhu

    Abstract: The construction of quantum error-correcting codes (QECCs) with good parameters is a hot topic in the area of quantum information and quantum computing. Quantum maximum distance separable (QMDS) codes are optimal because the minimum distance cannot be improved for a given length and code size. The QMDS codes over mixed alphabets are rarely known even if the existence and construction of QECCs over… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  10. arXiv:2406.06517  [pdf, other

    cs.CV

    Genomics-guided Representation Learning for Pathologic Pan-cancer Tumor Microenvironment Subtype Prediction

    Authors: Fangliangzi Meng, Hongrun Zhang, Ruodan Yan, Guohui Chuai, Chao Li, Qi Liu

    Abstract: The characterization of Tumor MicroEnvironment (TME) is challenging due to its complexity and heterogeneity. Relatively consistent TME characteristics embedded within highly specific tissue features, render them difficult to predict. The capability to accurately classify TME subtypes is of critical significance for clinical tumor diagnosis and precision medicine. Based on the observation that tumo… ▽ More

    Submitted 8 July, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: MICCAI2024

  11. arXiv:2406.06434  [pdf, ps, other

    eess.IV cs.CV

    Spatiotemporal Graph Neural Network Modelling Perfusion MRI

    Authors: Ruodan Yan, Carola-Bibiane Schönlieb, Chao Li

    Abstract: Perfusion MRI (pMRI) offers valuable insights into tumor vascularity and promises to predict tumor genotypes, thus benefiting prognosis for glioma patients, yet effective models tailored to 4D pMRI are still lacking. This study presents the first attempt to model 4D pMRI using a GNN-based spatiotemporal model PerfGAT, integrating spatial information and temporal kinetics to predict Isocitrate DeHy… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 11 pages, 2 figures

  12. arXiv:2406.05797  [pdf, other

    q-bio.BM cs.AI cs.CE cs.CL cs.LG

    3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization

    Authors: Qizhi Pei, Lijun Wu, Kaiyuan Gao, Jinhua Zhu, Rui Yan

    Abstract: The integration of molecule and language has garnered increasing attention in molecular science. Recent advancements in Language Models (LMs) have demonstrated potential for the comprehensive modeling of molecule and language. However, existing works exhibit notable limitations. Most existing works overlook the modeling of 3D information, which is crucial for understanding molecular structures and… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 18 pages

  13. arXiv:2406.05360  [pdf, other

    cs.CL

    Flexible and Adaptable Summarization via Expertise Separation

    Authors: Xiuying Chen, Mingzhe Li, Shen Gao, Xin Cheng, Qingqing Zhu, Rui Yan, Xin Gao, Xiangliang Zhang

    Abstract: A proficient summarization model should exhibit both flexibility -- the capacity to handle a range of in-domain summarization tasks, and adaptability -- the competence to acquire new knowledge and adjust to unseen out-of-domain tasks. Unlike large language models (LLMs) that achieve this through parameter scaling, we propose a more parameter-efficient approach in this study. Our motivation rests o… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages, 7 figures, published in SIGIR 2024

  14. arXiv:2406.04074  [pdf

    econ.GN

    Estimation of Global Building Stocks by 2070: Unlocking Renovation Potential

    Authors: Shufan Zhang, Minda Ma, Nan Zhou, Jinyue Yan, Wei Feng, Ran Yan, Kairui You, Jingjing Zhang, Jing Ke

    Abstract: Buildings produce one-third of carbon emissions globally, however, data absence regarding global floorspace poses challenges in advancing building carbon neutrality. We compile the measured building stocks for 14 major economies and apply our global building stock model, GLOBUS, to evaluate future trends in stock turnover. Based on a scenario not considering renovation, by 2070 the building stock… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 25 pages, 4 figures

  15. arXiv:2406.03894  [pdf, other

    cs.LG

    Transductive Off-policy Proximal Policy Optimization

    Authors: Yaozhong Gan, Renye Yan, Xiaoyang Tan, Zhe Wu, Junliang Xing

    Abstract: Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies is constrained. This paper introduces a novel off-policy extension to the original PPO method, christened Transductive Off-policy PPO (ToPPO). Herein, we provi… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 18

  16. arXiv:2406.03678  [pdf, other

    cs.LG cs.AI stat.ML

    Reflective Policy Optimization

    Authors: Yaozhong Gan, Renye Yan, Zhe Wu, Junliang Xing

    Abstract: On-policy reinforcement learning methods, like Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), often demand extensive data per update, leading to sample inefficiency. This paper introduces Reflective Policy Optimization (RPO), a novel on-policy extension that amalgamates past and future state-action information for policy optimization. This approach empowers the age… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 20 pages

  17. arXiv:2406.03075  [pdf, other

    cs.CL

    Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

    Authors: Xiaoxi Sun, Jinpeng Li, Yan Zhong, Dongyan Zhao, Rui Yan

    Abstract: The advent of large language models (LLMs) has facilitated the development of natural language text generation. It also poses unprecedented challenges, with content hallucination emerging as a significant concern. Existing solutions often involve expensive and complex interventions during the training process. Moreover, some approaches emphasize problem disassembly while neglecting the crucial val… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 18 pages, 3 figures

  18. arXiv:2406.03002  [pdf, other

    eess.IV cs.CV

    Phy-Diff: Physics-guided Hourglass Diffusion Model for Diffusion MRI Synthesis

    Authors: Juanhua Zhang, Ruodan Yan, Alessandro Perelli, Xi Chen, Chao Li

    Abstract: Diffusion MRI (dMRI) is an important neuroimaging technique with high acquisition costs. Deep learning approaches have been used to enhance dMRI and predict diffusion biomarkers through undersampled dMRI. To generate more comprehensive raw dMRI, generative adversarial network based methods are proposed to include b-values and b-vectors as conditions, but they are limited by unstable training and l… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI 2024

  19. arXiv:2406.00672  [pdf, other

    cs.CV

    Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification

    Authors: Xuenian Wang, Shanshan Shi, Renao Yan, Qiehe Sun, Lianghui Zhu, Tian Guan, Yonghong He

    Abstract: In the field of whole slide image (WSI) classification, multiple instance learning (MIL) serves as a promising approach, commonly decoupled into feature extraction and aggregation. In this paradigm, our observation reveals that discriminative embeddings are crucial for aggregation to the final prediction. Among all feature updating strategies, task-oriented ones can capture characteristics specifi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  20. arXiv:2405.20343  [pdf, other

    cs.CV cs.GR cs.LG

    Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

    Authors: Kailu Wu, Fangfu Liu, Zhihan Cai, Runjie Yan, Hanyang Wang, Yating Hu, Yueqi Duan, Kaisheng Ma

    Abstract: In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability. Previous methods based on Score Distillation Sampling (SDS) can produce diversified 3D results by distilling 3D knowledge from large 2D diffusion models, but they usually suffer from… ▽ More

    Submitted 13 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Project page: https://wukailu.github.io/Unique3D

    ACM Class: I.2.10

  21. arXiv:2405.18113  [pdf, other

    cs.CL cs.AI

    Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting

    Authors: Hongda Sun, Hongzhan Lin, Haiyu Yan, Chen Zhu, Yang Song, Xin Gao, Shuo Shang, Rui Yan

    Abstract: The emergence of online recruitment services has revolutionized the traditional landscape of job seeking and recruitment, necessitating the development of high-quality industrial applications to improve person-job fitting. Existing methods generally rely on modeling the latent semantics of resumes and job descriptions and learning a matching function between them. Inspired by the powerful role-pla… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  22. arXiv:2405.16914  [pdf, other

    cs.SE

    Rigorous Simulation-based Testing for Autonomous Driving Systems -- Targeting the Achilles' Heel of Four Open Autopilots

    Authors: Changwen Li, Joseph Sifakis, Rongjie Yan, Jian Zhang

    Abstract: Simulation-based testing remains the main approach for validating Autonomous Driving Systems. We propose a rigorous test method based on breaking down scenarios into simple ones, taking into account the fact that autopilots make decisions according to traffic rules whose application depends on local knowledge and context. This leads us to consider the autopilot as a dynamic system receiving three… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 57 pages, 19 figures, 29 tables

  23. arXiv:2405.15182  [pdf, other

    cs.CR cs.AI

    RFLPA: A Robust Federated Learning Framework against Poisoning Attacks with Secure Aggregation

    Authors: Peihua Mai, Ran Yan, Yan Pang

    Abstract: Federated learning (FL) allows multiple devices to train a model collaboratively without sharing their data. Despite its benefits, FL is vulnerable to privacy leakage and poisoning attacks. To address the privacy concern, secure aggregation (SecAgg) is often used to obtain the aggregation of gradients on sever without inspecting individual user updates. Unfortunately, existing defense strategies a… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 22 pages

    ACM Class: E.4

  24. arXiv:2405.13810  [pdf, other

    cs.LG cs.AI

    Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

    Authors: Xin Cheng, Xiuying Chen, Shuqi Li, Di Luo, Xun Wang, Dongyan Zhao, Rui Yan

    Abstract: Time series prediction is crucial for understanding and forecasting complex dynamics in various domains, ranging from finance and economics to climate and healthcare. Based on Transformer architecture, one approach involves encoding multiple variables from the same timestamp into a single temporal token to model global dependencies. In contrast, another approach embeds the time points of individua… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  25. arXiv:2405.13432  [pdf, other

    cs.CL cs.AI

    Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

    Authors: Tingchen Fu, Deng Cai, Lemao Liu, Shuming Shi, Rui Yan

    Abstract: Supervised fine-tuning (SFT) on instruction-following corpus is a crucial approach toward the alignment of large language models (LLMs). However, the performance of LLMs on standard knowledge and reasoning benchmarks tends to suffer from deterioration at the latter stage of the SFT process, echoing the phenomenon of alignment tax. Through our pilot study, we put a hypothesis that the data biases a… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted to the findings of ACL2024

  26. arXiv:2405.11315  [pdf, other

    cs.CV

    MediCLIP: Adapting CLIP for Few-shot Medical Image Anomaly Detection

    Authors: Ximiao Zhang, Min Xu, Dehui Qiu, Ruixin Yan, Ning Lang, Xiuzhuang Zhou

    Abstract: In the field of medical decision-making, precise anomaly detection in medical imaging plays a pivotal role in aiding clinicians. However, previous work is reliant on large-scale datasets for training anomaly detection models, which increases the development cost. This paper first focuses on the task of medical image anomaly detection in the few-shot setting, which is critically significant for the… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 12 pages, 3 figures, 5 tables, early accepted at MICCAI 2024

  27. arXiv:2405.10988  [pdf, other

    cs.LG cs.AI

    Flow Score Distillation for Diverse Text-to-3D Generation

    Authors: Runjie Yan, Kailu Wu, Kaisheng Ma

    Abstract: Recent advancements in Text-to-3D generation have yielded remarkable progress, particularly through methods that rely on Score Distillation Sampling (SDS). While SDS exhibits the capability to create impressive 3D assets, it is hindered by its inherent maximum-likelihood-seeking essence, resulting in limited diversity in generation outcomes. In this paper, we discover that the Denoise Diffusion Im… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  28. arXiv:2405.07187  [pdf, ps, other

    physics.plasm-ph

    Two-Plasmon-Decay Instability Stimulated by a Normal- and Large-Angle-Incidence Laser Pair

    Authors: C. -W. Lian, Y. Ji, R. Yan, J. Li, S. -H. Cao, C. Ren, L. -F. Wang, Y. -K. Ding, J. Zheng

    Abstract: The two-plasmon-decay instability (TPD) is a critical target preheating risk in direct-drive inertial confinement fusion. In this paper, TPD collectively driven by a normal-incidence laser beam (Beam-N) and a large-angle-incidence laser beam (Beam-L) is investigated via particle-in-cell simulations. Significant TPD growth is found able to develop in this regime at previously unexpected low laser i… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 16 pages, 5 figures, submitted

  29. arXiv:2405.02538  [pdf, other

    cs.CV

    AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition

    Authors: Meiqi Cao, Rui Yan, Xiangbo Shu, Guangzhao Dai, Yazhou Yao, Guo-Sen Xie

    Abstract: Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes, including individual activities, group activities, and global activities. Previous methods 1) heavily rely on manually annotated detection boxes in training and inference, hindering further practical deployment; or 2) directly employ normal detectors to detect multip… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  30. arXiv:2405.02460  [pdf, other

    astro-ph.GA

    Asymmetric drift in MaNGA: Mass and radially-dependent stratification rates in galaxy disks

    Authors: Matthew A. Bershady, Kyle B. Westfall, Shravan Shetty, David R. Law, Michele Cappellari, Niv Drory, Kevin Bundy, Renbin Yan

    Abstract: We measure the age-velocity relationship from the lag between ionized gas and stellar tangential speeds in ~500 nearby disk galaxies from MaNGA in SDSS-IV. Selected galaxies are kinematically axisymmetric. Velocity lags are asymmetric drift, seen in the Milky Way's (MW) solar neighborhood and other Local Group galaxies; their amplitude correlates with stellar population age. The trend is qualitati… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: 21 pages, 18 figures. Accepted for publication in MNRAS

  31. arXiv:2405.02077  [pdf, other

    cs.CV

    MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition

    Authors: Hongyu Qu, Rui Yan, Xiangbo Shu, Hailiang Gao, Peng Huang, Guo-Sen Xie

    Abstract: Recent few-shot action recognition (FSAR) methods typically perform semantic matching on learned discriminative features to achieve promising performance. However, most FSAR methods focus on single-scale (e.g., frame-level, segment-level, etc) feature alignment, which ignores that human actions with the same semantic may appear at different velocities. To this end, we develop a novel Multi-Velocit… ▽ More

    Submitted 23 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  32. arXiv:2404.16852  [pdf, other

    cs.LG cs.AI cs.CL eess.IV

    A Disease Labeler for Chinese Chest X-Ray Report Generation

    Authors: Mengwei Wang, Ruixin Yan, Zeyi Hou, Ning Lang, Xiuzhuang Zhou

    Abstract: In the field of medical image analysis, the scarcity of Chinese chest X-ray report datasets has hindered the development of technology for generating Chinese chest X-ray reports. On one hand, the construction of a Chinese chest X-ray report dataset is limited by the time-consuming and costly process of accurate expert disease annotation. On the other hand, a single natural language generation metr… ▽ More

    Submitted 18 March, 2024; originally announced April 2024.

  33. arXiv:2404.15771  [pdf, other

    cs.CV cs.MM

    DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines

    Authors: Xin Jiang, Hao Tang, Rui Yan, Jinhui Tang, Zechao Li

    Abstract: Fine-grained image retrieval (FGIR) is to learn visual representations that distinguish visually similar objects while maintaining generalization. Existing methods propose to generate discriminative features, but rarely consider the particularity of the FGIR task itself. This paper presents a meticulous analysis leading to the proposal of practical guidelines to identify subcategory-specific discr… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  34. arXiv:2404.15642  [pdf, other

    physics.plasm-ph

    Self-generated magnetic field in three-dimensional ablative Rayleigh-Taylor instability

    Authors: Dehua Zhang, Xian Jiang, Tao Tao, Jun Li, Rui Yan, De-Jun Sun, Jian Zheng

    Abstract: The self-generated magnetic field in three-dimensional (3D) single-mode ablative Rayleigh-Taylor instabilities (ARTI) relevant to the acceleration phase of a direct-drive inertial confinement fusion (ICF) implosion is investigated. It is found that stronger magnetic fields up to a few thousands of T can be generated by 3D ARTI than by its two-dimensional (2D) counterpart. The Nernst effects signif… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  35. arXiv:2404.15597  [pdf, other

    cs.NE cs.AI cs.LG cs.MA

    GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL

    Authors: Lang Qin, Ziming Wang, Runhao Jiang, Rui Yan, Huajin Tang

    Abstract: Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms,… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  36. arXiv:2404.15353  [pdf, other

    eess.SP cs.AI cs.LG

    SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals

    Authors: Runze Yan, Cheng Ding, Ran Xiao, Aleksandr Fedorov, Randall J Lee, Fadi Nahab, Xiao Hu

    Abstract: Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambu… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 15 pages; 9 figures; 2024 Conference on Health, Inference, and Learning (CHIL)

  37. arXiv:2404.11541  [pdf, other

    astro-ph.GA astro-ph.SR

    Carbon- and Oxygen-rich stars in MaStar: identification and classification

    Authors: Lewis Hill, Claudia Maraston, Daniel Thomas, Renbin Yan, Yanping Chen, Guy S. Stringfellow, Richard R. Lane, José G. Fernández-Trincado

    Abstract: Carbon- and Oxygen-rich stars populating the Thermally-Pulsing Asymptotic Giant Branch (TP-AGB) phase of stellar evolution are relevant contributors to the spectra of ~1 Gyr old populations. Atmosphere models for these types are uncertain, due to complex molecules and mass-loss effects. Empirical spectra are then crucial, but samples are small due to the short (~3 Myr) TP-AGB lifetime. Here we exp… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 16 pages, 13 figures, MNRAS in press

  38. arXiv:2404.10679  [pdf, other

    cs.GT cs.AI

    HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

    Authors: Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska

    Abstract: We consider a variant of continuous-state partially-observable stochastic games with neural perception mechanisms and an asymmetric information structure. One agent has partial information, with the observation function implemented as a neural network, while the other agent is assumed to have full knowledge of the state. We present, for the first time, an efficient online method to compute an… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 12 pages, 2 figures

  39. arXiv:2404.08216  [pdf, other

    physics.plasm-ph

    Role of nonlocal heat transport on the laser ablative Rayleigh-Taylor instability

    Authors: Z. H. Chen, X. H. Yang, G. B. Zhang, Y. Y. Ma, R. Yan, H. Xu, Z. M. Sheng, F. Q. Shao, J. Zhang

    Abstract: Ablative Rayleigh-Taylor instability (ARTI) and nonlocal heat transport are the critical problems in laser-driven inertial confinement fusion, while their coupling with each other is not completely understood yet. Here the ARTI in the presence of nonlocal heat transport is studied self-consistently for the first time theoretically and by using radiation hydrodynamic simulations. It is found that t… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 8 pages, 5 figures

  40. arXiv:2404.04272  [pdf, other

    cs.IR cs.CL

    Selecting Query-bag as Pseudo Relevance Feedback for Information-seeking Conversations

    Authors: Xiaoqing Zhang, Xiuying Chen, Shen Gao, Shuqi Li, Xin Gao, Ji-Rong Wen, Rui Yan

    Abstract: Information-seeking dialogue systems are widely used in e-commerce systems, with answers that must be tailored to fit the specific settings of the online system. Given the user query, the information-seeking dialogue systems first retrieve a subset of response candidates, then further select the best response from the candidate set through re-ranking. Current methods mainly retrieve response candi… ▽ More

    Submitted 22 March, 2024; originally announced April 2024.

  41. arXiv:2403.19521  [pdf, other

    cs.CL cs.AI cs.LG

    Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models

    Authors: Ang Lv, Yuhan Chen, Kaiyi Zhang, Yulong Wang, Lifeng Liu, Ji-Rong Wen, Jian Xie, Rui Yan

    Abstract: In this paper, we delve into several mechanisms employed by Transformer-based language models (LLMs) for factual recall tasks. We outline a pipeline consisting of three major steps: (1) Given a prompt ``The capital of France is,'' task-specific attention heads extract the topic token, such as ``France,'' from the context and pass it to subsequent MLPs. (2) As attention heads' outputs are aggregate… ▽ More

    Submitted 24 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  42. arXiv:2403.15105  [pdf, other

    cs.SI

    LLM-Driven Agents for Influencer Selection in Digital Advertising Campaigns

    Authors: Xiaoqing Zhang, Xiuying Chen, Yuhan Liu, Jianzhou Wang, Zhenxing Hu, Rui Yan

    Abstract: In the digital world, influencers are pivotal as opinion leaders, shaping the views and choices of their influencees. Modern advertising often follows this trend, where marketers choose appropriate influencers for product endorsements, based on thorough market analysis. Previous studies on influencer selection have typically relied on numerical representations of individual opinions and interactio… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  43. arXiv:2403.11439  [pdf, other

    cs.CL

    StyleChat: Learning Recitation-Augmented Memory in LLMs for Stylized Dialogue Generation

    Authors: Jinpeng Li, Zekai Zhang, Quan Tu, Xin Cheng, Dongyan Zhao, Rui Yan

    Abstract: Large Language Models (LLMs) demonstrate superior performance in generative scenarios and have attracted widespread attention. Among them, stylized dialogue generation is essential in the context of LLMs for building intelligent and engaging dialogue agent. However the ability of LLMs is data-driven and limited by data bias, leading to poor performance on specific tasks. In particular, stylized di… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  44. arXiv:2403.09498  [pdf, other

    cs.SI cs.AI cs.CL

    From Skepticism to Acceptance: Simulating the Attitude Dynamics Toward Fake News

    Authors: Yuhan Liu, Xiuying Chen, Xiaoqing Zhang, Xing Gao, Ji Zhang, Rui Yan

    Abstract: In the digital era, the rapid propagation of fake news and rumors via social networks brings notable societal challenges and impacts public opinion regulation. Traditional fake news modeling typically forecasts the general popularity trends of different groups or numerically represents opinions shift. However, these methods often oversimplify real-world complexities and overlook the rich semantic… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  45. arXiv:2403.08312  [pdf, other

    cs.CL cs.AI

    StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses

    Authors: Jia-Nan Li, Quan Tu, Cunli Mao, Zhengtao Yu, Ji-Rong Wen, Rui Yan

    Abstract: Standard Large Language Models (LLMs) struggle with handling dialogues with long contexts due to efficiency and consistency issues. According to our observation, dialogue contexts are highly structured, and the special token of \textit{End-of-Utterance} (EoU) in dialogues has the potential to aggregate information. We refer to the EoU tokens as ``conversational attention sinks'' (conv-attn sinks).… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  46. arXiv:2403.06408  [pdf, other

    cs.LG cs.AI

    What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation

    Authors: Zhuocheng Gong, Jiahao Liu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

    Abstract: Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be learned about the relationship between quantization and LLM performance. To shed light on this relationship, we propose a new perspective on quantization, viewing it… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  47. arXiv:2403.06202  [pdf, other

    eess.SY cs.GT

    Pursuit Winning Strategies for Reach-Avoid Games with Polygonal Obstacles

    Authors: Rui Yan, Shuai Mi, Xiaoming Duan, Jintao Chen, Xiangyang Ji

    Abstract: This paper studies a multiplayer reach-avoid differential game in the presence of general polygonal obstacles that block the players' motions. The pursuers cooperate to protect a convex region from the evaders who try to reach the region. We propose a multiplayer onsite and close-to-goal (MOCG) pursuit strategy that can tell and achieve an increasing lower bound on the number of guaranteed defeate… ▽ More

    Submitted 22 May, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

    Comments: 16 pages, 10 figures

  48. arXiv:2403.05217  [pdf, other

    cs.CL cs.AI cs.IR

    Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering

    Authors: Hongda Sun, Yuxuan Liu, Chengwei Wu, Haiyu Yan, Cheng Tai, Xin Gao, Shuo Shang, Rui Yan

    Abstract: Open-domain question answering (ODQA) has emerged as a pivotal research spotlight in information systems. Existing methods follow two main paradigms to collect evidence: (1) The \textit{retrieve-then-read} paradigm retrieves pertinent documents from an external corpus; and (2) the \textit{generate-then-read} paradigm employs large language models (LLMs) to generate relevant documents. However, nei… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: TheWebConf 2024 (WWW 2024) oral, code repo: https://github.com/EthanLeo-LYX/LLMQA

  49. arXiv:2403.03102  [pdf, other

    cs.CL cs.AI

    "In Dialogues We Learn": Towards Personalized Dialogue Without Pre-defined Profiles through In-Dialogue Learning

    Authors: Chuanqi Cheng, Quan Tu, Wei Wu, Shuo Shang, Cunli Mao, Zhengtao Yu, Rui Yan

    Abstract: Personalized dialogue systems have gained significant attention in recent years for their ability to generate responses in alignment with different personas. However, most existing approaches rely on pre-defined personal profiles, which are not only time-consuming and labor-intensive to create but also lack flexibility. We propose In-Dialogue Learning (IDL), a fine-tuning framework that enhances t… ▽ More

    Submitted 12 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  50. arXiv:2403.02178  [pdf, other

    cs.CL cs.AI cs.LG

    Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models

    Authors: Changyu Chen, Xiting Wang, Ting-En Lin, Ang Lv, Yuchuan Wu, Xin Gao, Ji-Rong Wen, Rui Yan, Yongbin Li

    Abstract: In reasoning tasks, even a minor error can cascade into inaccurate results, leading to suboptimal performance of large language models in such domains. Earlier fine-tuning approaches sought to mitigate this by leveraging more precise supervisory signals from human labeling, larger models, or self-sampling, although at a high cost. Conversely, we develop a method that avoids external resources, rel… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted by ACL 2024