Skip to main content

Showing 1–50 of 673 results for author: Du, Y

  1. arXiv:2407.13622  [pdf, other

    cs.LG cs.AI

    Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error

    Authors: Ally Yalei Du, Lin F. Yang, Ruosong Wang

    Abstract: The recent work by Dong & Yang (2023) showed for misspecified sparse linear bandits, one can obtain an $O\left(ε\right)$-optimal policy using a polynomial number of samples when the sparsity is a constant, where $ε$ is the misspecification error. This result is in sharp contrast to misspecified linear bandits without sparsity, which require an exponential number of samples to get the same guarante… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 21 pages

  2. arXiv:2407.13168  [pdf, other

    cs.AI cs.CL

    SciCode: A Research Coding Benchmark Curated by Scientists

    Authors: Minyang Tian, Luyu Gao, Shizhuo Dylan Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li, Shengyan Liu, Di Luo, Yutao Ma, Hao Tong, Kha Trinh, Chenyu Tian, Zihan Wang, Bohao Wu, Yanyu Xiong, Shengzhu Yin, Minhui Zhu, Kilian Lieret, Yanxin Lu, Genglin Liu, Yufeng Du , et al. (5 additional authors not shown)

    Abstract: Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields,… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 25 pages, 9 figures, 7 tables

  3. arXiv:2407.12505  [pdf, other

    cs.LG cs.AI cs.RO

    Subequivariant Reinforcement Learning in 3D Multi-Entity Physical Environments

    Authors: Runfa Chen, Ling Wang, Yu Du, Tianrui Xue, Fuchun Sun, Jianwei Zhang, Wenbing Huang

    Abstract: Learning policies for multi-entity systems in 3D environments is far more complicated against single-entity scenarios, due to the exponential expansion of the global state space as the number of entities increases. One potential solution of alleviating the exponential complexity is dividing the global space into independent local views that are invariant to transformations including translations a… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  4. arXiv:2407.12317  [pdf, other

    cs.CV

    Out of Length Text Recognition with Sub-String Matching

    Authors: Yongkun Du, Zhineng Chen, Caiyan Jia, Xieping Gao, Yu-Gang Jiang

    Abstract: Scene Text Recognition (STR) methods have demonstrated robust performance in word-level text recognition. However, in applications the text image is sometimes long due to detected with multiple horizontal words. It triggers the requirement to build long text recognition models from readily available short word-level text datasets, which has been less studied previously. In this paper, we term this… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: preprint, 15 pages

  5. arXiv:2407.11333  [pdf, other

    cs.RO cs.SD eess.AS

    Disentangled Acoustic Fields For Multimodal Physical Scene Understanding

    Authors: Jie Yin, Andrew Luo, Yilun Du, Anoop Cherian, Tim K. Marks, Jonathan Le Roux, Chuang Gan

    Abstract: We study the problem of multimodal physical scene understanding, where an embodied agent needs to find fallen objects by inferring object properties, direction, and distance of an impact sound source. Previous works adopt feed-forward neural networks to directly regress the variables from sound, leading to poor generalization and domain adaptation issues. In this paper, we illustrate that learning… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  6. arXiv:2407.06494  [pdf, other

    cs.LG cs.AI

    A Generative Approach to Control Complex Physical Systems

    Authors: Long Wei, Peiyan Hu, Ruiqi Feng, Haodong Feng, Yixuan Du, Tao Zhang, Rui Wang, Yue Wang, Zhi-Ming Ma, Tailin Wu

    Abstract: Controlling the evolution of complex physical systems is a fundamental task across science and engineering. Classical techniques suffer from limited applicability or huge computational costs. On the other hand, recent deep learning and reinforcement learning-based approaches often struggle to optimize long-term control sequences under the constraints of system dynamics. In this work, we introduce… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  7. arXiv:2407.06169  [pdf, other

    cs.RO cs.CV cs.LG

    Potential Based Diffusion Motion Planning

    Authors: Yunhao Luo, Chen Sun, Joshua B. Tenenbaum, Yilun Du

    Abstract: Effective motion planning in high dimensional spaces is a long-standing open problem in robotics. One class of traditional motion planning algorithms corresponds to potential-based motion planning. An advantage of potential based motion planning is composability -- different motion constraints can be easily combined by adding corresponding potentials. However, constructing motion paths from potent… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: ICML 2024. Project page and code at https://energy-based-model.github.io/potential-motion-plan/

  8. arXiv:2407.04842  [pdf, other

    cs.CV cs.CL cs.LG

    MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

    Authors: Zhaorun Chen, Yichao Du, Zichen Wen, Yiyang Zhou, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

    Abstract: While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequent… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 42 pages, 13 figures, 33 tables

  9. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  10. arXiv:2407.03719  [pdf, other

    cs.CV

    Relative Difficulty Distillation for Semantic Segmentation

    Authors: Dong Liang, Yue Sun, Yun Du, Songcan Chen, Sheng-Jun Huang

    Abstract: Current knowledge distillation (KD) methods primarily focus on transferring various structured knowledge and designing corresponding optimization goals to encourage the student network to imitate the output of the teacher network. However, introducing too many additional optimization objectives may lead to unstable training, such as gradient conflicts. Moreover, these methods ignored the guideline… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  11. arXiv:2407.03442  [pdf, other

    cs.CV

    Fisher-aware Quantization for DETR Detectors with Critical-category Objectives

    Authors: Huanrui Yang, Yafeng Huang, Zhen Dong, Denis A Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Yuan Du, Kurt Keutzer, Shanghang Zhang

    Abstract: The impact of quantization on the overall performance of deep learning models is a well-studied problem. However, understanding and mitigating its effects on a more fine-grained level is still lacking, especially for harder tasks such as object detection with both classification and regression objectives. This work defines the performance for a subset of task-critical categories, i.e. the critical… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Poster presentation at the 2nd Workshop on Advancing Neural Network Training: Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024)

  12. arXiv:2407.02913  [pdf, other

    cs.LG cs.AI eess.IV eess.SP math.NA

    SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic

    Authors: Liulu He, Yufei Zhao, Rui Gao, Yuan Du, Li Du

    Abstract: Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models. However, these algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization. To resolve this conflict and further improve the efficiency of quantized convolution, we proposes SFC, a new algebra transform for fast co… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  13. arXiv:2407.01392  [pdf, other

    cs.LG cs.CV cs.RO

    Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

    Authors: Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, Vincent Sitzmann

    Abstract: This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens without fully diffusing past ones. Our approach is shown to combine the strengths of… ▽ More

    Submitted 4 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Project website: https://boyuan.space/diffusion-forcing Code: https://github.com/buoyancy99/diffusion-forcing

  14. arXiv:2406.19298  [pdf, other

    cs.CV cs.LG

    Compositional Image Decomposition with Diffusion Models

    Authors: Jocelin Su, Nan Liu, Yanbo Wang, Joshua B. Tenenbaum, Yilun Du

    Abstract: Given an image of a natural scene, we are able to quickly decompose it into a set of components such as objects, lighting, shadows, and foreground. We can then envision a scene where we combine certain components with those from other images, for instance a set of objects from our bedroom and animals from a zoo under the lighting conditions of a forest, even if we have never encountered such a sce… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: ICML 2024, Webpage: https://energy-based-model.github.io/decomp-diffusion

  15. arXiv:2406.18020  [pdf, other

    cs.LG cs.AI physics.chem-ph

    MolFusion: Multimodal Fusion Learning for Molecular Representations via Multi-granularity Views

    Authors: Muzhen Cai, Sendong Zhao, Haochun Wang, Yanrui Du, Zewen Qiang, Bing Qin, Ting Liu

    Abstract: Artificial Intelligence predicts drug properties by encoding drug molecules, aiding in the rapid screening of candidates. Different molecular representations, such as SMILES and molecule graphs, contain complementary information for molecular encoding. Thus exploiting complementary information from different molecular representations is one of the research priorities in molecular encoding. Most ex… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  16. arXiv:2406.16976  [pdf, other

    cs.NE cs.AI cs.LG physics.chem-ph

    Efficient Evolutionary Search Over Chemical Space with Large Language Models

    Authors: Haorui Wang, Marta Skreta, Cher-Tian Ser, Wenhao Gao, Lingkai Kong, Felix Strieth-Kalthoff, Chenru Duan, Yuchen Zhuang, Yue Yu, Yanqiao Zhu, Yuanqi Du, Alán Aspuru-Guzik, Kirill Neklyudov, Chao Zhang

    Abstract: Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations… ▽ More

    Submitted 2 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  17. arXiv:2406.16754  [pdf, other

    cs.LG cs.CV eess.IV

    The MRI Scanner as a Diagnostic: Image-less Active Sampling

    Authors: Yuning Du, Rohan Dharmakumar, Sotirios A. Tsaftaris

    Abstract: Despite the high diagnostic accuracy of Magnetic Resonance Imaging (MRI), using MRI as a Point-of-Care (POC) disease identification tool poses significant accessibility challenges due to the use of high magnetic field strength and lengthy acquisition times. We ask a simple question: Can we dynamically optimise acquired samples, at the patient level, according to an (automated) downstream decision… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted in MICCAI 2024

  18. arXiv:2406.16087  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy

    Authors: Chen Wang, Kaiyi Ji, Junyi Geng, Zhongqiang Ren, Taimeng Fu, Fan Yang, Yifan Guo, Haonan He, Xiangyu Chen, Zitong Zhan, Qiwei Du, Shaoshu Su, Bowen Li, Yuheng Qiu, Yi Du, Qihang Li, Yifan Yang, Xiao Lin, Zhipeng Zhao

    Abstract: Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeS… ▽ More

    Submitted 6 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  19. arXiv:2406.16030  [pdf, other

    cs.CL cs.AI

    Zero-Shot Cross-Lingual NER Using Phonemic Representations for Low-Resource Languages

    Authors: Jimin Sohn, Haeji Jung, Alex Cheng, Jooeon Kang, Yilin Du, David R. Mortensen

    Abstract: Existing zero-shot cross-lingual NER approaches require substantial prior knowledge of the target language, which is impractical for low-resource languages. In this paper, we propose a novel approach to NER using phonemic representation based on the International Phonetic Alphabet (IPA) to bridge the gap between representations of different languages. Our experiments show that our method significa… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, 5 tables

  20. arXiv:2406.14129  [pdf, other

    cs.CV cs.CL cs.MM

    Towards Event-oriented Long Video Understanding

    Authors: Yifan Du, Kun Zhou, Yuqi Huo, Yifan Li, Wayne Xin Zhao, Haoyu Lu, Zijia Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen

    Abstract: With the rapid development of video Multimodal Large Language Models (MLLMs), numerous benchmarks have been proposed to assess their video understanding capability. However, due to the lack of rich events in the videos, these datasets may suffer from the short-cut bias that the answers can be deduced from a few frames, without the need to watch the entire video. To address this issue, we introduce… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Work on progress

  21. arXiv:2406.13948  [pdf, other

    cs.AI cs.CL cs.LG

    CityGPT: Empowering Urban Spatial Cognition of Large Language Models

    Authors: Jie Feng, Yuwei Du, Tianhui Liu, Siqi Guo, Yuming Lin, Yong Li

    Abstract: Large language models(LLMs) with powerful language generation and reasoning capabilities have already achieved success in many domains, e.g., math and code generation. However, due to the lacking of physical world's corpus and knowledge during training, they usually fail to solve many real-life tasks in the urban space. In this paper, we propose CityGPT, a systematic framework for enhancing the ca… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  22. arXiv:2406.13945  [pdf, other

    cs.AI cs.CL cs.LG

    CityBench: Evaluating the Capabilities of Large Language Model as World Model

    Authors: Jie Feng, Jun Zhang, Junbo Yan, Xin Zhang, Tianjian Ouyang, Tianhui Liu, Yuwei Du, Siqi Guo, Yong Li

    Abstract: Large language models (LLMs) with powerful generalization ability has been widely used in many domains. A systematic and reliable evaluation of LLMs is a crucial step in their development and applications, especially for specific professional fields. In the urban domain, there have been some early explorations about the usability of LLMs, but a systematic and scalable evaluation benchmark is still… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  23. arXiv:2406.13271  [pdf, other

    cs.CV

    Hierarchical IoU Tracking based on Interval

    Authors: Yunhao Du, Zhicheng Zhao, Fei Su

    Abstract: Multi-Object Tracking (MOT) aims to detect and associate all targets of given classes across frames. Current dominant solutions, e.g. ByteTrack and StrongSORT++, follow the hybrid pipeline, which first accomplish most of the associations in an online manner, and then refine the results using offline tricks such as interpolation and global link. While this paradigm offers flexibility in application… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 7 pages, 3 figures

  24. arXiv:2406.12195  [pdf, other

    quant-ph cs.LG

    Quantum Compiling with Reinforcement Learning on a Superconducting Processor

    Authors: Z. T. Wang, Qiuhao Chen, Yuxuan Du, Z. H. Yang, Xiaoxia Cai, Kaixuan Huang, Jingning Zhang, Kai Xu, Jun Du, Yinan Li, Yuling Jiao, Xingyao Wu, Wu Liu, Xiliang Lu, Huikai Xu, Yirong Jin, Ruixia Wang, Haifeng Yu, S. P. Zhao

    Abstract: To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcemen… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  25. arXiv:2406.11776  [pdf, other

    cs.CL

    Improving Multi-Agent Debate with Sparse Communication Topology

    Authors: Yunxuan Li, Yibing Du, Jiageng Zhang, Le Hou, Peter Grabowski, Yeqing Li, Eugene Ie

    Abstract: Multi-agent debate has proven effective in improving large language models quality for reasoning and factuality tasks. While various role-playing strategies in multi-agent debates have been explored, in terms of the communication among agents, existing approaches adopt a brute force algorithm -- each agent can communicate with all other agents. In this paper, we systematically investigate the effe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 13 pages, 9 figures

  26. arXiv:2406.11546  [pdf, other

    eess.AS cs.CL cs.SD

    GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

    Authors: Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen

    Abstract: The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired spee… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under review

  27. arXiv:2406.11179  [pdf, other

    cs.LG cs.AI

    Learning Iterative Reasoning through Energy Diffusion

    Authors: Yilun Du, Jiayuan Mao, Joshua B. Tenenbaum

    Abstract: We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks by formulating reasoning and decision-making problems with energy-based optimization. IRED learns energy functions to represent the constraints between input conditions and desired outputs. After training, IRED adapts the number of optimization steps during inference ba… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: ICML 2024, website: https://energy-based-model.github.io/ired/

  28. arXiv:2406.09367  [pdf, other

    cs.CV

    Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

    Authors: Zijia Zhao, Haoyu Lu, Yuqi Huo, Yifan Du, Tongtian Yue, Longteng Guo, Bingning Wang, Weipeng Chen, Jing Liu

    Abstract: Video understanding is a crucial next step for multimodal large language models (MLLMs). To probe specific aspects of video understanding ability, existing video benchmarks typically require careful video selection based on the target capability, along with laborious annotation of query-response pairs to match the specific video content. This process is both challenging and resource-intensive. In… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  29. arXiv:2406.07098  [pdf, other

    cs.IR cs.AI cs.DB

    Guiding Catalogue Enrichment with User Queries

    Authors: Yupei Du, Jacek Golebiowski, Philipp Schmidt, Ziawasch Abedjan

    Abstract: Techniques for knowledge graph (KGs) enrichment have been increasingly crucial for commercial applications that rely on evolving product catalogues. However, because of the huge search space of potential enrichment, predictions from KG completion (KGC) methods suffer from low precision, making them unreliable for real-world catalogues. Moreover, candidate facts for enrichment have varied relevance… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ECML PKDD 2024

  30. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin Jin, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, Jingjing Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  31. arXiv:2406.05954  [pdf, other

    cs.AI cs.LG eess.SY

    Aligning Large Language Models with Representation Editing: A Control Perspective

    Authors: Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang

    Abstract: Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabi… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: fix typos

  32. arXiv:2406.05343  [pdf, other

    cs.AI cs.CL

    M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark

    Authors: Wei Song, Yadong Li, Jianhua Xu, Guowei Wu, Lingfeng Ming, Kexin Yi, Weihua Luo, Houyi Li, Yi Du, Fangda Guo, Kaicheng Yu

    Abstract: As recent multi-modality large language models (MLLMs) have shown formidable proficiency on various complex tasks, there has been increasing attention on debating whether these models could eventually mirror human intelligence. However, existing benchmarks mainly focus on evaluating solely on task performance, such as the accuracy of identifying the attribute of an object. Combining well-developed… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  33. arXiv:2406.04845  [pdf, other

    cs.CL cs.AI cs.DC cs.LG cs.MA

    FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models

    Authors: Rui Ye, Rui Ge, Xinyu Zhu, Jingyi Chai, Yaxin Du, Yang Liu, Yanfeng Wang, Siheng Chen

    Abstract: Federated learning has enabled multiple parties to collaboratively train large language models without directly sharing their data (FedLLM). Following this training paradigm, the community has put massive efforts from diverse aspects including framework, performance, and privacy. However, an unpleasant fact is that there are currently no realistic datasets and benchmarks for FedLLM and previous wo… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 22 pages

  34. arXiv:2406.00497  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    Recent Advances in End-to-End Simultaneous Speech Translation

    Authors: Xiaoqian Liu, Guoqiang Hu, Yangfan Du, Erfeng He, YingFeng Luo, Chen Xu, Tong Xiao, Jingbo Zhu

    Abstract: Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles.… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  35. arXiv:2405.20018  [pdf, other

    cs.MA cs.CL cs.LG

    Safe Multi-agent Reinforcement Learning with Natural Language Constraints

    Authors: Ziyan Wang, Meng Fang, Tristan Tomilin, Fei Fang, Yali Du

    Abstract: The role of natural language constraints in Safe Multi-agent Reinforcement Learning (MARL) is crucial, yet often overlooked. While Safe MARL has vast potential, especially in fields like robotics and autonomous vehicles, its full potential is limited by the need to define constraints in pre-designed mathematical terms, which requires extensive domain expertise and reinforcement learning knowledge,… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 23 pages, 6 figures

  36. arXiv:2405.19946  [pdf, other

    cs.AI

    Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf

    Authors: Xuanfa Jin, Ziyan Wang, Yali Du, Meng Fang, Haifeng Zhang, Jun Wang

    Abstract: Communication is a fundamental aspect of human society, facilitating the exchange of information and beliefs among people. Despite the advancements in large language models (LLMs), recent agents built with these often neglect the control over discussion tactics, which are essential in communication scenarios and games. As a variant of the famous communication game Werewolf, One Night Ultimate Were… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 27 pages, 5 figures

  37. arXiv:2405.19667  [pdf, other

    cs.LG cs.AI

    Reconciling Model Multiplicity for Downstream Decision Making

    Authors: Ally Yalei Du, Dung Daniel Ngo, Zhiwei Steven Wu

    Abstract: We consider the problem of model multiplicity in downstream decision-making, a setting where two predictive models of equivalent accuracy cannot agree on the best-response action for a downstream loss function. We show that even when the two predictive models approximately agree on their individual predictions almost everywhere, it is still possible for their induced best-response actions to diffe… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 16 pages main body, 6 figures

  38. arXiv:2405.17950  [pdf, other

    cs.AI

    Self-Guiding Exploration for Combinatorial Problems

    Authors: Zangir Iklassov, Yali Du, Farkhad Akimov, Martin Takac

    Abstract: Large Language Models (LLMs) have become pivotal in addressing reasoning tasks across diverse domains, including arithmetic, commonsense, and symbolic reasoning. They utilize prompting techniques such as Exploration-of-Thought, Decomposition, and Refinement to effectively navigate and solve intricate tasks. Despite these advancements, the application of LLMs to Combinatorial Problems (CPs), known… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 22 pages

  39. arXiv:2405.17719  [pdf, other

    cs.CV

    EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?

    Authors: Boshen Xu, Ziheng Wang, Yang Du, Zhinan Song, Sipeng Zheng, Qin Jin

    Abstract: Egocentric video-language pretraining is a crucial paradigm to advance the learning of egocentric hand-object interactions (EgoHOI). Despite the great success on existing testbeds, these benchmarks focus more on closed-set visual concepts or limited scenarios. Due to the occurrence of diverse EgoHOIs in the real world, we propose an open-vocabulary benchmark named EgoHOIBench to reveal the diminis… ▽ More

    Submitted 3 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Code: https://github.com/xuboshen/EgoNCEpp

  40. arXiv:2405.17440  [pdf, other

    cs.LG cs.AI cs.CL

    CataLM: Empowering Catalyst Design Through Large Language Models

    Authors: Ludi Wang, Xueqing Chen, Yi Du, Yuanchun Zhou, Yang Gao, Wenjuan Cui

    Abstract: The field of catalysis holds paramount importance in shaping the trajectory of sustainable development, prompting intensive research efforts to leverage artificial intelligence (AI) in catalyst design. Presently, the fine-tuning of open-source large language models (LLMs) has yielded significant breakthroughs across various domains such as biology and healthcare. Drawing inspiration from these adv… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  41. arXiv:2405.16486  [pdf, other

    cs.CV cs.AI

    Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

    Authors: Rongyu Zhang, Aosong Cheng, Yulin Luo, Gaole Dai, Huanrui Yang, Jiaming Liu, Ran Xu, Li Du, Yuan Du, Yanbing Jiang, Shanghang Zhang

    Abstract: Continual Test-Time Adaptation (CTTA), which aims to adapt the pre-trained model to ever-evolving target domains, emerges as an important task for vision models. As current vision models appear to be heavily biased towards texture, continuously adapting the model from one domain distribution to another can result in serious catastrophic forgetting. Drawing inspiration from the human visual system'… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  42. arXiv:2405.16133  [pdf, other

    cs.SE cs.AI

    Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting

    Authors: Tong Ye, Yangkai Du, Tengfei Ma, Lingfei Wu, Xuhong Zhang, Shouling Ji, Wenhai Wang

    Abstract: Large Language Models (LLMs) have exhibited remarkable proficiency in generating code. However, the misuse of LLM-generated (Synthetic) code has prompted concerns within both educational and industrial domains, highlighting the imperative need for the development of synthetic code detectors. Existing methods for detecting LLM-generated content are primarily tailored for general text and often stru… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: Previously submitted to EMNLP2023

  43. arXiv:2405.14702  [pdf, other

    cs.CV cs.AI

    G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models

    Authors: Pengyue Jia, Yiding Liu, Xiaopeng Li, Xiangyu Zhao, Yuhao Wang, Yantong Du, Xiao Han, Xuetao Wei, Shuaiqiang Wang, Dawei Yin

    Abstract: Worldwide geolocalization aims to locate the precise location at the coordinate level of photos taken anywhere on the Earth. It is very challenging due to 1) the difficulty of capturing subtle location-aware visual semantics, and 2) the heterogeneous geographical distribution of image data. As a result, existing studies have clear limitations when scaled to a worldwide context. They may easily con… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  44. arXiv:2405.14488  [pdf, other

    cs.CL

    MoGU: A Framework for Enhancing Safety of Open-Sourced LLMs While Preserving Their Usability

    Authors: Yanrui Du, Sendong Zhao, Danyang Zhao, Ming Ma, Yuhan Chen, Liangyu Huo, Qing Yang, Dongliang Xu, Bing Qin

    Abstract: Large Language Models (LLMs) are increasingly deployed in various applications. As their usage grows, concerns regarding their safety are rising, especially in maintaining harmless responses when faced with malicious instructions. Many defense strategies have been developed to enhance the safety of LLMs. However, our research finds that existing defense strategies lead LLMs to predominantly adopt… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  45. arXiv:2405.14075  [pdf, other

    cs.CL cs.AI cs.LG

    $T^2$ of Thoughts: Temperature Tree Elicits Reasoning in Large Language Models

    Authors: Chengkun Cai, Xu Zhao, Yucheng Du, Haoliang Liu, Lei Li

    Abstract: Large Language Models (LLMs) have emerged as powerful tools in artificial intelligence, especially in complex decision-making scenarios, but their static problem-solving strategies often limit their adaptability to dynamic environments. We explore the enhancement of reasoning capabilities in LLMs through Temperature Tree ($T^2$) prompting via Particle Swarm Optimization, termed as $T^2$ of Thought… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 10 pages, 5 figures

  46. arXiv:2405.12754  [pdf, other

    astro-ph.SR cs.AI cs.LG physics.space-ph

    Neural Operator for Accelerating Coronal Magnetic Field Model

    Authors: Yutao Du, Qin Li, Raghav Gnanasambandam, Mengnan Du, Haimin Wang, Bo Shen

    Abstract: Studying the sun's outer atmosphere is challenging due to its complex magnetic fields impacting solar activities. Magnetohydrodynamics (MHD) simulations help model these interactions but are extremely time-consuming (usually on a scale of days). Our research applies the Fourier Neural Operator (FNO) to accelerate the coronal magnetic field modeling, specifically, the Bifrost MHD model. We apply Te… ▽ More

    Submitted 26 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  47. arXiv:2405.11928  [pdf, other

    cs.RO cs.AI

    "Set It Up!": Functional Object Arrangement with Compositional Generative Models

    Authors: Yiqing Xu, Jiayuan Mao, Yilun Du, Tomas Lozáno-Pérez, Leslie Pack Kaebling, David Hsu

    Abstract: This paper studies the challenge of developing robots capable of understanding under-specified instructions for creating functional object arrangements, such as "set up a dining table for two"; previous arrangement approaches have focused on much more explicit instructions, such as "put object A on the table." We introduce a framework, SetItUp, for learning to interpret under-specified instruction… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 10 pages main paper, 21 pages appendix, RSS 2024

  48. arXiv:2405.07518  [pdf, other

    cs.AR cs.AI

    SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

    Authors: Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Edison Chen, Kaizhao Liang, Swayambhoo Jain , et al. (5 additional authors not shown)

    Abstract: Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Expert… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  49. arXiv:2405.07226  [pdf, other

    quant-ph cs.AI cs.LG

    Separable Power of Classical and Quantum Learning Protocols Through the Lens of No-Free-Lunch Theorem

    Authors: Xinbiao Wang, Yuxuan Du, Kecheng Liu, Yong Luo, Bo Du, Dacheng Tao

    Abstract: The No-Free-Lunch (NFL) theorem, which quantifies problem- and data-independent generalization errors regardless of the optimization process, provides a foundational framework for comprehending diverse learning protocols' potential. Despite its significance, the establishment of the NFL theorem for quantum machine learning models remains largely unexplored, thereby overlooking broader insights int… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  50. arXiv:2405.06916  [pdf, other

    cs.CV

    High-order Neighborhoods Know More: HyperGraph Learning Meets Source-free Unsupervised Domain Adaptation

    Authors: Jinkun Jiang, Qingxuan Lv, Yuezun Li, Yong Du, Sheng Chen, Hui Yu, Junyu Dong

    Abstract: Source-free Unsupervised Domain Adaptation (SFDA) aims to classify target samples by only accessing a pre-trained source model and unlabelled target samples. Since no source data is available, transferring the knowledge from the source domain to the target domain is challenging. Existing methods normally exploit the pair-wise relation among target samples and attempt to discover their correlations… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.