Skip to main content

Showing 1–50 of 1,345 results for author: Wu, T

  1. arXiv:2407.08141  [pdf, ps, other

    eess.SP

    A Framework of FAS-RIS Systems: Performance Analysis and Throughput Optimization

    Authors: Junteng Yao, Xiazhi Lai, Kangda Zhi, Tuo Wu, Ming Jin, Cunhua Pan, Maged Elkashlan, Chau Yuen, Kai-Kit Wong

    Abstract: In this paper, we investigate reconfigurable intelligent surface (RIS)-assisted communication systems which involve a fixed-antenna base station (BS) and a mobile user (MU) that is equipped with fluid antenna system (FAS). Specifically, the RIS is utilized to enable communication for the user whose direct link from the base station is blocked by obstacles. We propose a comprehensive framework that… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: submitted to IEEE journal for possible publication

  2. HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation

    Authors: Guoan Xu, Wenjing Jia, Tao Wu, Ligeng Chen, Guangwei Gao

    Abstract: Both Convolutional Neural Networks (CNNs) and Transformers have shown great success in semantic segmentation tasks. Efforts have been made to integrate CNNs with Transformer models to capture both local and global context interactions. However, there is still room for enhancement, particularly when considering constraints on computational resources. In this paper, we introduce HAFormer, a model th… ▽ More

    Submitted 10 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: 13 pages, 10 figures, 8 tables, IEEE Transactions on Image Processing

  3. arXiv:2407.07209  [pdf

    cond-mat.mes-hall

    Electrical switching of spin-polarized light-emitting diodes based on a 2D CrI3/hBN/WSe2 heterostructure

    Authors: Jianchen Dang, Tongyao Wu, Shuohua Yan, Kenji Watanabe, Takashi Taniguchi, Hechang Lei, Xiao-Xiao Zhang

    Abstract: Spin-polarized light-emitting diodes (spin-LEDs) convert the electronic spin information to photon circular polarization, offering potential applications including spin amplification, optical communications, and advanced imaging. The conventional control of the emitted light's circular polarization requires a change in the external magnetic field, limiting the operation conditions of spin-LEDs. He… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  4. arXiv:2407.06494  [pdf, other

    cs.LG cs.AI

    A Generative Approach to Control Complex Physical Systems

    Authors: Long Wei, Peiyan Hu, Ruiqi Feng, Haodong Feng, Yixuan Du, Tao Zhang, Rui Wang, Yue Wang, Zhi-Ming Ma, Tailin Wu

    Abstract: Controlling the evolution of complex physical systems is a fundamental task across science and engineering. Classical techniques suffer from limited applicability or huge computational costs. On the other hand, recent deep learning and reinforcement learning-based approaches often struggle to optimize long-term control sequences under the constraints of system dynamics. In this work, we introduce… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  5. arXiv:2407.06191  [pdf, other

    cs.CV

    Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images

    Authors: Zhangyang Qi, Yunhan Yang, Mengchen Zhang, Long Xing, Xiaoyang Wu, Tong Wu, Dahua Lin, Xihui Liu, Jiaqi Wang, Hengshuang Zhao

    Abstract: Recent advances in 3D AIGC have shown promise in directly creating 3D objects from text and images, offering significant cost savings in animation and product design. However, detailed edit and customization of 3D assets remains a long-standing challenge. Specifically, 3D Generation methods lack the ability to follow finely detailed instructions as precisely as their 2D image creation counterparts… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Project Page: https://tailor3d-2024.github.io/

  6. arXiv:2407.05643  [pdf, other

    cs.IT eess.SP

    Spatial Non-Stationary Dual-Wideband Channel Estimation for XL-MIMO Systems

    Authors: Anzheng Tang, Jun-Bo Wang, Yijin Pan, Tuo Wu, Chuanwen Chang, Yijian Chen, Hongkang Yu, Maged Elkashlan

    Abstract: In this paper, we investigate the channel estimation problem for extremely large-scale multi-input and multi-output (XL-MIMO) systems, considering the spherical wavefront effect, spatially non-stationary (SnS) property, and dual-wideband effects. To accurately characterize the XL-MIMO channel, we first derive a novel spatial-and-frequency-domain channel model for XL-MIMO systems and carefully exam… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE journal for possible publication

  7. arXiv:2407.05289  [pdf, other

    cs.IT eess.SP

    DM-MIMO: Diffusion Models for Robust Semantic Communications over MIMO Channels

    Authors: Yiheng Duan, Tong Wu, Zhiyong Chen, Meixia Tao

    Abstract: This paper investigates robust semantic communications over multiple-input multiple-output (MIMO) fading channels. Current semantic communications over MIMO channels mainly focus on channel adaptive encoding and decoding, which lacks exploration of signal distribution. To leverage the potential of signal distribution in signal space denoising, we develop a diffusion model over MIMO channels (DM-MI… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  8. arXiv:2407.05013  [pdf, other

    cs.CL

    Progress or Regress? Self-Improvement Reversal in Post-training

    Authors: Ting Wu, Xuefeng Li, Pengfei Liu

    Abstract: Self-improvement through post-training methods such as iterative preference learning has been acclaimed for enhancing the problem-solving capabilities (e.g., mathematical reasoning) of Large Language Models (LLMs) without human intervention. However, as exploration deepens, it becomes crucial to assess whether these improvements genuinely signify progress in solving more challenging problems or if… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  9. arXiv:2407.03204  [pdf, other

    cs.CV

    Expressive Gaussian Human Avatars from Monocular RGB Video

    Authors: Hezhen Hu, Zhiwen Fan, Tianhao Wu, Yihan Xi, Seoyoung Lee, Georgios Pavlakos, Zhangyang Wang

    Abstract: Nuanced expressiveness, particularly through fine-grained hand and facial expressions, is pivotal for enhancing the realism and vitality of digital human representations. In this work, we focus on investigating the expressiveness of human avatars when learned from monocular RGB video; a setting that introduces new challenges in capturing and animating fine-grained details. To this end, we introduc… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  10. arXiv:2407.02233  [pdf, other

    cs.CL cs.AI cs.LG

    Synthetic Multimodal Question Generation

    Authors: Ian Wu, Sravan Jayanthi, Vijay Viswanathan, Simon Rosenberg, Sina Pakazad, Tongshuang Wu, Graham Neubig

    Abstract: Multimodal Retrieval Augmented Generation (MMRAG) is a powerful approach to question-answering over multimodal documents. A key challenge with evaluating MMRAG is the paucity of high-quality datasets matching the question styles and modalities of interest. In light of this, we propose SMMQG, a synthetic data generation framework. SMMQG leverages interplay between a retriever, large language model… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Submitted to ARR June 2024

  11. arXiv:2407.01449  [pdf, other

    cs.IR cs.CL cs.CV

    ColPali: Efficient Document Retrieval with Vision Language Models

    Authors: Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo

    Abstract: Documents are visually rich structures that convey information through text, as well as tables, figures, page layouts, or fonts. While modern document retrieval systems exhibit strong performance on query-to-text matching, they struggle to exploit visual cues efficiently, hindering their performance on practical document retrieval applications such as Retrieval Augmented Generation. To benchmark c… ▽ More

    Submitted 2 July, 2024; v1 submitted 27 June, 2024; originally announced July 2024.

    Comments: Under Review

  12. arXiv:2407.00304  [pdf, other

    eess.SY

    A Review of Safe Reinforcement Learning Methods for Modern Power Systems

    Authors: Tong Su, Tong Wu, Junbo Zhao, Anna Scaglione, Le Xie

    Abstract: Due to the availability of more comprehensive measurement data in modern power systems, there has been significant interest in developing and applying reinforcement learning (RL) methods for operation and control. Conventional RL training is based on trial-and-error and reward feedback interaction with either a model-based simulated environment or a data-driven and model-free simulation environmen… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  13. arXiv:2407.00281  [pdf

    cond-mat.str-el cond-mat.mes-hall

    Distinguishing Surface and Bulk Electromagnetism via Their Dynamics in an Intrinsic Magnetic Topological Insulator

    Authors: Khanh Duy Nguyen, Woojoo Lee, Jianchen Dang, Tongyao Wu, Gabriele Berruto, Chenhui Yan, Chi Ian Jess Ip, Haoran Lin, Qiang Gao, Seng Huat Lee, Binghai Yan, Chaoxing Liu, Zhiqiang Mao, Xiao-Xiao Zhang, Shuolong Yang

    Abstract: The indirect exchange interaction between local magnetic moments via surface electrons has been long predicted to bolster the surface ferromagnetism in magnetic topological insulators (MTIs), which facilitates the quantum anomalous Hall effect. This unconventional effect is critical to determining the operating temperatures of future topotronic devices. However, the experimental confirmation of th… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 19 pages, 4 figures

  14. arXiv:2406.18665  [pdf, other

    cs.LG cs.AI cs.CL

    RouteLLM: Learning to Route LLMs with Preference Data

    Authors: Isaac Ong, Amjad Almahairi, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M Waleed Kadous, Ion Stoica

    Abstract: Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost. More powerful models, though effective, come with higher expenses, while less capable models are more cost-effective. To address this dilemma, we propose several efficient router models that dynamically select betwe… ▽ More

    Submitted 1 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  15. arXiv:2406.17836  [pdf

    cs.AI cs.CY math.HO physics.soc-ph

    A Moonshot for AI Oracles in the Sciences

    Authors: Bryan Kaiser, Tailin Wu, Maike Sonnewald, Colin Thackray, Skylar Callis

    Abstract: Nobel laureate Philip Anderson and Elihu Abrahams once stated that, "even if machines did contribute to normal science, we see no mechanism by which they could create a Kuhnian revolution and thereby establish a new physical law." In this Perspective, we draw upon insights from the philosophies of science and artificial intelligence (AI) to propose necessary conditions of precisely such a mechanis… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Report number: LA-UR-23-31369

  16. arXiv:2406.17642  [pdf, other

    cs.CL cs.AI

    Banishing LLM Hallucinations Requires Rethinking Generalization

    Authors: Johnny Li, Saksham Consul, Eda Zhou, James Wong, Naila Farooqui, Yuxin Ye, Nithyashree Manohar, Zhuxiaona Wei, Tian Wu, Ben Echols, Sharon Zhou, Gregory Diamos

    Abstract: Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  17. arXiv:2406.16990  [pdf, other

    cs.SD cs.AI eess.AS

    AND: Audio Network Dissection for Interpreting Deep Acoustic Models

    Authors: Tung-Yu Wu, Yu-Xiang Lin, Tsui-Wei Weng

    Abstract: Neuron-level interpretations aim to explain network behaviors and properties by investigating neurons responsive to specific perceptual or structural input patterns. Although there is emerging work in the vision and language domains, none is explored for acoustic models. To bridge the gap, we introduce $\textit{AND}$, the first $\textbf{A}$udio $\textbf{N}$etwork $\textbf{D}$issection framework th… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML'24

    Journal ref: Forty-first International Conference on Machine Learning (2024)

  18. arXiv:2406.16876  [pdf, other

    eess.SP

    Near-Field Mobile Tracking: A Framework of Using XL-RIS Information

    Authors: Tuo Wu, Cunhua Pan, Kangda Zhi, Hong Ren, Maged Elkashlan, Chau Yuen

    Abstract: This paper introduces a novel mobile tracking framework leveraging the high-dimensional signal received from extremely large-scale (XL) reconfigurable intelligent surfaces (RIS). This received signal, named XL-RIS information, has a much larger data dimension and therefore offers a richer feature set compared to the traditional base station (BS) received signal, i.e., BS information, enabling more… ▽ More

    Submitted 3 April, 2024; originally announced June 2024.

  19. arXiv:2406.16599  [pdf, ps, other

    math.AC

    Further results on equivalence of multivariate polynomial matrices

    Authors: Jiancheng Guan, Jinwang Liu, Dongmei Li, Tao Wu

    Abstract: This paper investigates equivalence of square multivariate polynomial matrices with the determinant being some power of a univariate irreducible polynomial. We first generalized a global-local theorem of Vaserstein. Then we proved these matrices are equivalent to their Smith forms by the generalized global-local theorem.

    Submitted 24 June, 2024; originally announced June 2024.

  20. arXiv:2406.14909  [pdf, other

    cs.LG cs.AI cs.CL

    MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

    Authors: Tianyu Fu, Haofeng Huang, Xuefei Ning, Genghan Zhang, Boju Chen, Tianqi Wu, Hongyi Wang, Zixiao Huang, Shiyao Li, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Sparse attention can effectively mitigate the significant memory and throughput demands of Large Language Models (LLMs) in long contexts. Existing methods typically employ a uniform sparse attention mask, applying the same sparse pattern across different attention heads and input lengths. However, this uniform approach fails to capture the diverse attention patterns inherent in LLMs, ignoring thei… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 10 pages

    ACM Class: I.2.7

  21. arXiv:2406.14300  [pdf, ps, other

    math.DG

    The Connes-Chamseddine Hochschild cocycle and the noncommutative integral

    Authors: Tong Wu, Yong Wang

    Abstract: In [5], Connes and Chamseddine defined a Hochschild cocycle in the general framework of noncommutative geometry. They computed this Hochschild cocycle for the Dirac operator on 4-dimensioanl manifolds. We propose a way to study the Connes-Chamseddine Hochschild cocycle from the viewpoint of the noncommutative integral on 6-dimensional manifolds in this paper. We compute several interesting noncomm… ▽ More

    Submitted 26 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  22. arXiv:2406.13943  [pdf, ps, other

    cs.IT

    New QEC codes and EAQEC codes from repeated-root cyclic codes of length $2^rp^s$

    Authors: Lanqiang Li, Ziwen Cao, Tingting Wu, Li Liu

    Abstract: Let $p$ be an odd prime and $r,s,m$ be positive integers. In this study, we initiate our exploration by delving into the intricate structure of all repeated-root cyclic codes and their duals with a length of $2^rp^s$ over the finite field $\mathbb{F}_{p^m}$. Through the utilization of CSS and Steane's constructions, a series of new quantum error-correcting (QEC) codes are constructed with paramete… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    MSC Class: 94B15 (Primary) 94B05; 11T71(Secondary)

  23. arXiv:2406.13920  [pdf, other

    cs.LG cs.SI

    Explainable AI Security: Exploring Robustness of Graph Neural Networks to Adversarial Attacks

    Authors: Tao Wu, Canyixing Cui, Xingping Xian, Shaojie Qiao, Chao Wang, Lin Yuan, Shui Yu

    Abstract: Graph neural networks (GNNs) have achieved tremendous success, but recent studies have shown that GNNs are vulnerable to adversarial attacks, which significantly hinders their use in safety-critical scenarios. Therefore, the design of robust GNNs has attracted increasing attention. However, existing research has mainly been conducted via experimental trial and error, and thus far, there remains a… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  24. arXiv:2406.13890  [pdf, other

    cs.CL cs.AI

    ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World

    Authors: Weixiang Yan, Haitian Liu, Tengxiao Wu, Qian Chen, Wen Wang, Haoyuan Chai, Jiayi Wang, Weishan Zhao, Yixin Zhang, Renjun Zhang, Li Zhu

    Abstract: LLMs have achieved significant performance progress in various NLP applications. However, LLMs still struggle to meet the strict requirements for accuracy and reliability in the medical field and face many challenges in clinical applications. Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations. Firstly, most existing medical eval… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  25. arXiv:2406.13499  [pdf, other

    cs.SI cs.LG

    GraphMU: Repairing Robustness of Graph Neural Networks via Machine Unlearning

    Authors: Tao Wu, Xinwen Cao, Chao Wang, Shaojie Qiao, Xingping Xian, Lin Yuan, Canyixing Cui, Yanbing Liu

    Abstract: Graph Neural Networks (GNNs) have demonstrated significant application potential in various fields. However, GNNs are still vulnerable to adversarial attacks. Numerous adversarial defense methods on GNNs are proposed to address the problem of adversarial attacks. However, these methods can only serve as a defense before poisoning, but cannot repair poisoned GNN. Therefore, there is an urgent need… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  26. arXiv:2406.13444  [pdf, other

    cs.CL cs.CV

    VDebugger: Harnessing Execution Feedback for Debugging Visual Programs

    Authors: Xueqing Wu, Zongyu Lin, Songyan Zhao, Te-Lin Wu, Pan Lu, Nanyun Peng, Kai-Wei Chang

    Abstract: Visual programs are executable code generated by large language models to address visual reasoning problems. They decompose complex questions into multiple reasoning steps and invoke specialized models for each step to solve the problems. However, these programs are prone to logic errors, with our preliminary evaluation showing that 58% of the total errors are caused by program logic errors. Debug… ▽ More

    Submitted 27 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: update reference

  27. arXiv:2406.12871  [pdf, other

    math.RA

    Weighted differential ($q$-tri)dendriform algebras

    Authors: Yuanyuan Zhang, Huhu Zhang, Tingzeng Wu, Xing Gao

    Abstract: In this paper, we first introduce a weighted derivation on algebras over an operad $\cal P$, and prove that for the free $\cal P$-algebra, its weighted derivation is determined by the restriction on the generators. As applications, we propose the concept of weighted differential ($q$-tri)dendriform algebras and study some basic properties of them. Then Novikov-(tri)dendriform algebras are initiate… ▽ More

    Submitted 28 April, 2024; originally announced June 2024.

    Comments: 26 pages. arXiv admin note: substantial text overlap with arXiv:2305.19609

    MSC Class: 16W99; 16S10; 13P10; 08B20;

  28. arXiv:2406.12753  [pdf, other

    cs.CL cs.AI

    OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

    Authors: Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang , et al. (3 additional authors not shown)

    Abstract: The evolution of Artificial Intelligence (AI) has been significantly accelerated by advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), gradually showcasing potential cognitive reasoning abilities in problem-solving and scientific discovery (i.e., AI4Science) once exclusive to human intellect. To comprehensively evaluate current models' performance in cognitive reasoni… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 44 pages

  29. arXiv:2406.12373  [pdf, other

    cs.CL cs.AI cs.LG

    WebCanvas: Benchmarking Web Agents in Online Environments

    Authors: Yichen Pan, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu, Yanyi Shang, Shuyan Zhou, Tongshuang Wu, Zhengyang Wu

    Abstract: For web agents to be practically useful, they must adapt to the continuously evolving web environment characterized by frequent updates to user interfaces and content. However, most existing benchmarks only capture the static aspects of the web. To bridge this gap, we introduce WebCanvas, an innovative online evaluation framework for web agents that effectively addresses the dynamic nature of web… ▽ More

    Submitted 27 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Our platform, tool and dataset are publically available at https://www.imean.ai/web-canvas/ and https://huggingface.co/datasets/iMeanAI/Mind2Web-Live/

    MSC Class: 68T50 ACM Class: I.2.7

  30. arXiv:2406.11939  [pdf, other

    cs.LG cs.AI cs.CL

    From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

    Authors: Tianle Li, Wei-Lin Chiang, Evan Frick, Lisa Dunlap, Tianhao Wu, Banghua Zhu, Joseph E. Gonzalez, Ion Stoica

    Abstract: The rapid evolution of language models has necessitated the development of more challenging benchmarks. Current static benchmarks often struggle to consistently distinguish between the capabilities of different models and fail to align with real-world user preferences. On the other hand, live crowd-sourced platforms like the Chatbot Arena collect a wide range of natural prompts and user feedback.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  31. arXiv:2406.11909  [pdf, other

    cs.LG cs.AI

    Mixture-of-Subspaces in Low-Rank Adaptation

    Authors: Taiqiang Wu, Jiahao Wang, Zhe Zhao, Ngai Wong

    Abstract: In this paper, we introduce a subspace-inspired Low-Rank Adaptation (LoRA) method, which is computationally efficient, easy to implement, and readily applicable to large language, multimodal, and diffusion models. Initially, we equivalently decompose the weights of LoRA into two subspaces, and find that simply mixing them can enhance performance. To study such a phenomenon, we revisit it through a… ▽ More

    Submitted 5 July, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: working in progress

  32. arXiv:2406.11739  [pdf, other

    cs.CV

    V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

    Authors: Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou , et al. (9 additional authors not shown)

    Abstract: Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  33. arXiv:2406.11451  [pdf, other

    cs.CV

    MedThink: Inducing Medical Large-scale Visual Language Models to Hallucinate Less by Thinking More

    Authors: Yue Jiang, Jiawei Chen, Dingkang Yang, Mingcheng Li, Shunli Wang, Tong Wu, Ke Li, Lihua Zhang

    Abstract: When Large Vision Language Models (LVLMs) are applied to multimodal medical generative tasks, they suffer from significant model hallucination issues. This severely impairs the model's generative accuracy, making it challenging for LVLMs to be implemented in real-world medical scenarios to assist doctors in diagnosis. Enhancing the training data for downstream medical generative tasks is an effect… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  34. arXiv:2406.10558  [pdf, other

    cs.RO

    A Hybrid Controller Design for Human-Assistive Piloting of an Underactuated Blimp

    Authors: Wugang Meng, Tianfu Wu, Qiuyang Tao, Fumin Zhang

    Abstract: This paper introduces a novel solution to the manual control challenge for indoor blimps. The problem's complexity arises from the conflicting demands of executing human commands while maintaining stability through automatic control for underactuated robots. To tackle this challenge, we introduced an assisted piloting hybrid controller with a preemptive mechanism, that seamlessly switches between… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  35. arXiv:2406.10556  [pdf, other

    cs.IT cs.AI

    Multi-User Semantic Fusion for Semantic Communications over Degraded Broadcast Channels

    Authors: Tong Wu, Zhiyong Chen, Meixia Tao, Bin Xia, Wenjun Zhang

    Abstract: Degraded broadcast channels (DBC) are a typical multiuser communication scenario, Semantic communications over DBC still lack in-depth research. In this paper, we design a semantic communications approach based on multi-user semantic fusion for wireless image transmission over DBC. In the proposed method, the transmitter extracts semantic features for two users separately. It then effectively fuse… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: accepted by China Communications

  36. arXiv:2406.10185  [pdf, other

    cs.CV

    Detecting and Evaluating Medical Hallucinations in Large Vision Language Models

    Authors: Jiawei Chen, Dingkang Yang, Tong Wu, Yue Jiang, Xiaolu Hou, Mingcheng Li, Shunli Wang, Dongling Xiao, Ke Li, Lihua Zhang

    Abstract: Large Vision Language Models (LVLMs) are increasingly integral to healthcare applications, including medical visual question answering and imaging report generation. While these models inherit the robust capabilities of foundational Large Language Models (LLMs), they also inherit susceptibility to hallucinations-a significant concern in high-stakes medical contexts where the margin for error is mi… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  37. arXiv:2406.08160  [pdf, other

    cs.RO

    Chemistry3D: Robotic Interaction Benchmark for Chemistry Experiments

    Authors: Shoujie Li, Yan Huang, Changqing Guo, Tong Wu, Jiawei Zhang, Linrui Zhang, Wenbo Ding

    Abstract: The advent of simulation engines has revolutionized learning and operational efficiency for robots, offering cost-effective and swift pipelines. However, the lack of a universal simulation platform tailored for chemical scenarios impedes progress in robotic manipulation and visualization of reaction processes. Addressing this void, we present Chemistry3D, an innovative toolkit that integrates exte… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  38. arXiv:2406.07411  [pdf, other

    cs.SE cs.CL

    VersiCode: Towards Version-controllable Code Generation

    Authors: Tongtong Wu, Weigang Wu, Xingyu Wang, Kang Xu, Suyu Ma, Bo Jiang, Ping Yang, Zhenchang Xing, Yuan-Fang Li, Gholamreza Haffari

    Abstract: Significant research has focused on improving the performance of large language model on code-related tasks due to their practical importance. Although performance is typically evaluated using public benchmark datasets, the existing datasets do not account for the concept of \emph{version}, which is crucial in professional software development. In this paper, we introduce VersiCode, the first comp… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  39. arXiv:2406.07138  [pdf, other

    cs.CL

    Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement

    Authors: Tong Wu, Yanpeng Zhao, Zilong Zheng

    Abstract: Recently, many methods have been developed to extend the context length of pre-trained large language models (LLMs), but they often require fine-tuning at the target length ($\gg4K$) and struggle to effectively utilize information from the middle part of the context. To address these issues, we propose $\textbf{C}$ontinuity-$\textbf{R}$elativity ind$\textbf{E}$xing with g$\textbf{A}$ussian… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  40. arXiv:2406.06230  [pdf, other

    cs.CV

    UEMM-Air: A Synthetic Multi-modal Dataset for Unmanned Aerial Vehicle Object Detection

    Authors: Fan Liu, Liang Yao, Shengxiang Xu, Chuanyi Zhang, Xinlei Zhang, Ting Wu

    Abstract: The development of multi-modal object detection for Unmanned Aerial Vehicles (UAVs) typically relies on a large amount of pixel-aligned multi-modal image data. However, existing datasets face challenges such as limited modalities, high construction costs, and imprecise annotations. To this end, we propose a synthetic multi-modal UAV-based object detection dataset, UEMM-Air. Specially, we simulate… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  41. arXiv:2406.05338  [pdf, other

    cs.CV

    MotionClone: Training-Free Motion Cloning for Controllable Video Generation

    Authors: Pengyang Ling, Jiazi Bu, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Tong Wu, Huaian Chen, Jiaqi Wang, Yi Jin

    Abstract: Motion-based controllable text-to-video generation involves motions to control the video generation. Previous methods typically require the training of models to encode motion cues or the fine-tuning of video diffusion models. However, these approaches often result in suboptimal motion generation when applied outside the trained domain. In this work, we propose MotionClone, a training-free framewo… ▽ More

    Submitted 28 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: 17 pages, 12 figures, https://bujiazi.github.io/motionclone.github.io/

  42. arXiv:2406.04837  [pdf, other

    cond-mat.supr-con cond-mat.str-el

    Normal and superconducting properties of La$_3$Ni$_2$O$_7$

    Authors: Meng Wang, Hai-Hu Wen, Tao Wu, Dao-Xin Yao, Tao Xiang

    Abstract: This review provides a comprehensive overview of current research on the structural, electronic, and magnetic characteristics of the recently discovered high-temperature superconductor La$_3$Ni$_2$O$_7$ under high pressures. We present the experimental results for synthesizing and characterizing this material, derived from measurements of transport, thermodynamics, and various spectroscopic techni… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 15 pages, 11 figures

  43. arXiv:2406.04375  [pdf, other

    cs.SE

    Verifying components of Arm(R) Confidential Computing Architecture with ESBMC

    Authors: Tong Wu, Shale Xiong, Edoardo Manino, Gareth Stockwell, Lucas C. Cordeiro

    Abstract: Realm Management Monitor (RMM) is an essential firmware component within the recent Arm Confidential Computing Architecture (Arm CCA). Previous work applies formal techniques to verify the specification and prototype reference implementation of RMM. However, relying solely on a single verification tool may lead to the oversight of certain bugs or vulnerabilities. This paper discusses the applicati… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  44. arXiv:2406.03090  [pdf, other

    math.OC

    The Optimal Production Transport: Model and Algorithm

    Authors: Jie Fan, Tianhao Wu, Hao Wu

    Abstract: In this paper, we propose the optimal production transport model, which is an extension of the classical optimal transport model. We observe in economics, the production of the factories can always be adjusted within a certain range, while the classical optimal transport does not take this situation into account. Therefore, differing from the classical optimal transport, one of the marginals is al… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 28 pages, 4 figures

  45. arXiv:2406.01425  [pdf, other

    cs.CV

    Sensitivity-Informed Augmentation for Robust Segmentation

    Authors: Laura Zheng, Wenjie Wei, Tony Wu, Jacob Clements, Shreelekha Revankar, Andre Harrison, Yu Shen, Ming C. Lin

    Abstract: Segmentation is an integral module in many visual computing applications such as virtual try-on, medical imaging, autonomous driving, and agricultural automation. These applications often involve either widespread consumer use or highly variable environments, both of which can degrade the quality of visual sensor data, whether from a common mobile phone or an expensive satellite imaging camera. In… ▽ More

    Submitted 16 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages

  46. arXiv:2406.00312  [pdf, other

    cs.RO

    NuRF: Nudging the Particle Filter in Radiance Fields for Robot Visual Localization

    Authors: Wugang Meng, Tianfu Wu, Huan Yin, Fumin Zhang

    Abstract: Can we localize a robot in radiance fields only using monocular vision? This study presents NuRF, a nudged particle filter framework for 6-DoF robot visual localization in radiance fields. NuRF sets anchors in SE(3) to leverage visual place recognition, which provides image comparisons to guide the sampling process. This guidance could improve the convergence and robustness of particle filters for… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 11 pages, 14 figures

  47. arXiv:2406.00093  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.MM

    Bootstrap3D: Improving 3D Content Creation with Synthetic Data

    Authors: Zeyi Sun, Tong Wu, Pan Zhang, Yuhang Zang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

    Abstract: Recent years have witnessed remarkable progress in multi-view diffusion models for 3D content creation. However, there remains a significant gap in image quality and prompt-following ability compared to 2D diffusion models. A critical bottleneck is the scarcity of high-quality 3D assets with detailed captions. To address this challenge, we propose Bootstrap3D, a novel framework that automatically… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: Project Page: https://sunzey.github.io/Bootstrap3D/

  48. arXiv:2405.20883  [pdf, other

    cs.RO

    Scalable Distance-based Multi-Agent Relative State Estimation via Block Multiconvex Optimization

    Authors: Tianyue Wu, Gongye Zaitian, Qianhao Wang, Fei Gao

    Abstract: This paper explores the distance-based relative state estimation problem in large-scale systems, which is hard to solve effectively due to its high-dimensionality and non-convexity. In this paper, we alleviate this inherent hardness to simultaneously achieve scalability and robustness of inference on this problem. Our idea is launched from a universal geometric formulation, called \emph{generalize… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: To appear in Robotics: Science and System 2024

  49. arXiv:2405.20685  [pdf, other

    cs.LG cs.CV

    Enhancing Counterfactual Image Generation Using Mahalanobis Distance with Distribution Preferences in Feature Space

    Authors: Yukai Zhang, Ao Xu, Zihao Li, Tieru Wu

    Abstract: In the realm of Artificial Intelligence (AI), the importance of Explainable Artificial Intelligence (XAI) is increasingly recognized, particularly as AI models become more integral to our lives. One notable single-instance XAI approach is counterfactual explanation, which aids users in comprehending a model's decisions and offers guidance on altering these decisions. Specifically in the context of… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  50. arXiv:2405.20664  [pdf, other

    cs.LG

    Weak Robust Compatibility Between Learning Algorithms and Counterfactual Explanation Generation Algorithms

    Authors: Ao Xu, Tieru Wu

    Abstract: Counterfactual explanation generation is a powerful method for Explainable Artificial Intelligence. It can help users understand why machine learning models make specific decisions, and how to change those decisions. Evaluating the robustness of counterfactual explanation algorithms is therefore crucial. Previous literature has widely studied the robustness based on the perturbation of input insta… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.