Skip to main content

Showing 1–50 of 1,634 results for author: Xue, D

  1. arXiv:2407.09068  [pdf, other

    cs.RO

    Fast and Accurate Multi-Agent Trajectory Prediction For Crowded Unknown Scenes

    Authors: Xiuye Tao, Huiping Li, Bin Liang, Yang Shi, Demin Xu

    Abstract: This paper studies the problem of multi-agent trajectory prediction in crowded unknown environments. A novel energy function optimization-based framework is proposed to generate prediction trajectories. Firstly, a new energy function is designed for easier optimization. Secondly, an online optimization pipeline for calculating parameters and agents' velocities is developed. In this pipeline, we fi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  2. arXiv:2407.07615  [pdf, other

    math.OC eess.SY

    Finite Control Set Model Predictive Control with Limit Cycle Stability Guarantees

    Authors: Duo Xu, Mircea Lazar

    Abstract: This paper considers the design of finite control set model predictive control (FCS-MPC) for discrete-time switched affine systems. Existing FCS-MPC methods typically pursue practical stability guarantees, which ensure convergence to a bounded invariant set that contains a desired steady state. As such, current FCS-MPC methods result in unpredictable steady-state behavior due to arbitrary switchin… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  3. arXiv:2407.06942  [pdf, other

    cs.CR

    An Improved Two-Step Attack on CRYSTALS-Kyber

    Authors: Kai Wang, Dejun Xu, Jing Tian

    Abstract: After three rounds of post-quantum cryptography (PQC) strict evaluations conducted by the national institute of standards and technology (NIST), CRYSTALS-Kyber has successfully been selected and drafted for standardization from the mid of 2022. It becomes urgent to further evaluate Kyber's physical security for the upcoming deployment phase. In this paper, we present an improved two-step attack on… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Submitted to ICCAD in May 2024

  4. arXiv:2407.05858  [pdf, other

    cs.AI

    Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU

    Authors: Daliang Xu, Hao Zhang, Liming Yang, Ruiqi Liu, Gang Huang, Mengwei Xu, Xuanzhe Liu

    Abstract: On-device large language models (LLMs) are catalyzing novel mobile applications such as UI task automation and personalized email auto-reply, without giving away users' private data. However, on-device LLMs still suffer from unacceptably long inference latency, especially the time to first token (prefill stage) due to the need of long context for accurate, personalized content generation, as well… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  5. arXiv:2407.03307  [pdf, other

    eess.IV cs.CV

    HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization

    Authors: Yucheng Tang, Yufan He, Vishwesh Nath, Pengfeig Guo, Ruining Deng, Tianyuan Yao, Quan Liu, Can Cui, Mengmeng Yin, Ziyue Xu, Holger Roth, Daguang Xu, Haichun Yang, Yuankai Huo

    Abstract: In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  6. arXiv:2407.02877  [pdf, other

    cs.IT eess.SP

    Resource Allocation Design for Next-Generation Multiple Access: A Tutorial Overview

    Authors: Zhiqiang Wei, Dongfang Xu, Shuangyang Li, Shenghui Song, Derrick Wing Kwan Ng, Giuseppe Caire

    Abstract: Multiple access is the cornerstone technology for each generation of wireless cellular networks and resource allocation design plays a crucial role in multiple access. In this paper, we present a comprehensive tutorial overview for junior researchers in this field, aiming to offer a foundational guide for resource allocation design in the context of next-generation multiple access (NGMA). Initiall… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 69 pages, 10 figures, 5 tables

  7. arXiv:2407.02508  [pdf, other

    cs.RO cs.AI cs.LG

    Sample-efficient Imitative Multi-token Decision Transformer for Generalizable Real World Driving

    Authors: Hang Zhou, Dan Xu, Yiding Ji

    Abstract: Reinforcement learning via sequence modeling has shown remarkable promise in autonomous systems, harnessing the power of offline datasets to make informed decisions in simulated environments. However, the full potential of such methods in complex dynamic environments remain to be discovered. In autonomous driving domain, learning-based agents face significant challenges when transferring knowledge… ▽ More

    Submitted 18 June, 2024; originally announced July 2024.

  8. arXiv:2407.02452  [pdf, other

    cs.CR

    A Hardware-Friendly Shuffling Countermeasure Against Side-Channel Attacks for Kyber

    Authors: Dejun Xu, Kai Wang, Jing Tian

    Abstract: CRYSTALS-Kyber (a.k.a. Kyber) has been drafted to be standardized as the only key encapsulation mechanism (KEM) scheme by the national institute of standards and technology (NIST) to withstand attacks by large-scale quantum computers. However, the side-channel attack (SCA) on its implementation is still needed to be well considered for the upcoming migration. In this brief, we propose a secure and… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  9. arXiv:2407.01548  [pdf, ps, other

    q-bio.OT cs.AI cs.LG

    From Cognition to Computation: A Comparative Review of Human Attention and Transformer Architectures

    Authors: Minglu Zhao, Dehong Xu, Tao Gao

    Abstract: Attention is a cornerstone of human cognition that facilitates the efficient extraction of information in everyday life. Recent developments in artificial intelligence like the Transformer architecture also incorporate the idea of attention in model designs. However, despite the shared fundamental principle of selectively attending to information, human attention and the Transformer model display… ▽ More

    Submitted 25 April, 2024; originally announced July 2024.

  10. arXiv:2407.00286  [pdf, other

    cs.NI cs.LG

    Digital Twin-Assisted Data-Driven Optimization for Reliable Edge Caching in Wireless Networks

    Authors: Zifan Zhang, Yuchen Liu, Zhiyuan Peng, Mingzhe Chen, Dongkuan Xu, Shuguang Cui

    Abstract: Optimizing edge caching is crucial for the advancement of next-generation (nextG) wireless networks, ensuring high-speed and low-latency services for mobile users. Existing data-driven optimization approaches often lack awareness of the distribution of random data variables and focus solely on optimizing cache hit rates, neglecting potential reliability concerns, such as base station overload and… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Accepted by IEEE Journal on Selected Areas in Communications (JSAC)

  11. arXiv:2407.00031  [pdf, other

    cs.DC cs.SE

    Supercharging Federated Learning with Flower and NVIDIA FLARE

    Authors: Holger R. Roth, Daniel J. Beutel, Yan Cheng, Javier Fernandez Marques, Heng Pan, Chester Chen, Zhihong Zhang, Yuhong Wen, Sean Yang, Isaac, Yang, Yuan-Ting Hsieh, Ziyue Xu, Daguang Xu, Nicholas D. Lane, Andrew Feng

    Abstract: Several open-source systems, such as Flower and NVIDIA FLARE, have been developed in recent years while focusing on different aspects of federated learning (FL). Flower is dedicated to implementing a cohesive approach to FL, analytics, and evaluation. Over time, Flower has cultivated extensive strategies and algorithms tailored for FL application development, fostering a vibrant FL community in re… ▽ More

    Submitted 21 May, 2024; originally announced July 2024.

  12. arXiv:2406.19699  [pdf, other

    cond-mat.mes-hall

    Tunable corner-like modes in generalized quadrupole topological insulator

    Authors: Rui Chen, Bin Zhou, Dong-Hui Xu

    Abstract: Higher-order topological insulators harbor unique corner modes that hold immense potential for applications in information storage. However, the practical manipulation of these states has been constrained by the fixed positions and energies of conventional corner modes. In this work, we present a theoretical framework for generating topologically protected corner-like modes in higher-order topolog… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  13. arXiv:2406.19421  [pdf, other

    hep-ex physics.ins-det

    The Belle II Detector Upgrades Framework Conceptual Design Report

    Authors: H. Aihara, A. Aloisio, D. P. Auguste, M. Aversano, M. Babeluk, S. Bahinipati, Sw. Banerjee, M. Barbero, J. Baudot, A. Beaubien, F. Becherer, T. Bergauer, F. U. Bernlochner., V. Bertacchi, G. Bertolone, C. Bespin, M. Bessner, S. Bettarini, A. J. Bevan, B. Bhuyan, M. Bona, J. F. Bonis, J. Borah, F. Bosi, R. Boudagga , et al. (186 additional authors not shown)

    Abstract: We describe the planned near-term and potential longer-term upgrades of the Belle II detector at the SuperKEKB electron-positron collider operating at the KEK laboratory in Tsukuba, Japan. These upgrades will allow increasingly sensitive searches for possible new physics beyond the Standard Model in flavor, tau, electroweak and dark sector physics that are both complementary to and competitive wit… ▽ More

    Submitted 4 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Editor: F. Forti 170 pages

    Report number: KEK-REPORT-2024-1, BELLE2-REPORT-2024-042

  14. arXiv:2406.18754  [pdf, other

    astro-ph.HE astro-ph.GA

    Rapid Response Mode observations of GRB 160203A: Looking for fine-structure line variability at z=3.52

    Authors: G. Pugliese, A. Saccardi, V. D Elia, S. D. Vergani, K. E. Heintz, S. Savaglio, L. Kaper, A. de Ugarte Postigo, D. H. Hartmann, A. De Cia, S. Vejlgaard, J. P. U. Fynbo, L. Christensen, S. Campana, D. van Rest, J. Selsing, K. Wiersema, D. B. Malesani, S. Covino, D. Burgarella, M. De Pasquale, P. Jakobsson, J. Japelj, D. A. Kann, C. Kouveliotou , et al. (4 additional authors not shown)

    Abstract: Gamma-ray bursts are the most energetic known explosions. Despite fading rapidly, they allow to measure redshift and important properties of their host-galaxies. We report the photometric and spectroscopic study of GRB 160203A and its host-galaxy. Fine-structure absorption lines, detected in the afterglow at different epochs, allow us to investigate variability due to the strong fading background… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 16 pages, 9 figures, 2 appendices, A&A accepted

  15. arXiv:2406.18037  [pdf, other

    cs.CV

    Towards Synchronous Memorizability and Generalizability with Site-Modulated Diffusion Replay for Cross-Site Continual Segmentation

    Authors: Dunyuan Xu, Xi Wang, Jingyang Zhang, Pheng-Ann Heng

    Abstract: The ability to learn sequentially from different data sites is crucial for a deep network in solving practical medical image diagnosis problems due to privacy restrictions and storage limitations. However, adapting on incoming site leads to catastrophic forgetting on past sites and decreases generalizablity on unseen sites. Existing Continual Learning (CL) and Domain Generalization (DG) methods ha… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  16. arXiv:2406.17404  [pdf, other

    cs.CL cs.LG

    Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training

    Authors: Yixuan Wang, Xianzhen Luo, Fuxuan Wei, Yijun Liu, Qingfu Zhu, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che

    Abstract: Existing speculative decoding methods typically require additional model structure and training processes to assist the model for draft token generation. This makes the migration of acceleration methods to the new model more costly and more demanding on device memory. To address this problem, we propose the Make Some Noise (MSN) training framework as a replacement for the supervised fine-tuning st… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages, 6 figures

  17. arXiv:2406.16888  [pdf, other

    eess.SP

    Efficient UAV Hovering, Resource Allocation, and Trajectory Design for ISAC with Limited Backhaul Capacity

    Authors: Ata Khalili, Atefeh Rezaei, Dongfang Xu, Falko Dressler, Robert Schober

    Abstract: In this paper, we investigate the joint resource allocation and trajectory design for a multi-user, multi-target unmanned aerial vehicle (UAV)-enabled integrated sensing and communication (ISAC) system, where the link capacity between a ground base station (BS) and the UAV is limited. The UAV conducts target sensing and information transmission in orthogonal time slots to prevent interference. As… ▽ More

    Submitted 30 April, 2024; originally announced June 2024.

    Comments: Submitted to IEEE for possible publications. arXiv admin note: text overlap with arXiv:2302.10124

  18. Formation of super-thin galaxies in Illustris-TNG

    Authors: Jianhong Hu, Dandan Xu, Cheng Li

    Abstract: Superthin galaxies are observed to have stellar disks with extremely small minor-to-major axis ratios. In this work, we investigate the formation of superthin galaxies in the TNG100 simulation. We trace the merger history and investigate the evolution of galaxy properties of a selected sample of superthin galaxies and a control sample of galaxies that share the same joint probability distribution… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 16 pages, 15 figures, accepted for publication in Research in Astronomy and Astrophysics

  19. arXiv:2406.13638  [pdf, other

    physics.data-an astro-ph.IM hep-ex physics.ins-det

    XENONnT WIMP Search: Signal & Background Modeling and Statistical Inference

    Authors: XENON Collaboration, E. Aprile, J. Aalbers, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, D. Antón Martin, F. Arneodo, L. Baudis, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, K. Boese, A. Brown, G. Bruno, R. Budnik, J. M. R. Cardoso, A. P. Cimental Chávez, A. P. Colijn, J. Conrad, J. J. Cuenca-García, V. D'Andrea , et al. (139 additional authors not shown)

    Abstract: The XENONnT experiment searches for weakly-interacting massive particle (WIMP) dark matter scattering off a xenon nucleus. In particular, XENONnT uses a dual-phase time projection chamber with a 5.9-tonne liquid xenon target, detecting both scintillation and ionization signals to reconstruct the energy, position, and type of recoil. A blind search for nuclear recoil WIMPs with an exposure of 1.1 t… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 20 pages, 10 figures

  20. arXiv:2406.13527  [pdf, other

    cs.CV

    4K4DGen: Panoramic 4D Generation at 4K Resolution

    Authors: Renjie Li, Panwang Pan, Bangbang Yang, Dejia Xu, Shijie Zhou, Xuanyang Zhang, Zeming Li, Achuta Kadambi, Zhangyang Wang, Zhiwen Fan

    Abstract: The blooming of virtual reality and augmented reality (VR/AR) technologies has driven an increasing demand for the creation of high-quality, immersive, and dynamic environments. However, existing generative techniques either focus solely on dynamic objects or perform outpainting from a single perspective image, failing to meet the needs of VR/AR applications. In this work, we tackle the challengin… ▽ More

    Submitted 4 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  21. arXiv:2406.11253  [pdf, other

    cs.CV

    Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space

    Authors: Yuan Wang, Zhao Wang, Junhao Gong, Di Huang, Tong He, Wanli Ouyang, Jile Jiao, Xuetao Feng, Qi Dou, Shixiang Tang, Dan Xu

    Abstract: In this paper, we introduce a novel path to $\textit{general}$ human motion generation by focusing on 2D space. Traditional methods have primarily generated human motions in 3D, which, while detailed and realistic, are often limited by the scope of available 3D motion data in terms of both the size and the diversity. To address these limitations, we exploit extensive availability of 2D motion data… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 22 pages, 11figures, 17 tables

  22. arXiv:2406.10897  [pdf, ps, other

    eess.SP

    When NOMA Meets AIGC: Enhanced Wireless Federated Learning

    Authors: Ding Xu, Lingjie Duan, Hongbo Zhu

    Abstract: Wireless federated learning (WFL) enables devices to collaboratively train a global model via local model training, uploading and aggregating. However, WFL faces the data scarcity/heterogeneity problem (i.e., data are limited and unevenly distributed among devices) that degrades the learning performance. In this regard, artificial intelligence generated content (AIGC) can synthesize various types… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 13 pages, submitted to IEEE TWC for possible publication

  23. arXiv:2406.10895  [pdf, ps, other

    eess.SP

    Fair Computation Offloading for RSMA-Assisted Mobile Edge Computing Networks

    Authors: Ding Xu, Lingjie Duan, Haitao Zhao, Hongbo Zhu

    Abstract: Rate splitting multiple access (RSMA) provides a flexible transmission framework that can be applied in mobile edge computing (MEC) systems. However, the research work on RSMA-assisted MEC systems is still at the infancy and many design issues remain unsolved, such as the MEC server and channel allocation problem in general multi-server and multi-channel scenarios as well as the user fairness issu… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 13 pages,submitted to IEEE TWC for possible publication

  24. arXiv:2406.08698  [pdf, other

    astro-ph.HE hep-ph

    Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

    Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 17 pages, 12 figures, accepted by PRL

  25. arXiv:2406.07929  [pdf, other

    cs.LG cs.AI

    A Generic Layer Pruning Method for Signal Modulation Recognition Deep Learning Models

    Authors: Yao Lu, Yutao Zhu, Yuqi Li, Dongwei Xu, Yun Lin, Qi Xuan, Xiaoniu Yang

    Abstract: With the successful application of deep learning in communications systems, deep neural networks are becoming the preferred method for signal classification. Although these models yield impressive results, they often come with high computational complexity and large model sizes, which hinders their practical deployment in communication systems. To address this challenge, we propose a novel layer p… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  26. arXiv:2406.06948  [pdf, other

    cs.CV cs.RO

    Neural Visibility Field for Uncertainty-Driven Active Mapping

    Authors: Shangjie Xue, Jesse Dill, Pranay Mathur, Frank Dellaert, Panagiotis Tsiotras, Danfei Xu

    Abstract: This paper presents Neural Visibility Field (NVF), a novel uncertainty quantification method for Neural Radiance Fields (NeRF) applied to active mapping. Our key insight is that regions not visible in the training views lead to inherently unreliable color predictions by NeRF at this region, resulting in increased uncertainty in the synthesized views. To address this, we propose to use Bayesian Net… ▽ More

    Submitted 15 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024. More details can be found at https://sites.google.com/view/nvf-cvpr24/

  27. arXiv:2406.06807  [pdf

    cond-mat.mtrl-sci

    Additive engineering for Sb$_2$S$_3$ indoor photovoltaics with efficiency exceeding 17%

    Authors: Xiao Chen, Xiaoxuan Shu, Jiangcheng Zhou, Lei Wan, Peng Xiao, Yuchen Fu, Junzhi Ye, Yi-Teng Huang, Bin Yan, Dingjiang Xue, Tao Chen, Jiejie Chen, Robert L. Z. Hoye, Ru Zhou

    Abstract: Indoor photovoltaics (IPVs) have attracted increasing attention for sustainably powering Internet of Things (IoT) electronics. Sb$_2$S$_3$ is a promising IPV candidate material with a bandgap of ~1.75 eV, which is near the optimal value for indoor energy harvesting. However, the performance of Sb$_2$S$_3$ solar cells is limited by nonradiative recombination, closely associated with the poor-qualit… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 28 pages, 6 figures

  28. arXiv:2406.06475  [pdf, other

    cs.IR cs.AI

    Survey for Landing Generative AI in Social and E-commerce Recsys -- the Industry Perspectives

    Authors: Da Xu, Danqing Zhang, Guangyu Yang, Bo Yang, Shuyuan Xu, Lingling Zheng, Cindy Liang

    Abstract: Recently, generative AI (GAI), with their emerging capabilities, have presented unique opportunities for augmenting and revolutionizing industrial recommender systems (Recsys). Despite growing research efforts at the intersection of these fields, the integration of GAI into industrial Recsys remains in its infancy, largely due to the intricate nature of modern industrial Recsys infrastructure, ope… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  29. arXiv:2406.05285  [pdf, other

    cs.CV

    VISTA3D: Versatile Imaging SegmenTation and Annotation model for 3D Computed Tomography

    Authors: Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu, Wenqi Li

    Abstract: Segmentation foundation models have attracted great interest, however, none of them are adequate enough for the use cases in 3D computed tomography scans (CT) images. Existing works finetune on medical images with 2D foundation models trained on natural images, but interactive segmentation, especially in 2D, is too time-consuming for 3D scans and less useful for large cohort analysis. Models that… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  30. arXiv:2406.05232  [pdf, other

    cs.CL cs.LG

    Improving Logits-based Detector without Logits from Black-box LLMs

    Authors: Cong Zeng, Shengkun Tang, Xianjun Yang, Yuanzhou Chen, Yiyou Sun, zhiqiang xu, Yao Li, Haifeng Chen, Wei Cheng, Dongkuan Xu

    Abstract: The advent of Large Language Models (LLMs) has revolutionized text generation, producing outputs that closely mimic human writing. This blurring of lines between machine- and human-written text presents new challenges in distinguishing one from the other a task further complicated by the frequent updates and closed nature of leading proprietary LLMs. Traditional logits-based detection methods leve… ▽ More

    Submitted 11 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

  31. arXiv:2406.04609  [pdf, other

    cs.LG cs.AI

    Diverse Intra- and Inter-Domain Activity Style Fusion for Cross-Person Generalization in Activity Recognition

    Authors: Junru Zhang, Lang Feng, Zhidan Liu, Yuhan Wu, Yang He, Yabo Dong, Duanqing Xu

    Abstract: Existing domain generalization (DG) methods for cross-person generalization tasks often face challenges in capturing intra- and inter-domain style diversity, resulting in domain gaps with the target domain. In this study, we explore a novel perspective to tackle this problem, a process conceptualized as domain padding. This proposal aims to enrich the domain diversity by synthesizing intra- and in… ▽ More

    Submitted 28 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2024)

  32. arXiv:2406.03007  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents

    Authors: Yifei Wang, Dizhan Xue, Shengjie Zhang, Shengsheng Qian

    Abstract: With the prosperity of large language models (LLMs), powerful LLM-based intelligent agents have been developed to provide customized services with a set of user-defined tools. State-of-the-art methods for constructing LLM agents adopt trained LLMs and further fine-tune them on data for the agent task. However, we show that such methods are vulnerable to our proposed backdoor attacks named BadAgent… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024

  33. arXiv:2406.02756  [pdf, other

    cs.CL cs.AI cs.LG

    Aligning Large Language Models via Fine-grained Supervision

    Authors: Dehong Xu, Liang Qiu, Minseok Kim, Faisal Ladhak, Jaeyoung Do

    Abstract: Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations. Current approaches focus on using reinforcement learning with human feedback (RLHF) to improve model alignment, which works by transforming coarse human preferences of LLM outputs into a feedback signal that guides the model learn… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  34. arXiv:2406.02509  [pdf, other

    cs.CV

    CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

    Authors: Dejia Xu, Weili Nie, Chao Liu, Sifei Liu, Jan Kautz, Zhangyang Wang, Arash Vahdat

    Abstract: Recently video diffusion models have emerged as expressive generative tools for high-quality video content creation readily available to general users. However, these models often do not offer precise control over camera poses for video generation, limiting the expression of cinematic language and user control. To address this issue, we introduce CamCo, which allows fine-grained Camera pose Contro… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Project page: https://ir1d.github.io/CamCo/

  35. arXiv:2406.02461  [pdf, other

    cs.CV

    RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting

    Authors: Qi Wang, Ruijie Lu, Xudong Xu, Jingbo Wang, Michael Yu Wang, Bo Dai, Gang Zeng, Dan Xu

    Abstract: The advancement of diffusion models has pushed the boundary of text-to-3D object generation. While it is straightforward to composite objects into a scene with reasonable geometry, it is nontrivial to texture such a scene perfectly due to style inconsistency and occlusions between objects. To tackle these problems, we propose a coarse-to-fine 3D scene texturing framework, referred to as RoomTex, t… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  36. arXiv:2406.00830  [pdf, other

    cs.CV

    Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

    Authors: Yang Cao, Yihan Zeng, Hang Xu, Dan Xu

    Abstract: Open-vocabulary 3D Object Detection (OV-3DDet) addresses the detection of objects from an arbitrary list of novel categories in 3D scenes, which remains a very challenging problem. In this work, we propose CoDAv2, a unified framework designed to innovatively tackle both the localization and classification of novel 3D objects, under the condition of limited base categories. For localization, the pr… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Code Page: https://github.com/yangcaoai/CoDA_NeurIPS2023 This paper has been submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) for possible publication

  37. arXiv:2406.00440  [pdf, other

    cs.CV

    Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture

    Authors: Xuanchen Li, Yuhao Cheng, Xingyu Ren, Haozhe Jia, Di Xu, Wenhan Zhu, Yichao Yan

    Abstract: 4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant… ▽ More

    Submitted 1 July, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

  38. arXiv:2405.20363  [pdf, other

    cs.CV

    LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild

    Authors: Zhiqiang Wang, Dejia Xu, Rana Muhammad Shahroz Khan, Yanbin Lin, Zhiwen Fan, Xingquan Zhu

    Abstract: Image geolocation is a critical task in various image-understanding applications. However, existing methods often fail when analyzing challenging, in-the-wild images. Inspired by the exceptional background knowledge of multimodal language models, we systematically evaluate their geolocation capabilities using a novel image dataset and a comprehensive evaluation framework. We first collect images f… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 7 pages, 3 figures, 5 tables, CVPR 2024 Workshop on Computer Vision in the Wild

  39. arXiv:2405.19335  [pdf, other

    cs.CV cs.CL cs.LG

    X-VILA: Cross-Modality Alignment for Large Language Model

    Authors: Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei Ping, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu Yin

    Abstract: We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities. By aligning modality-specific encoders with LLM inputs and diffusion decoders with LLM outputs, X-VILA achieves cross-modality understanding, reasoning, and generation. To facilitate this cross-modality alignment, we curate an effectiv… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Technical Report

  40. arXiv:2405.19139  [pdf, other

    cs.CL cs.AI

    DGRC: An Effective Fine-tuning Framework for Distractor Generation in Chinese Multi-choice Reading Comprehension

    Authors: Runfeng Lin, Dacheng Xu, Huijiang Wang, Zebiao Chen, Yating Wang, Shouqiang Liu

    Abstract: When evaluating a learner's knowledge proficiency, the multiple-choice question is an efficient and widely used format in standardized tests. Nevertheless, generating these questions, particularly plausible distractors (incorrect options), poses a considerable challenge. Generally, the distractor generation can be classified into cloze-style distractor generation (CDG) and natural questions distra… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  41. arXiv:2405.18777  [pdf, other

    math.OC cs.LG

    SPABA: A Single-Loop and Probabilistic Stochastic Bilevel Algorithm Achieving Optimal Sample Complexity

    Authors: Tianshu Chu, Dachuan Xu, Wei Yao, Jin Zhang

    Abstract: While stochastic bilevel optimization methods have been extensively studied for addressing large-scale nested optimization problems in machine learning, it remains an open question whether the optimal complexity bounds for solving bilevel optimization are the same as those in single-level optimization. Our main result resolves this question: SPABA, an adaptation of the PAGE method for nonconvex op… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  42. arXiv:2405.18676  [pdf

    physics.med-ph

    Exploring Automated Contouring Across Institutional Boundaries: A Deep Learning Approach with Mouse Micro-CT Datasets

    Authors: Lu Jiang, Di Xu, Qifan Xu, Arion Chatziioannou, Keisuke S. Iwamoto, Susanta Hui, Ke Sheng

    Abstract: Image-guided mouse irradiation is essential to understand interventions involving radiation prior to human studies. Our objective is to employ Swin UNEt Transformers (Swin UNETR) to segment native micro-CT and contrast-enhanced micro-CT scans and benchmark the results against 3D no-new-Net (nnU-Net). Swin UNETR reformulates mouse organ segmentation as a sequence-to-sequence prediction task, using… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  43. arXiv:2405.17792  [pdf, other

    hep-ex hep-ph

    JUNO Sensitivity to Invisible Decay Modes of Neutrons

    Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

    Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 28 pages, 7 figures, 4 tables

  44. arXiv:2405.16865  [pdf, other

    q-bio.NC cs.LG stat.ML

    An Investigation of Conformal Isometry Hypothesis for Grid Cells

    Authors: Dehong Xu, Ruiqi Gao, Wen-Hao Zhang, Xue-Xin Wei, Ying Nian Wu

    Abstract: This paper investigates the conformal isometry hypothesis as a potential explanation for the emergence of hexagonal periodic patterns in the response maps of grid cells. The hypothesis posits that the activities of the population of grid cells form a high-dimensional vector in the neural space, representing the agent's self-position in 2D physical space. As the agent moves in the 2D physical space… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.19192

  45. arXiv:2405.16829  [pdf, other

    cs.CV

    PyGS: Large-scale Scene Representation with Pyramidal 3D Gaussian Splatting

    Authors: Zipeng Wang, Dan Xu

    Abstract: Neural Radiance Fields (NeRFs) have demonstrated remarkable proficiency in synthesizing photorealistic images of large-scale scenes. However, they are often plagued by a loss of fine details and long rendering durations. 3D Gaussian Splatting has recently been introduced as a potent alternative, achieving both high-fidelity visual results and accelerated rendering performance. Nonetheless, scaling… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  46. arXiv:2405.16645  [pdf, other

    cs.CV

    Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models

    Authors: Hanwen Liang, Yuyang Yin, Dejia Xu, Hanxue Liang, Zhangyang Wang, Konstantinos N. Plataniotis, Yao Zhao, Yunchao Wei

    Abstract: The availability of large-scale multimodal datasets and advancements in diffusion models have significantly accelerated progress in 4D content generation. Most prior approaches rely on multiple image or video diffusion models, utilizing score distillation sampling for optimization or generating pseudo novel views for direct supervision. However, these methods are hindered by slow optimization spee… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Project page: https://vita-group.github.io/Diffusion4D

  47. arXiv:2405.16156  [pdf, other

    cs.LG

    Mixture of In-Context Prompters for Tabular PFNs

    Authors: Derek Xu, Olcay Cirit, Reza Asadi, Yizhou Sun, Wei Wang

    Abstract: Recent benchmarks found In-Context Learning (ICL) outperforms both deep learning and tree-based algorithms on small tabular datasets. However, on larger datasets, ICL for tabular learning cannot run without severely compromising performance, due to its quadratic space and time complexity w.r.t. dataset size. We propose MIXTUREPFN, which both extends nearest-neighbor sampling to the state-of-the-ar… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 32 pages, 16 figures

  48. arXiv:2405.15769  [pdf, other

    cs.CV

    FastDrag: Manipulate Anything in One Step

    Authors: Xuanjia Zhao, Jian Guan, Congyi Fan, Dongli Xu, Youtian Lin, Haiwei Pan, Pengming Feng

    Abstract: Drag-based image editing using generative models provides precise control over image contents, enabling users to manipulate anything in an image with a few clicks. However, prevailing methods typically adopt $n$-step iterations for latent semantic optimization to achieve drag-based image editing, which is time-consuming and limits practical applications. In this paper, we introduce a novel one-ste… ▽ More

    Submitted 6 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 13 pages, 13 figures, Project page: https://fastdrag-site.github.io/

  49. arXiv:2405.15463  [pdf, other

    cs.CV

    PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis

    Authors: Zicheng Wang, Zhenghao Chen, Yiming Wu, Zhen Zhao, Luping Zhou, Dong Xu

    Abstract: Point cloud analysis has seen substantial advancements due to deep learning, although previous Transformer-based methods excel at modeling long-range dependencies on this task, their computational demands are substantial. Conversely, the Mamba offers greater efficiency but shows limited potential compared with Transformer-based methods. In this study, we introduce PoinTramba, a pioneering hybrid f… ▽ More

    Submitted 16 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 14 pages, 4 figures, 6 tables

  50. arXiv:2405.14488  [pdf, other

    cs.CL

    MoGU: A Framework for Enhancing Safety of Open-Sourced LLMs While Preserving Their Usability

    Authors: Yanrui Du, Sendong Zhao, Danyang Zhao, Ming Ma, Yuhan Chen, Liangyu Huo, Qing Yang, Dongliang Xu, Bing Qin

    Abstract: Large Language Models (LLMs) are increasingly deployed in various applications. As their usage grows, concerns regarding their safety are rising, especially in maintaining harmless responses when faced with malicious instructions. Many defense strategies have been developed to enhance the safety of LLMs. However, our research finds that existing defense strategies lead LLMs to predominantly adopt… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.