Skip to main content

Showing 101–150 of 3,101 results for author: Huang, S

  1. arXiv:2405.11338  [pdf

    cs.CV cs.AI

    EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging

    Authors: Danli Shi, Weiyi Zhang, Xiaolan Chen, Yexin Liu, Jiancheng Yang, Siyu Huang, Yih Chung Tham, Yingfeng Zheng, Mingguang He

    Abstract: Artificial intelligence (AI) is vital in ophthalmology, tackling tasks like diagnosis, classification, and visual question answering (VQA). However, existing AI models in this domain often require extensive annotation and are task-specific, limiting their clinical utility. While recent developments have brought about foundation models for ophthalmology, they are limited by the need to train separa… ▽ More

    Submitted 21 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

    Comments: 21 pages, 2 figures, 4 tables

  2. arXiv:2405.11286  [pdf, other

    cs.CV

    Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion

    Authors: Zeyu Zhang, Yiran Wang, Biao Wu, Shuo Chen, Zhiyuan Zhang, Shiya Huang, Wenbo Zhang, Meng Fang, Ling Chen, Yang Zhao

    Abstract: In recent years, there has been significant interest in creating 3D avatars and motions, driven by their diverse applications in areas like film-making, video games, AR/VR, and human-robot interaction. However, current efforts primarily concentrate on either generating the 3D avatar mesh alone or producing motion sequences, with integrating these two aspects proving to be a persistent challenge. A… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  3. arXiv:2405.11106  [pdf, other

    cs.MA cs.AI cs.CL cs.LG cs.RO

    LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions

    Authors: Chuanneng Sun, Songjun Huang, Dario Pompili

    Abstract: In recent years, Large Language Models (LLMs) have shown great abilities in various tasks, including question answering, arithmetic problem solving, and poem writing, among others. Although research on LLM-as-an-agent has shown that LLM can be applied to Reinforcement Learning (RL) and achieve decent results, the extension of LLM-based RL to Multi-Agent System (MAS) is not trivial, as many aspects… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 8 pages, 1 figure, 1 table, submitted to IEEE RA-L

  4. arXiv:2405.10895  [pdf, other

    astro-ph.HE astro-ph.GA

    The unluckiest star: A spectroscopically confirmed repeated partial tidal disruption event AT 2022dbl

    Authors: Zheyu Lin, Ning Jiang, Tinggui Wang, Xu Kong, Dongyue Li, Han He, Yibo Wang, Jiazheng Zhu, Wentao Li, Ji-an Jiang, Avinash Singh, Rishabh Singh Teja, D. K. Sahu, Chichuan Jin, Keiichi Maeda, Shifeng Huang

    Abstract: The unluckiest star orbits a supermassive black hole elliptically. Every time it reaches the pericenter, it shallowly enters the tidal radius and gets partially tidal disrupted, producing a series of flares. Confirmation of a repeated partial tidal disruption event (pTDE) requires not only evidence to rule out other types of transients, but also proof that only one star is involved, as TDEs from m… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 15 pages, 8 figures, submitted to ApJ Letters on 2024 Apr 27

  5. arXiv:2405.10879  [pdf, other

    cs.CV

    One registration is worth two segmentations

    Authors: Shiqi Huang, Tingfa Xu, Ziyi Shen, Shaheer Ullah Saeed, Wen Yan, Dean Barratt, Yipeng Hu

    Abstract: The goal of image registration is to establish spatial correspondence between two or more images, traditionally through dense displacement fields (DDFs) or parametric transformations (e.g., rigid, affine, and splines). Rethinking the existing paradigms of achieving alignment via spatial transformations, we uncover an alternative but more intuitive correspondence representation: a set of correspond… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Early Accepted by MICCAI2024

  6. Occupancy-SLAM: Simultaneously Optimizing Robot Poses and Continuous Occupancy Map

    Authors: Liang Zhao, Yingyu Wang, Shoudong Huang

    Abstract: In this paper, we propose an optimization based SLAM approach to simultaneously optimize the robot trajectory and the occupancy map using 2D laser scans (and odometry) information. The key novelty is that the robot poses and the occupancy map are optimized together, which is significantly different from existing occupancy mapping strategies where the robot poses need to be obtained first before th… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: This paper has been accpeted by Robotics: Science and Systems 2022

    Journal ref: Robotics: Science and Systems 2022

  7. arXiv:2405.10632  [pdf, other

    cs.CY cs.AI cs.HC

    Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

    Authors: Lujain Ibrahim, Saffron Huang, Lama Ahmad, Markus Anderljung

    Abstract: Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of hum… ▽ More

    Submitted 12 July, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: revised figure

  8. arXiv:2405.10098  [pdf, other

    cs.NE

    When Large Language Model Meets Optimization

    Authors: Sen Huang, Kaixiang Yang, Sheng Qi, Rui Wang

    Abstract: Optimization algorithms and large language models (LLMs) enhance decision-making in dynamic environments by integrating artificial intelligence with traditional techniques. LLMs, with extensive domain knowledge, facilitate intelligent modeling and strategic decision-making in optimization, while optimization algorithms refine LLM architectures and output quality. This synergy offers novel approach… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  9. arXiv:2405.09839  [pdf, other

    cs.LG

    Advances in Robust Federated Learning: Heterogeneity Considerations

    Authors: Chuan Chen, Tianchi Liao, Xiaojun Deng, Zihou Wu, Sheng Huang, Zibin Zheng

    Abstract: In the field of heterogeneous federated learning (FL), the key challenge is to efficiently and collaboratively train models across multiple clients with different data distributions, model structures, task objectives, computational capabilities, and communication resources. This diversity leads to significant heterogeneity, which increases the complexity of model training. In this paper, we first… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  10. arXiv:2405.09611  [pdf, other

    cond-mat.str-el hep-th math-ph

    Fermionic quantum criticality through the lens of topological holography

    Authors: Sheng-Jie Huang

    Abstract: We utilize the topological holographic framework to characterize and gain insights into the nature of quantum critical points and gapless phases in fermionic quantum systems. Topological holography is a general framework that describes the generalized global symmetry and the symmetry charges of a local quantum system in terms of a slab of a topological order, termed as the symmetry topological fie… ▽ More

    Submitted 18 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: 33 pages, 7 figures, 6 tables; v2: minor changes, references added

  11. arXiv:2405.09066  [pdf, other

    hep-ex

    Search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, V. Batozskaya, D. Becker, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, J. Bloms, A. Bortone, I. Boyko , et al. (559 additional authors not shown)

    Abstract: We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 14 pages, 7 figures

  12. arXiv:2405.08749  [pdf, other

    nucl-th hep-ph nucl-ex

    Longitudinal Structure of Quark-Gluon Plasma Unveiled Through Nuclear Deformations

    Authors: Chunjian Zhang, Shengli Huang, Jiangyong Jia

    Abstract: The study of quark-gluon plasma (QGP) is hindered by our limited understanding of its initial conditions, particularly its longitudinal structure. We propose a novel approach that entails analyzing collisions involving nuclei of similar masses but different deformations. This strategy allows us to vary the initial conditions and collective expansion of the QGP, while minimizing the influence of no… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 6 pages, 6 figures

  13. arXiv:2405.08541  [pdf, other

    physics.ins-det

    A Determination of the Local Gravitational Acceleration for the Tsinghua Tabletop Kibble Balance

    Authors: Weibo Liu, Nanjia Li, Yongchao Ma, Ruo Hu, Shuqing Wu, Wei Zhao, Songling Huang, Shisong Li

    Abstract: The Kibble balance requires a measurement of the local gravitational acceleration, $g$, with a typical relative measurement uncertainty of $10^{-9}$. In this paper, the determination of $g$ for the Tsinghua tabletop Kibble balance is presented. A polynomial fitting method is proposed for blind transfers of the absolute gravitational acceleration using relative gravimeters, showing agreement with t… ▽ More

    Submitted 20 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: 11 figures, submitted to IEEE Trans. Instrum. Meas

  14. arXiv:2405.07741  [pdf, other

    hep-ex

    Search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (635 additional authors not shown)

    Abstract: Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$. No $χ_{c1}(3872)\toγψ_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 8 pages, 2 figures

  15. arXiv:2405.07469  [pdf, other

    quant-ph physics.optics

    Phase coding semi-quantum key distribution system based on the Single-state protocol

    Authors: Qincheng Hou, Siying Huang, Naida Mo, Jindong Wang, Zhengjun Wei, Yafei Yu, Tianming Zhao, Zhiming Zhang

    Abstract: Semi-quantum key distribution (SQKD) allows sharing random keys between a quantum user and a classical user. However, implementing classical user operations is challenging, posing a hurdle to achieving the Single-state protocol. By using the "selective modulation" method, the feasibility of SQKD is verified in principle. The proposal of the selective modulation method enables the realization of ot… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  16. arXiv:2405.06690  [pdf, other

    q-bio.BM cs.CL cs.LG

    DrugLLM: Open Large Language Model for Few-shot Molecule Generation

    Authors: Xianggen Liu, Yan Guo, Haoran Li, Jin Liu, Shudong Huang, Bowen Ke, Jiancheng Lv

    Abstract: Large Language Models (LLMs) have made great strides in areas such as language processing and computer vision. Despite the emergence of diverse techniques to improve few-shot learning capacity, current LLMs fall short in handling the languages in biology and chemistry. For example, they are struggling to capture the relationship between molecule structure and pharmacochemical properties. Consequen… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 17 pages, 3 figures

  17. arXiv:2405.06393  [pdf, other

    hep-ex

    Measurement of the ${e}^{+}{e}^{-}\to p \bar{p}π^{0}$ cross section at $\sqrt{s}=2.1000-3.0800$ GeV

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

    Abstract: The process $e^{+}e^{-}\to p\bar{p}π^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  18. arXiv:2405.06217  [pdf, other

    cs.CV cs.MM

    DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding

    Authors: Ting Liu, Xuyang Liu, Siteng Huang, Honggang Chen, Quanjun Yin, Long Qin, Donglin Wang, Yue Hu

    Abstract: Visual grounding (VG) is a challenging task to localize an object in an image based on a textual description. Recent surge in the scale of VG models has substantially improved performance, but also introduced a significant burden on computational costs during fine-tuning. In this paper, we explore applying parameter-efficient transfer learning (PETL) to efficiently transfer the pre-trained vision-… ▽ More

    Submitted 8 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted by ICME 2024 (Oral)

  19. arXiv:2405.05254  [pdf, other

    cs.CL

    You Only Cache Once: Decoder-Decoder Architectures for Language Models

    Authors: Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei

    Abstract: We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a self-decoder. The self-decoder efficiently encodes global key-value (KV) caches that are reused by the cross-decoder via cross-attention. The overall model behaves like a decoder-only Transformer, although YOCO onl… ▽ More

    Submitted 9 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  20. arXiv:2405.04768  [pdf, other

    cond-mat.mtrl-sci

    Circularly polarized light irradiated ferromagnetic MnBi$_2$Te$_4$: the long-sought ideal Weyl semimetal

    Authors: Shuai Fan, Shengpu Huang, Zhuo Chen, Fangyang Zhan, Xian-Yong Ding, Da-Shuai Ma, Rui Wang

    Abstract: The interaction between light and non-trivial energy band topology allows for the precise manipulation of topological quantum states, which has attracted intensive interest in condensed matter physics. In this work, using first-principles calculations, we studied the topological transition of ferromagnetic (FM) MnBi$_2$Te$_4$ upon irradiation with circularly polarized light (CPL). We revealed that… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  21. arXiv:2405.04101  [pdf, other

    cs.LG cs.AI

    Continual Learning in the Presence of Repetition

    Authors: Hamed Hemati, Lorenzo Pellegrini, Xiaotian Duan, Zixuan Zhao, Fangfang Xia, Marc Masana, Benedikt Tscheschner, Eduardo Veas, Yuxiang Zheng, Shiji Zhao, Shao-Yuan Li, Sheng-Jun Huang, Vincenzo Lomonaco, Gido M. van de Ven

    Abstract: Continual learning (CL) provides a framework for training models in ever-evolving environments. Although re-occurrence of previously seen objects or tasks is common in real-world problems, the concept of repetition in the data stream is not often considered in standard benchmarks for CL. Unlike with the rehearsal mechanism in buffer-based strategies, where sample repetition is controlled by the st… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Preprint; Challenge Report of the 4th Workshop on Continual Learning in Computer Vision at CVPR

  22. arXiv:2405.03908  [pdf, other

    cs.DC cs.DS

    Deterministic Expander Routing: Faster and More Versatile

    Authors: Yi-Jun Chang, Shang-En Huang, Hsin-Hao Su

    Abstract: We consider the expander routing problem formulated by Ghaffari, Kuhn, and Su (PODC 2017), where the goal is to route all the tokens to their destinations given that each vertex is the source and the destination of at most $°(v)$ tokens. They developed $\textit{randomized algorithms}$ that solve this problem in $\text{poly}(φ^{-1}) \cdot 2^{O(\sqrt{\log n \log \log n})}$ rounds in the… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted to PODC 2024

  23. arXiv:2405.02425  [pdf, other

    cs.RO cs.AI

    Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning

    Authors: Dhruva Tirumala, Markus Wulfmeier, Ben Moran, Sandy Huang, Jan Humplik, Guy Lever, Tuomas Haarnoja, Leonard Hasenclever, Arunkumar Byravan, Nathan Batchelor, Neil Sreendra, Kushal Patel, Marlon Gwira, Francesco Nori, Martin Riedmiller, Nicolas Heess

    Abstract: We apply multi-agent deep reinforcement learning (RL) to train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision. This setting reflects many challenges of real-world robotics, including active perception, agile full-body control, and long-horizon planning in a dynamic, partially-observable, multi-agent domain. We rely on large-scale, simulation-b… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  24. arXiv:2405.01345  [pdf, other

    cs.CL

    The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights

    Authors: Wenhao Zhu, Shujian Huang, Fei Yuan, Cheng Chen, Jiajun Chen, Alexandra Birch

    Abstract: Bridging the significant gap between large language model's English and non-English performance presents a great challenge. While some previous studies attempt to mitigate this gap with translated training data, the recently proposed question alignment approach leverages the model's English expertise to improve multilingual performance with minimum usage of expensive, error-prone translation. In t… ▽ More

    Submitted 29 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  25. arXiv:2404.18491  [pdf, other

    cond-mat.dis-nn

    Emergent Non-Abelian Thouless Pumping Induced by the Quasiperiodic Disorder

    Authors: Sen Huang, Yan-Qing Zhu, Zhi Li

    Abstract: We investigate the non-Abelian Thouless pumping in a disorder tunable Lieb chain with degenerate flat bands. The results reveal that quasiperiodic disorder will cause a topological phase transition from the trivial (without non-Abelian Thouless pumping) to the non-trivial (with non-Abelian Thouless pumping) phase. The mechanism behind is that the monopole originally outside the topological region… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 9 pages,5 figures

  26. arXiv:2404.17521  [pdf, other

    cs.RO cs.CV

    Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

    Authors: Puhao Li, Tengyu Liu, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

    Abstract: Autonomous robotic systems capable of learning novel manipulation tasks are poised to transform industries from manufacturing to service automation. However, modern methods (e.g., VIP and R3M) still face significant hurdles, notably the domain gap among robotic embodiments and the sparsity of successful task executions within specific action spaces, resulting in misaligned and ambiguous task repre… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Project website and open-source code: https://xiaoyao-li.github.io/research/ag2manip

  27. arXiv:2404.17025  [pdf, other

    cs.HC

    How Does Conversation Length Impact User's Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered Chatbots

    Authors: Shih-Hong Huang, Ya-Fang Lin, Zeyu He, Chieh-Yang Huang, Ting-Hao 'Kenneth' Huang

    Abstract: Users can discuss a wide range of topics with large language models (LLMs), but they do not always prefer solving problems or getting information through lengthy conversations. This raises an intriguing HCI question: How does instructing LLMs to engage in longer or shorter conversations affect conversation quality? In this paper, we developed two Slack chatbots using GPT-4 with the ability to vary… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  28. arXiv:2404.16666  [pdf, other

    cs.CV

    PhyRecon: Physically Plausible Neural Scene Reconstruction

    Authors: Junfeng Ni, Yixin Chen, Bohan Jing, Nan Jiang, Bin Wang, Bo Dai, Puhao Li, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

    Abstract: Neural implicit representations have gained popularity in multi-view 3D reconstruction. However, most previous work struggles to yield physically plausible results, limiting their utility in domains requiring rigorous physical accuracy, such as embodied AI and robotics. This lack of plausibility stems from the absence of physics modeling in existing methods and their inability to recover intricate… ▽ More

    Submitted 2 June, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: project page: https://phyrecon.github.io/. arXiv admin note: text overlap with arXiv:2303.08605 by other authors

  29. arXiv:2404.16306  [pdf, other

    cs.CV

    TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

    Authors: Haomiao Ni, Bernhard Egger, Suhas Lohit, Anoop Cherian, Ye Wang, Toshiaki Koike-Akino, Sharon X. Huang, Tim K. Marks

    Abstract: Text-conditioned image-to-video generation (TI2V) aims to synthesize a realistic video starting from a given image (e.g., a woman's photo) and a text description (e.g., "a woman is drinking water."). Existing TI2V frameworks often require costly training on video-text datasets and specific model designs for text and image conditioning. In this paper, we propose TI2V-Zero, a zero-shot, tuning-free… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  30. arXiv:2404.15584  [pdf

    eess.SY

    Research on OPF control of three-phase four-wire low-voltage distribution network considering uncertainty

    Authors: Rui Wang, Xiaoqing Bai, Shengquan Huang, Shoupu Wei

    Abstract: As power systems become more complex and uncertain, low-voltage distribution networks face numerous challenges, including three-phase imbalances caused by asymmetrical loads and distributed energy resources. We propose a robust stochastic optimization (RSO) based optimal power flow (OPF) control method for three-phase, four-wire low-voltage distribution networks that consider uncertainty to addres… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: systems optimization, robust optimization, local control

  31. arXiv:2404.15121  [pdf, other

    cs.GR cs.AI cs.CV

    Taming Diffusion Probabilistic Models for Character Control

    Authors: Rui Chen, Mingyi Shi, Shaoli Huang, Ping Tan, Taku Komura, Xuelin Chen

    Abstract: We present a novel character control framework that effectively utilizes motion diffusion probabilistic models to generate high-quality and diverse character animations, responding in real-time to a variety of dynamic user-supplied control signals. At the heart of our method lies a transformer-based Conditional Autoregressive Motion Diffusion Model (CAMDM), which takes as input the character's his… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGGRAPH 2024 (Conference Track). Project page and source codes: https://aiganimation.github.io/CAMDM/

  32. arXiv:2404.15100  [pdf, other

    cs.CV cs.MM

    Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation

    Authors: Xun Wu, Shaohan Huang, Furu Wei

    Abstract: Recent studies have demonstrated the exceptional potentials of leveraging human preference datasets to refine text-to-image generative models, enhancing the alignment between generated images and textual prompts. Despite these advances, current human preference datasets are either prohibitively expensive to construct or suffer from a lack of diversity in preference dimensions, resulting in limited… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  33. arXiv:2404.15045  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-Head Mixture-of-Experts

    Authors: Xun Wu, Shaohan Huang, Wenhui Wang, Furu Wei

    Abstract: Sparse Mixtures of Experts (SMoE) scales model capacity without significant increases in training and inference costs, but exhibits the following two issues: (1) Low expert activation, where only a small subset of experts are activated for optimization. (2) Lacking fine-grained analytical capabilities for multiple semantic concepts within individual tokens. We propose Multi-Head Mixture-of-Experts… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  34. arXiv:2404.14986  [pdf, other

    cs.LG cs.AI

    $\texttt{MiniMol}$: A Parameter-Efficient Foundation Model for Molecular Learning

    Authors: Kerstin Kläser, Błażej Banaszewski, Samuel Maddrell-Mander, Callum McLean, Luis Müller, Ali Parviz, Shenyang Huang, Andrew Fitzgibbon

    Abstract: In biological tasks, data is rarely plentiful as it is generated from hard-to-gather measurements. Therefore, pre-training foundation models on large quantities of available data and then transfer to low-data downstream tasks is a promising direction. However, how to design effective foundation models for molecular learning remains an open question, with existing approaches typically focusing on m… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  35. arXiv:2404.13840  [pdf, other

    hep-ex

    Study of $e^+e^-\toωX(3872)$ and $γX(3872)$ from 4.66 to 4.95 GeV

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

    Abstract: Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 19 pages, 10 figures

  36. arXiv:2404.13677  [pdf, other

    cs.CV eess.IV

    A Dataset and Model for Realistic License Plate Deblurring

    Authors: Haoyan Gong, Yuzheng Feng, Zhenrong Zhang, Xianxu Hou, Jingxin Liu, Siqi Huang, Hongbin Liu

    Abstract: Vehicle license plate recognition is a crucial task in intelligent traffic management systems. However, the challenge of achieving accurate recognition persists due to motion blur from fast-moving vehicles. Despite the widespread use of image synthesis approaches in existing deblurring and recognition algorithms, their effectiveness in real-world scenarios remains unproven. To address this, we int… ▽ More

    Submitted 22 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  37. arXiv:2404.13628  [pdf, other

    cs.CL cs.LG cs.MM

    Mixture of LoRA Experts

    Authors: Xun Wu, Shaohan Huang, Furu Wei

    Abstract: LoRA has gained widespread acceptance in the fine-tuning of large pre-trained models to cater to a diverse array of downstream tasks, showcasing notable effectiveness and efficiency, thereby solidifying its position as one of the most prevalent fine-tuning techniques. Due to the modular nature of LoRA's plug-and-play plugins, researchers have delved into the amalgamation of multiple LoRAs to empow… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 17 pages, 11 figures

  38. arXiv:2404.13086  [pdf

    cond-mat.mtrl-sci

    Band Structure Engineering in Highly Crystalline Organic Semiconductors

    Authors: Shu-Jen Wang, Sebastian Hutsch, Felix Talnack, Marielle Deconinck, Shiyu Huang, Zongbao Zhang, Hans Kleemann, Yana Vaynzof, Stefan C. B. Mannsfeld, Frank Ortmann, Karl Leo

    Abstract: Blending of semiconductors for controlling the energy levels (band structure engineering) is an important technique, in particular, for optoelectronic applications. The underlying physics is the delocalized Bloch states, which average over the potential landscape of the blend. For organic semiconductors, it has been shown that two quite different effects, the dielectric constant and electrostatic… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  39. arXiv:2404.12768  [pdf, other

    cs.CV cs.AI cs.GR

    MixLight: Borrowing the Best of both Spherical Harmonics and Gaussian Models

    Authors: Xinlong Ji, Fangneng Zhan, Shijian Lu, Shi-Sheng Huang, Hua Huang

    Abstract: Accurately estimating scene lighting is critical for applications such as mixed reality. Existing works estimate illumination by generating illumination maps or regressing illumination parameters. However, the method of generating illumination maps has poor generalization performance and parametric models such as Spherical Harmonic (SH) and Spherical Gaussian (SG) fall short in capturing high-freq… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  40. arXiv:2404.11996  [pdf, other

    cs.AI

    DST-GTN: Dynamic Spatio-Temporal Graph Transformer Network for Traffic Forecasting

    Authors: Songtao Huang, Hongjin Song, Tianqi Jiang, Akbar Telikani, Jun Shen, Qingguo Zhou, Binbin Yong, Qiang Wu

    Abstract: Accurate traffic forecasting is essential for effective urban planning and congestion management. Deep learning (DL) approaches have gained colossal success in traffic forecasting but still face challenges in capturing the intricacies of traffic dynamics. In this paper, we identify and address this challenges by emphasizing that spatial features are inherently dynamic and change over time. A novel… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  41. arXiv:2404.11994  [pdf, other

    quant-ph

    Image Compression and Reconstruction Based on Quantum Network

    Authors: Xun Ji, Qin Liu, Shan Huang, Andi Chen, Shengjun Wu

    Abstract: Quantum network is an emerging type of network structure that leverages the principles of quantum mechanics to transmit and process information. Compared with classical data reconstruction algorithms, quantum networks make image reconstruction more efficient and accurate. They can also process more complex image information using fewer bits and faster parallel computing capabilities. Therefore, th… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 8 pages,5 figures

    ACM Class: I.4

  42. arXiv:2404.11865  [pdf, other

    cs.CV

    From Image to Video, what do we need in multimodal LLMs?

    Authors: Suyuan Huang, Haoxin Zhang, Yan Gao, Yao Hu, Zengchang Qin

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated profound capabilities in understanding multimodal information, covering from Image LLMs to the more complex Video LLMs. Numerous studies have illustrated their exceptional cross-modal comprehension. Recently, integrating video foundation models with large language models to build a comprehensive video understanding system has been proposed… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  43. arXiv:2404.11474  [pdf, other

    cs.CV

    Towards Highly Realistic Artistic Style Transfer via Stable Diffusion with Step-aware and Layer-aware Prompt

    Authors: Zhanjie Zhang, Quanwei Zhang, Huaizhong Lin, Wei Xing, Juncheng Mo, Shuaicheng Huang, Jinheng Xie, Guangyuan Li, Junsheng Luan, Lei Zhao, Dalong Zhang, Lixia Chen

    Abstract: Artistic style transfer aims to transfer the learned artistic style onto an arbitrary content image, generating artistic stylized images. Existing generative adversarial network-based methods fail to generate highly realistic stylized images and always introduce obvious artifacts and disharmonious patterns. Recently, large-scale pre-trained diffusion models opened up a new way for generating highl… ▽ More

    Submitted 29 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI2024

  44. arXiv:2404.11384  [pdf, other

    cs.CL cs.LG

    Exploring Key Point Analysis with Pairwise Generation and Graph Partitioning

    Authors: Xiao Li, Yong Jiang, Shen Huang, Pengjun Xie, Gong Cheng, Fei Huang

    Abstract: Key Point Analysis (KPA), the summarization of multiple arguments into a concise collection of key points, continues to be a significant and unresolved issue within the field of argument mining. Existing models adapt a two-stage pipeline of clustering arguments or generating key points for argument clusters. This approach rely on semantic similarity instead of measuring the existence of shared key… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 11 pages, 4 figures, 4 tables. Accepted to NAACL 2024

  45. arXiv:2404.10498  [pdf, other

    cs.AI cs.CV cs.DC

    LAECIPS: Large Vision Model Assisted Adaptive Edge-Cloud Collaboration for IoT-based Perception System

    Authors: Shijing Hu, Ruijun Deng, Xin Du, Zhihui Lu, Qiang Duan, Yi He, Shih-Chia Huang, Jie Wu

    Abstract: Recent large vision models (e.g., SAM) enjoy great potential to facilitate intelligent perception with high accuracy. Yet, the resource constraints in the IoT environment tend to limit such large vision models to be locally deployed, incurring considerable inference latency thereby making it difficult to support real-time applications, such as autonomous driving and robotics. Edge-cloud collaborat… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  46. arXiv:2404.10220  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V

    Authors: Peiyuan Zhi, Zhiyuan Zhang, Muzhi Han, Zeyu Zhang, Zhitian Li, Ziyuan Jiao, Baoxiong Jia, Siyuan Huang

    Abstract: Autonomous robot navigation and manipulation in open environments require reasoning and replanning with closed-loop feedback. We present COME-robot, the first closed-loop framework utilizing the GPT-4V vision-language foundation model for open-ended reasoning and adaptive planning in real-world scenarios. We meticulously construct a library of action primitives for robot exploration, navigation, a… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  47. arXiv:2404.10141  [pdf, other

    cs.CV cs.CL cs.MM

    ANCHOR: LLM-driven News Subject Conditioning for Text-to-Image Synthesis

    Authors: Aashish Anantha Ramakrishnan, Sharon X. Huang, Dongwon Lee

    Abstract: Text-to-Image (T2I) Synthesis has made tremendous strides in enhancing synthesized image quality, but current datasets evaluate model performance only on descriptive, instruction-based prompts. Real-world news image captions take a more pragmatic approach, providing high-level situational and Named-Entity (NE) information and limited physical object descriptions, making them abstractive. To evalua… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 23 pages, 9 figures

    MSC Class: 65D19

  48. arXiv:2404.09465  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

    Authors: Yandan Yang, Baoxiong Jia, Peiyuan Zhi, Siyuan Huang

    Abstract: With recent developments in Embodied Artificial Intelligence (EAI) research, there has been a growing demand for high-quality, large-scale interactive scene generation. While prior methods in scene synthesis have prioritized the naturalness and realism of the generated scenes, the physical plausibility and interactivity of scenes have been largely left unexplored. To address this disparity, we int… ▽ More

    Submitted 9 July, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024 (Highlight), 18 pages

  49. arXiv:2404.09460  [pdf, other

    math.OC

    Optimal Real-time Bidding Strategy For EV Aggregators in Wholesale Electricity Markets

    Authors: Shihan Huang, Dongkun Han, John Zhen Fu Pang, Yue Chen

    Abstract: With the rapid growth of electric vehicles (EVs), EV aggregators have been playing a increasingly vital role in power systems by not merely providing charging management but also participating in wholesale electricity markets. This work studies the optimal real-time bidding strategy for an EV aggregator. Since the charging process of EVs is time-coupled, it is necessary for EV aggregators to consi… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 13 pages, 6 figures

  50. arXiv:2404.09219  [pdf, ps, other

    hep-ex

    Observation of $D \to a_{0}(980)π$ in the decays $D^{0} \rightarrow π^{+}π^{-}η$ and $D^{+} \rightarrow π^{+}π^{0}η$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

    Abstract: We report the first amplitude analysis of the decays $D^{0} \to π^{+} π^{-} η$ and $D^{+} \rightarrow π^{+}π^{0}η$ using a data sample taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, corresponding to an integrated luminosity of 7.9 ${\rm fb}^{-1}$. The contribution from the process $D^{0(+)} \to a_{0}(980)^{+} π^{-(0)}$ is significantly larger than the… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.