Skip to main content

Showing 1–50 of 897 results for author: Tan, X

  1. arXiv:2407.09194  [pdf, other

    astro-ph.SR astro-ph.EP

    The JWST Weather Report from the Nearest Brown Dwarfs I: multi-period JWST NIRSpec + MIRI monitoring of the benchmark binary brown dwarf WISE 1049AB

    Authors: Beth A. Biller, Johanna M. Vos, Yifan Zhou, Allison M. McCarthy, Xianyu Tan, Ian J. M. Crossfield, Niall Whiteford, Genaro Suarez, Jacqueline Faherty, Elena Manjavacas, Xueqing Chen, Pengyu Liu, Ben J. Sutlieff, Mary Anne Limbach, Paul Molliere, Trent J. Dupuy, Natalia Oliveros-Gomez, Philip S. Muirhead, Thomas Henning, Gregory Mace, Nicolas Crouzet, Theodora Karalidi, Caroline V. Morley, Pascal Tremblin, Tiffany Kataria

    Abstract: We report results from 8 hours of JWST/MIRI LRS spectroscopic monitoring directly followed by 7 hours of JWST/NIRSpec prism spectroscopic monitoring of the benchmark binary brown dwarf WISE 1049AB, the closest, brightest brown dwarfs known. We find water, methane, and CO absorption features in both components, including the 3.3 $μ$m methane absorption feature and a tentative detection of small gra… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 28 pages, 27 figures, accepted to MNRAS

  2. arXiv:2407.08975  [pdf, other

    cs.AR cs.ET

    Hybrid Temporal Computing for Lower Power Hardware Accelerators

    Authors: Maliha Tasnim, Sachin Sachdeva, Yibo Liu, Sheldon X. -D. Tan

    Abstract: In this paper, we propose a new hybrid temporal computing (HTC) framework that leverages both pulse rate and temporal data encoding to design ultra-low energy hardware accelerators. Our approach is inspired by the recently proposed temporal computing, or race logic, which encodes data values as single delays, leading to significantly lower energy consumption due to minimized signal switching. Howe… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 7 pages, 8 figures and 3 tables

  3. arXiv:2407.07465  [pdf, other

    cs.CV

    Exploring the Untouched Sweeps for Conflict-Aware 3D Segmentation Pretraining

    Authors: Tianfang Sun, Zhizhong Zhang, Xin Tan, Yanyun Qu, Yuan Xie

    Abstract: LiDAR-camera 3D representation pretraining has shown significant promise for 3D perception tasks and related applications. However, two issues widely exist in this framework: 1) Solely keyframes are used for training. For example, in nuScenes, a substantial quantity of unpaired LiDAR and camera frames remain unutilized, limiting the representation capabilities of the pretrained network. 2) The con… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: preprint, version 1

  4. arXiv:2407.05679  [pdf, other

    cs.CV cs.AI

    BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space

    Authors: Yumeng Zhang, Shi Gong, Kaixin Xiong, Xiaoqing Ye, Xiao Tan, Fan Wang, Jizhou Huang, Hua Wu, Haifeng Wang

    Abstract: World models are receiving increasing attention in autonomous driving for their ability to predict potential future scenarios. In this paper, we present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View (BEV) latent space for environment modeling. The world model consists of two parts: the multi-modal tokenizer and the latent BEV sequence… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 10 pages

  5. arXiv:2407.05305  [pdf, other

    cs.AI

    MINDECHO: Role-Playing Language Agents for Key Opinion Leaders

    Authors: Rui Xu, Dakuan Lu, Xiaoyu Tan, Xintao Wang, Siyu Yuan, Jiangjie Chen, Wei Chu, Xu Yinghui

    Abstract: Large language models~(LLMs) have demonstrated impressive performance in various applications, among which role-playing language agents (RPLAs) have engaged a broad user base. Now, there is a growing demand for RPLAs that represent Key Opinion Leaders (KOLs), \ie, Internet celebrities who shape the trends and opinions in their domains. However, research in this line remains underexplored. In this… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  6. arXiv:2407.05239  [pdf, other

    cs.DS cs.NI

    Competitive Analysis of Online Path Selection: Impacts of Path Length, Topology, and System-Level Costs

    Authors: Ying Cao, Siyuan Yu, Xiaoqi Tan, Danny H. K. Tsang

    Abstract: Consider a communication network to which a sequence of self-interested users come and send requests for data transmission between nodes. This work studies the question of how to guide the path selection choices made by those online-arriving users and maximize the social welfare. Competitive analysis is the main technical tool. Specifically, the impacts of path length bounds and topology on the co… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  7. arXiv:2407.04946  [pdf, ps, other

    math.DG math.AP

    Boundary determination of the Riemannian metric by the elastic Dirichlet-to-Neumann map

    Authors: Xiaoming Tan

    Abstract: For a compact connected Riemannian manifold with smooth boundary, by computing the full symbol of the elastic Dirichlet-to-Neumann map, we prove that the elastic Dirichlet-to-Neumann map can uniquely determine the partial derivatives of all orders of the Riemannian metric on the boundary of the manifold.

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 11 pages

  8. arXiv:2407.00486  [pdf, other

    cs.CL

    Towards Massive Multilingual Holistic Bias

    Authors: Xiaoqing Ellen Tan, Prangthip Hansanti, Carleigh Wood, Bokai Yu, Christophe Ropers, Marta R. Costa-jussà

    Abstract: In the current landscape of automatic language generation, there is a need to understand, evaluate, and mitigate demographic biases as existing models are becoming increasingly multilingual. To address this, we present the initial eight languages from the MASSIVE MULTILINGUAL HOLISTICBIAS (MMHB) dataset and benchmark consisting of approximately 6 million sentences representing 13 demographic axes.… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    ACM Class: I.2.7

  9. arXiv:2407.00326  [pdf, other

    cs.DC cs.AI cs.NI

    Teola: Towards End-to-End Optimization of LLM-based Applications

    Authors: Xin Tan, Yimin Jiang, Yitao Yang, Hong Xu

    Abstract: Large language model (LLM)-based applications consist of both LLM and non-LLM components, each contributing to the end-to-end latency. Despite great efforts to optimize LLM inference, end-to-end workflow optimization has been overlooked. Existing frameworks employ coarse-grained orchestration with task modules, which confines optimizations to within each module and yields suboptimal scheduling dec… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  10. arXiv:2407.00136  [pdf, other

    hep-ex

    Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, S. Ahmed, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (495 additional authors not shown)

    Abstract: Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions… ▽ More

    Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

  11. arXiv:2406.19969  [pdf, other

    q-bio.QM

    Enhancing Terrestrial Net Primary Productivity Estimation with EXP-CASA: A Novel Light Use Efficiency Model Approach

    Authors: Guanzhou Chen, Kaiqi Zhang, Xiaodong Zhang, Hong Xie, Haobo Yang, Xiaoliang Tan, Tong Wang, Yule Ma, Qing Wang, Jinzhou Cao, Weihong Cui

    Abstract: The Light Use Efficiency model, epitomized by the CASA model, is extensively applied in the quantitative estimation of vegetation Net Primary Productivity. However, the classic CASA model is marked by significant complexity: the estimation of environmental stress parameters, in particular, necessitates multi-source observation data, adding to the complexity and uncertainty of the model's operation… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  12. arXiv:2406.18449  [pdf, other

    cs.CL cs.AI

    Cascading Large Language Models for Salient Event Graph Generation

    Authors: Xingwei Tan, Yuxiang Zhou, Gabriele Pergola, Yulan He

    Abstract: Generating event graphs from long documents is challenging due to the inherent complexity of multiple tasks involved such as detecting events, identifying their relationships, and reconciling unstructured input with structured graphs. Recent studies typically consider all events with equal importance, failing to distinguish salient events crucial for understanding narratives. This paper presents C… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 9 + 12 pages

  13. arXiv:2406.18009  [pdf, other

    eess.AS cs.SD

    E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

    Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Xu Tan, Yanqing Liu, Sheng Zhao, Naoyuki Kanda

    Abstract: This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based on the… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  14. arXiv:2406.14228  [pdf, other

    cs.AI

    EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

    Authors: Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Dongsheng Li, Deqing Yang

    Abstract: The rise of powerful large language models (LLMs) has spurred a new trend in building LLM-based autonomous agents for solving complex tasks, especially multi-agent systems. Despite the remarkable progress, we notice that existing works are heavily dependent on human-designed frameworks, which greatly limits the functional scope and scalability of agent systems. How to automatically extend the spec… ▽ More

    Submitted 11 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Work in process

  15. PIG: Prompt Images Guidance for Night-Time Scene Parsing

    Authors: Zhifeng Xie, Rui Qiu, Sen Wang, Xin Tan, Yuan Xie, Lizhuang Ma

    Abstract: Night-time scene parsing aims to extract pixel-level semantic information in night images, aiding downstream tasks in understanding scene object distribution. Due to limited labeled night image datasets, unsupervised domain adaptation (UDA) has become the predominant method for studying night scenes. UDA typically relies on paired day-night image pairs to guide adaptation, but this approach hamper… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: This paper is accepted by IEEE TIP. Code: https://github.com/qiurui4shu/PIG

  16. arXiv:2406.10056  [pdf, other

    cs.SD eess.AS

    UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner

    Authors: Dongchao Yang, Haohan Guo, Yuanyuan Wang, Rongjie Huang, Xiang Li, Xu Tan, Xixin Wu, Helen Meng

    Abstract: The Large Language models (LLMs) have demonstrated supreme capabilities in text understanding and generation, but cannot be directly applied to cross-modal tasks without fine-tuning. This paper proposes a cross-modal in-context learning approach, empowering the frozen LLMs to achieve multiple audio tasks in a few-shot style without any parameter update. Specifically, we propose a novel and LLMs-dr… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  17. arXiv:2406.09641  [pdf, other

    astro-ph.EP

    Phase-resolving the absorption signatures of water and carbon monoxide in the atmosphere of the ultra-hot Jupiter WASP-121b with GEMINI-S/IGRINS

    Authors: Joost P. Wardenier, Vivien Parmentier, Michael R. Line, Megan Weiner Mansfield, Xianyu Tan, Shang-Min Tsai, Jacob L. Bean, Jayne L. Birkby, Matteo Brogi, Jean-Michel Désert, Siddharth Gandhi, Elspeth K. H. Lee, Colette I. Levens, Lorenzo Pino, Peter C. B. Smith

    Abstract: Ultra-hot Jupiters are among the best targets for atmospheric characterization at high spectral resolution. Resolving their transmission spectra as a function of orbital phase offers a unique window into the 3D nature of these objects. In this work, we present three transits of the ultra-hot Jupiter WASP-121b observed with Gemini-S/IGRINS. For the first time, we measure the phase-dependent absorpt… ▽ More

    Submitted 19 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 24 pages, 16 figures, resubmitted to PASP (made a few minor changes to the text w.r.t. v1)

  18. arXiv:2406.09147  [pdf, other

    cs.LG

    Weakly-supervised anomaly detection for multimodal data distributions

    Authors: Xu Tan, Junqi Chen, Sylwan Rahardja, Jiawei Yang, Susanto Rahardja

    Abstract: Weakly-supervised anomaly detection can outperform existing unsupervised methods with the assistance of a very small number of labeled anomalies, which attracts increasing attention from researchers. However, existing weakly-supervised anomaly detection methods are limited as these methods do not factor in the multimodel nature of the real-world data distribution. To mitigate this, we propose the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures. Accepted by 2024 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC)

  19. arXiv:2406.08096  [pdf, other

    cs.CV

    Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement

    Authors: Runyi Yu, Tianyu He, Ailing Zhang, Yuchi Wang, Junliang Guo, Xu Tan, Chang Liu, Jie Chen, Jiang Bian

    Abstract: We aim to edit the lip movements in talking video according to the given speech while preserving the personal identity and visual details. The task can be decomposed into two sub-problems: (1) speech-driven lip motion generation and (2) visual appearance synthesis. Current solutions handle the two sub-problems within a single generative model, resulting in a challenging trade-off between lip-sync… ▽ More

    Submitted 16 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 14 pages of main text, 23 pages in total, 9 figures

  20. arXiv:2406.07478  [pdf, other

    quant-ph cs.CC

    Incompressibility and spectral gaps of random circuits

    Authors: Chi-Fang Chen, Jeongwan Haah, Jonas Haferkamp, Yunchao Liu, Tony Metger, Xinyu Tan

    Abstract: Random reversible and quantum circuits form random walks on the alternating group $\mathrm{Alt}(2^n)$ and unitary group $\mathrm{SU}(2^n)$, respectively. Known bounds on the spectral gap for the $t$-th moment of these random walks have inverse-polynomial dependence in both $n$ and $t$. We prove that the gap for random reversible circuits is $Ω(n^{-3})$ for all $t\geq 1$, and the gap for random qua… ▽ More

    Submitted 8 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 80 pages, 5 figures, v2: added references and minor changes in the presentation

  21. arXiv:2406.06904  [pdf, other

    cs.RO cs.HC

    Person Transfer in the Field: Examining Real World Sequential Human-Robot Interaction Between Two Robots

    Authors: Xiang Zhi Tan, Elizabeth J. Carter, Aaron Steinfeld

    Abstract: With more robots being deployed in the world, users will likely interact with multiple robots sequentially when receiving services. In this paper, we describe an exploratory field study in which unsuspecting participants experienced a ``person transfer'' -- a scenario in which they first interacted with one stationary robot before another mobile robot joined to complete the interaction. In our 7-h… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to RO-MAN 2024

  22. arXiv:2406.05370  [pdf, other

    cs.CL cs.SD eess.AS

    VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

    Authors: Sanyuan Chen, Shujie Liu, Long Zhou, Yanqing Liu, Xu Tan, Jinyu Li, Sheng Zhao, Yao Qian, Furu Wei

    Abstract: This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration introduces two significant enhancements: Repetition Aware Sampling refines the original nucleus sampling process by accounting for token repetition in… ▽ More

    Submitted 17 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: Demo posted

  23. arXiv:2406.04321  [pdf, other

    cs.CV cs.LG cs.MM cs.SD

    VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

    Authors: Zeyue Tian, Zhaoyang Liu, Ruibin Yuan, Jiahao Pan, Xiaoqiang Huang, Qifeng Liu, Xu Tan, Qifeng Chen, Wei Xue, Yike Guo

    Abstract: In this work, we systematically study music generation conditioned solely on the video. First, we present a large-scale dataset comprising 190K video-music pairs, including various genres such as movie trailers, advertisements, and documentaries. Furthermore, we propose VidMuse, a simple framework for generating music aligned with video inputs. VidMuse stands out by producing high-fidelity music t… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: The code and datasets will be available at https://github.com/ZeyueT/VidMuse/

  24. arXiv:2406.03894  [pdf, other

    cs.LG

    Transductive Off-policy Proximal Policy Optimization

    Authors: Yaozhong Gan, Renye Yan, Xiaoyang Tan, Zhe Wu, Junliang Xing

    Abstract: Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies is constrained. This paper introduces a novel off-policy extension to the original PPO method, christened Transductive Off-policy PPO (ToPPO). Herein, we provi… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 18

  25. arXiv:2406.01916  [pdf, other

    cs.CV

    FastLGS: Speeding up Language Embedded Gaussians with Feature Grid Mapping

    Authors: Yuzhou Ji, He Zhu, Junshu Tang, Wuyi Liu, Zhizhong Zhang, Yuan Xie, Lizhuang Ma, Xin Tan

    Abstract: The semantically interactive radiance field has always been an appealing task for its potential to facilitate user-friendly and automated real-world 3D scene understanding applications. However, it is a challenging task to achieve high quality, efficiency and zero-shot ability at the same time with semantics in radiance fields. In this work, we present FastLGS, an approach that supports real-time… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  26. arXiv:2406.01375  [pdf, other

    cs.CL

    D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

    Authors: Haoran Que, Jiaheng Liu, Ge Zhang, Chenchen Zhang, Xingwei Qu, Yinghao Ma, Feiyu Duan, Zhiqi Bai, Jiakai Wang, Yuanxing Zhang, Xu Tan, Jie Fu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

    Abstract: Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to expand the model's fundamental understanding of specific downstream domains (e.g., math and code). For the CPT on domain-specific LLMs, one important question is how to choose the optimal mixture ratio between the general-corpus (e.g., Dolma, Slim-pajama) and the downstream domain-corpus. Existing methods usually… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  27. arXiv:2405.19823  [pdf, other

    cs.LG cs.AI

    Joint Selective State Space Model and Detrending for Robust Time Series Anomaly Detection

    Authors: Junqi Chen, Xu Tan, Sylwan Rahardja, Jiawei Yang, Susanto Rahardja

    Abstract: Deep learning-based sequence models are extensively employed in Time Series Anomaly Detection (TSAD) tasks due to their effective sequential modeling capabilities. However, the ability of TSAD is limited by two key challenges: (i) the ability to model long-range dependency and (ii) the generalization issue in the presence of non-stationary data. To tackle these challenges, an anomaly detector that… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Submitted to IEEE Signal Processing Letters

  28. arXiv:2405.19291  [pdf, other

    cs.RO

    Grasp as You Say: Language-guided Dexterous Grasp Generation

    Authors: Yi-Lin Wei, Jian-Jian Jiang, Chengyi Xing, Xiantuo Tan, Xiao-Ming Wu, Hao Li, Mark Cutkosky, Wei-Shi Zheng

    Abstract: This paper explores a novel task ""Dexterous Grasp as You Say"" (DexGYS), enabling robots to perform dexterous grasping based on human commands expressed in natural language. However, the development of this field is hindered by the lack of datasets with natural human guidance; thus, we propose a language-guided dexterous grasp dataset, named DexGYSNet, offering high-quality dexterous grasp annota… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 9 pages, 7 figures

  29. arXiv:2405.18289  [pdf, other

    cs.LG cs.AI

    Highway Reinforcement Learning

    Authors: Yuhui Wang, Miroslav Strupl, Francesco Faccio, Qingyuan Wu, Haozhe Liu, Michał Grudzień, Xiaoyang Tan, Jürgen Schmidhuber

    Abstract: Learning from multi-step off-policy data collected by a set of policies is a core problem of reinforcement learning (RL). Approaches based on importance sampling (IS) often suffer from large variances due to products of IS ratios. Typical IS-free methods, such as $n$-step Q-learning, look ahead for $n$ time steps along the trajectory of actions (where $n$ is called the lookahead depth) and utilize… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  30. arXiv:2405.17792  [pdf, other

    hep-ex hep-ph

    JUNO Sensitivity to Invisible Decay Modes of Neutrons

    Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

    Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 28 pages, 7 figures, 4 tables

  31. arXiv:2405.17247  [pdf, other

    cs.LG

    An Introduction to Vision-Language Modeling

    Authors: Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie , et al. (16 additional authors not shown)

    Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  32. arXiv:2405.15758  [pdf, other

    cs.CV cs.AI

    InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

    Authors: Yuchi Wang, Junliang Guo, Jianhong Bai, Runyi Yu, Tianyu He, Xu Tan, Xu Sun, Jiang Bian

    Abstract: Recent talking avatar generation models have made strides in achieving realistic and accurate lip synchronization with the audio, but often fall short in controlling and conveying detailed expressions and emotions of the avatar, making the generated video less vivid and controllable. In this paper, we propose a novel text-guided approach for generating emotionally expressive 2D avatars, offering f… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Project page: https://wangyuchi369.github.io/InstructAvatar/

  33. arXiv:2405.15259  [pdf, other

    eess.SY

    Robust Economic Dispatch with Flexible Demand and Adjustable Uncertainty Set

    Authors: Tian Liu, Xiaoqi Tan, Su Wang, Danny H. K. Tsang

    Abstract: With more renewable energy sources (RES) integrated into the power system, the intermittency of RES places a heavy burden on the system. The uncertainty of RES is traditionally handled by controllable generators to balance the real time wind power deviation. As the demand side management develops, the flexibility of aggregate loads can be leveraged to mitigate the negative impact of the wind power… ▽ More

    Submitted 4 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  34. arXiv:2405.15187  [pdf, other

    eess.SY

    Chance-Constrained Economic Dispatch with Flexible Loads and RES

    Authors: Tian Liu, Bo Sun, Xiaoqi Tan, Danny H. K. Tsang

    Abstract: With the increasing penetration of intermittent renewable energy sources (RESs), it becomes increasingly challenging to maintain the supply-demand balance of power systems by solely relying on the generation side. To combat the volatility led by the uncertain RESs, demand-side management by leveraging the multi-dimensional flexibility (MDF) has been recognized as an economic and efficient approach… ▽ More

    Submitted 4 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  35. arXiv:2405.13383  [pdf, other

    cs.LG

    Gradient Projection For Continual Parameter-Efficient Tuning

    Authors: Jingyang Qiao, Zhizhong Zhang, Xin Tan, Yanyun Qu, Wensheng Zhang, Zhi Han, Yuan Xie

    Abstract: Parameter-efficient tunings (PETs) have demonstrated impressive performance and promising perspectives in training large models, while they are still confronted with a common problem: the trade-off between learning new content and protecting old knowledge, e.g., zero-shot generalization ability, and cross-modal hallucination. In this paper, we reformulate Adapter, LoRA, Prefix-tuning, and Prompt-t… ▽ More

    Submitted 3 July, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  36. arXiv:2405.12809  [pdf, other

    hep-ex

    Precision measurement of the branching fraction of \boldmath $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (604 additional authors not shown)

    Abstract: Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$. The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with sig… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: to be submitted to PRD

  37. arXiv:2405.10739  [pdf, other

    cs.CV cs.AI

    Efficient Multimodal Large Language Models: A Survey

    Authors: Yizhang Jin, Jian Li, Yexin Liu, Tianjun Gu, Kai Wu, Zhengkai Jiang, Muyang He, Bo Zhao, Xin Tan, Zhenye Gan, Yabiao Wang, Chengjie Wang, Lizhuang Ma

    Abstract: In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, e… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  38. arXiv:2405.10591  [pdf, other

    cs.CV

    GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision

    Authors: Xin Tan, Wenbin Wu, Zhiwei Zhang, Chaojie Fan, Yong Peng, Zhizhong Zhang, Yuan Xie, Lizhuang Ma

    Abstract: 3D occupancy perception holds a pivotal role in recent vision-centric autonomous driving systems by converting surround-view images into integrated geometric and semantic representations within dense 3D grids. Nevertheless, current models still encounter two main challenges: modeling depth accurately in the 2D-3D view transformation stage, and overcoming the lack of generalizability issues due to… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  39. arXiv:2405.09066  [pdf, other

    hep-ex

    Search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, V. Batozskaya, D. Becker, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, J. Bloms, A. Bortone, I. Boyko , et al. (559 additional authors not shown)

    Abstract: We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 14 pages, 7 figures

  40. arXiv:2405.07682  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation

    Authors: Jianyi Chen, Wei Xue, Xu Tan, Zhen Ye, Qifeng Liu, Yike Guo

    Abstract: Singing Accompaniment Generation (SAG), which generates instrumental music to accompany input vocals, is crucial to developing human-AI symbiotic art creation systems. The state-of-the-art method, SingSong, utilizes a multi-stage autoregressive (AR) model for SAG, however, this method is extremely slow as it generates semantic and acoustic tokens recursively, and this makes it impossible for real-… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: IJCAI 2024

  41. arXiv:2405.07201  [pdf, other

    cs.CV

    Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception

    Authors: Haoming Chen, Zhizhong Zhang, Yanyun Qu, Ruixin Zhang, Xin Tan, Yuan Xie

    Abstract: An effective pre-training framework with universal 3D representations is extremely desired in perceiving large-scale dynamic scenes. However, establishing such an ideal framework that is both task-generic and label-efficient poses a challenge in unifying the representation of the same primitive across diverse scenes. The current contrastive 3D pre-training methods typically follow a frame-level co… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  42. arXiv:2405.01970  [pdf, other

    astro-ph.EP astro-ph.IM

    Direct detectability of tidally heated exomoons by photometric orbital modulation

    Authors: E. Kleisioti, D. Dirkx, X. Tan, M. A. Kenworthy

    Abstract: (Aims) We investigate whether volcanic exomoons can be detected in thermal wavelength light curves due to their phase variability along their orbit. The method we use is based on the photometric signal variability that volcanic features or hotspots would cause in infrared (IR) wavelengths, when they are inhomogeneously distributed on the surface of a tidally heated exomoon (THEM). (Methods) We sim… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted for publication in A&A

    Journal ref: A&A 687, A125 (2024)

  43. arXiv:2405.01881  [pdf

    q-fin.RM cs.LG

    Explainable Risk Classification in Financial Reports

    Authors: Xue Wen Tan, Stanley Kok

    Abstract: Every publicly traded company in the US is required to file an annual 10-K financial report, which contains a wealth of information about the company. In this paper, we propose an explainable deep-learning model, called FinBERT-XRC, that takes a 10-K report as input, and automatically assesses the post-event return volatility risk of its associated company. In contrast to previous systems, our pro… ▽ More

    Submitted 6 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: ICIS 2023 Proceedings. 3. https://aisel.aisnet.org/icis2023/blockchain/blockchain/3

  44. arXiv:2405.01807  [pdf, other

    cs.GT cs.AI

    Algorithmic Decision-Making under Agents with Persistent Improvement

    Authors: Tian Xie, Xuwei Tan, Xueru Zhang

    Abstract: This paper studies algorithmic decision-making under human's strategic behavior, where a decision maker uses an algorithm to make decisions about human agents, and the latter with information about the algorithm may exert effort strategically and improve to receive favorable decisions. Unlike prior works that assume agents benefit from their efforts immediately, we consider realistic scenarios whe… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  45. How to Gain Commit Rights in Modern Top Open Source Communities?

    Authors: Xin Tan, Yan Gong, Geyu Huang, Haohua Wu, Li Zhang

    Abstract: The success of open source software (OSS) projects relies on voluntary contributions from various community roles.Being a committer signifies gaining trust and higher privileges. Substantial studies have focused on the requirements of becoming a committer, but most of them are based on interviews or several hypotheses, lacking a comprehensive understanding of committers' qualifications.We explore… ▽ More

    Submitted 16 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 23 pages,5 figures,FSE 2024

    Journal ref: Proceedings of the ACM on Software Engineering (PACMSE) Issue FSE 2024

  46. arXiv:2405.00728  [pdf

    cs.CL cs.AI cs.HC

    Evaluating the Application of ChatGPT in Outpatient Triage Guidance: A Comparative Study

    Authors: Dou Liu, Ying Han, Xiandi Wang, Xiaomei Tan, Di Liu, Guangwu Qian, Kang Li, Dan Pu, Rong Yin

    Abstract: The integration of Artificial Intelligence (AI) in healthcare presents a transformative potential for enhancing operational efficiency and health outcomes. Large Language Models (LLMs), such as ChatGPT, have shown their capabilities in supporting medical decision-making. Embedding LLMs in medical systems is becoming a promising trend in healthcare development. The potential of ChatGPT to address t… ▽ More

    Submitted 27 April, 2024; originally announced May 2024.

    Comments: 8 pages, 1 figure, conference(International Ergonomics Association)

  47. arXiv:2404.19221  [pdf, other

    cs.CV cs.CL

    Transcrib3D: 3D Referring Expression Resolution through Large Language Models

    Authors: Jiading Fang, Xiangshan Tan, Shengjie Lin, Igor Vasiljevic, Vitor Guizilini, Hongyuan Mei, Rares Ambrus, Gregory Shakhnarovich, Matthew R Walter

    Abstract: If robots are to work effectively alongside people, they must be able to interpret natural language references to objects in their 3D environment. Understanding 3D referring expressions is challenging -- it requires the ability to both parse the 3D structure of the scene and correctly ground free-form language in the presence of distraction and clutter. We introduce Transcrib3D, an approach that b… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: CORLW 2023

  48. arXiv:2404.17119  [pdf, other

    astro-ph.CO

    Revisiting Bounds on Primordial Black Hole as Dark Matter with X-ray Background

    Authors: Xiu-hui Tan, Jun-qing Xia

    Abstract: Within the mass range of $10^{16}-5\times 10^{18}$ g, primordial black holes (PBHs) persist as plausible candidates for dark matter. Our study involves a reassessment of the constraints on PBHs through a comparative analysis of the cosmic X-ray background (CXB) and the emissions arising from their Hawking evaporation. We identify previously overlooked radiation processes across the relevant energy… ▽ More

    Submitted 28 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: 15 pages, 6 figures

  49. arXiv:2404.16813  [pdf, other

    astro-ph.SR astro-ph.EP

    Atmospheric Retrievals of the Phase-resolved Spectra of Irradiated Brown Dwarfs WD-0137B and EPIC-2122B

    Authors: Joshua D. Lothringer, Yifan Zhou, Daniel Apai, Xianyu Tan, Vivien Parmentier, Sarah L. Casewell

    Abstract: We present an atmospheric retrieval analysis of HST/WFC3/G141 spectroscopic phase curve observations of two brown dwarfs, WD-0137B and EPIC-2122B, in ultra-short period orbits around white dwarf hosts. These systems are analogous to hot and ultra-hot Jupiter systems, enabling a unique and high-precision comparison to exoplanet systems. We use the PETRA retrieval suite to test various analysis setu… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 19 pages, 15 figures, 3 tables. Accepted for publication in ApJ

  50. arXiv:2404.16488  [pdf, other

    astro-ph.EP

    Two-Dimensional Eclipse Mapping of the Hot Jupiter WASP-43b with JWST MIRI/LRS

    Authors: Mark Hammond, Taylor J. Bell, Ryan C. Challener, Neil T. Lewis, Megan Weiner Mansfield, Isaac Malsky, Emily Rauscher, Jacob L. Bean, Ludmila Carone, João M. Mendonça, Lucas Teinturier, Xianyu Tan, Nicolas Crouzet, Laura Kreidberg, Giuseppe Morello, Vivien Parmentier, Jasmina Blecic, Jean-Michel Désert, Christiane Helling, Pierre-Olivier Lagage, Karan Molaverdikhani, Matthew C. Nixon, Benjamin V. Rackham, Jingxuan Yang

    Abstract: We present eclipse maps of the two-dimensional thermal emission from the dayside of the hot Jupiter WASP-43b, derived from an observation of a phase curve with the JWST MIRI/LRS instrument. The observed eclipse shapes deviate significantly from those expected for a planet emitting uniformly over its surface. We fit a map to this deviation, constructed from spherical harmonics up to order… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted for publication in The Astronomical Journal