Skip to main content

Showing 1–50 of 2,507 results for author: Ma, Z

  1. arXiv:2407.09367  [pdf, other

    cs.CV

    Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation

    Authors: Zhilin Zhu, Xiaopeng Hong, Zhiheng Ma, Weijun Zhuang, Yaohui Ma, Dai Yong, Yaowei Wang

    Abstract: Continual Test-Time Adaptation (CTTA) involves adapting a pre-trained source model to continually changing unsupervised target domains. In this paper, we systematically analyze the challenges of this task: online environment, unsupervised nature, and the risks of error accumulation and catastrophic forgetting under continual domain shifts. To address these challenges, we reshape the online data bu… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: This is the preprint version of our paper and supplemental material to appear in ECCV 2024

  2. arXiv:2407.08713  [pdf, other

    cs.CL cs.AI

    GTA: A Benchmark for General Tool Agents

    Authors: Jize Wang, Zerun Ma, Yining Li, Songyang Zhang, Cailian Chen, Kai Chen, Xinyi Le

    Abstract: Significant focus has been placed on integrating large language models (LLMs) with various tools in developing general-purpose agents. This poses a challenge to LLMs' tool-use capabilities. However, there are evident gaps between existing tool-use evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only interactions, fa… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Github repo: https://github.com/open-compass/GTA

  3. arXiv:2407.08457  [pdf, other

    cs.CV

    Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending

    Authors: Delong Wu, Hao Zhu, Qi Zhang, You Li, Zhan Ma, Xun Cao

    Abstract: Implicit Neural Representation (INR) has become a popular method for representing visual signals (e.g., 2D images and 3D scenes), demonstrating promising results in various downstream applications. Given its potential as a medium for visual signals, exploring the development of a neural blending method that utilizes INRs is a natural progression. Neural blending involves merging two INRs to create… ▽ More

    Submitted 11 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: accepted by ECCV 2024

  4. arXiv:2407.07895  [pdf, other

    cs.CV cs.CL cs.LG

    LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

    Authors: Feng Li, Renrui Zhang, Hao Zhang, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, Chunyuan Li

    Abstract: Visual instruction tuning has made considerable strides in enhancing the capabilities of Large Multimodal Models (LMMs). However, existing open LMMs largely focus on single-image tasks, their applications to multi-image scenarios remains less explored. Additionally, prior LMM research separately tackles different scenarios, leaving it impossible to generalize cross scenarios with new emerging capa… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Project Page: https://llava-vl.github.io/blog/2024-06-16-llava-next-interleave/

  5. arXiv:2407.07651  [pdf, other

    hep-ex physics.data-an

    Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$

    Authors: M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (645 additional authors not shown)

    Abstract: The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  6. arXiv:2407.07479  [pdf, other

    cs.CV

    How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?

    Authors: Yuxin Chen, Zongyang Ma, Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Bing Li, Junfu Pu, Ying Shan, Xiaojuan Qi, Weiming Hu

    Abstract: Dominant dual-encoder models enable efficient image-text retrieval but suffer from limited accuracy while the cross-encoder models offer higher accuracy at the expense of efficiency. Distilling cross-modality matching knowledge from cross-encoder to dual-encoder provides a natural approach to harness their strengths. Thus we investigate the following valuable question: how to make cross-encoder a… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by CVPR 2024

  7. arXiv:2407.07478  [pdf, other

    cs.CV

    EA-VTR: Event-Aware Video-Text Retrieval

    Authors: Zongyang Ma, Ziqi Zhang, Yuxin Chen, Zhongang Qi, Chunfeng Yuan, Bing Li, Yingmin Luo, Xu Li, Xiaojuan Qi, Ying Shan, Weiming Hu

    Abstract: Understanding the content of events occurring in the video and their inherent temporal logic is crucial for video-text retrieval. However, web-crawled pre-training datasets often lack sufficient event information, and the widely adopted video-level cross-modal contrastive learning also struggles to capture detailed and complex video-text event alignment. To address these challenges, we make improv… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  8. arXiv:2407.07035  [pdf, other

    cs.CL cs.CV

    Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

    Authors: Yue Zhang, Ziqiao Ma, Jialu Li, Yanyuan Qiao, Zun Wang, Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi

    Abstract: Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for VLN research. In this survey, we provide a top-down review that adopts a principled framework for embodied planning and reasoning, and emphasizes the… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Authors contributed equally to this work, and supervisors contributed equal advising to this work

  9. arXiv:2407.06494  [pdf, other

    cs.LG cs.AI

    A Generative Approach to Control Complex Physical Systems

    Authors: Long Wei, Peiyan Hu, Ruiqi Feng, Haodong Feng, Yixuan Du, Tao Zhang, Rui Wang, Yue Wang, Zhi-Ming Ma, Tailin Wu

    Abstract: Controlling the evolution of complex physical systems is a fundamental task across science and engineering. Classical techniques suffer from limited applicability or huge computational costs. On the other hand, recent deep learning and reinforcement learning-based approaches often struggle to optimize long-term control sequences under the constraints of system dynamics. In this work, we introduce… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  10. arXiv:2407.06192  [pdf, other

    cs.CV cs.AI cs.CL

    Multi-Object Hallucination in Vision-Language Models

    Authors: Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang, Sihan Xu, Shengyi Qian, Jianing Yang, David F. Fouhey, Joyce Chai

    Abstract: Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images. While current benchmarks for object hallucination primarily concentrate on the presence of a single object class rather than individual entities, this work systematically investigates multi-object hallucination, examining how models misperceive (e.g., invent nonexistent o… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to ALVR @ ACL 2024 | Project page: https://multi-object-hallucination.github.io/

  11. arXiv:2407.05407  [pdf, other

    cs.SD cs.AI eess.AS

    CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

    Authors: Zhihao Du, Qian Chen, Shiliang Zhang, Kai Hu, Heng Lu, Yexin Yang, Hangrui Hu, Siqi Zheng, Yue Gu, Ziyang Ma, Zhifu Gao, Zhijie Yan

    Abstract: Recent years have witnessed a trend that large language model (LLM) based text-to-speech (TTS) emerges into the mainstream due to their high naturalness and zero-shot capacity. In this paradigm, speech signals are discretized into token sequences, which are modeled by an LLM with text as prompts and reconstructed by a token-based vocoder to waveforms. Obviously, speech tokens play a critical role… ▽ More

    Submitted 9 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: work in progress. arXiv admin note: substantial text overlap with arXiv:2407.04051

  12. arXiv:2407.04193  [pdf, ps, other

    cs.IT

    Combinatorial Constructions of Optimal Quaternary Additive Codes

    Authors: Chaofeng Guan, Jingjie Lv, Gaojun Luo, Zhi Ma

    Abstract: This paper aims to construct optimal quaternary additive codes with non-integer dimensions. Firstly, we propose combinatorial constructions of quaternary additive constant-weight codes, alongside additive anticode construction. Subsequently, we propose generalized Construction X, which facilitates the construction of non-integer dimensional optimal additive codes from linear codes. Then, we constr… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: This work was mainly completed in the summer of 2023, and here we add some new developments. Everyone is welcome to discuss issues related to additional code with the first author

  13. arXiv:2407.04051  [pdf, other

    cs.SD cs.AI eess.AS

    FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

    Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress. Authors are listed in alphabetical order by family name

  14. arXiv:2407.02899  [pdf, other

    hep-ex

    Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

    Abstract: A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  15. arXiv:2407.02726  [pdf, other

    quant-ph

    General Communication Enhancement via the Quantum Switch

    Authors: Zhen Wu, James Fullwood, Zhihao Ma, Siqi Zhou, Qi Zhao, Giulio Chiribella

    Abstract: Recent studies have shown that quantum information may be effectively transmitted by a finite collection of completely depolarizing channels in a coherent superposition of different orders, via an operation known as the quantum $\tt SWITCH$. Such results are quite remarkable, as completely depolarizing channels taken in isolation and in a definite order can only output white noise. For general cha… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 5 figures

  16. arXiv:2407.02247  [pdf

    physics.optics physics.app-ph

    Hypermultiplexed off-chip hologram by on-chip integrated metasurface

    Authors: Xianjin Liu, Zhanying Ma, Dasen Zhang, Qiwen Bao, Zhenzhen Liu, Jun-Jun Xiao

    Abstract: The waveguide-integrated metasurface introduces a novel photonic chip capable of converting guided modes into free-space light. This enables functions such as off-chip beam focusing, steering, and imaging. The challenge lies in achieving hypermultiplexing across diverse parameters, including guided-wave mode type, direction, polarization, and notably, multiple wavelengths. Here, we introduce a com… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  17. arXiv:2407.02214  [pdf

    physics.optics

    Enhanced Second-Harmonic Generation in Thin-Film Lithium Niobate Circular Bragg Nanocavity

    Authors: Zengya Li, Zhuoran Hu, Xiaona Ye, Zhengyang Mao, Juan Feng, Hao Li, Shijie Liu, Bo Wang, Yuanlin Zheng, Xianfeng Chen

    Abstract: Second-order nonlinearity gives rise to many distinctive physical phenomena, e.g., second-harmonic generation, which plays an important role in fundamental science and various applications. Lithium niobate, one of the most widely used nonlinear crystals, exhibits strong second-order nonlinear effects and electro-optic properties. However, its moderate refractive index and etching sidewall angle li… ▽ More

    Submitted 11 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: 19 pages, 5 figures

  18. arXiv:2407.02040  [pdf, other

    cs.CV cs.AI cs.MM

    ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation

    Authors: Zhiyuan Ma, Yuxiang Wei, Yabin Zhang, Xiangyu Zhu, Zhen Lei, Lei Zhang

    Abstract: By leveraging the text-to-image diffusion priors, score distillation can synthesize 3D contents without paired text-3D training data. Instead of spending hours of online optimization per text prompt, recent studies have been focused on learning a text-to-3D generative network for amortizing multiple text-3D relations, which can synthesize 3D contents in seconds. However, existing score distillatio… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. Code available at https://github.com/theEricMa/ScaleDreamer

  19. arXiv:2407.00281  [pdf

    cond-mat.str-el cond-mat.mes-hall

    Distinguishing Surface and Bulk Electromagnetism via Their Dynamics in an Intrinsic Magnetic Topological Insulator

    Authors: Khanh Duy Nguyen, Woojoo Lee, Jianchen Dang, Tongyao Wu, Gabriele Berruto, Chenhui Yan, Chi Ian Jess Ip, Haoran Lin, Qiang Gao, Seng Huat Lee, Binghai Yan, Chaoxing Liu, Zhiqiang Mao, Xiao-Xiao Zhang, Shuolong Yang

    Abstract: The indirect exchange interaction between local magnetic moments via surface electrons has been long predicted to bolster the surface ferromagnetism in magnetic topological insulators (MTIs), which facilitates the quantum anomalous Hall effect. This unconventional effect is critical to determining the operating temperatures of future topotronic devices. However, the experimental confirmation of th… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 19 pages, 4 figures

  20. arXiv:2407.00136  [pdf, other

    hep-ex

    Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, S. Ahmed, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (495 additional authors not shown)

    Abstract: Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions… ▽ More

    Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

  21. arXiv:2406.19661  [pdf, ps, other

    hep-ph

    Heavy quarkonium spectral function in the spinning black hole background

    Authors: Zhou-Run Zhu, Manman Sun, Rui Zhou, Zhuang Ma, Jinzhong Han

    Abstract: In this paper, we study the dissociation of heavy quarkonium in the spinning black hole background. Specifically, we analyze the spectral function of charmonium and bottomonium in the spinning black hole background and examine how the angular momentum affects the dissociation of $J/Ψ$ and $Υ(1S)$. From the results, we find that the angular momentum decreases the peak height and expands the peak wi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 12 pages, 2 figures

  22. arXiv:2406.19190  [pdf, ps, other

    hep-ex

    Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

    Abstract: Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 13 pages, 6 figures

  23. arXiv:2406.18583  [pdf, other

    cs.CV cs.LG

    Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT

    Authors: Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao

    Abstract: Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lu… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Code at: https://github.com/Alpha-VLLM/Lumina-T2X

  24. arXiv:2406.18183  [pdf, other

    hep-ex

    Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

    Abstract: Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 26 pages,5 tables, 4 figures

  25. arXiv:2406.18083  [pdf, other

    hep-ex

    Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

    Abstract: Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 19 pages, 2 figures

  26. arXiv:2406.17452  [pdf, ps, other

    hep-ex

    Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (649 additional authors not shown)

    Abstract: We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  27. arXiv:2406.16698  [pdf, other

    cs.LG cs.CY

    Learning Interpretable Fair Representations

    Authors: Tianhao Wang, Zana Buçinca, Zilin Ma

    Abstract: Numerous approaches have been recently proposed for learning fair representations that mitigate unfair outcomes in prediction tasks. A key motivation for these methods is that the representations can be used by third parties with unknown objectives. However, because current fair representations are generally not interpretable, the third party cannot use these fair representations for exploration,… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  28. arXiv:2406.15752  [pdf, other

    eess.AS cs.AI cs.CL

    TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers

    Authors: Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Guanrou Yang, Xie Chen

    Abstract: Neural codec language model (LM) has demonstrated strong capability in zero-shot text-to-speech (TTS) synthesis. However, the codec LM often suffers from limitations in inference speed and stability, due to its auto-regressive nature and implicit alignment between text and audio. In this work, to handle these challenges, we introduce a new variant of neural codec LM, namely TacoLM. Specifically, T… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  29. arXiv:2406.15704  [pdf, other

    cs.CV

    video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

    Authors: Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

    Abstract: Speech understanding as an element of the more generic video understanding using audio-visual large language models (av-LLMs) is a crucial yet understudied aspect. This paper proposes video-SALMONN, a single end-to-end av-LLM for video processing, which can understand not only visual frame sequences, audio events and music, but speech as well. To obtain fine-grained temporal information required b… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML 2024. arXiv admin note: substantial text overlap with arXiv:2310.05863

  30. arXiv:2406.15034  [pdf, other

    cs.CV

    SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition

    Authors: Liutao Yu, Liwei Huang, Chenlin Zhou, Han Zhang, Zhengyu Ma, Huihui Zhou, Yonghong Tian

    Abstract: Video action recognition (VAR) plays crucial roles in various domains such as surveillance, healthcare, and industrial automation, making it highly significant for the society. Consequently, it has long been a research spot in the computer vision field. As artificial neural networks (ANNs) are flourishing, convolution neural networks (CNNs), including 2D-CNNs and 3D-CNNs, as well as variants of th… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by IJCAI 2024 workshop - Human Brain and Artificial Intelligence

  31. arXiv:2406.15030  [pdf, ps, other

    hep-ex

    Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

    Abstract: Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 11 pages, 3 figures

  32. arXiv:2406.14991  [pdf, other

    cs.CL cs.SE

    SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation

    Authors: Zeyao Ma, Bohan Zhang, Jing Zhang, Jifan Yu, Xiaokang Zhang, Xiaohan Zhang, Sijia Luo, Xi Wang, Jie Tang

    Abstract: We introduce SpreadsheetBench, a challenging spreadsheet manipulation benchmark exclusively derived from real-world scenarios, designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users. Unlike existing benchmarks that rely on synthesized queries and simplified spreadsheet files, SpreadsheetBench is built from 912 real questions gathered from online Excel… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Homepage: https://spreadsheetbench.github.io/

  33. arXiv:2406.14433  [pdf

    physics.app-ph cond-mat.mtrl-sci

    Structural and Electrical Properties of Grafted Si/GaAsSb Heterojunction

    Authors: Haris Naeem Abbasi, Seunghyun Lee, Hyemin Jung, Nathan Gajowski, Yi Lu, Linus Wang, Donghyeok Kim, Jie Zhou, Jiarui Gong, Chris Chae, Jinwoo Hwang, Manisha Muduli, Subramanya Nookala, Zhenqiang Ma, Sanjay Krishna

    Abstract: The short-wave infrared (SWIR) wavelength, especially 1.55 um, has attracted significant attention in various areas such as high-speed optical communication and LiDAR systems. Avalanche photodiodes (APDs) are a critical component as a receiver in these systems due to their internal gain which enhances the system performance. Silicon-based APDs are promising since they are CMOS compatible, but they… ▽ More

    Submitted 24 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 14 pages, 6 figures

  34. arXiv:2406.14001  [pdf, other

    astro-ph.HE hep-ph

    High-energy Neutrino Emission from NGC 1068 by Outflow-cloud Interactions

    Authors: Yong-Han Huang, Kai Wang, Zhi-Peng Ma

    Abstract: As the hottest high-energy neutrino spot, NGC 1068 has received much attention in recent years. Here we focus on the central region of the active galactic nuclei (AGN) and propose an outflow-cloud interaction model that could probably explain the observed neutrino data. Considering the accretion process adjacent to the central supermassive black hole (SMBH) of NGC 1068, strong outflows will be gen… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 7 Pages, 4 figures, 1 table; submitted

  35. arXiv:2406.13326  [pdf

    cond-mat.soft cond-mat.mes-hall cond-mat.mtrl-sci

    Chiral π Domain Walls Composed of Twin Half-Integer Surface Disclinations in Ferroelectric Nematic Liquid Crystals

    Authors: Shengzhu Yi, Zening Hong, Zhongjie Ma, Chao Zhou, Miao Jiang, Xiang Huang, Mingjun Huang, Satoshi Aya, Rui Zhang, Qi-Huo Wei

    Abstract: Ferroelectric nematic liquid crystals are polar fluids characterized by microscopic orientational ordering and macroscopic spontaneous polarizations. Within these fluids, walls that separate domains of different polarizations are ubiquitous. We demonstrate that the π walls in films of polar fluids consist of twin half-integer surface disclinations spaced horizontally, enclosing a subdomain where t… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  36. arXiv:2406.13317  [pdf, other

    cs.CV

    M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and Atmosphere

    Authors: Mengqiu Xu, Ming Wu, Kaixin Chen, Yixiang Huang, Mingrui Xu, Yujia Yang, Yiqing Feng, Yiying Guo, Bin Huang, Dongliang Chang, Zhenwei Shi, Chuang Zhang, Zhanyu Ma, Jun Guo

    Abstract: Marine fog poses a significant hazard to global shipping, necessitating effective detection and forecasting to reduce economic losses. In recent years, several machine learning (ML) methods have demonstrated superior detection accuracy compared to traditional meteorological methods. However, most of these works are developed on proprietary datasets, and the few publicly accessible datasets are oft… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  37. arXiv:2406.13265  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Molecule Graph Networks with Many-body Equivariant Interactions

    Authors: Zetian Mao, Jiawen Li, Chen Liang, Diptesh Das, Masato Sumita, Koji Tsuda

    Abstract: Message passing neural networks have demonstrated significant efficacy in predicting molecular interactions. Introducing equivariant vectorial representations augments expressivity by capturing geometric data symmetries, thereby improving model accuracy. However, two-body bond vectors in opposition may cancel each other out during message passing, leading to the loss of directional information on… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  38. arXiv:2406.13215  [pdf, other

    cs.CV cs.AI

    Neural Residual Diffusion Models for Deep Scalable Vision Generation

    Authors: Zhiyuan Ma, Liangliang Zhao, Biqing Qi, Bowen Zhou

    Abstract: The most advanced diffusion models have recently adopted increasingly deep stacked networks (e.g., U-Net or Transformer) to promote the generative emergence capabilities of vision generation models similar to large language models (LLMs). However, progressively deeper stacked networks will intuitively cause numerical propagation errors and reduce noisy prediction capabilities on generative data, w… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  39. arXiv:2406.12910  [pdf

    cs.LG cs.AI cs.NE physics.chem-ph q-bio.BM

    Human-level molecular optimization driven by mol-gene evolution

    Authors: Jiebin Fang, Churu Mao, Yuchen Zhu, Xiaoming Chen, Chang-Yu Hsieh, Zhongjun Ma

    Abstract: De novo molecule generation allows the search for more drug-like hits across a vast chemical space. However, lead optimization is still required, and the process of optimizing molecular structures faces the challenge of balancing structural novelty with pharmacological properties. This study introduces the Deep Genetic Molecular Modification Algorithm (DGMM), which brings structure modification to… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  40. arXiv:2406.12846  [pdf, other

    cs.CV

    DrVideo: Document Retrieval Based Long Video Understanding

    Authors: Ziyu Ma, Chenhui Gou, Hengcan Shi, Bin Sun, Shutao Li, Hamid Rezatofighi, Jianfei Cai

    Abstract: Existing methods for long video understanding primarily focus on videos only lasting tens of seconds, with limited exploration of techniques for handling longer videos. The increased number of frames in longer videos presents two main challenges: difficulty in locating key information and performing long-range reasoning. Thus, we propose DrVideo, a document-retrieval-based system designed for long… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 11 pages

  41. arXiv:2406.11824  [pdf, other

    cs.CV

    Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

    Authors: Alexander Raistrick, Lingjie Mei, Karhan Kayan, David Yan, Yiming Zuo, Beining Han, Hongyu Wen, Meenal Parakh, Stamatis Alexandropoulos, Lahav Lipson, Zeyu Ma, Jia Deng

    Abstract: We introduce Infinigen Indoors, a Blender-based procedural generator of photorealistic indoor scenes. It builds upon the existing Infinigen system, which focuses on natural scenes, but expands its coverage to indoor scenes by introducing a diverse library of procedural indoor assets, including furniture, architecture elements, appliances, and other day-to-day objects. It also introduces a constrai… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024

  42. arXiv:2406.11775  [pdf, other

    cs.CV cs.AI

    Task Me Anything

    Authors: Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna

    Abstract: Benchmarks for large multimodal language models (MLMs) now serve to simultaneously assess the general capabilities of models instead of evaluating for a specific capability. As a result, when a developer wants to identify which models to use for their application, they are overwhelmed by the number of benchmarks and remain uncertain about which benchmark's results are most reflective of their spec… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: website: https://www.task-me-anything.org

  43. arXiv:2406.11546  [pdf, other

    eess.AS cs.CL cs.SD

    GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

    Authors: Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen

    Abstract: The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired spee… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under review

  44. arXiv:2406.10264  [pdf

    cs.RO cs.CL

    Large Language Model-empowered multimodal strain sensory system for shape recognition, monitoring, and human interaction of tensegrity

    Authors: Zebing Mao, Ryota Kobayashi, Hiroyuki Nabae, Koichi Suzumori

    Abstract: A tensegrity-based system is a promising approach for dynamic exploration of uneven and unpredictable environments, particularly, space exploration. However, implementing such systems presents challenges in terms of intelligent aspects: state recognition, wireless monitoring, human interaction, and smart analyzing and advising function. Here, we introduce a 6-strut tensegrity integrate with 24 mul… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  45. arXiv:2406.09475  [pdf, other

    hep-ex

    Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

    Abstract: Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  46. arXiv:2406.09435  [pdf, ps, other

    math.AP

    Dynamics of the combined nonlinear Schrödinger equation with inverse-square potential

    Authors: Zuyu Ma, Yilin Song, Jiqiang Zheng

    Abstract: We consider the long-time dynamics of focusing energy-critical Schrödinger equation perturbed by the $\dot{H}^\frac{1}{2}$-critical nonlinearity and with inverse-square potential(CNLS$_a$) in dimensions $d\in\{3,4,5\}$ \begin{equation}\label{NLS-ab} \begin{cases} i\partial_tu-\mathcal{L}_au=-|u|^{\frac{4}{d-2}}u+|u|^{\frac{4}{d-1}}u, \quad (t,x)\in\mathbb{R}\times\mathbb{R}^d,\tag{CNLS$_a$},\\ u(0… ▽ More

    Submitted 17 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: 62 pages

  47. arXiv:2406.09264  [pdf, other

    cs.HC cs.AI cs.CL

    Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

    Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

    Abstract: Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve th… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 56 pages

  48. arXiv:2406.08225  [pdf, ps, other

    hep-ex

    Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (636 additional authors not shown)

    Abstract: Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  49. arXiv:2406.08177  [pdf, other

    eess.IV cs.CV

    One-Step Effective Diffusion Network for Real-World Image Super-Resolution

    Authors: Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, Lei Zhang

    Abstract: The pre-trained text-to-image diffusion models have been increasingly employed to tackle the real-world image super-resolution (Real-ISR) problem due to their powerful generative image priors. Most of the existing methods start from random noise to reconstruct the high-quality (HQ) image under the guidance of the given low-quality (LQ) image. While promising results have been achieved, such Real-… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  50. arXiv:2406.08135  [pdf

    cs.RO

    Design, modeling, and characteristics of ringshaped robot actuated by functional fluid

    Authors: Zebing Mao, Xuehang Bai, Yanhong Peng, Yayi Shen

    Abstract: The controlled actuation of hydraulic and pneumatic actuators has unveiled fresh and thrilling opportunities for designing mobile robots with adaptable structures. Previously reported rolling robots, which were powered by fluidic systems, often relied on complex principles, cumbersome pump and valve systems, and intricate control strategies, limiting their applicability in other fields. In this in… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.