Skip to main content

Showing 1–50 of 1,356 results for author: Gao, Z

  1. arXiv:2407.06152  [pdf, other

    physics.chem-ph cs.AI

    Uni-ELF: A Multi-Level Representation Learning Framework for Electrolyte Formulation Design

    Authors: Boshen Zeng, Sian Chen, Xinxin Liu, Changhong Chen, Bin Deng, Xiaoxu Wang, Zhifeng Gao, Yuzhi Zhang, Weinan E, Linfeng Zhang

    Abstract: Advancements in lithium battery technology heavily rely on the design and engineering of electrolytes. However, current schemes for molecular design and recipe optimization of electrolytes lack an effective computational-experimental closed loop and often fall short in accurately predicting diverse electrolyte formulation properties. In this work, we introduce Uni-ELF, a novel multi-level represen… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2407.05407  [pdf, other

    cs.SD cs.AI eess.AS

    CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

    Authors: Zhihao Du, Qian Chen, Shiliang Zhang, Kai Hu, Heng Lu, Yexin Yang, Hangrui Hu, Siqi Zheng, Yue Gu, Ziyang Ma, Zhifu Gao, Zhijie Yan

    Abstract: Recent years have witnessed a trend that large language model (LLM) based text-to-speech (TTS) emerges into the mainstream due to their high naturalness and zero-shot capacity. In this paradigm, speech signals are discretized into token sequences, which are modeled by an LLM with text as prompts and reconstructed by a token-based vocoder to waveforms. Obviously, speech tokens play a critical role… ▽ More

    Submitted 9 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: work in progress. arXiv admin note: substantial text overlap with arXiv:2407.04051

  3. arXiv:2407.05395  [pdf, other

    nucl-th

    Quantifying angular distributions in multinucleon transfer reactions with a semi-classical method

    Authors: Zehong Liao, Zepeng Gao, Yu Yang, Yueping Fang, Jun Su, Long Zhu

    Abstract: The multinucleon transfer (MNT) process in low-energy heavy ion collisions can be utilized to produce unknown nuclei far beyond the stability line. However, the reaction products exhibit broad angular and energy distributions, which could lower the experimental detection efficiency. We present a classical approach that employs a parameterized angular distribution to describe the complex issue. By… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 6 pages, 6 figure

  4. arXiv:2407.04405  [pdf, other

    cs.LG cs.AI

    Discovering symbolic expressions with parallelized tree search

    Authors: Kai Ruan, Ze-Feng Gao, Yike Guo, Hao Sun, Ji-Rong Wen, Yang Liu

    Abstract: Symbolic regression plays a crucial role in modern scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data. A grand challenge lies in the arduous search for parsimonious and generalizable mathematical formulas, in an infinite search space, while intending to fit the training data. Existing algorithms have faced a critical bottleneck… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    ACM Class: I.2

  5. arXiv:2407.04100  [pdf, other

    cs.CV

    C$^3$DG: Conditional Domain Generalization for Hyperspectral Imagery Classification with Convergence and Constrained-risk Theories

    Authors: Zhe Gao, Bin Pan, Zhenwei Shi

    Abstract: Hyperspectral imagery (HSI) classification may suffer the challenge of hyperspectral-monospectra, where different classes present similar spectra. Joint spatial-spectral feature extraction is a popular solution for the problem, but this strategy tends to inflate accuracy since test pixels may exist in training patches. Domain generalization methods show promising potential, but they still fail to… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  6. arXiv:2407.04051  [pdf, other

    cs.SD cs.AI eess.AS

    FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

    Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress. Authors are listed in alphabetical order by family name

  7. arXiv:2407.02893  [pdf, other

    cs.CV

    An Uncertainty-guided Tiered Self-training Framework for Active Source-free Domain Adaptation in Prostate Segmentation

    Authors: Zihao Luo, Xiangde Luo, Zijun Gao, Guotai Wang

    Abstract: Deep learning models have exhibited remarkable efficacy in accurately delineating the prostate for diagnosis and treatment of prostate diseases, but challenges persist in achieving robust generalization across different medical centers. Source-free Domain Adaptation (SFDA) is a promising technique to adapt deep segmentation models to address privacy and security concerns while reducing domain shif… ▽ More

    Submitted 4 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures, 2 tables, accept to MICCAI 2024

  8. arXiv:2407.02814  [pdf, other

    cs.AI cs.CL cs.CV

    Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective

    Authors: Zhaotian Weng, Zijun Gao, Jerone Andrews, Jieyu Zhao

    Abstract: Vision-language models (VLMs) pre-trained on extensive datasets can inadvertently learn biases by correlating gender information with specific objects or scenarios. Current methods, which focus on modifying inputs and monitoring changes in the model's output probability scores, often struggle to comprehensively understand bias from the perspective of model components. We propose a framework that i… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    ACM Class: I.2.7

  9. arXiv:2407.01926  [pdf

    physics.med-ph cs.CV

    Chemical Shift Encoding based Double Bonds Quantification in Triglycerides using Deep Image Prior

    Authors: Chaoxing Huang, Ziqiang Yu, Zijian Gao, Qiuyi Shen, Queenie Chan, Vincent Wai-Sun Wong, Winnie Chiu-Wing Chu, Weitian Chen

    Abstract: This study evaluated a deep learning-based method using Deep Image Prior (DIP) to quantify triglyceride double bonds from chemical-shift encoded multi-echo gradient echo images without network training. We employed a cost function based on signal constraints to iteratively update the neural network on a single dataset. The method was validated using phantom experiments and in vivo scans. Results s… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  10. arXiv:2407.01517  [pdf, other

    eess.IV cs.CV cs.LG

    Centerline Boundary Dice Loss for Vascular Segmentation

    Authors: Pengcheng Shi, Jiesi Hu, Yanwu Yang, Zilve Gao, Wei Liu, Ting Ma

    Abstract: Vascular segmentation in medical imaging plays a crucial role in analysing morphological and functional assessments. Traditional methods, like the centerline Dice (clDice) loss, ensure topology preservation but falter in capturing geometric details, especially under translation and deformation. The combination of clDice with traditional Dice loss can lead to diameter imbalance, favoring larger ves… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: accepted by MICCAI 2024

  11. arXiv:2407.01304  [pdf, ps, other

    math.AG math.NT

    Heights and periods of algebraic cycles in families

    Authors: Ziyang Gao, Shou-Wu Zhang

    Abstract: We consider the Beilinson--Bloch heights and Abel--Jacobian periods of homologically trivial Chow cycles in families. For the Beilinson--Bloch heights, we show that for any $g\ge 2$, there is a Zariski open dense subset $U$ of $\mathcal{M}_g$, the coarse moduli of curves of genus $g$ over rationals, such that the heights of Ceresa cycles and Gross--Schoen cycles over $U$ satisfy the Northcott prop… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Comments are welcome

  12. arXiv:2407.01220  [pdf, other

    cs.CV

    Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation

    Authors: Zihan Gao, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Wenping Ma, Yuwei Guo, Shuyuan Yang

    Abstract: Understanding 3D scenes is a crucial challenge in computer vision research with applications spanning multiple domains. Recent advancements in distilling 2D vision-language foundation models into neural fields, like NeRF and 3DGS, enables open-vocabulary segmentation of 3D scenes from 2D multi-view images without the need for precise 3D annotations. While effective, however, the per-pixel distilla… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 16 pages, 7 figures

  13. arXiv:2407.00050  [pdf, other

    q-bio.BM cs.AI cs.LG

    FoldToken2: Learning compact, invariant and generative protein structure language

    Authors: Zhangyang Gao, Cheng Tan, Stan Z. Li

    Abstract: The equivalent nature of 3D coordinates has posed long term challenges in protein structure representation learning, alignment, and generation. Can we create a compact and invariant language that equivalently represents protein structures? Towards this goal, we propose FoldToken2 to transfer equivariant structures into discrete tokens, while maintaining the recoverability of the original structure… ▽ More

    Submitted 11 June, 2024; originally announced July 2024.

  14. arXiv:2406.19853  [pdf, other

    cs.CL cs.AI

    YuLan: An Open-source Large Language Model

    Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  15. arXiv:2406.19130  [pdf, other

    cs.CV

    Evidential Concept Embedding Models: Towards Reliable Concept Explanations for Skin Disease Diagnosis

    Authors: Yibo Gao, Zheyao Gao, Xin Gao, Yuanye Liu, Bomin Wang, Xiahai Zhuang

    Abstract: Due to the high stakes in medical decision-making, there is a compelling demand for interpretable deep learning methods in medical image analysis. Concept Bottleneck Models (CBM) have emerged as an active interpretable framework incorporating human-interpretable concepts into decision-making. However, their concept predictions may lack reliability when applied to clinical diagnosis, impeding conce… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: accepted by MICCAI 2024

  16. arXiv:2406.17810  [pdf, other

    physics.comp-ph cs.AI physics.optics

    PIC2O-Sim: A Physics-Inspired Causality-Aware Dynamic Convolutional Neural Operator for Ultra-Fast Photonic Device FDTD Simulation

    Authors: Pingchuan Ma, Haoyu Yang, Zhengqi Gao, Duane S. Boning, Jiaqi Gu

    Abstract: The finite-difference time-domain (FDTD) method, which is important in photonic hardware design flow, is widely adopted to solve time-domain Maxwell equations. However, FDTD is known for its prohibitive runtime cost, taking minutes to hours to simulate a single device. Recently, AI has been applied to realize orders-of-magnitude speedup in partial differential equation (PDE) solving. However, AI-b… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  17. arXiv:2406.17626  [pdf, other

    cs.CL cs.AI

    CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference

    Authors: Erxin Yu, Jing Li, Ming Liao, Siqi Wang, Zuchen Gao, Fei Mi, Lanqing Hong

    Abstract: As large language models (LLMs) constantly evolve, ensuring their safety remains a critical research problem. Previous red-teaming approaches for LLM safety have primarily focused on single prompt attacks or goal hijacking. To the best of our knowledge, we are the first to study LLM safety in multi-turn dialogue coreference. We created a dataset of 1,400 questions across 14 categories, each featur… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Submitted to EMNLP 2024

  18. arXiv:2406.17255  [pdf, other

    cs.CL

    MPCODER: Multi-user Personalized Code Generator with Explicit and Implicit Style Representation Learning

    Authors: Zhenlong Dai, Chang Yao, WenKang Han, Ying Yuan, Zhipeng Gao, Jingyuan Chen

    Abstract: Large Language Models (LLMs) have demonstrated great potential for assisting developers in their daily development. However, most research focuses on generating correct code, how to use LLMs to generate personalized code has seldom been investigated. To bridge this gap, we proposed MPCoder (Multi-user Personalized Code Generator) to generate personalized code for multiple users. To better learn co… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024, Main Conference

  19. arXiv:2406.16603  [pdf, other

    cond-mat.mtrl-sci

    Bipolarized Weyl semimetals and quantum crystal valley Hall effect in two-dimensional altermagnetic materials

    Authors: Chao-Yang Tan, Ze-Feng Gao, Huan-Cheng Yang, Kai Liu, Peng-Jie Guo, Zhong-Yi Lu

    Abstract: Magnetism and topology are two major areas of condensed matter physics. The combination of magnetism and topology gives rise to more novel physical effects, which have attracted strongly theoretical and experimental attention. Recently, the concept of altermagnetism has been introduced, characterized by a dual nature: real-space antiferromagnetism and reciprocal-space anisotropic spin polarization… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures

  20. arXiv:2406.16189  [pdf, other

    eess.IV cs.CV

    Fuzzy Attention-based Border Rendering Network for Lung Organ Segmentation

    Authors: Sheng Zhang, Yang Nan, Yingying Fang, Shiyi Wang, Xiaodan Xing, Zhifan Gao, Guang Yang

    Abstract: Automatic lung organ segmentation on CT images is crucial for lung disease diagnosis. However, the unlimited voxel values and class imbalance of lung organs can lead to false-negative/positive and leakage issues in advanced methods. Additionally, some slender lung organs are easily lost during the recycled down/up-sample procedure, e.g., bronchioles & arterioles, causing severe discontinuity issue… ▽ More

    Submitted 1 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024

  21. arXiv:2406.14969  [pdf, other

    cs.LG cs.AI

    Uni-Mol2: Exploring Molecular Pretraining Model at Scale

    Authors: Xiaohong Ji, Zhen Wang, Zhifeng Gao, Hang Zheng, Linfeng Zhang, Guolin Ke, Weinan E

    Abstract: In recent years, pretraining models have made significant advancements in the fields of natural language processing (NLP), computer vision (CV), and life sciences. The significant advancements in NLP and CV are predominantly driven by the expansion of model parameters and data size, a phenomenon now recognized as the scaling laws. However, research exploring scaling law in molecular pretraining mo… ▽ More

    Submitted 1 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  22. arXiv:2406.13436  [pdf, other

    cs.HC cs.AI

    What's Next? Exploring Utilization, Challenges, and Future Directions of AI-Generated Image Tools in Graphic Design

    Authors: Yuying Tang, Mariana Ciancia, Zhigang Wang, Ze Gao

    Abstract: Recent advancements in artificial intelligence, such as computer vision and deep learning, have led to the emergence of numerous generative AI platforms, particularly for image generation. However, the application of AI-generated image tools in graphic design has not been extensively explored. This study conducted semi-structured interviews with seven designers of varying experience levels to unde… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  23. arXiv:2406.13170  [pdf, other

    cs.AI cs.CL

    Amphista: Accelerate LLM Inference with Bi-directional Multiple Drafting Heads in a Non-autoregressive Style

    Authors: Zeping Li, Xinlong Yang, Ziheng Gao, Ji Liu, Zhuang Liu, Dong Li, Jinzhang Peng, Lu Tian, Emad Barsoum

    Abstract: Large Language Models (LLMs) inherently use autoregressive decoding, which lacks parallelism in inference and results in significantly slow inference speeds, especially when hardware parallel accelerators and memory bandwidth are not fully utilized. In this work, we propose Amphista, a speculative decoding algorithm that adheres to a non-autoregressive decoding paradigm. Owing to the increased par… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  24. arXiv:2406.12460  [pdf, other

    math.NA

    An extrapolation-driven network architecture for physics-informed deep learning

    Authors: Yong Wang, Yanzhong Yao, Zhiming Gao

    Abstract: Deep learning with physics-informed neural networks (PINNs) has emerged as a highly popular and effective approach for solving partial differential equations(PDEs). In this paper, we first investigate the extrapolation capability of the PINN method for time-dependent PDEs. Taking advantage of this extrapolation property, we can generalize the training result obtained in the time subinterval to the… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  25. arXiv:2406.12414  [pdf, other

    quant-ph

    Harnessing spontaneous emission of correlated photon pairs from ladder-type giant atoms

    Authors: Zhao-Min Gao, Jia-Qi Li, Ying-Huan Wu, Wen-Xiao Liu, Xin Wang

    Abstract: The realization of correlated multi-photon processes usually depends on the interaction between nonlinear media and atoms. However, the nonlinearity of optical materials is generally weak, making it still very challenging to achieve correlated multi-photon dynamics at the few-photon level. Meanwhile, giant atoms, with their capability for multi-point coupling, which is a novel paradigm in quantum… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 12 pages; 10 figures

  26. arXiv:2406.11816  [pdf, other

    cs.CV

    VideoLLM-online: Online Video Large Language Model for Streaming Video

    Authors: Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Zheng Shou

    Abstract: Recent Large Language Models have been enhanced with vision capabilities, enabling them to comprehend images, videos, and interleaved vision-language content. However, the learning methods of these large multimodal models typically treat videos as predetermined clips, making them less effective and efficient at handling streaming video inputs. In this paper, we propose a novel Learning-In-Video-St… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: CVPR 2024. This arxiv version is upgraded with Llama-3

  27. arXiv:2406.11204  [pdf

    physics.optics

    Magnetically tunable optical bound states in the continuum with arbitrary polarization and intrinsic chirality

    Authors: Qing-an Tu, Hongxin Zhou, Yan Meng, Maohua Gong, Zhen Gao

    Abstract: Optical bound states in the continuum (BICs), which are exotic localized eigenstates embedded in the continuum spectrum and topological polarization singularity in momentum space, have attracted great attentions in both fundamental and applied physics. Here, based on magneto-optical photonic crystal slab placed in external magnetic fields to break the time-reversal symmetry, we theoretically demon… ▽ More

    Submitted 1 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 13 pages, 4 figures

  28. arXiv:2406.10840  [pdf, other

    cs.LG cs.AI q-bio.BM

    CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph

    Authors: Haitao Lin, Guojiang Zhao, Odin Zhang, Yufei Huang, Lirong Wu, Zicheng Liu, Siyuan Li, Cheng Tan, Zhifeng Gao, Stan Z. Li

    Abstract: Structure-based drug design (SBDD) aims to generate potential drugs that can bind to a target protein and is greatly expedited by the aid of AI techniques in generative models. However, a lack of systematic understanding persists due to the diverse settings, complex implementation, difficult reproducibility, and task singularity. Firstly, the absence of standardization can lead to unfair compariso… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 9 pages main context

  29. arXiv:2406.10188  [pdf, ps, other

    math.CV

    $L^{\vec{p}}-L^{\vec{q}}$ Boundedness of Multiparameter Forelli-Rudin Type Operators on the Siegel Upper Half-space

    Authors: Hongheng Yin, Guan-Tie Deng, Zhi-Qiang Gao

    Abstract: In this article,we present exactly when two classes of multiparameter Forelli-Rudin type integral operators are bounded from one weighted mixed-norm Lebesgue space $L^{\vec{p}}$ to another space $L^{\vec{q}}$ over the Siegel upper half-space.

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 22 pages

  30. arXiv:2406.09953  [pdf, other

    cs.RO cs.AI

    DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning

    Authors: Zeyu Gao, Yao Mu, Jinye Qu, Mengkang Hu, Lingyue Guo, Ping Luo, Yanfeng Lu

    Abstract: Dual-arm robots offer enhanced versatility and efficiency over single-arm counterparts by enabling concurrent manipulation of multiple objects or cooperative execution of tasks using both arms. However, effectively coordinating the two arms for complex long-horizon tasks remains a significant challenge. Existing task planning methods predominantly focus on single-arm robots or rely on predefined b… ▽ More

    Submitted 30 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 46 pages, 13 figures

  31. arXiv:2406.09890  [pdf, other

    astro-ph.GA

    ALMA Lensing Cluster Survey: Physical characterization of near-infrared-dark intrinsically faint ALMA sources at z=2-4

    Authors: Akiyoshi Tsujita, Kotaro Kohno, Shuo Huang, Masamune Oguri, Ken-ichi Tadaki, Ian Smail, Hideki Umehata, Zhen-Kai Gao, Wei-Hao Wang, Fengwu Sun, Seiji Fujimoto, Tao Wang, Ryosuke Uematsu, Daniel Espada, Francesco Valentino, Yiping Ao, Franz E. Bauer, Bunyo Hatsukade, Fumi Egusa, Yuri Nishimura, Anton M. Koekemoer, Daniel Schaerer, Claudia Lagos, Miroslava Dessauges-Zavadsky, Gabriel Brammer , et al. (11 additional authors not shown)

    Abstract: We present results from Atacama Large Millimeter/submillimeter Array (ALMA) spectral line-scan observations at 3-mm and 2-mm bands of three near-infrared-dark (NIR-dark) galaxies behind two massive lensing clusters MACS J0417.5-1154 and RXC J0032.1+1808. Each of these three sources is a faint (de-lensed $S_{\text{1.2 mm}}$ $<$ 1 mJy) triply lensed system originally discovered in the ALMA Lensing C… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 23 pages, 10 figures, Submitted to ApJ

  32. arXiv:2406.08418  [pdf, other

    cs.CV cs.AI

    OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    Authors: Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang , et al. (15 additional authors not shown)

    Abstract: Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data aids multimodal in-context learning and maintains the capabilities of large language models during multimodal fine-tuning. However, the limited scale an… ▽ More

    Submitted 12 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  33. arXiv:2406.07868  [pdf, other

    stat.ME

    Bridging multiple worlds: multi-marginal optimal transport for causal partial-identification problem

    Authors: Zijun Gao, Shu Ge, Jian Qian

    Abstract: Under the prevalent potential outcome model in causal inference, each unit is associated with multiple potential outcomes but at most one of which is observed, leading to many causal quantities being only partially identified. The inherent missing data issue echoes the multi-marginal optimal transport (MOT) problem, where marginal distributions are known, but how the marginals couple to form the j… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  34. arXiv:2406.07068  [pdf

    cond-mat.mtrl-sci

    Emergent Moiré fringes in direct-grown quasicrystal

    Authors: Jingwei Li, Kejie Bao, Honglin Sun, Xingxu Yan, Ting Huang, Qicheng Zhang, Yaoqiang Zhou, Zhenjing Liu, Paul Masih Das, Jiawen You, Jiong Zhao, Jianbin Xu, Xiaoqing Pan, Yongli Mi, Junyi Zhu, Zhaoli Gao

    Abstract: Quasicrystals represent a category of rarely structured solids that challenge traditional periodicity in crystal materials. Recent advancements in the synthesis of two-dimensional (2D) van der Waals materials have paved the way for exploring the unique physical properties of these systems. Here, we report on the synthesis of 2D quasicrystals featuring 30° alternating twist angles between multiple… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  35. arXiv:2406.06986  [pdf, other

    cs.LG

    DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach

    Authors: Zhang Liu, Hongyang Du, Junzhe Lin, Zhibin Gao, Lianfen Huang, Seyyedali Hosseinalipour, Dusit Niyato

    Abstract: The rapid advancement of Artificial Intelligence (AI) has introduced Deep Neural Network (DNN)-based tasks to the ecosystem of vehicular networks. These tasks are often computation-intensive, requiring substantial computation resources, which are beyond the capability of a single vehicle. To address this challenge, Vehicular Edge Computing (VEC) has emerged as a solution, offering computing servic… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 16 pages, 9 figures, and with extra appendix

  36. arXiv:2406.06867  [pdf

    cond-mat.mes-hall cond-mat.mtrl-sci

    Electrically Tunable Magnetoconductance of Close-Packed CVD Bilayer Graphene Layer Stacking Walls

    Authors: Qicheng Zhang, Sheng Wang, Zhaoli Gao, Sebastian Hurtado-Parra, Joel Berry, Zachariah Addison, Paul Masih Das, William M. Parkin, Marija Drndic, James M. Kikkawa, Feng Wang, Eugene J. Mele, A. T. Charlie Johnson, Zhengtang Luo

    Abstract: Quantum valley Hall (QVH) domain wall states are a new class of one-dimensional (1D) one-way conductors that are topologically protected in the absence of valley mixing. Development beyond a single QVH channel raises important new questions as to how QVH channels in close spatial proximity interact with each other, and how that interaction may be controlled. Scalable epitaxial bilayer graphene syn… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  37. arXiv:2406.05839  [pdf, other

    eess.AS cs.AI

    MaLa-ASR: Multimedia-Assisted LLM-Based ASR

    Authors: Guanrou Yang, Ziyang Ma, Fan Yu, Zhifu Gao, Shiliang Zhang, Xie Chen

    Abstract: As more and more information-rich data like video become available, utilizing multi-modal auxiliary information to enhance audio tasks has sparked widespread research interest. The recent surge in research on LLM-based audio models provides fresh perspectives for tackling audio tasks. Given that LLM can flexibly ingest multiple inputs, we propose MaLa-ASR, an LLM-based ASR model that can integrate… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  38. arXiv:2406.05688  [pdf, other

    cs.CL cs.AI cs.LG

    Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

    Authors: Cheng Tan, Dongxin Lyu, Siyuan Li, Zhangyang Gao, Jingxuan Wei, Siqi Ma, Zicheng Liu, Stan Z. Li

    Abstract: Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-r… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Under review

  39. arXiv:2406.05676  [pdf

    cond-mat.mtrl-sci cond-mat.mes-hall

    Chern insulator phase realized in dual-gate-tuned MnBi2Te4 thin films grown by molecular beam epitaxy

    Authors: Yunhe Bai, Yuanzhao Li, Ruixuan Liu, Jianli Luan, Yang Chen, Wenyu Song, Peng-Fei Ji, Cui Ding, Zongwei Gao, Qinghua Zhang, Fanqi Meng, Bingbing Tong, Lin Li, Tianchen Zhu, Lin Gu, Lili Wang, Jinsong Zhang, Yayu Wang, Qi-Kun Xue, Ke He, Yang Feng, Xiao Feng

    Abstract: The intrinsic magnetic order, large topological-magnetic gap and rich topological phases make MnBi2Te4 a wonderful platform to study exotic topological quantum states such as axion insulator and Chern insulator. To realize and manipulate these topological phases in a MnBi2Te4 thin film, precise manipulation of the electric field across the film is essential, which requires a dual-gate structure. I… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 24 pages, 4 figures

  40. arXiv:2406.04961  [pdf, other

    cs.CV

    Multiplane Prior Guided Few-Shot Aerial Scene Rendering

    Authors: Zihan Gao, Licheng Jiao, Lingling Li, Xu Liu, Fang Liu, Puhua Chen, Yuwei Guo

    Abstract: Neural Radiance Fields (NeRF) have been successfully applied in various aerial scenes, yet they face challenges with sparse views due to limited supervision. The acquisition of dense aerial views is often prohibitive, as unmanned aerial vehicles (UAVs) may encounter constraints in perspective range and energy constraints. In this work, we introduce Multiplane Prior guided NeRF (MPNeRF), a novel ap… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 17 pages, 8 figures, accepted at CVPR 2024

    Journal ref: CVPR 2024

  41. arXiv:2406.04821  [pdf, other

    cs.RO

    Deep Learning Powered Estimate of The Extrinsic Parameters on Unmanned Surface Vehicles

    Authors: Yi Shen, Hao Liu, Chang Zhou, Wentao Wang, Zijun Gao, Qi Wang

    Abstract: Unmanned Surface Vehicles (USVs) are pivotal in marine exploration, but their sensors' accuracy is compromised by the dynamic marine environment. Traditional calibration methods fall short in these conditions. This paper introduces a deep learning architecture that predicts changes in the USV's dynamic metacenter and refines sensors' extrinsic parameters in real time using a Time-Sequence General… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted by The 9th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS 2024)

  42. arXiv:2406.04809  [pdf, other

    cs.CR cs.AI

    A Survey of Fragile Model Watermarking

    Authors: Zhenzhe Gao, Yu Cheng, Zhaoxia Yin

    Abstract: Model fragile watermarking, inspired by both the field of adversarial attacks on neural networks and traditional multimedia fragile watermarking, has gradually emerged as a potent tool for detecting tampering, and has witnessed rapid development in recent years. Unlike robust watermarks, which are widely used for identifying model copyrights, fragile watermarks for models are designed to identify… ▽ More

    Submitted 8 July, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Submitted Signal Processing

  43. arXiv:2406.04727  [pdf, other

    cs.LG cond-mat.soft cs.AI

    Predicting Polymer Properties Based on Multimodal Multitask Pretraining

    Authors: Fanmeng Wang, Wentao Guo, Minjie Cheng, Shen Yuan, Hongteng Xu, Zhifeng Gao

    Abstract: In the past few decades, polymers, high-molecular-weight compounds formed by bonding numerous identical or similar monomers covalently, have played an essential role in various scientific fields. In this context, accurate prediction of their properties is becoming increasingly crucial. Typically, the properties of a polymer, such as plasticity, conductivity, bio-compatibility, and so on, are highl… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  44. arXiv:2406.03659  [pdf

    physics.flu-dyn physics.ao-ph

    Applications of Deep Learning parameterization of Ocean Momentum Forcing

    Authors: Guosong Wang, Min Hou, Xinrong Wu, Xidong Wang, Zhigang Gao, Hongli Fu, Bo Dan, Chunjian Sun, Xiaoshuang Zhang

    Abstract: Mesoscale eddies are of utmost importance in understanding ocean dynamics and the transport of heat, salt, and nutrients. Accurate representation of these eddies in ocean models is essential for improving model predictions. However, accurately representing these mesoscale features in numerical models is challenging due to their relatively small size. In this study, we propose a convolutional neura… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  45. arXiv:2406.03438  [pdf, other

    cs.IT eess.SP

    CSI-GPT: Integrating Generative Pre-Trained Transformer with Federated-Tuning to Acquire Downlink Massive MIMO Channels

    Authors: Ye Zeng, Li Qiao, Zhen Gao, Tong Qin, Zhonghuai Wu, Sheng Chen, Mohsen Guizani

    Abstract: In massive multiple-input multiple-output (MIMO) systems, how to reliably acquire downlink channel state information (CSI) with low overhead is challenging. In this work, by integrating the generative pre-trained Transformer (GPT) with federated-tuning, we propose a CSI-GPT approach to realize efficient downlink CSI acquisition. Specifically, we first propose a Swin Transformer-based channel acqui… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  46. arXiv:2406.03412  [pdf, other

    hep-ph nucl-th

    Nonlocal chiral contributions to generalized parton distributions of the proton at nonzero skewness

    Authors: Zhengyang Gao, Fangcheng He, Chueng-Ryong Ji, W. Melnitchouk, Y. Salamu, P. Wang

    Abstract: We compute the one-loop contributions to spin-averaged generalized parton distributions (GPDs) in the proton from pseudoscalar mesons with intermediate octet and decuplet baryon states at nonzero skewness. Our framework is based on nonlocal covariant chiral effective theory, with ultraviolet divergences regularized by introducing a relativistic regulator derived consistently from the nonlocal Lagr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 38 pages, 9 figures

  47. arXiv:2406.02518  [pdf, other

    cs.CV eess.IV

    DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering

    Authors: Zhongpai Gao, Benjamin Planche, Meng Zheng, Xiao Chen, Terrence Chen, Ziyan Wu

    Abstract: Digitally reconstructed radiographs (DRRs) are simulated 2D X-ray images generated from 3D CT volumes, widely used in preoperative settings but limited in intraoperative applications due to computational bottlenecks, especially for accurate but heavy physics-based Monte Carlo methods. While analytical DRR renderers offer greater efficiency, they overlook anisotropic X-ray image formation phenomena… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  48. arXiv:2406.01605  [pdf, other

    eess.IV cs.CV

    An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

    Authors: Zijun Gao, Qi Wang, Taiyuan Mei, Xiaohan Cheng, Yun Zi, Haowei Yang

    Abstract: The traditional SegNet architecture commonly encounters significant information loss during the sampling process, which detrimentally affects its accuracy in image semantic segmentation tasks. To counter this challenge, we introduce an innovative encoder-decoder network structure enhanced with residual connections. Our approach employs a multi-residual connection strategy designed to preserve the… ▽ More

    Submitted 26 May, 2024; originally announced June 2024.

  49. arXiv:2406.01586  [pdf, other

    cs.RO cs.AI

    ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation

    Authors: Guanxing Lu, Zifeng Gao, Tianxing Chen, Wenxun Dai, Ziwei Wang, Yansong Tang

    Abstract: Diffusion models have been verified to be effective in generating complex distributions from natural images to motion trajectories. Recent diffusion-based methods show impressive performance in 3D robotic manipulation tasks, whereas they suffer from severe runtime inefficiency due to multiple denoising steps, especially with high-dimensional observations. To this end, we propose a real-time roboti… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: https://manicm-fast.github.io/

  50. arXiv:2406.01154  [pdf, other

    cs.CV

    UniUSNet: A Promptable Framework for Universal Ultrasound Disease Prediction and Tissue Segmentation

    Authors: Zehui Lin, Zhuoneng Zhang, Xindi Hu, Zhifan Gao, Xin Yang, Yue Sun, Dong Ni, Tao Tan

    Abstract: Ultrasound is a widely used imaging modality in clinical practice due to its low cost, portability, and safety. Current research in general AI for healthcare focuses on large language models and general segmentation models, with insufficient attention to solutions addressing both disease prediction and tissue segmentation. In this study, we propose a novel universal framework for ultrasound, namel… ▽ More

    Submitted 20 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.