Skip to main content

Showing 1–50 of 871 results for author: Cheng, Z

  1. arXiv:2407.08506  [pdf, other

    cs.RO

    Imitation Learning for Robotic Assisted Ultrasound Examination of Deep Venous Thrombosis using Kernelized Movement Primitives

    Authors: Diego Dall'Alba, Lorenzo Busellato, Thiusius Rajeeth Savarimuthu, Zhuoqi Cheng, Iñigo Iturrate

    Abstract: Deep Vein Thrombosis (DVT) is a common yet potentially fatal condition, often leading to critical complications like pulmonary embolism. DVT is commonly diagnosed using Ultrasound (US) imaging, which can be inconsistent due to its high dependence on the operator's skill. Robotic US Systems (RUSs) aim to improve diagnostic test consistency but face challenges with the complex scanning pattern neede… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  2. arXiv:2407.07053  [pdf, other

    cs.CV

    Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

    Authors: Wenqi Zhang, Zhenglin Cheng, Yuanyu He, Mengna Wang, Yongliang Shen, Zeqi Tan, Guiyang Hou, Mingqian He, Yanna Ma, Weiming Lu, Yueting Zhuang

    Abstract: Although most current large multimodal models (LMMs) can already understand photos of natural scenes and portraits, their understanding of abstract images, e.g., charts, maps, or layouts, and visual reasoning capabilities remains quite rudimentary. They often struggle with simple daily tasks, such as reading time from a clock, understanding a flowchart, or planning a route using a road map. In lig… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: code: https://github.com/zwq2018/Multi-modal-Self-instruct dataset: https://huggingface.co/datasets/zwq2018/Multi-modal-Self-instruct Leaderboard: https://multi-modal-self-instruct.github.io/

  3. arXiv:2407.05118  [pdf, other

    cs.CV

    SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding

    Authors: Zixu Cheng, Yujiang Pu, Shaogang Gong, Parisa Kordjamshidi, Yu Kong

    Abstract: Temporal grounding, a.k.a video moment retrieval, aims at locating video segments corresponding to a given query sentence. The compositional nature of natural language enables the localization beyond predefined events, posing a certain challenge to the compositional generalizability of existing methods. Recent studies establish the correspondence between videos and queries through a decompose-reco… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  4. arXiv:2407.03636  [pdf, other

    cs.CV

    Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration

    Authors: Yuhong Zhang, Hengsheng Zhang, Xinning Chai, Zhengxue Cheng, Rong Xie, Li Song, Wenjun Zhang

    Abstract: Image restoration is a classic low-level problem aimed at recovering high-quality images from low-quality images with various degradations such as blur, noise, rain, haze, etc. However, due to the inherent complexity and non-uniqueness of degradation in real-world images, it is challenging for a model trained for single tasks to handle real-world restoration problems effectively. Moreover, existin… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  5. arXiv:2406.19859  [pdf, other

    cs.AI cs.HC cs.MM

    MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis

    Authors: Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Qi He, Wangmeng Xiang, Hanyuan Chen, Jin-Peng Lan, Xianhui Lin, Kang Zhu, Bin Luo, Yifeng Geng, Xuansong Xie, Alexander G. Hauptmann

    Abstract: MetaDesigner revolutionizes artistic typography synthesis by leveraging the strengths of Large Language Models (LLMs) to drive a design paradigm centered around user engagement. At the core of this framework lies a multi-agent system comprising the Pipeline, Glyph, and Texture agents, which collectively enable the creation of customized WordArt, ranging from semantic enhancements to the imposition… ▽ More

    Submitted 4 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: 18 pages, 16 figures, Project: https://modelscope.cn/studios/WordArt/WordArt

  6. arXiv:2406.19236  [pdf, other

    cs.AI cs.CV cs.RO

    Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions

    Authors: Minghan Li, Heng Li, Zhi-Qi Cheng, Yifei Dong, Yuxuan Zhou, Jun-Yan He, Qi Dai, Teruko Mitamura, Alexander G. Hauptmann

    Abstract: Vision-and-Language Navigation (VLN) aims to develop embodied agents that navigate based on human instructions. However, current VLN frameworks often rely on static environments and optimal expert supervision, limiting their real-world applicability. To address this, we introduce Human-Aware Vision-and-Language Navigation (HA-VLN), extending traditional VLN by incorporating dynamic human activitie… ▽ More

    Submitted 4 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 30 pages, 18 figures, Project Page: https://lpercc.github.io/HA3D_simulator/

  7. arXiv:2406.15877  [pdf, other

    cs.SE cs.AI cs.CL

    BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

    Authors: Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu , et al. (8 additional authors not shown)

    Abstract: Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires… ▽ More

    Submitted 26 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: 44 pages, 14 figures, 7 tables, built with love by the BigCode community :)

  8. arXiv:2406.15835  [pdf

    cond-mat.mtrl-sci

    Alternating-Chiral Charge Density Waves and Hybrid Ferrimagnetism in Monolayered NbTe2

    Authors: Yusong Bai, Guohua Cao, Jinghao Deng, Haomin Fei, Xiaoyu Lin, Leiqiang Li, Chao Zhu, Zemin Pan, Tao Jian, Da Huo, Zhengbo Cheng, Chih-Kang Shih, Ping Cui, Chendong Zhang, Zhenyu Zhang

    Abstract: Intertwining of different quantum degrees of freedom manifests exotic quantum phenomena in many-body systems, especially in reduced dimensionality. Here we show that monolayered NbTe2 serves as an ideal platform where lattice, charge, and spin degrees of freedom manifest cooperatively, leading to a new and threading order of chirality. By using spin-polarized scanning tunneling microscopy/spectros… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  9. arXiv:2406.15060  [pdf, other

    nucl-th

    Evidence for Three-$α$ Breathing Modes Uncovered by Control Neural Network

    Authors: Zheng Cheng, Mengjiao Lyu, Takayuki Myo, Hisashi Horiuchi, Hiroshi Toki, Zhongzhou Ren, Masahiro Isaka, Mengyun Mao, Hiroki Takemoto, Niu Wan, Wenlong You, Qing Zhao

    Abstract: This work introduces a new Control Neural Network (Ctrl.NN) method to uncover evidence of exotic quantum state, \textit{i.e.}, the breathing modes in 3-$α$ resonant states of $^{12}$C nucleus. We provide the most precise microscopic description to date for the $^{12}$C energy spectrum, identify two new exotic breathing states, and uncover strong evidence that directly connects the recent experimen… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  10. arXiv:2406.14025  [pdf

    cond-mat.mtrl-sci

    Direct Observation of Dendrites Nucleation in Li Metal Battery by Machine Learning Accelerated Molecular Simulations under Realistic Electrochemical Conditions

    Authors: Taiping Hu, Haichao Huang, Guobing Zhou, Xinyan Wang, Zheng Cheng, Fangjia Fu, Xiaoxu Wang, Fuzhi Dai, Kuang Yu, Shenzhen Xu

    Abstract: Uncontrollable dendrites growth during electrochemical cycles leads to low Coulombic efficiency and critical safety issues in Li metal batteries. Hence, a comprehensive understanding of the dendrite formation mechanism is essential for further enhancing the performance of Li metal batteries. Machine learning accelerated molecular dynamics (MD) simulations can provide atomic-scale resolution for va… ▽ More

    Submitted 3 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  11. arXiv:2406.13702  [pdf

    cond-mat.str-el cond-mat.mtrl-sci

    Van-Hove annihilation and nematic instability on a Kagome lattice

    Authors: Yu-Xiao Jiang, Sen Shao, Wei Xia, M. Michael Denner, Julian Ingham, Md Shafayat Hossain, Qingzheng Qiu, Xiquan Zheng, Hongyu Chen, Zi-Jia Cheng, Xian P. Yang, Byunghoon Kim, Jia-Xin Yin, Songbo Zhang, Maksim Litskevich, Qi Zhang, Tyler A. Cochran, Yingying Peng, Guoqing Chang, Yanfeng Guo, Ronny Thomale, Titus Neupert, M. Zahid Hasan

    Abstract: Novel states of matter arise in quantum materials due to strong interactions among electrons. A nematic phase breaks the point group symmetry of the crystal lattice and is known to emerge in correlated materials. Here we report the observation of an intra-unit-cell nematic order and signatures of Pomeranchuk instability in the Kagome metal ScV6Sn6. Using scanning tunneling microscopy and spectrosc… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 19 pages, 5 figures, accepted for publication in Nature materials

  12. arXiv:2406.11161  [pdf, other

    cs.AI cs.MM

    Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

    Authors: Zebang Cheng, Zhi-Qi Cheng, Jun-Yan He, Jingdong Sun, Kai Wang, Yuxiang Lin, Zheng Lian, Xiaojiang Peng, Alexander Hauptmann

    Abstract: Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling. However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing su… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 37 pages, 12 figures, Project: https://github.com/ZebangCheng/Emotion-LLaMA, Demo: https://huggingface.co/spaces/ZebangCheng/Emotion-LLaMA

  13. arXiv:2406.10575  [pdf, ps, other

    math.GT math.RA

    The necessity of (co)unit in nearly Frobenius algebra

    Authors: Zhiyun Cheng, Ziyi Lei

    Abstract: In this article, we concern the concept of nearly Frobenius algebra, which corresponds to most 2D-TQFT of which each cobordism admits no critical points of index 0 or 2. We prove that any nearly Frobenius algebra over a principal ideal domain with surjective multiplication and injective comultiplication is indeed a Frobenius algebra. The motivation of this study mainly emanates from the investigat… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 23 pages, 6 figures

    MSC Class: 16T10; 16S10; 57K18

  14. arXiv:2406.09375  [pdf, other

    stat.ML cs.LG math.ST

    Learning conditional distributions on continuous spaces

    Authors: Cyril Bénézet, Ziteng Cheng, Sebastian Jaimungal

    Abstract: We investigate sample-based learning of conditional distributions on multi-dimensional unit boxes, allowing for different dimensions of the feature and target spaces. Our approach involves clustering data near varying query points in the feature space to create empirical measures in the target space. We employ two distinct clustering schemes: one based on a fixed-radius ball and the other on neare… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  15. arXiv:2406.09180  [pdf, other

    cs.LG

    Detection-Rate-Emphasized Multi-objective Evolutionary Feature Selection for Network Intrusion Detection

    Authors: Zi-Hang Cheng, Haopu Shang, Chao Qian

    Abstract: Network intrusion detection is one of the most important issues in the field of cyber security, and various machine learning techniques have been applied to build intrusion detection systems. However, since the number of features to describe the network connections is often large, where some features are redundant or noisy, feature selection is necessary in such scenarios, which can both improve t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  16. arXiv:2406.08689  [pdf, other

    cs.CR cs.AI

    Security of AI Agents

    Authors: Yifeng He, Ethan Wang, Yuyang Rong, Zifei Cheng, Hao Chen

    Abstract: The study and development of AI agents have been boosted by large language models. AI agents can function as intelligent assistants and complete tasks on behalf of their users with access to tools and the ability to execute commands in their environments, Through studying and experiencing the workflow of typical AI agents, we have raised several concerns regarding their security. These potential v… ▽ More

    Submitted 20 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  17. arXiv:2406.07476  [pdf, other

    cs.CV cs.CL

    VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

    Authors: Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing

    Abstract: In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks. Building upon its predecessor, VideoLLaMA 2 incorporates a tailor-made Spatial-Temporal Convolution (STC) connector, which effectively captures the intricate spatial and temporal dynamics of video data… ▽ More

    Submitted 17 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: ZC, SL, HZ, YX, and XL contributed equally to this project

  18. arXiv:2406.06396  [pdf, other

    physics.plasm-ph physics.optics

    Lightwave-controlled relativistic plasma mirrors

    Authors: Marie Ouillé, Jaismeen Kaur, Zhao Cheng, Stefan Haessler, Rodrigo Lopez-Martens

    Abstract: We report on attosecond-scale control of high-harmonic and electron emission from plasma mirrors driven by relativistic-intensity near-single-cycle lightwaves at kHz repetition rate. By controlling the waveform of the intense light transient, we reproducibly form a sub-cycle temporal intensity gate at the plasma mirror surface, leading to the observation of extreme ultraviolet spectral continua, c… ▽ More

    Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  19. arXiv:2406.06279  [pdf, other

    cs.CL

    Multi-Prompting Decoder Helps Better Language Understanding

    Authors: Zifeng Cheng, Zhaoling Chen, Zhiwei Jiang, Yafeng Yin, Shiping Ge, Yuliang Liu, Qing Gu

    Abstract: Recent Pre-trained Language Models (PLMs) usually only provide users with the inference APIs, namely the emerging Model-as-a-Service (MaaS) setting. To adapt MaaS PLMs to downstream tasks without accessing their parameters and gradients, some existing methods focus on the output-side adaptation of PLMs, viewing the PLM as an encoder and then optimizing a task-specific decoder for decoding the outp… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  20. arXiv:2406.06104  [pdf

    cond-mat.mtrl-sci

    Correlated electrons of the flat band in charge density wave state of 4Hb-TaSexS2-x

    Authors: Yanyan Geng, Jianfeng Guo, Fanyu Meng, Manyu Wang, Shuo Mi, Li Huang, Rui Xu, Fei Pang, Kai Liu, Shancai Wang, Hong-Jun Gao, Weichang Zhou, Wei Ji, Hechang Lei, Zhihai Cheng

    Abstract: Many intriguing quantum states of matter, such as unconventional superconductivity, magnetic phases and fractional quantum Hall physics, emergent from the spatially-correlated localized electrons in the flat band of solid materials. By using scanning tunneling microscopy and spectroscopy (STM/STS), we report the real-space investigation of correlated electrons in the flat band of superlattice 4Hb-… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 18 pages, 4 figures

  21. arXiv:2406.06031  [pdf, other

    cs.IR

    A WT-ResNet based fault diagnosis model for the urban rail train transmission system

    Authors: Zuyu Cheng, Zhengcai Zhao, Yixiao Wang, Wentao Guo, Yufei Wang, Xiang Gao

    Abstract: This study presents a novel fault diagnosis model for urban rail transit systems based on Wavelet Transform Residual Neural Network (WT-ResNet). The model integrates the advantages of wavelet transform for feature extraction and ResNet for pattern recognition, offering enhanced diagnostic accuracy and robustness. Experimental results demonstrate the effectiveness of the proposed model in identifyi… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 12 pages,10 figures

  22. Self-supervised Adversarial Training of Monocular Depth Estimation against Physical-World Attacks

    Authors: Zhiyuan Cheng, Cheng Han, James Liang, Qifan Wang, Xiangyu Zhang, Dongfang Liu

    Abstract: Monocular Depth Estimation (MDE) plays a vital role in applications such as autonomous driving. However, various attacks target MDE models, with physical attacks posing significant threats to system security. Traditional adversarial training methods, which require ground-truth labels, are not directly applicable to MDE models that lack ground-truth depth. Some self-supervised model hardening techn… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted in TPAMI'24. Extended from our ICLR'23 publication (arXiv:2301.13487). arXiv admin note: substantial text overlap with arXiv:2301.13487

  23. arXiv:2406.05731  [pdf, ps, other

    physics.plasm-ph

    Nonlinear saturation of reversed shear Alfven eigenmode via high-frequency quasi-mode generation

    Authors: Zhiwen Cheng, Guangyu Wei, Lei Ye, Zhiyong Qiu

    Abstract: A nonlinear saturation mechanism for reversed shear Alfven eigenmode (RSAE) is proposed and analysed, and is shown to be of relevance to typical reactor parameter region. The saturation is achieved through the generation of high-frequency quasi-mode due to nonlinear coupling of two RSAEs, which is then damped due to coupling with the shear Alfven continuum, and leads to the nonlinear saturation of… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: submitted to Plasma Physics and Technology

  24. arXiv:2406.04538  [pdf, other

    physics.flu-dyn

    A unified framework for prediction of vortex-induced vibration based on the nonlinear identification of general wake oscillator modeling

    Authors: Zhi Cheng, Fue-Sang Lien, Earl H. Dowell

    Abstract: In this paper, we present novel identification strategies to develop a unified framework for vortex-induced vibration (VIV) prediction based on the general semi-empirical wake oscillator. Greybox nonlinear system identification method accompanying high-fidelity computational fluid dynamics (CFD) and/or experimental data could be applied for the identification process. The proposed template of gene… ▽ More

    Submitted 13 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: This version is not intended to be sent for peer review and is merely a summary of some preliminary work

  25. arXiv:2406.02607  [pdf, other

    physics.flu-dyn

    Flow-Induced Vibration of Flexible Hydrofoil Within Cavitating Turbulent Flow

    Authors: Zhi Cheng, Rajeev Jaiman

    Abstract: The flow-induced vibration and cavitation dynamics of three-dimensional flow past a cantilever flexible hydrofoil are investigated using a large eddy simulation (LES) model, a homogeneous mixture cavitation model and the structural modes superposition method. The present work aims to explore a potential mechanism responsible for a propeller singing behavior, and thus focuses on the synchronized hy… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Report number: OMAE 2024-125985

  26. arXiv:2406.01850  [pdf, other

    eess.SP

    Cell-free massive MIMO Channels in an Urban Environment -- Measurements and Channel Statistics

    Authors: Yuning Zhang, Thomas Choi, Zihang Cheng, Jorge Gomez-Ponce, Issei Kanno, Masaaki Ito, Andreas F. Molisch

    Abstract: Cell-free massive MIMO (CF-mMIMO), where each user equipment (UE) is connected to multiple access points (APs), is emerging as an important component for 5G and 6G cellular systems. Accurate channel models based on measurements are required to optimize their design and deployment. This paper presents an extensive measurement campaign for CF-mMIMO in an urban environment. A new "virtual AP" techniq… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: submitted to IEEE TWC

  27. arXiv:2406.01007  [pdf, other

    hep-ex

    Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay

    Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, J. Cheng, Y. -C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng , et al. (177 additional authors not shown)

    Abstract: This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  28. arXiv:2405.20617  [pdf, other

    eess.SP

    Large-scale Outdoor Cell-free mMIMO Channel Measurement in an Urban Scenario at 3.5 GHz

    Authors: Yuning Zhang, Thomas Choi, Zihang Cheng, Issei Kanno, Masaaki Ito, Jorge Gomez-Ponce, Hussein Hammoud, Bowei Wu, Ashwani Pradhan, Kelvin Arana, Pramod Krishna, Tianyi Yang, Tyler Chen, Ishita Vasishtha, Haoyu Xie, Linyu Sun, Andreas F. Molisch

    Abstract: The design of cell-free massive MIMO (CF-mMIMO) systems requires accurate, measurement-based channel models. This paper provides the first results from the by far most extensive outdoor measurement campaign for CF-mMIMO channels in an urban environment. We measured impulse responses between over 20,000 potential access point (AP) locations and 80 user equipments (UEs) at 3.5 GHz with 350 MHz bandw… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Submitted to: VTC 2024-Fall

  29. arXiv:2405.20325  [pdf, other

    cs.CV

    MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion

    Authors: Shuyuan Tu, Qi Dai, Zihao Zhang, Sicheng Xie, Zhi-Qi Cheng, Chong Luo, Xintong Han, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Despite impressive advancements in diffusion-based video editing models in altering video attributes, there has been limited exploration into modifying motion information while preserving the original protagonist's appearance and background. In this paper, we propose MotionFollower, a lightweight score-guided diffusion model for video motion editing. To introduce conditional controls to the denois… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 23 pages, 18 figures. Project page at https://francis-rings.github.io/MotionFollower/

    MSC Class: 68T45; 68T10

  30. arXiv:2405.18997  [pdf, other

    stat.ML cs.LG

    Kernel Semi-Implicit Variational Inference

    Authors: Ziheng Cheng, Longlin Yu, Tianyu Xie, Shiyue Zhang, Cheng Zhang

    Abstract: Semi-implicit variational inference (SIVI) extends traditional variational families with semi-implicit distributions defined in a hierarchical manner. Due to the intractable densities of semi-implicit distributions, classical SIVI often resorts to surrogates of evidence lower bound (ELBO) that would introduce biases for training. A recent advancement in SIVI, named SIVI-SM, utilizes an alternative… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: ICML 2024 camera ready

  31. arXiv:2405.18347  [pdf, other

    cs.LG

    Dataset Growth

    Authors: Ziheng Qin, Zhaopan Xu, Yukun Zhou, Zangwei Zheng, Zebang Cheng, Hao Tang, Lei Shang, Baigui Sun, Xiaojiang Peng, Radu Timofte, Hongxun Yao, Kai Wang, Yang You

    Abstract: Deep learning benefits from the growing abundance of available data. Meanwhile, efficiently dealing with the growing data scale has become a challenge. Data publicly available are from different sources with various qualities, and it is impractical to do manual cleaning against noise and redundancy given today's data scale. There are existing techniques for cleaning/selecting the collected data. H… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  32. arXiv:2405.17509  [pdf, other

    cs.LG

    Reference Neural Operators: Learning the Smooth Dependence of Solutions of PDEs on Geometric Deformations

    Authors: Ze Cheng, Zhongkai Hao, Xiaoqiang Wang, Jianing Huang, Youjia Wu, Xudan Liu, Yiru Zhao, Songming Liu, Hang Su

    Abstract: For partial differential equations on domains of arbitrary shapes, existing works of neural operators attempt to learn a mapping from geometries to solutions. It often requires a large dataset of geometry-solution pairs in order to obtain a sufficiently accurate neural operator. However, for many industrial applications, e.g., engineering design optimization, it can be prohibitive to satisfy the r… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  33. arXiv:2405.16577  [pdf, other

    stat.ML cs.LG

    Reflected Flow Matching

    Authors: Tianyu Xie, Yu Zhu, Longlin Yu, Tong Yang, Ziheng Cheng, Shiyue Zhang, Xiangyu Zhang, Cheng Zhang

    Abstract: Continuous normalizing flows (CNFs) learn an ordinary differential equation to transform prior samples into data. Flow matching (FM) has recently emerged as a simulation-free approach for training CNFs by regressing a velocity model towards the conditional velocity field. However, on constrained domains, the learned velocity model may lead to undesirable flows that result in highly unnatural sampl… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: ICML 2024 camera-ready

  34. arXiv:2405.15553  [pdf, other

    eess.SP

    Massive MIMO-ISAC System With 1-Bit ADCs/DACs

    Authors: Bowen Wang, Hongyu Li, Bin Liao, Ziyang Cheng

    Abstract: This paper investigates a hardware-efficient massive multiple-input multiple-output integrated sensing and communication (MIMO-ISAC) system with 1-bit analog-to-digital converters (ADCs)/digital-to-analog converters (DACs). The proposed system, referred to as 1BitISAC, employs 1-bit DACs at the ISAC transmitter and 1-bit ADCs at the sensing receiver, achieving significant reductions in power consu… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  35. arXiv:2405.15350  [pdf, ps, other

    math.GT

    Coloring invariants for links in $Σ_g\times S^1$

    Authors: Zhiyun Cheng, Hongzhu Gao

    Abstract: Let $Σ_g$ be a closed oriented surface of genus $g$, in this paper we discuss how to define coloring invariants and its generalizations for links in $Σ_g\times S^1$.

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 11 pages, 8 figures

    MSC Class: 57K10; 57K12

  36. arXiv:2405.15133  [pdf, other

    physics.flu-dyn

    Modeling of Hydroacoustic Noise from Marine Propellers with Tip Vortex Cavitation

    Authors: Zhi Cheng, Suraj Kashyap, Brendan Smoker, Giorgio Burella, Rajeev Jaiman

    Abstract: The present work aims to study the cavitating turbulent flow of a full-scale marine propeller and explore the physical mechanism underpinning the underwater radiated noise. We employ the standard dynamic large eddy simulation for the turbulent wake flow and the Schnerr-Sauer cavitation model, while the Ffowcs-Williams-Hawkings acoustic analogy is considered for the hydroacoustic modeling. For the… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Report number: OMAE2024-125991

  37. arXiv:2405.14297  [pdf, other

    cs.LG cs.AI

    Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

    Authors: Yongxin Guo, Zhenglin Cheng, Xiaoying Tang, Tao Lin

    Abstract: The Sparse Mixture of Experts (SMoE) has been widely employed to enhance the efficiency of training and inference for Transformer-based foundational models, yielding promising results. However, the performance of SMoE heavily depends on the choice of hyper-parameters, such as the number of experts and the number of experts to be activated (referred to as top-k), resulting in significant computatio… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 9 pages, 21 figures

  38. arXiv:2405.12072  [pdf, other

    cond-mat.mtrl-sci

    Real topological phonons in 3D carbon allotropes

    Authors: Xiaotian Wang, Jingbo Bai, Jianhua Wang, Zhenxiang Cheng, Shifeng Qian, Wenhong Wang, Gang Zhang, Zhi-Ming Yu, Yugui Yao

    Abstract: There has been a significant focus on real topological systems that enjoy space-time inversion symmetry (PT ) and lack spin-orbit coupling. While the theoretical classification of the real topology has been established, more progress has yet to be made in the materials realization of such real topological systems in three dimensions (3D). To address this crucial issue, by selecting the carbon-base… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  39. arXiv:2405.11667  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication

    Authors: Kumar Kshitij Patel, Margalit Glasgow, Ali Zindari, Lingxiao Wang, Sebastian U. Stich, Ziheng Cheng, Nirmit Joshi, Nathan Srebro

    Abstract: Local SGD is a popular optimization method in distributed learning, often outperforming other algorithms in practice, including mini-batch SGD. Despite this success, theoretically proving the dominance of local SGD in settings with reasonable data heterogeneity has been difficult, creating a significant gap between theory and practice. In this paper, we provide new lower bounds for local SGD under… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  40. arXiv:2405.10313  [pdf, other

    cs.AI cs.CL cs.CY cs.LG

    How Far Are We From AGI

    Authors: Tao Feng, Chuanyang Jin, Jingyu Liu, Kunlun Zhu, Haoqin Tu, Zirui Cheng, Guanyu Lin, Jiaxuan You

    Abstract: The evolution of artificial intelligence (AI) has profoundly impacted human society, driving significant advancements in multiple sectors. Yet, the escalating demands on AI have highlighted the limitations of AI's current offerings, catalyzing a movement towards Artificial General Intelligence (AGI). AGI, distinguished by its ability to execute diverse real-world tasks with efficiency and effectiv… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  41. arXiv:2405.08463  [pdf, other

    cs.CV

    A Timely Survey on Vision Transformer for Deepfake Detection

    Authors: Zhikan Wang, Zhongyao Cheng, Jiajie Xiong, Xun Xu, Tianrui Li, Bharadwaj Veeravalli, Xulei Yang

    Abstract: In recent years, the rapid advancement of deepfake technology has revolutionized content creation, lowering forgery costs while elevating quality. However, this progress brings forth pressing concerns such as infringements on individual rights, national security threats, and risks to public safety. To counter these challenges, various detection methodologies have emerged, with Vision Transformer (… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  42. arXiv:2405.07281  [pdf, ps, other

    eess.SP

    Movable Antennas Aided Multicast MISO Communication Systems

    Authors: Zhenqiao Cheng, Nanxi Li, Ruizhe Long, Jianchi Zhu, Chongjun Ouyang, Peng Chen

    Abstract: A novel multicast communication system with movable antennas (MAs) is proposed, where the antenna position optimization is exploited to enhance the transmission rate. Specifically, an MA-assisted two-user multicast multiple-input single-input system is considered. The joint optimization of the transmit beamforming vector and transmit MA positions is studied by modeling the motion of the MA element… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 5 pages

  43. arXiv:2405.03064  [pdf, other

    cs.LG cs.AI cs.CR

    RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation

    Authors: Zelei Cheng, Xian Wu, Jiahao Yu, Sabrina Yang, Gang Wang, Xinyu Xing

    Abstract: Deep reinforcement learning (DRL) is playing an increasingly important role in real-world applications. However, obtaining an optimally performing DRL agent for complex tasks, especially with sparse rewards, remains a significant challenge. The training of a DRL agent can be often trapped in a bottleneck without further progress. In this paper, we propose RICE, an innovative refining scheme for re… ▽ More

    Submitted 5 June, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  44. arXiv:2405.00587  [pdf, other

    cs.CV

    GraCo: Granularity-Controllable Interactive Segmentation

    Authors: Yian Zhao, Kehan Li, Zesen Cheng, Pengchong Qiao, Xiawu Zheng, Rongrong Ji, Chang Liu, Li Yuan, Jie Chen

    Abstract: Interactive Segmentation (IS) segments specific objects or parts in the image according to user input. Current IS pipelines fall into two categories: single-granularity output and multi-granularity output. The latter aims to alleviate the spatial ambiguity present in the former. However, the multi-granularity output pipeline suffers from limited interaction flexibility and produces redundant resul… ▽ More

    Submitted 16 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: CVPR2024 Highlight, Project: https://zhao-yian.github.io/GraCo

  45. arXiv:2404.18398  [pdf, other

    cs.CL cs.MM

    MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis

    Authors: Xiang Li, Zhi-Qi Cheng, Jun-Yan He, Xiaojiang Peng, Alexander G. Hauptmann

    Abstract: Emotional Text-to-Speech (E-TTS) synthesis has gained significant attention in recent years due to its potential to enhance human-computer interaction. However, current E-TTS approaches often struggle to capture the complexity of human emotions, primarily relying on oversimplified emotional labels or single-modality inputs. To address these limitations, we propose the Multimodal Emotional Text-to-… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  46. arXiv:2404.18243  [pdf, other

    cs.CL

    LEGENT: Open Platform for Embodied Agents

    Authors: Zhili Cheng, Zhitong Wang, Jinyi Hu, Shengding Hu, An Liu, Yuge Tu, Pengkai Li, Lei Shi, Zhiyuan Liu, Maosong Sun

    Abstract: Despite advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), their integration into language-grounded, human-like embodied agents remains incomplete, hindering complex real-life task performance in physical environments. Existing integrations often feature limited open sourcing, challenging collective progress in this field. We introduce LEGENT, an open, scalable platfo… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: Demo Paper

  47. arXiv:2404.18166  [pdf, other

    cs.IR

    Behavior-Contextualized Item Preference Modeling for Multi-Behavior Recommendation

    Authors: Mingshi Yan, Fan Liu, Jing Sun, Fuming Sun, Zhiyong Cheng, Yahong Han

    Abstract: In recommender systems, multi-behavior methods have demonstrated their effectiveness in mitigating issues like data sparsity, a common challenge in traditional single-behavior recommendation approaches. These methods typically infer user preferences from various auxiliary behaviors and apply them to the target behavior for recommendations. However, this direct transfer can introduce noise to the t… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by SIGIR 2024

  48. arXiv:2404.17936  [pdf, other

    cs.CV

    FDCE-Net: Underwater Image Enhancement with Embedding Frequency and Dual Color Encoder

    Authors: Zheng Cheng, Guodong Fan, Jingchun Zhou, Min Gan, C. L. Philip Chen

    Abstract: Underwater images often suffer from various issues such as low brightness, color shift, blurred details, and noise due to light absorption and scattering caused by water and suspended particles. Previous underwater image enhancement (UIE) methods have primarily focused on spatial domain enhancement, neglecting the frequency domain information inherent in the images. However, the degradation factor… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 16pages,13 figures

  49. arXiv:2404.17297  [pdf, ps, other

    cs.PL

    Denotation-based Compositional Compiler Verification

    Authors: Zhang Cheng, Jiyang Wu, Di Wang, Qinxiang Cao

    Abstract: A desired but challenging property of compiler verification is compositionality in the sense that the compilation correctness of a program can be deduced from that of its substructures ranging from statements, functions, and modules incrementally. Previously proposed approaches have devoted extensive effort to module-level compositionality based on small-step semantics and simulation theories. Thi… ▽ More

    Submitted 15 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: 38 pages, 8 figures

  50. arXiv:2404.16580  [pdf, other

    math.OC

    A New Two-Sided Sketching Algorithm for Large-Scale Tensor Decomposition Based on Discrete Cosine Transformation

    Authors: Zhiguang Cheng, Gaohang Yu, Xiaohao Cai, Liqun Qi

    Abstract: Large tensors are frequently encountered in various fields such as computer vision, scientific simulations, sensor networks, and data mining. However, these tensors are often too large for convenient processing, transfer, or storage. Fortunately, they typically exhibit a low-rank structure that can be leveraged through tensor decomposition. Despite this, performing large-scale tensor decomposition… ▽ More

    Submitted 28 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.