Skip to main content

Showing 1–50 of 889 results for author: Zhong, Y

  1. arXiv:2407.08496  [pdf, ps, other

    math.DG math.GT

    Convergences of Combinatorial Ricci Flows to Degenerated Circle Packings in Hyperbolic Background Geometry

    Authors: Guangming Hu, Sicheng Lu, Dong Tan, Youliang Zhong, Puchun Zhou

    Abstract: This paper investigates a kind of degenerated circle packings in hyperbolic background geometry. A main problem is whether a prescribed total geodesic curvature data can be realized by a degenerated circle packing or not. We fully characterize the sufficient and necessary conditions and show the uniqueness. Furthermore, we introduce the combinatoral Ricci flow to find the desired degenerated circl… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 36 pages, 9 figures

    MSC Class: 52C26; 57M50

  2. arXiv:2407.08126  [pdf, other

    cs.AI cs.CV cs.MM

    Label-anticipated Event Disentanglement for Audio-Visual Video Parsing

    Authors: Jinxing Zhou, Dan Guo, Yuxin Mao, Yiran Zhong, Xiaojun Chang, Meng Wang

    Abstract: Audio-Visual Video Parsing (AVVP) task aims to detect and temporally locate events within audio and visual modalities. Multiple events can overlap in the timeline, making identification challenging. While traditional methods usually focus on improving the early audio-visual encoders to embed more effective features, the decoding phase -- crucial for final event classification, often receives less… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  3. arXiv:2407.07406  [pdf, other

    cs.CV cs.AI

    Weakly-supervised Medical Image Segmentation with Gaze Annotations

    Authors: Yuan Zhong, Chenhui Tang, Yumeng Yang, Ruoxi Qi, Kang Zhou, Yuqi Gong, Pheng Ann Heng, Janet H. Hsiao, Qi Dou

    Abstract: Eye gaze that reveals human observational patterns has increasingly been incorporated into solutions for vision tasks. Despite recent explorations on leveraging gaze to aid deep networks, few studies exploit gaze as an efficient annotation approach for medical image segmentation which typically entails heavy annotating costs. In this paper, we propose to collect dense weak supervision for medical… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

  4. arXiv:2407.07140  [pdf, other

    cs.LG stat.ML

    Cardinality-Aware Set Prediction and Top-$k$ Classification

    Authors: Corinna Cortes, Anqi Mao, Christopher Mohri, Mehryar Mohri, Yutao Zhong

    Abstract: We present a detailed study of cardinality-aware top-$k$ classification, a novel approach that aims to learn an accurate top-$k$ set predictor while maintaining a low cardinality. We introduce a new target loss function tailored to this setting that accounts for both the classification error and the cardinality of the set predicted. To optimize this loss function, we propose two families of surrog… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2403.19625

  5. arXiv:2407.06794  [pdf, other

    cs.CV

    ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

    Authors: Yunshan Zhong, Jiawei Hu, You Huang, Yuxin Zhang, Rongrong Ji

    Abstract: Post-training quantization (PTQ) for vision transformers (ViTs) has garnered significant attention due to its efficiency in compressing models. However, existing methods typically overlook the intricate interdependence between quantized weight and activation, leading to considerable quantization error. In this paper, we propose ERQ, a two-step PTQ approach meticulously crafted to sequentially redu… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ICML2024 (Spotlight)

  6. arXiv:2407.05220  [pdf, other

    cond-mat.str-el cond-mat.mes-hall cond-mat.mtrl-sci cond-mat.quant-gas

    Altermagnetism in Heavy Fermion Systems

    Authors: Miaomiao Zhao, Wei-Wei Yang, Xueming Guo, Hong-Gang Luo, Yin Zhong

    Abstract: Novel collinear magnet, the altermagnet (AM) with spin-splitting energy band and zero net magnetization have attracted great interest due to its potential spintronic applications. Here, we demonstrate AM-like phases in a microscopic Kondo lattice model, widely used for heavy fermion compounds. With the framework of fermionic parton mean-field theory, we find the $d$-wave AM state can coexist with… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 14 pages, 12 figures

  7. arXiv:2407.04941  [pdf, other

    astro-ph.HE

    Spindown of Pulsars Interacting with Companion Winds: Impact of Magnetospheric Compression

    Authors: Yici Zhong, Anatoly Spitkovsky, Jens F. Mahlmann, Hayk Hakobyan

    Abstract: The presence of a companion wind in neutron star binary systems can form a contact discontinuity well within the pulsar's light cylinder, effectively creating a waveguide that confines the pulsar's electromagnetic fields and significantly alters its spindown. We parametrize this confinement as the ratio between the equatorial position of the contact discontinuity (or standoff distance)… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 13 pages, 7 figures, comments are welcomed

  8. arXiv:2407.03992  [pdf, other

    eess.IV

    Performance of Medical Image Fusion in High-level Analysis Tasks: A Mutual Enhancement Framework for Unaligned PAT and MRI Image Fusion

    Authors: Yutian Zhong, Jinchuan He, Zhichao Liang, Shuangyang Zhang, Qianjin Feng, Wufan Chen, Li Qi

    Abstract: Photoacoustic tomography (PAT) offers optical contrast, whereas magnetic resonance imaging (MRI) excels in imaging soft tissue and organ anatomy. The fusion of PAT with MRI holds promising application prospects due to their complementary advantages. Existing image fusion have made considerable progress in pre-registered images, yet spatial deformations are difficult to avoid in medical imaging sce… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  9. arXiv:2407.02842  [pdf, other

    cs.CV cs.AI cs.CL

    MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis

    Authors: Lei Chen, Feng Yan, Yujie Zhong, Shaoxiang Chen, Zequn Jie, Lin Ma

    Abstract: Multimodal Large Language Models (MLLM) have made significant progress in the field of document analysis. Despite this, existing benchmarks typically focus only on extracting text and simple layout information, neglecting the complex interactions between elements in structured documents such as mind maps and flowcharts. To address this issue, we introduce the new benchmark named MindBench, which n… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: technical report

  10. arXiv:2407.02785  [pdf

    cond-mat.mtrl-sci physics.comp-ph

    Identifying Direct Bandgap Silicon Structures with High-throughput Search and Machine Learning Methods

    Authors: Rui Wang, Hongyu Yu, Yang Zhong, Hongjun Xiang

    Abstract: Utilizations of silicon-based luminescent devices are restricted by the indirect-gap nature of diamond silicon. In this study, the high-throughput method is employed to expedite discoveries of direct-gap silicon crystals. The machine learning (ML) potential is utilized to construct a dataset comprising 2637 silicon allotropes, which is subsequently screened using an ML Hamiltonian model and densit… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  11. arXiv:2407.00983  [pdf, other

    cs.CV

    FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models

    Authors: Ruinan Jin, Zikang Xu, Yuan Zhong, Qiongsong Yao, Qi Dou, S. Kevin Zhou, Xiaoxiao Li

    Abstract: The advent of foundation models (FMs) in healthcare offers unprecedented opportunities to enhance medical diagnostics through automated classification and segmentation tasks. However, these models also raise significant concerns about their fairness, especially when applied to diverse and underrepresented populations in healthcare applications. Currently, there is a lack of comprehensive benchmark… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: 29 pages, 17 figures

  12. arXiv:2407.00765  [pdf, other

    cs.LG cs.NE math.NA stat.ML

    Structured and Balanced Multi-component and Multi-layer Neural Networks

    Authors: Shijun Zhang, Hongkai Zhao, Yimin Zhong, Haomin Zhou

    Abstract: In this work, we propose a balanced multi-component and multi-layer neural network (MMNN) structure to approximate functions with complex features with both accuracy and efficiency in terms of degrees of freedom and computation cost. The main idea is motivated by a multi-component, each of which can be approximated effectively by a single-layer network, and multi-layer decomposition in a "divide-a… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Our codes and implementation details are available at https://github.com/ShijunZhangMath/MMNN

  13. arXiv:2406.18173  [pdf, other

    cs.CL

    UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs

    Authors: Wenhao Li, Mingbao Lin, Yunshan Zhong, Shuicheng Yan, Rongrong Ji

    Abstract: Managing long texts is challenging for large language models (LLMs) due to limited context window sizes. This study introduces UIO-LLMs, an unbiased incremental optimization approach for memory-enhanced transformers under long-context settings. We initially conceptualize the process as a streamlined encoder-decoder framework where the weights-shared encoder and decoder respectively encapsulate a c… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  14. arXiv:2406.18018  [pdf, other

    eess.IV

    A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset

    Authors: Muwei Jian, Haoran Zhang, Mingju Shao, Hongyu Chen, Huihui Huang, Yanjie Zhong, Changlei Zhang, Bin Wang, Penghui Gao

    Abstract: Recently, intelligent analysis of lung nodules with the assistant of computer aided detection (CAD) techniques can improve the accuracy rate of lung cancer diagnosis. However, existing CAD systems and pulmonary datasets mainly focus on Computed Tomography (CT) images from one single period, while ignoring the cross spatio-temporal features associated with the progression of nodules contained in im… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  15. arXiv:2406.17998  [pdf, other

    cs.CV

    Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model

    Authors: Zhuo Zheng, Stefano Ermon, Dongjun Kim, Liangpei Zhang, Yanfei Zhong

    Abstract: Our understanding of the temporal dynamics of the Earth's surface has been advanced by deep vision models, which often require lots of labeled multi-temporal images for training. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present change data generators based on gene… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: The enhanced extension of our ICCV 2023 (Changen)

  16. arXiv:2406.17894  [pdf, other

    cs.LG

    Efficient and Effective Implicit Dynamic Graph Neural Network

    Authors: Yongjian Zhong, Hieu Vu, Tianbao Yang, Bijaya Adhikari

    Abstract: Implicit graph neural networks have gained popularity in recent years as they capture long-range dependencies while improving predictive performance in static graphs. Despite the tussle between performance degradation due to the oversmoothing of learned embeddings and long-range dependency being more pronounced in dynamic graphs, as features are aggregated both across neighborhood and time, no pri… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  17. arXiv:2406.16690  [pdf, other

    cs.CL

    Scaling Laws for Linear Complexity Language Models

    Authors: Xuyang Shen, Dong Li, Ruitao Leng, Zhen Qin, Weigao Sun, Yiran Zhong

    Abstract: The interest in linear complexity models for large language models is on the rise, although their scaling capacity remains uncertain. In this study, we present the scaling laws for linear complexity language models to establish a foundation for their scalability. Specifically, we examine the scaling behaviors of three efficient linear architectures. These include TNL, a linear attention model with… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Technical report. Yiran Zhong is the corresponding author

  18. Single-Temporal Supervised Learning for Universal Remote Sensing Change Detection

    Authors: Zhuo Zheng, Yanfei Zhong, Ailong Ma, Liangpei Zhang

    Abstract: Bitemporal supervised learning paradigm always dominates remote sensing change detection using numerous labeled bitemporal image pairs, especially for high spatial resolution (HSR) remote sensing imagery. However, it is very expensive and labor-intensive to label change regions in large-scale bitemporal HSR remote sensing image pairs. In this paper, we propose single-temporal supervised learning (… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: IJCV 2024. arXiv admin note: text overlap with arXiv:2108.07002

  19. arXiv:2406.15306  [pdf

    cs.LG cs.CL cs.CV

    Advanced Multimodal Deep Learning Architecture for Image-Text Matching

    Authors: Jinyin Wang, Haijing Zhang, Yihao Zhong, Yingbin Liang, Rongwei Ji, Yiru Cang

    Abstract: Image-text matching is a key multimodal task that aims to model the semantic association between images and text as a matching relationship. With the advent of the multimedia information age, image, and text data show explosive growth, and how to accurately realize the efficient and accurate semantic correspondence between them has become the core issue of common concern in academia and industry.… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.17460 by other authors

  20. arXiv:2406.15000  [pdf, other

    cs.CL cs.AI

    Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations

    Authors: Lichao Zhang, Jia Yu, Shuai Zhang, Long Li, Yangyang Zhong, Guanbao Liang, Yuming Yan, Qing Ma, Fangsheng Weng, Fayu Pan, Jing Li, Renjun Xu, Zhenzhong Lan

    Abstract: Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We cond… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  21. arXiv:2406.14069  [pdf, other

    eess.IV cs.CV

    Towards Multi-modality Fusion and Prototype-based Feature Refinement for Clinically Significant Prostate Cancer Classification in Transrectal Ultrasound

    Authors: Hong Wu, Juan Fu, Hongsheng Ye, Yuming Zhong, Xuebin Zou, Jianhua Zhou, Yi Wang

    Abstract: Prostate cancer is a highly prevalent cancer and ranks as the second leading cause of cancer-related deaths in men globally. Recently, the utilization of multi-modality transrectal ultrasound (TRUS) has gained significant traction as a valuable technique for guiding prostate biopsies. In this study, we propose a novel learning framework for clinically significant prostate cancer (csPCa) classifica… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  22. arXiv:2406.13942  [pdf, other

    cs.LG

    Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models

    Authors: Yuan Zhong, Xiaochen Wang, Jiaqi Wang, Xiaokun Zhang, Yaqing Wang, Mengdi Huai, Cao Xiao, Fenglong Ma

    Abstract: Synthesizing electronic health records (EHR) data has become a preferred strategy to address data scarcity, improve data quality, and model fairness in healthcare. However, existing approaches for EHR data generation predominantly rely on state-of-the-art generative techniques like generative adversarial networks, variational autoencoders, and language models. These methods typically replicate inp… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  23. arXiv:2406.13524  [pdf, ps, other

    math.CV math.DS

    Koebe uniformization for infinitely connected attracting Fatou domains

    Authors: Xiaoguang Wang, Yi Zhong

    Abstract: This paper works on the structure of infinitely connected Fatou damains of rational maps in terms of Koebe uniformization. Due to the complicated boundary behavior, the existing uniformization results are failed to apply in general. We proved that if the rational map is geometrically finite, then its infinitely connected attracting Fatou damain is conformally homeomorphic to a circle domain.

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages

    MSC Class: 30C20(Primary); 30C35(Secondary)

  24. arXiv:2406.13078  [pdf

    physics.med-ph

    A universal bioluminescence tomography system for pre-clinical image-guided radiotherapy research

    Authors: Zhishen Tong, Zijian Deng, Xiangkun Xu, Ciara Newman, Xun Jia, Yuncheng Zhong, Merle Reinhart, Paul Tsouchlos, Tim Devling, Hamid Dehghani, Iulian Iordachita, Debabrata Saha, John W. Wong, Ken Kang-Hsin Wang

    Abstract: CBCT-guided small animal irradiators encounter challenges in localizing soft-tissue targets due to low imaging contrast. Bioluminescence tomography (BLT) offers a promising solution, but they have largely remained in laboratorial development, limiting accessibility for researchers. In this work, we develop a universal, commercial-graded BLT-guided system (MuriGlo) designed to seamlessly integrate… ▽ More

    Submitted 27 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  25. arXiv:2406.11194  [pdf, other

    cs.CL

    In-Context Editing: Learning Knowledge from Self-Induced Distributions

    Authors: Siyuan Qi, Bangcheng Yang, Kailin Jiang, Xiaobo Wang, Jiaqi Li, Yifan Zhong, Yaodong Yang, Zilong Zheng

    Abstract: The existing fine-tuning paradigm for language models is brittle in knowledge editing scenarios, where the model must incorporate new information without extensive retraining. This brittleness often results in overfitting, reduced performance, and unnatural language generation. To address this, we propose Consistent In-Context Editing (ICE), a novel approach that leverages the model's in-context l… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  26. arXiv:2406.10514  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

    Authors: Zehua Kcriss Li, Meiying Melissa Chen, Yi Zhong, Pinxin Liu, Zhiyao Duan

    Abstract: Expressive speech synthesis aims to generate speech that captures a wide range of para-linguistic features, including emotion and articulation, though current research primarily emphasizes emotional aspects over the nuanced articulatory features mastered by professional voice actors. Inspired by this, we explore expressive speech synthesis through the lens of articulatory phonetics. Specifically,… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  27. arXiv:2406.10457  [pdf, other

    quant-ph

    Noise-induced quantum synchronization and maximally entangled mixed states in superconducting circuits

    Authors: Ziyu Tao, Finn Schmolke, Chang-Kang Hu, Wenhui Huang, Yuxuan Zhou, Jiawei Zhang, Ji Chu, Libo Zhang, Xuandong Sun, Zecheng Guo, Jingjing Niu, Wenle Weng, Song Liu, Youpeng Zhong, Dian Tan, Dapeng Yu, Eric Lutz

    Abstract: Random fluctuations can lead to cooperative effects in complex systems. We here report the experimental observation of noise-induced quantum synchronization in a chain of superconducting transmon qubits with nearest-neighbor interactions. The application of Gaussian white noise to a single site leads to synchronous oscillations in the entire chain. We show that the two synchronized end qubits are… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  28. arXiv:2406.07490  [pdf, other

    astro-ph.SR physics.plasm-ph

    Transition from decaying to decayless kink oscillations of solar coronal loops

    Authors: Valery M. Nakariakov, Yu Zhong, Dmitrii Y. Kolotkov

    Abstract: The transition of an impulsively excited kink oscillation of a solar coronal loop to an oscillation with a stationary amplitude, i.e., the damping pattern, is determined using the low-dimensional self-oscillation model. In the model, the decayless kink oscillations are sustained by the interaction of the oscillating loop with an external quasi-steady flow. The analytical solution is based on the a… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted for publication in MNRAS, 8 pages, 5 figures

  29. arXiv:2406.06858  [pdf, other

    cs.LG cs.DC

    FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

    Authors: Li-Wen Chang, Wenlei Bao, Qi Hou, Chengquan Jiang, Ningxin Zheng, Yinmin Zhong, Xuanrun Zhang, Zuquan Song, Ziheng Jiang, Haibin Lin, Xin Jin, Xin Liu

    Abstract: Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models typically require training and inference to be distributed. Tensor parallelism is a common technique partitioning computation of an operation or layer across devices to overcome the memory capacity limitation of a single processor, and/or to accelerate computation… ▽ More

    Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  30. arXiv:2406.03075  [pdf, other

    cs.CL

    Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

    Authors: Xiaoxi Sun, Jinpeng Li, Yan Zhong, Dongyan Zhao, Rui Yan

    Abstract: The advent of large language models (LLMs) has facilitated the development of natural language text generation. It also poses unprecedented challenges, with content hallucination emerging as a significant concern. Existing solutions often involve expensive and complex interventions during the training process. Moreover, some approaches emphasize problem disassembly while neglecting the crucial val… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 18 pages, 3 figures

  31. arXiv:2406.00919  [pdf, other

    cs.CV cs.MM

    Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling

    Authors: Jinxing Zhou, Dan Guo, Yiran Zhong, Meng Wang

    Abstract: The Audio-Visual Video Parsing task aims to identify and temporally localize the events that occur in either or both the audio and visual streams of audible videos. It often performs in a weakly-supervised manner, where only video event labels are provided, \ie, the modalities and the timestamps of the labels are unknown. Due to the lack of densely annotated labels, recent work attempts to leverag… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: IJCV 2024 Accepted. arXiv admin note: substantial text overlap with arXiv:2303.02344

  32. arXiv:2406.00491  [pdf, other

    cs.NI

    Optimizing Age of Information in Random Access Networks: A Second-Order Approach for Active/Passive Users

    Authors: Siqi Fan, Yuxin Zhong, I-Hong Hou, Clement K Kam

    Abstract: In this paper, we study the moments of the Age of Information (AoI) for both active and passive users in a random access network. In this network, active users broadcast sensing data, while passive users detect in-band radio activities from out-of-network devices, such as jammers. Collisions occur when multiple active users transmit simultaneously. Passive users can detect radio activities only wh… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transaction on Communications. arXiv admin note: text overlap with arXiv:2305.05137

  33. arXiv:2405.21022  [pdf, other

    cs.CL cs.CV

    You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet

    Authors: Zhen Qin, Yuxin Mao, Xuyang Shen, Dong Li, Jing Zhang, Yuchao Dai, Yiran Zhong

    Abstract: Linear attention mechanisms have gained prominence in causal language models due to their linear computational complexity and enhanced speed. However, the inherent decay mechanism in linear attention presents challenges when applied to multi-dimensional sequence modeling tasks, such as image processing and multi-modal learning. In these scenarios, the utilization of sequential scanning to establis… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Technical report. Yiran Zhong is the corresponding author. The code is available at https://github.com/OpenNLPLab/LightNet

  34. arXiv:2405.17383  [pdf, other

    cs.CL

    Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective

    Authors: Zhen Qin, Xuyang Shen, Dong Li, Weigao Sun, Stan Birchfield, Richard Hartley, Yiran Zhong

    Abstract: We present the Linear Complexity Sequence Model (LCSM), a comprehensive solution that unites various sequence modeling techniques with linear complexity, including linear attention, state space model, long convolution, and linear RNN, within a single framework. The goal is to enhance comprehension of these models by analyzing the impact of each component from a cohesive and streamlined viewpoint.… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Technical report. Yiran Zhong is the corresponding author

  35. arXiv:2405.17381  [pdf, other

    cs.CL

    Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

    Authors: Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong

    Abstract: We present Lightning Attention, the first linear attention implementation that maintains a constant training speed for various sequence lengths under fixed memory consumption. Due to the issue with cumulative summation operations (cumsum), previous linear attention implementations cannot achieve their theoretical advantage in a casual setting. However, this issue can be effectively solved by utili… ▽ More

    Submitted 20 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024. Yiran Zhong is the corresponding author. Code is released at github.com/OpenNLPLab/TransnormerLLM

  36. arXiv:2405.15542  [pdf, other

    cs.NI cs.DC cs.LG eess.SP

    SATSense: Multi-Satellite Collaborative Framework for Spectrum Sensing

    Authors: Haoxuan Yuan, Zhe Chen, Zheng Lin, Jinbo Peng, Zihan Fang, Yuhang Zhong, Zihang Song, Yue Gao

    Abstract: Low Earth Orbit satellite Internet has recently been deployed, providing worldwide service with non-terrestrial networks. With the large-scale deployment of both non-terrestrial and terrestrial networks, limited spectrum resources will not be allocated enough. Consequently, dynamic spectrum sharing is crucial for their coexistence in the same spectrum, where accurate spectrum sensing is essential.… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 13 pages, 16 figures

  37. arXiv:2405.14582  [pdf, other

    cs.CV cs.AI

    PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control

    Authors: Yong Zhong, Min Zhao, Zebin You, Xiaofeng Yu, Changwang Zhang, Chongxuan Li

    Abstract: In this paper, we introduce PoseCrafter, a one-shot method for personalized video generation following the control of flexible poses. Built upon Stable Diffusion and ControlNet, we carefully design an inference process to produce high-quality videos without the corresponding ground-truth frames. First, we select an appropriate reference frame from the training video and invert it to initialize all… ▽ More

    Submitted 24 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  38. arXiv:2405.13967  [pdf, other

    cs.CL

    DeTox: Toxic Subspace Projection for Model Editing

    Authors: Rheeya Uppaal, Apratim Dey, Yiting He, Yiqiao Zhong, Junjie Hu

    Abstract: Recent alignment algorithms such as direct preference optimization (DPO) have been developed to improve the safety of large language models (LLMs) by training these models to match human behaviors exemplified by preference data. However, these methods are both computationally intensive and lacking in controllability and transparency, making them prone to jailbreaking and inhibiting their widesprea… ▽ More

    Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Preprint

  39. arXiv:2405.11349  [pdf, other

    cs.LG

    Unlock the Power of Algorithm Features: A Generalization Analysis for Algorithm Selection

    Authors: Xingyu Wu, Yan Zhong, Jibin Wu, Yuxiao Huang, Sheng-hao Wu, Kay Chen Tan

    Abstract: In the algorithm selection research, the discussion surrounding algorithm features has been significantly overshadowed by the emphasis on problem features. Although a few empirical studies have yielded evidence regarding the effectiveness of algorithm features, the potential benefits of incorporating algorithm features into algorithm selection models and their suitability for different scenarios r… ▽ More

    Submitted 3 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  40. arXiv:2405.05968  [pdf, other

    cs.LG stat.ML

    A Universal Growth Rate for Learning with Smooth Surrogate Losses

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: This paper presents a comprehensive analysis of the growth rate of $H$-consistency bounds (and excess error bounds) for various surrogate losses used in classification. We prove a square-root growth rate near zero for smooth margin-based surrogate losses in binary classification, providing both upper and lower bounds under mild assumptions. This result also translates to excess error bounds. Our l… ▽ More

    Submitted 8 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  41. arXiv:2405.03091  [pdf

    cs.CV cs.LG

    Research on Image Recognition Technology Based on Multimodal Deep Learning

    Authors: Jinyin Wang, Xingchen Li, Yixuan Jin, Yihao Zhong, Keke Zhang, Chang Zhou

    Abstract: This project investigates the human multi-modal behavior identification algorithm utilizing deep neural networks. According to the characteristics of different modal information, different deep neural networks are used to adapt to different modal video information. Through the integration of various deep neural networks, the algorithm successfully identifies behaviors across multiple modalities. I… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  42. arXiv:2405.03025  [pdf, other

    cs.CV

    Matten: Video Generation with Mamba-Attention

    Authors: Yu Gao, Jiancheng Huang, Xiaopeng Sun, Zequn Jie, Yujie Zhong, Lin Ma

    Abstract: In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation. With minimal computational cost, Matten employs spatial-temporal attention for local video content modeling and bidirectional Mamba for global video content modeling. Our comprehensive experimental evaluation demonstrates that Matten has competitive performance with the… ▽ More

    Submitted 10 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  43. arXiv:2405.02660  [pdf, other

    cs.IT eess.SP

    AFDM Channel Estimation in Multi-Scale Multi-Lag Channels

    Authors: Rongyou Cao, Yuheng Zhong, Jiangbin Lyu, Deqing Wang, Liqun Fu

    Abstract: Affine Frequency Division Multiplexing (AFDM) is a brand new chirp-based multi-carrier (MC) waveform for high mobility communications, with promising advantages over Orthogonal Frequency Division Multiplexing (OFDM) and other MC waveforms. Existing AFDM research focuses on wireless communication at high carrier frequency (CF), which typically considers only Doppler frequency shift (DFS) as a resul… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 6 pages, 6 figures. Investigate AFDM under underwater multi-scale multi-lag channels. Derive the new input-output formula with the impact of Doppler time scaling. Propose two new channel estimation methods to tackle different level of Doppler factors. Perform diversity analyis based on CFR overlap probability (COP) and mutual incoherent property (MIP)

  44. arXiv:2405.01312  [pdf, other

    cs.DB cs.CR

    Privacy-Enhanced Database Synthesis for Benchmark Publishing

    Authors: Yongrui Zhong, Yunqing Ge, Jianbin Qin, Shuyuan Zheng, Bo Tang, Yu-Xuan Qiu, Rui Mao, Ye Yuan, Makoto Onizuka, Chuan Xiao

    Abstract: Benchmarking is crucial for evaluating a DBMS, yet existing benchmarks often fail to reflect the varied nature of user workloads. As a result, there is increasing momentum toward creating databases that incorporate real-world user data to more accurately mirror business environments. However, privacy concerns deter users from directly sharing their data, underscoring the importance of creating syn… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  45. arXiv:2404.18472  [pdf

    cond-mat.supr-con cond-mat.mtrl-sci

    Direct observation of anisotropic Cooper pairing in kagome superconductor CsV3Sb5

    Authors: Akifumi Mine, Yigui Zhong, Jinjin Liu, Takeshi Suzuki, Sahand Najafzadeh, Takumi Uchiyama, Jia-Xin Yin, Xianxin Wu, Xun Shi, Zhiwei Wang, Yugui Yao, Kozo Okazaki

    Abstract: In the recently discovered kagome superconductor AV3Sb5 (A = K, Rb, and Cs), the superconductivity is intertwined with an unconventional charge density wave order. Its pairing symmetry remains elusive owing to the lack of direct measurement of the superconducting gap in the momentum space. In this letter, utilizing laser-based ultra-high-resolution and low-temperature angle-resolved photoemission… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  46. arXiv:2404.18255  [pdf, other

    cs.CL cs.AI

    PatentGPT: A Large Language Model for Intellectual Property

    Authors: Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, Jing Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jianping Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang , et al. (2 additional authors not shown)

    Abstract: In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, pro… ▽ More

    Submitted 4 June, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 19 pages, 9 figures

    ACM Class: I.2.7

  47. arXiv:2404.17013  [pdf, ps, other

    cs.CC math.CO

    Two-Source and Affine Non-Malleable Extractors for Small Entropy

    Authors: Xin Li, Yan Zhong

    Abstract: Non-malleable extractors are generalizations and strengthening of standard randomness extractors, that are resilient to adversarial tampering. Such extractors have wide applications in cryptography and explicit construction of extractors. In the well-studied models of two-source and affine non-malleable extractors, the previous best constructions only work for entropy rate $>2/3$ and $1-γ$ respect… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in ICALP 24. Abstract shortened due to arXiv requirement

  48. arXiv:2404.14681  [pdf

    physics.app-ph

    Physical Vapor Deposition of High Mobility P-type Tellurium and its Applications for Gate-tunable van der Waals PN Photodiodes

    Authors: Tianyi Huang, Sen Lin, Jingyi Zou, Zexiao Wang, Yibai Zhong, Jingwei Li, Ruixuan Wang, Han Wang, Qing Li, Min Xu, Sheng Shen, Xu Zhang

    Abstract: Recently tellurium (Te) has attracted resurgent interests due to its p-type characteristics and outstanding ambient environmental stability. Here we present a substrate engineering based physical vapor deposition method to synthesize high-quality Te nanoflakes and achieved a field-effect hole mobility of 1500 cm2/Vs, which is, to the best of our knowledge, the highest among the existing synthesize… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  49. arXiv:2404.14381  [pdf, other

    cs.CV cs.MM

    TAVGBench: Benchmarking Text to Audible-Video Generation

    Authors: Yuxin Mao, Xuyang Shen, Jing Zhang, Zhen Qin, Jinxing Zhou, Mochu Xiang, Yiran Zhong, Yuchao Dai

    Abstract: The Text to Audible-Video Generation (TAVG) task involves generating videos with accompanying audio based on text descriptions. Achieving this requires skillful alignment of both audio and video elements. To support research in this field, we have developed a comprehensive Text to Audible-Video Generation Benchmark (TAVGBench), which contains over 1.7 million clips with a total duration of 11.8 th… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Technical Report. Project page:https://github.com/OpenNLPLab/TAVGBench

  50. arXiv:2404.13929  [pdf, other

    eess.IV cs.CV

    Exploring Kinetic Curves Features for the Classification of Benign and Malignant Breast Lesions in DCE-MRI

    Authors: Zixian Li, Yuming Zhong, Yi Wang

    Abstract: Breast cancer is the most common malignant tumor among women and the second cause of cancer-related death. Early diagnosis in clinical practice is crucial for timely treatment and prognosis. Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) has revealed great usability in the preoperative diagnosis and assessing therapy effects thanks to its capability to reflect the morphology and dy… ▽ More

    Submitted 10 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 6 pages, 8 figures, conference