Skip to main content

Showing 1–50 of 482 results for author: Chang, X

  1. arXiv:2407.08126  [pdf, other

    cs.AI cs.CV cs.MM

    Label-anticipated Event Disentanglement for Audio-Visual Video Parsing

    Authors: Jinxing Zhou, Dan Guo, Yuxin Mao, Yiran Zhong, Xiaojun Chang, Meng Wang

    Abstract: Audio-Visual Video Parsing (AVVP) task aims to detect and temporally locate events within audio and visual modalities. Multiple events can overlap in the timeline, making identification challenging. While traditional methods usually focus on improving the early audio-visual encoders to embed more effective features, the decoding phase -- crucial for final event classification, often receives less… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  2. arXiv:2407.00837  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Towards Robust Speech Representation Learning for Thousands of Languages

    Authors: William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe

    Abstract: Self-supervised learning (SSL) has helped extend speech technologies to more languages by reducing the need for labeled data. However, models are still far from supporting the world's 7000+ languages. We propose XEUS, a Cross-lingual Encoder for Universal Speech, trained on over 1 million hours of data across 4057 languages, extending the language coverage of SSL models 4-fold. We combine 1 millio… ▽ More

    Submitted 2 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: Updated affiliations; 20 pages

  3. arXiv:2406.12723  [pdf, other

    cs.LG

    BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

    Authors: Zahra Gharaee, Scott C. Lowe, ZeMing Gong, Pablo Millan Arias, Nicholas Pellegrino, Austin T. Wang, Joakim Bruslund Haurum, Iuliia Zarubiieva, Lila Kari, Dirk Steinke, Graham W. Taylor, Paul Fieguth, Angel X. Chang

    Abstract: As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, this paper presents the BIOSCAN-5M Insect dataset to the machine learning community and establish several benchmark tasks. BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens, and it significantly expands existing image-based biological datasets by includin… ▽ More

    Submitted 24 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  4. arXiv:2406.11579  [pdf, other

    cs.CV

    Duoduo CLIP: Efficient 3D Understanding with Multi-View Images

    Authors: Han-Hung Lee, Yiming Zhang, Angel X. Chang

    Abstract: We introduce Duoduo CLIP, a model for 3D representation learning that learns shape encodings from multi-view images instead of point-clouds. The choice of multi-view images allows us to leverage 2D priors from off-the-shelf CLIP models to facilitate fine-tuning with 3D data. Our approach not only shows better generalization compared to existing point cloud methods, but also reduces GPU requirement… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.09687  [pdf

    cond-mat.mes-hall cond-mat.str-el

    Interplay between topology and correlations in the second moiré band of twisted bilayer MoTe2

    Authors: Fan Xu, Xumin Chang, Jiayong Xiao, Yixin Zhang, Feng Liu, Zheng Sun, Ning Mao, Nikolai Peshcherenko, Jiayi Li, Kenji Watanabe, Takashi Taniguchi, Bingbing Tong, Li Lu, Jinfeng Jia, Dong Qian, Zhiwen Shi, Yang Zhang, Xiaoxue Liu, Shengwei Jiang, Tingxin Li

    Abstract: Topological flat bands formed in two-dimensional lattice systems offer unique opportunity to study the fractional phases of matter in the absence of an external magnetic field. Celebrated examples include fractional quantum anomalous Hall (FQAH) effects and fractional topological insulators. Recently, FQAH effects have been experimentally realized in both the twisted bilayer MoTe2 (tMoTe2) system… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  6. arXiv:2406.08641  [pdf, ps, other

    cs.SD cs.CL eess.AS

    ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

    Authors: Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, Shinji Watanabe

    Abstract: ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model, which can be fine-tuned for a downstream task. However, real-world use cases may require different configurations. This paper presents ML-SUPERB~2.0, which is a ne… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  7. arXiv:2406.07725  [pdf, ps, other

    cs.SD eess.AS

    The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

    Authors: Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin

    Abstract: Representing speech and audio signals in discrete units has become a compelling alternative to traditional high-dimensional feature vectors. Numerous studies have highlighted the efficacy of discrete units in various applications such as speech compression and restoration, speech recognition, and speech generation. To foster exploration in this domain, we introduce the Interspeech 2024 Challenge,… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: This manuscript has been accepted by Interspeech2024

  8. arXiv:2406.06999  [pdf, other

    cs.CV

    Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

    Authors: Junfei Yi, Jianxu Mao, Tengfei Liu, Mingjie Li, Hanyu Gu, Hui Zhang, Xiaojun Chang, Yaonan Wang

    Abstract: Knowledge distillation (KD) is a widely adopted and effective method for compressing models in object detection tasks. Particularly, feature-based distillation methods have shown remarkable performance. Existing approaches often ignore the uncertainty in the teacher model's knowledge, which stems from data noise and imperfect training. This limits the student model's ability to learn latent knowle… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  9. arXiv:2406.02990  [pdf, other

    cs.CV

    Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification

    Authors: Gexin Huang, Chenfei Wu, Mingjie Li, Xiaojun Chang, Ling Chen, Ying Sun, Shen Zhao, Xiaodan Liang, Liang Lin

    Abstract: Predicting genetic mutations from whole slide images is indispensable for cancer diagnosis. However, existing work training multiple binary classification models faces two challenges: (a) Training multiple binary classifiers is inefficient and would inevitably lead to a class imbalance problem. (b) The biological relationships among genes are overlooked, which limits the prediction performance. To… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 16 pages, 8 figures, and 3 tables

  10. arXiv:2405.17761  [pdf, other

    cs.LG math.OC

    Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient

    Authors: Hao Di, Haishan Ye, Yueling Zhang, Xiangyu Chang, Guang Dai, Ivor W. Tsang

    Abstract: Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  11. arXiv:2405.17537  [pdf, other

    cs.AI cs.CL cs.CV

    BIOSCAN-CLIP: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

    Authors: ZeMing Gong, Austin T. Wang, Joakim Bruslund Haurum, Scott C. Lowe, Graham W. Taylor, Angel X. Chang

    Abstract: Measuring biodiversity is crucial for understanding ecosystem health. While prior works have developed machine learning models for the taxonomic classification of photographic images and DNA separately, in this work, we introduce a multimodal approach combining both, using CLIP-style contrastive learning to align images, DNA barcodes, and textual data in a unified embedding space. This allows for… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 16 pages with 9 figures

  12. arXiv:2405.13369  [pdf, other

    quant-ph

    Realization of a crosstalk-free multi-ion node for long-distance quantum networking

    Authors: P. -C. Lai, Y. Wang, J. -X. Shi, Z. -B. Cui, Z. -Q. Wang, S. Zhang, P. -Y. Liu, Z. -C. Tian, Y. -D. Sun, X. -Y. Chang, B. -X. Qi, Y. -Y. Huang, Z. -C. Zhou, Y. -K. Wu, Y. Xu, Y. -F. Pu, L. -M. Duan

    Abstract: Trapped atomic ions constitute one of the leading physical platforms for building the quantum repeater nodes to realize large-scale quantum networks. In a long-distance trapped-ion quantum network, it is essential to have crosstalk-free dual-type qubits: one type, called the communication qubit, to establish entangling interface with telecom photons; and the other type, called the memory qubit, to… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 12 pages, 12 figures

  13. arXiv:2405.11429  [pdf, ps, other

    math.AG

    Severi Varieties on Ruled Surfaces over Elliptic Curves

    Authors: Xiaotian Chang, Xi Chen, Adrian Zahariuc

    Abstract: We proved that the general members of Severi varieties on an Atiyah ruled surface over a general elliptic curve have nodes and ordinary triple points as singularities.

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 25 pages

    MSC Class: 14C25; 14C30; 14C35

  14. arXiv:2405.10255  [pdf, other

    cs.CV cs.RO

    When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

    Authors: Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu

    Abstract: As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context lear… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  15. arXiv:2405.06747  [pdf, other

    cs.SD cs.LG eess.AS

    Music Emotion Prediction Using Recurrent Neural Networks

    Authors: Xinyu Chang, Xiangyu Zhang, Haoruo Zhang, Yulu Ran

    Abstract: This study explores the application of recurrent neural networks to recognize emotions conveyed in music, aiming to enhance music recommendation systems and support therapeutic interventions by tailoring music to fit listeners' emotional states. We utilize Russell's Emotion Quadrant to categorize music into four distinct emotional regions and develop models capable of accurately predicting these c… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: 15 pages, 13 figures

  16. arXiv:2405.05481  [pdf, other

    quant-ph

    Achieving millisecond coherence fluxonium through overlap Josephson junctions

    Authors: Fei Wang, Kannan Lu, Huijuan Zhan, Lu Ma, Feng Wu, Hantao Sun, Hao Deng, Yang Bai, Feng Bao, Xu Chang, Ran Gao, Xun Gao, Guicheng Gong, Lijuan Hu, Ruizi Hu, Honghong Ji, Xizheng Ma, Liyong Mao, Zhijun Song, Chengchun Tang, Hongcheng Wang, Tenghui Wang, Ziang Wang, Tian Xia, Hongxin Xu , et al. (10 additional authors not shown)

    Abstract: Fluxonium qubits are recognized for their high coherence times and high operation fidelities, attributed to their unique design incorporating over 100 Josephson junctions per superconducting loop. However, this complexity poses significant fabrication challenges, particularly in achieving high yield and junction uniformity with traditional methods. Here, we introduce an overlap process for Josephs… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  17. arXiv:2405.05010  [pdf, other

    cs.CV

    ${M^2D}$NeRF: Multi-Modal Decomposition NeRF with 3D Feature Fields

    Authors: Ning Wang, Lefei Zhang, Angel X Chang

    Abstract: Neural fields (NeRF) have emerged as a promising approach for representing continuous 3D scenes. Nevertheless, the lack of semantic encoding in NeRFs poses a significant challenge for scene decomposition. To address this challenge, we present a single model, Multi-Modal Decomposition NeRF (${M^2D}$NeRF), that is capable of both text-based and visual patch-based edits. Specifically, we use multi-mo… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  18. arXiv:2405.04466  [pdf, other

    physics.flu-dyn math-ph physics.comp-ph

    A fully differentiable GNN-based PDE Solver: With Applications to Poisson and Navier-Stokes Equations

    Authors: Tianyu Li, Yiye Zou, Shufan Zou, Xinghua Chang, Laiping Zhang, Xiaogang Deng

    Abstract: In this study, we present a novel computational framework that integrates the finite volume method with graph neural networks to address the challenges in Physics-Informed Neural Networks(PINNs). Our approach leverages the flexibility of graph neural networks to adapt to various types of two-dimensional unstructured grids, enhancing the model's applicability across different physical equations and… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  19. arXiv:2405.04046  [pdf

    cs.CR

    MBCT: A Monero-Based Covert Transmission Approach with On-chain Dynamic Session Key Negotiation

    Authors: Zhenshuai Yue, Haoran Zhu, Xiaolin Chang, Jelena Mišić, Vojislav B. Mišić, Junchao Fan

    Abstract: Traditional covert transmission (CT) approaches have been hindering CT application while blockchain technology offers new avenue. Current blockchain-based CT approaches require off-chain negotiation of critical information and often overlook the dynamic session keys updating, which increases the risk of message and key leakage. Additionally, in some approaches the covert transactions exhibit obvio… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  20. arXiv:2405.01030  [pdf

    cs.CR cs.SE

    Towards Trust Proof for Secure Confidential Virtual Machines

    Authors: Jingkai Mao, Haoran Zhu, Junchao Fan, Lin Li, Xiaolin Chang

    Abstract: The Virtual Machine (VM)-based Trusted-Execution-Environment (TEE) technology, like AMD Secure-Encrypted-Virtualization (SEV), enables the establishment of Confidential VMs (CVMs) to protect data privacy. But CVM lacks ways to provide the trust proof of its running state, degrading the user confidence of using CVM. The technology of virtual Trusted Platform Module (vTPM) can be used to generate tr… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  21. arXiv:2404.15189  [pdf, other

    cs.AI

    Text2Grasp: Grasp synthesis by text prompts of object grasping parts

    Authors: Xiaoyun Chang, Yi Sun

    Abstract: The hand plays a pivotal role in human ability to grasp and manipulate objects and controllable grasp synthesis is the key for successfully performing downstream tasks. Existing methods that use human intention or task-level language as control signals for grasping inherently face ambiguity. To address this challenge, we propose a grasp synthesis method guided by text prompts of object grasping pa… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  22. arXiv:2404.14886  [pdf, other

    cs.LG

    GCEPNet: Graph Convolution-Enhanced Expectation Propagation for Massive MIMO Detection

    Authors: Qincheng Lu, Sitao Luan, Xiao-Wen Chang

    Abstract: Massive MIMO (multiple-input multiple-output) detection is an important topic in wireless communication and various machine learning based methods have been developed recently for this task. Expectation propagation (EP) and its variants are widely used for MIMO detection and have achieved the best performance. However, EP-based solvers fail to capture the correlation between unknown variables, lea… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  23. arXiv:2404.14125  [pdf, ps, other

    math.GR math.RT

    Weights for $π$-partial characters of $π$-separable groups

    Authors: Xuewu Chang, Ping Jin

    Abstract: The aim of this paper is to confirm an inequality predicted by Isaacs and Navarro in 1995, which asserts that for any $π'$-subgroup $Q$ of a $π$-separable group $G$, the number of $π'$-weights of $G$ with $Q$ as the first component always exceeds that of irreducible $π$-partial characters of $G$ with $Q$ as their vertex. We also give some sufficient condition to guarantee that these two numbers ar… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    MSC Class: 20C15; 20C20

  24. arXiv:2404.09385  [pdf, other

    eess.AS cs.CL eess.SP

    A Large-Scale Evaluation of Speech Foundation Models

    Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

    Abstract: The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred

  25. arXiv:2404.07847  [pdf, other

    cs.CV

    The Effectiveness of a Simplified Model Structure for Crowd Counting

    Authors: Lei Chen, Xinghang Gao, Fei Chao, Xiang Chang, Chih Min Lin, Xingen Gao, Shaopeng Lin, Hongyi Zhang, Juqiang Lin

    Abstract: In the field of crowd counting research, many recent deep learning based methods have demonstrated robust capabilities for accurately estimating crowd sizes. However, the enhancement in their performance often arises from an increase in the complexity of the model structure. This paper discusses how to construct high-performance crowd counting models using only simple structures. We proposes the F… ▽ More

    Submitted 18 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  26. arXiv:2404.05657  [pdf, other

    cs.CV

    MLP Can Be A Good Transformer Learner

    Authors: Sihao Lin, Pumeng Lyu, Dongrui Liu, Tao Tang, Xiaodan Liang, Andy Song, Xiaojun Chang

    Abstract: Self-attention mechanism is the key of the Transformer but often criticized for its computation demands. Previous token pruning works motivate their methods from the view of computation redundancy but still need to load the full network and require same memory costs. This paper introduces a novel strategy that simplifies vision transformers and reduces computational load through the selective remo… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: efficient transformer

  27. arXiv:2404.03384  [pdf, other

    cs.CV

    LongVLM: Efficient Long Video Understanding via Large Language Models

    Authors: Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang, Bohan Zhuang

    Abstract: Empowered by Large Language Models (LLMs), recent advancements in VideoLLMs have driven progress in various video understanding tasks. These models encode video representations through pooling or query aggregation over a vast number of visual tokens, making computational and memory costs affordable. Despite successfully providing an overall comprehension of video content, existing VideoLLMs still… ▽ More

    Submitted 10 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  28. arXiv:2403.19207  [pdf, other

    eess.AS

    LV-CTC: Non-autoregressive ASR with CTC and latent variable models

    Authors: Yuya Fujita, Shinji Watanabe, Xuankai Chang, Takashi Maekaku

    Abstract: Non-autoregressive (NAR) models for automatic speech recognition (ASR) aim to achieve high accuracy and fast inference by simplifying the autoregressive (AR) generation process of conventional models. Connectionist temporal classification (CTC) is one of the key techniques used in NAR ASR models. In this paper, we propose a new model combining CTC and a latent variable model, which is one of the s… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  29. arXiv:2403.16734  [pdf, other

    math.OC

    Anderson Acceleration Without Restart: A Novel Method with $n$-Step Super Quadratic Convergence Rate

    Authors: Haishan Ye, Dachao Lin, Xiangyu Chang, Zhihua Zhang

    Abstract: In this paper, we propose a novel Anderson's acceleration method to solve nonlinear equations, which does \emph{not} require a restart strategy to achieve numerical stability. We propose the greedy and random versions of our algorithm. Specifically, the greedy version selects the direction to maximize a certain measure of progress for approximating the current Jacobian matrix. In contrast, the ran… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  30. arXiv:2403.16116  [pdf, other

    cs.CV cs.AI cs.LG

    Self-Supervised Multi-Frame Neural Scene Flow

    Authors: Dongrui Liu, Daqi Liu, Xueqian Li, Sihao Lin, Hongwei xie, Bing Wang, Xiaojun Chang, Lei Chu

    Abstract: Neural Scene Flow Prior (NSFP) and Fast Neural Scene Flow (FNSF) have shown remarkable adaptability in the context of large out-of-distribution autonomous driving. Despite their success, the underlying reasons for their astonishing generalization capabilities remain unclear. Our research addresses this gap by examining the generalization capabilities of NSFP through the lens of uniform stability,… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  31. arXiv:2403.16097  [pdf, other

    cs.AI cs.LO cs.SE

    Can Language Models Pretend Solvers? Logic Code Simulation with LLMs

    Authors: Minyu Chen, Guoqiang Li, Ling-I Wu, Ruibang Liu, Yuxin Su, Xi Chang, Jianxin Xue

    Abstract: Transformer-based large language models (LLMs) have demonstrated significant potential in addressing logic problems. capitalizing on the great capabilities of LLMs for code-related activities, several frameworks leveraging logical solvers for logic reasoning have been proposed recently. While existing research predominantly focuses on viewing LLMs as natural language logic solvers or translators,… ▽ More

    Submitted 28 March, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: 12 pages, 8 figures

  32. arXiv:2403.14174  [pdf, other

    cs.CV

    Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding

    Authors: Jingjing Hu, Dan Guo, Kun Li, Zhan Si, Xun Yang, Xiaojun Chang, Meng Wang

    Abstract: Inspired by the activity-silent and persistent activity mechanisms in human visual perception biology, we design a Unified Static and Dynamic Network (UniSDNet), to learn the semantic association between the video and text/audio queries in a cross-modal environment for efficient video grounding. For static modeling, we devise a novel residual structure (ResMLP) to boost the global comprehensive in… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  33. arXiv:2403.13289  [pdf, other

    cs.CV

    Text-to-3D Shape Generation

    Authors: Han-Hung Lee, Manolis Savva, Angel X. Chang

    Abstract: Recent years have seen an explosion of work and interest in text-to-3D shape generation. Much of the progress is driven by advances in 3D representations, large-scale pretraining and representation learning for text and image data enabling generative AI models, and differentiable rendering. Computational systems that can perform text-to-3D shape generation have captivated the popular imagination a… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  34. arXiv:2403.12347  [pdf

    physics.optics

    Octave-wide broadening of ultraviolet dispersive wave driven by soliton-splitting dynamics

    Authors: Tiandao Chen, Jinyu Pan, Zhiyuan Huang, Yue Yu, Donghan Liu, Xinshuo Chang, Zhengzheng Liu, Wenbin He, Xin Jiang, Meng Pang, Yuxin Leng, Ruxin Li

    Abstract: Coherent dispersive wave emission, as an important phenomenon of soliton dynamics, manifests itself in multiple platforms of nonlinear optics from fibre waveguides to integrated photonics. Limited by its resonance nature, efficient generation of coherent dispersive wave with ultra-broad bandwidth has, however, proved difficult to realize. Here, we unveil a new regime of soliton dynamics in which t… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  35. arXiv:2403.12301  [pdf, other

    cs.CV

    R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding

    Authors: Qirui Wu, Sonia Raychaudhuri, Daniel Ritchie, Manolis Savva, Angel X Chang

    Abstract: We introduce the Reality-linked 3D Scenes (R3DS) dataset of synthetic 3D scenes mirroring the real-world scene arrangements from Matterport3D panoramas. Compared to prior work, R3DS has more complete and densely populated scenes with objects linked to real-world observations in panoramas. R3DS also provides an object support hierarchy, and matching object sets (e.g., same chairs around a dining ta… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  36. arXiv:2403.11519  [pdf, other

    cs.CR

    Efficient and Privacy-Preserving Federated Learning based on Full Homomorphic Encryption

    Authors: Yuqi Guo, Lin Li, Zhongxiang Zheng, Hanrui Yun, Ruoyan Zhang, Xiaolin Chang, Zhixuan Gao

    Abstract: Since the first theoretically feasible full homomorphic encryption (FHE) scheme was proposed in 2009, great progress has been achieved. These improvements have made FHE schemes come off the paper and become quite useful in solving some practical problems. In this paper, we propose a set of novel Federated Learning Schemes by utilizing the latest homomorphic encryption technologies, so as to improv… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  37. arXiv:2403.08568  [pdf, other

    cs.CV cs.LG

    Consistent Prompting for Rehearsal-Free Continual Learning

    Authors: Zhanxin Gao, Jun Cen, Xiaobin Chang

    Abstract: Continual learning empowers models to adapt autonomously to the ever-changing environment or data streams without forgetting old knowledge. Prompt-based approaches are built on frozen pre-trained models to learn the task-specific prompts and classifiers efficiently. Existing prompt-based methods are inconsistent between training and testing, limiting their effectiveness. Two types of inconsistency… ▽ More

    Submitted 14 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  38. arXiv:2403.07376  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning

    Authors: Bingqian Lin, Yunshuang Nie, Ziming Wei, Jiaqi Chen, Shikui Ma, Jianhua Han, Hang Xu, Xiaojun Chang, Xiaodan Liang

    Abstract: Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions. Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability. However, their predominant use in an offlin… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  39. arXiv:2403.06384  [pdf, other

    physics.atom-ph

    Precision Spectroscopy and Nuclear Structure Parameters in 7Li+ ion

    Authors: Hua Guan, Xiao-Qiu Qi, Peng-Peng Zhou, Wei Sun, Shao-Long Chen, Xu-Rui Chang, Yao Huang, Pei-Pei Zhang, Zong-Chao Yan, G. W. F. Drake, Ai-Xi Chen, Zhen-Xiang Zhong, Ting-Yun Shi, Ke-Lin Gao

    Abstract: The optical Ramsey technique is used to obtain precise measurements of the hyperfine splittings in the $2\,^3\!S_1$ and $2\,^3\!P_J$ states of $^7$Li$^+$. Together with bound-state quantum electrodynamic theory, the Zemach radius and quadrupole moment of the $^7$Li nucleus are determined to be $3.35(1)$~fm and $-3.86(5)$~fm$^2$ respectively, with the quadrupole moment deviating from the recommende… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  40. arXiv:2403.04161  [pdf, other

    cs.LG cs.CV cs.NE

    SWAP-NAS: Sample-Wise Activation Patterns for Ultra-fast NAS

    Authors: Yameng Peng, Andy Song, Haytham M. Fayek, Vic Ciesielski, Xiaojun Chang

    Abstract: Training-free metrics (a.k.a. zero-cost proxies) are widely used to avoid resource-intensive neural network training, especially in Neural Architecture Search (NAS). Recent studies show that existing training-free metrics have several limitations, such as limited correlation and poor generalisation across different search spaces and tasks. Hence, we propose Sample-Wise Activation Patterns and its… ▽ More

    Submitted 24 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: ICLR2024 Spotlight

  41. arXiv:2403.01475  [pdf, other

    cs.LG cs.AI cs.SI

    Representation Learning on Heterophilic Graph with Directional Neighborhood Attention

    Authors: Qincheng Lu, Jiaqi Zhu, Sitao Luan, Xiao-Wen Chang

    Abstract: Graph Attention Network (GAT) is one of the most popular Graph Neural Network (GNN) architecture, which employs the attention mechanism to learn edge weights and has demonstrated promising performance in various applications. However, since it only incorporates information from immediate neighborhood, it lacks the ability to capture long-range and global graph information, leading to unsatisfactor… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  42. arXiv:2403.01326  [pdf, other

    cs.CV

    DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions

    Authors: Guangrun Wang, Changlin Li, Liuchun Yuan, Jiefeng Peng, Xiaoyu Xian, Xiaodan Liang, Xiaojun Chang, Liang Lin

    Abstract: Neural Architecture Search (NAS), aiming at automatically designing neural architectures by machines, has been considered a key step toward automatic machine learning. One notable NAS branch is the weight-sharing NAS, which significantly improves search efficiency and allows NAS algorithms to run on ordinary computers. Despite receiving high expectations, this category of methods suffers from low… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: T-PAMI

  43. arXiv:2402.18086  [pdf, other

    cs.CV

    Generalizable Two-Branch Framework for Image Class-Incremental Learning

    Authors: Chao Wu, Xiaobin Chang, Ruixuan Wang

    Abstract: Deep neural networks often severely forget previously learned knowledge when learning new knowledge. Various continual learning (CL) methods have been proposed to handle such a catastrophic forgetting issue from different perspectives and achieved substantial improvements. In this paper, a novel two-branch continual learning framework is proposed to further enhance most existing CL methods. Specif… ▽ More

    Submitted 13 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 5 pages,3 figures,accepted by ICASSP 2024

  44. arXiv:2402.16021  [pdf, other

    cs.CL cs.AI cs.CV eess.AS

    TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

    Authors: Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro

    Abstract: The capability to jointly process multi-modal information is becoming an essential task. However, the limited number of paired multi-modal data and the large computational requirements in multi-modal learning hinder the development. We propose a novel Tri-Modal Translation (TMT) model that translates between arbitrary modalities spanning speech, image, and text. We introduce a novel viewpoint, whe… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  45. "It Must Be Gesturing Towards Me": Gesture-Based Interaction between Autonomous Vehicles and Pedestrians

    Authors: Xiang Chang, Zihe Chen, Xiaoyan Dong, Yuxin Cai, Tingmin Yan, Haolin Cai, Zherui Zhou, Guyue Zhou, Jiangtao Gong

    Abstract: Interacting with pedestrians understandably and efficiently is one of the toughest challenges faced by autonomous vehicles (AVs) due to the limitations of current algorithms and external human-machine interfaces (eHMIs). In this paper, we design eHMIs based on gestures inspired by the most popular method of interaction between pedestrians and human drivers. Eight common gestures were selected to c… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 26 pages,22 figures

    MSC Class: H.5.2

    Journal ref: CHI2024

  46. MatchNAS: Optimizing Edge AI in Sparse-Label Data Contexts via Automating Deep Neural Network Porting for Mobile Deployment

    Authors: Hongtao Huang, Xiaojun Chang, Wen Hu, Lina Yao

    Abstract: Recent years have seen the explosion of edge intelligence with powerful Deep Neural Networks (DNNs). One popular scheme is training DNNs on powerful cloud servers and subsequently porting them to mobile devices after being lightweight. Conventional approaches manually specialized DNNs for various edge platforms and retrain them with real-world data. However, as the number of platforms increases, t… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  47. arXiv:2402.08769  [pdf, other

    cs.LG cs.DC

    FLASH: Federated Learning Across Simultaneous Heterogeneities

    Authors: Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury

    Abstract: The key premise of federated learning (FL) is to train ML models across a diverse set of data-owners (clients), without exchanging local data. An overarching challenge to this date is client heterogeneity, which may arise not only from variations in data distribution, but also in data quality, as well as compute/communication latency. An integrated view of these diverse and concurrent sources of h… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  48. arXiv:2401.16658  [pdf, ps, other

    cs.CL eess.AS

    OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

    Authors: Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe

    Abstract: Recent studies have highlighted the importance of fully open foundation models. The Open Whisper-style Speech Model (OWSM) is an initial step towards reproducing OpenAI Whisper using public data and open-source toolkits. However, previous versions of OWSM (v1 to v3) are still based on standard Transformer, which might lead to inferior performance compared to state-of-the-art speech encoder archite… ▽ More

    Submitted 16 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted at INTERSPEECH 2024. Webpage: https://www.wavlab.org/activities/2024/owsm/

  49. The universality of physical images at relative timescales on multiplex networks

    Authors: Xin Chang, Chao-Ran Cai, Ji-Qiang Zhang, Wen-Li Yang

    Abstract: The duration of the accumulation rate (physical image) is a key factor in analysis of counterintuitive phenomena involving relative timescales on multiplex networks. Typically, the relative timescales are represented by multiplying any layer by the same factor. However, researchers often overlook the changes in the relative timescales caused by local parameters, resulting in incomplete analysis of… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Journal ref: Chaos Solitons Fractals 182, 114780 (2024)

  50. arXiv:2401.15815  [pdf, ps, other

    eess.SP cs.CE math.OC

    Success probability of the $L_0$-regularized box-constrained Babai point and column permutation strategies

    Authors: Xiao-Wen Chang, Yingzi XU

    Abstract: We consider the success probability of the $L_0$-regularized box-constrained Babai point, which is a suboptimal solution to the $L_0$-regularized box-constrained integer least squares problem and can be used for MIMO detection. First, we derive formulas for the success probability of both $L_0$-regularized and unregularized box-constrained Babai points. Then we investigate the properties of the… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: 37 pages, 1 figure including 2 subfigures