subscribe to arXiv mailings

Narrow Linewidth Laser Based on Extended Topological Interface States in One-Dimensional Photonic Crystals

Authors: Xiao Sun, Zhibo Li, Yiming Sun, Yupei Wang, Jue Wang, Huihua Cheng, Cong Fu, John H. Marsh, Anthony E. Kelly, Lianping Hou

Abstract: Recent advances in topological one-dimensional photonic crystal concepts have enabled the development of robust light-emitting devices by incorporating a topological interface state (TIS) at the cavity center. In this study, we theoretically and experimentally demonstrate a one-dimensional TIS-extended photonic crystal (1D-TISE-PC) structure. By integrating a linearly dispersive zero-index one-dim… ▽ More Recent advances in topological one-dimensional photonic crystal concepts have enabled the development of robust light-emitting devices by incorporating a topological interface state (TIS) at the cavity center. In this study, we theoretically and experimentally demonstrate a one-dimensional TIS-extended photonic crystal (1D-TISE-PC) structure. By integrating a linearly dispersive zero-index one-dimensional photonic crystal structure with a four-phase shift sampled grating, photons propagate along the cavity without phase differences, enhancing the robustness to material variations and extending the TIS. Our findings indicate that extending the TIS promotes a more uniform photon distribution along the laser cavity and mitigates the spatial hole burning (SHB) effect. We fabricated and characterized a 1550 nm sidewall 1D-TISE-PC semiconductor laser, achieving stable single-mode operation across a wide current range from 60 to 420 mA, with a side-mode suppression ratio of 50 dB. The 1D-TISE-PC structure exhibited a linewidth narrowing effect to approximately 150 kHz Lorentzian linewidth. Utilizing reconstruction equivalent-chirp technology for the 4PS sampled grating enabled precise wavelength control in 1D-TISE-PC laser arrays, achieving a wavelength spacing of 0.796 nm +- 0.003 nm. We show that the TIS still exists in the TISE cavity and topological protection is preserved. Its mode extension characteristics mitigate the SHB so narrows the linewidth. We argue that the design simplicity and improvement of the fabrication tolerance make this architecture suitable for high-power and narrow-linewidth semiconductor lasers development. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.04927 [pdf, other]

doi 10.1088/1572-9494/ad4f6f

Collectively induced transparency and absorption in waveguide QED with Bragg atom arrays

Authors: Haolei Cheng, Wei Nie

Abstract: Collective quantum states, such as subradiant and superradiant states, are useful for controlling optical responses in many-body quantum systems. In this work, we study novel collective quantum phenomena in waveguide-coupled Bragg atom arrays with inhomogeneous frequencies. For atoms without free-space dissipation, collectively induced transparency is produced by destructive quantum interference b… ▽ More Collective quantum states, such as subradiant and superradiant states, are useful for controlling optical responses in many-body quantum systems. In this work, we study novel collective quantum phenomena in waveguide-coupled Bragg atom arrays with inhomogeneous frequencies. For atoms without free-space dissipation, collectively induced transparency is produced by destructive quantum interference between subradiant and superradiant states. In a large Bragg atom array, multi-frequency photon transparency can be obtained by considering atoms with different frequencies. Interestingly, we find collectively induced absorption (CIA) by studying the influence of free-space dissipation on photon transport. Tunable atomic frequencies nontrivially modify decay rates of subradiant states. When the decay rate of a subradiant state equals to the free-space dissipation, photon absorption can reach a limit at a certain frequency. In other words, photon absorption is enhanced with low free-space dissipation, distinct from previous photon detection schemes. We also show multi-frequency CIA by properly adjusting atomic frequencies. Our work presents a way to manipulate collective quantum states and exotic optical properties in waveguide QED systems. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Journal ref: Commun. Theor. Phys. 76, 085101 (2024)

arXiv:2407.03568 [pdf, other]

When LLM Meets Hypergraph: A Sociological Analysis on Personality via Online Social Networks

Authors: Zhiyao Shu, Xiangguo Sun, Hong Cheng

Abstract: Individual personalities significantly influence our perceptions, decisions, and social interactions, which is particularly crucial for gaining insights into human behavior patterns in online social network analysis. Many psychological studies have observed that personalities are strongly reflected in their social behaviors and social environments. In light of these problems, this paper proposes a… ▽ More Individual personalities significantly influence our perceptions, decisions, and social interactions, which is particularly crucial for gaining insights into human behavior patterns in online social network analysis. Many psychological studies have observed that personalities are strongly reflected in their social behaviors and social environments. In light of these problems, this paper proposes a sociological analysis framework for one's personality in an environment-based view instead of individual-level data mining. Specifically, to comprehensively understand an individual's behavior from low-quality records, we leverage the powerful associative ability of LLMs by designing an effective prompt. In this way, LLMs can integrate various scattered information with their external knowledge to generate higher-quality profiles, which can significantly improve the personality analysis performance. To explore the interactive mechanism behind the users and their online environments, we design an effective hypergraph neural network where the hypergraph nodes are users and the hyperedges in the hypergraph are social environments. We offer a useful dataset with user profile data, personality traits, and several detected environments from the real-world social platform. To the best of our knowledge, this is the first network-based dataset containing both hypergraph structure and social information, which could push forward future research in this area further. By employing the framework on this dataset, we can effectively capture the nuances of individual personalities and their online behaviors, leading to a deeper understanding of human interactions in the digital world. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03197 [pdf, other]

DyFADet: Dynamic Feature Aggregation for Temporal Action Detection

Authors: Le Yang, Ziwei Zheng, Yizeng Han, Hao Cheng, Shiji Song, Gao Huang, Fan Li

Abstract: Recent proposed neural network-based Temporal Action Detection (TAD) models are inherently limited to extracting the discriminative representations and modeling action instances with various lengths from complex scenes by shared-weights detection heads. Inspired by the successes in dynamic neural networks, in this paper, we build a novel dynamic feature aggregation (DFA) module that can simultaneo… ▽ More Recent proposed neural network-based Temporal Action Detection (TAD) models are inherently limited to extracting the discriminative representations and modeling action instances with various lengths from complex scenes by shared-weights detection heads. Inspired by the successes in dynamic neural networks, in this paper, we build a novel dynamic feature aggregation (DFA) module that can simultaneously adapt kernel weights and receptive fields at different timestamps. Based on DFA, the proposed dynamic encoder layer aggregates the temporal features within the action time ranges and guarantees the discriminability of the extracted representations. Moreover, using DFA helps to develop a Dynamic TAD head (DyHead), which adaptively aggregates the multi-scale features with adjusted parameters and learned receptive fields better to detect the action instances with diverse ranges from videos. With the proposed encoder layer and DyHead, a new dynamic TAD model, DyFADet, achieves promising performance on a series of challenging TAD benchmarks, including HACS-Segment, THUMOS14, ActivityNet-1.3, Epic-Kitchen 100, Ego4D-Moment QueriesV1.0, and FineAction. Code is released to https://github.com/yangle15/DyFADet-pytorch. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.00946 [pdf]

Atomic cluster expansion interatomic potential for defects and thermodynamics of Cu-W system

Authors: Jiahao Pan, Huiqun Cheng, Gaosheng Yan, Lei Zhang, Wenshan Yu, Shengping Shen

Abstract: The unique properties exhibited in immiscible metals, such as excellent strength, hardness, and radiation-damage tolerance, have stimulated the interest of many researchers. As a typical immiscible metal system, the Cu-W nano-multilayers combine the plasticity of copper and the strength of tungsten, making it a suitable candidate for applications in aerospace, nuclear fusion engineering, and elect… ▽ More The unique properties exhibited in immiscible metals, such as excellent strength, hardness, and radiation-damage tolerance, have stimulated the interest of many researchers. As a typical immiscible metal system, the Cu-W nano-multilayers combine the plasticity of copper and the strength of tungsten, making it a suitable candidate for applications in aerospace, nuclear fusion engineering, and electronic packaging etc. To understand the atomistic origin of the defects and thermodynamics of the Cu-W immiscible system, we have developed an accurate machine learning interatomic potential (ML-IAP) for Cu-W based on the atomic cluster expansion (ACE) method. The Cu-W ACE potential can faithfully reproduce the fundamental properties of Cu and W predicted by density functional theory (DFT). Moreover, the thermodynamical properties, such as the melting point, coefficient of thermal expansion, diffusion coefficient, and equation of the state curve of the Cu-W solid solution, are calculated and compared against DFT and experiments. Monte Carlo Molecular Dynamics (MC-MD) simulations performed with the Cu-W ACE potential predict the experimentally observed phase separation and uphill diffusion phenomena. Our findings not only provide an accurate ACE potential for describing the Cu-W immiscible system, but also shed light on understanding the atomistic mechanism during the Cu-W nano-multilayers formation process. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 26 pages, 14 figures

arXiv:2406.18658 [pdf, ps, other]

Sample Complexity of Locally Differentially Private Quantum Hypothesis Testing

Authors: Hao-Chung Cheng, Christoph Hirche, Cambyse Rouzé

Abstract: Quantum state discrimination is an important problem in many information processing tasks. In this work we are concerned with finding its best possible sample complexity when the states are preprocessed by a quantum channel that is required to be locally differentially private. To that end we provide achievability and converse bounds for different settings. This includes symmetric state discrimina… ▽ More Quantum state discrimination is an important problem in many information processing tasks. In this work we are concerned with finding its best possible sample complexity when the states are preprocessed by a quantum channel that is required to be locally differentially private. To that end we provide achievability and converse bounds for different settings. This includes symmetric state discrimination in various regimes and the asymmetric case. On the way, we also prove new sample complexity bounds for the general unconstrained setting. An important tool in this endeavor are new entropy inequalities that we believe to be of independent interest. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 24 pages. Short version accepted at ISIT 2024. This work is independent and concurrent to "Contraction of Private Quantum Channels and Private Quantum Hypothesis Testing" by Theshani Nuradha and Mark M. Wilde

arXiv:2406.14927 [pdf, other]

Gaussian-Informed Continuum for Physical Property Identification and Simulation

Authors: Junhao Cai, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, Qifeng Chen

Abstract: This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a… ▽ More This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a new dynamic 3D Gaussian framework based on motion factorization to recover the object as 3D Gaussian point sets across different time states. Furthermore, we develop a coarse-to-fine filling strategy to generate the density fields of the object from the Gaussian reconstruction, allowing for the extraction of object continuums along with their surfaces and the integration of Gaussian attributes into these continuums. In addition to the extracted object surfaces, the Gaussian-informed continuum also enables the rendering of object masks during simulations, serving as implicit shape guidance for physical property estimation. Extensive experimental evaluations demonstrate that our pipeline achieves state-of-the-art performance across multiple benchmarks and metrics. Additionally, we illustrate the effectiveness of the proposed method through real-world demonstrations, showcasing its practical utility. Our project page is at https://jukgei.github.io/project/gic. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 19 pages, 8 figures

arXiv:2406.13159 [pdf, other]

Ultrastable vacuum-gap Fabry-Pérot cavities operated in air

Authors: Yifan Liu, Naijun Jin, Dahyeon Lee, Charles McLemore, Takuma Nakamura, Megan Kelleher, Haotian Cheng, Susan Schima, Nazanin Hoghooghi, Scott Diddams, Peter Rakich, Franklyn Quinlan

Abstract: We demonstrate a vacuum-gap ultrastable optical reference cavity that does not require a vacuum enclosure. Our simple method of optical contact bonding in a vacuum environment allows for cavity operation in air while maintaining vacuum between the cavity mirrors. Vacuum is maintained long term, with no observed degradation in cavity stability for over 1 year after bonding. For a 1550 nm laser stab… ▽ More We demonstrate a vacuum-gap ultrastable optical reference cavity that does not require a vacuum enclosure. Our simple method of optical contact bonding in a vacuum environment allows for cavity operation in air while maintaining vacuum between the cavity mirrors. Vacuum is maintained long term, with no observed degradation in cavity stability for over 1 year after bonding. For a 1550 nm laser stabilized to a 9.7 mL in-vacuum bonded cavity, the measured Allan deviation is $2.4\times 10^{-14}$ at 1 s and its phase noise is thermal-noise-limited from 0.1 Hz to 10 kHz, reaching about -105 dBc/Hz at 10 kHz offset frequency. This represents the highest stability of any oscillator operated without a vacuum enclosure. Furthermore, we demonstrate a 0.5 mL in-vacuum bonded cavity created using microfabricated mirrors and cavity dicing, with phase noise reaching -95 dBc/Hz at 10 kHz offset frequency. By relieving the need for high-vacuum enclosures, we greatly enhance the portability and utility of low noise, compact cavity-stabilized lasers, with applications ranging from environmental sensing to mobile optical clocks to ultralow noise microwave generation. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 10 pages, 6 figures

arXiv:2406.11472 [pdf, other]

Learning from Exemplars for Interactive Image Segmentation

Authors: Kun Li, Hao Cheng, George Vosselman, Michael Ying Yang

Abstract: Interactive image segmentation enables users to interact minimally with a machine, facilitating the gradual refinement of the segmentation mask for a target of interest. Previous studies have demonstrated impressive performance in extracting a single target mask through interactive segmentation. However, the information cues of previously interacted objects have been overlooked in the existing met… ▽ More Interactive image segmentation enables users to interact minimally with a machine, facilitating the gradual refinement of the segmentation mask for a target of interest. Previous studies have demonstrated impressive performance in extracting a single target mask through interactive segmentation. However, the information cues of previously interacted objects have been overlooked in the existing methods, which can be further explored to speed up interactive segmentation for multiple targets in the same category. To this end, we introduce novel interactive segmentation frameworks for both a single object and multiple objects in the same category. Specifically, our model leverages transformer backbones to extract interaction-focused visual features from the image and the interactions to obtain a satisfactory mask of a target as an exemplar. For multiple objects, we propose an exemplar-informed module to enhance the learning of similarities among the objects of the target category. To combine attended features from different modules, we incorporate cross-attention blocks followed by a feature fusion module. Experiments conducted on mainstream benchmarks demonstrate that our models achieve superior performance compared to previous methods. Particularly, our model reduces users' labor by around 15\%, requiring two fewer clicks to achieve target IoUs 85\% and 90\%. The results highlight our models' potential as a flexible and practical annotation tool. The source code will be released after publication. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.10810 [pdf, other]

RGBlimp-Q: Robotic Gliding Blimp With Moving Mass Control Based on a Bird-Inspired Continuum Arm

Authors: Hao Cheng, Feitian Zhang

Abstract: Robotic blimps, as lighter-than-air aerial systems, offer prolonged duration and enhanced safety in human-robot interactions due to their buoyant lift. However, robust flight against environmental airflow disturbances remains a significant challenge, limiting the broader application of these robots. Drawing inspiration from the flight mechanics of birds and their ability to perch against natural w… ▽ More Robotic blimps, as lighter-than-air aerial systems, offer prolonged duration and enhanced safety in human-robot interactions due to their buoyant lift. However, robust flight against environmental airflow disturbances remains a significant challenge, limiting the broader application of these robots. Drawing inspiration from the flight mechanics of birds and their ability to perch against natural wind, this article introduces RGBlimp-Q, a robotic gliding blimp equipped with a bird-inspired continuum arm. This arm allows for flexible attitude adjustments through moving mass control to enhance disturbance resilience, while also enabling object capture by using claws to counteract environmental disturbances, similar to a bird. This article presents the design, modeling, and prototyping of RGBlimp-Q, thus extending the advantages of robotic blimps to more complex environments. To the best of the authors' knowledge, this is the first interdisciplinary design integrating continuum mechanisms onto robotic blimps. Experimental results from both indoor and outdoor settings validate the improved flight robustness against environmental disturbances offered by this novel design. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.05346 [pdf, other]

ProG: A Graph Prompt Learning Benchmark

Authors: Chenyi Zi, Haihong Zhao, Xiangguo Sun, Yiqing Lin, Hong Cheng, Jia Li

Abstract: Artificial general intelligence on graphs has shown significant advancements across various applications, yet the traditional 'Pre-train & Fine-tune' paradigm faces inefficiencies and negative transfer issues, particularly in complex and few-shot settings. Graph prompt learning emerges as a promising alternative, leveraging lightweight prompts to manipulate data and fill the task gap by reformulat… ▽ More Artificial general intelligence on graphs has shown significant advancements across various applications, yet the traditional 'Pre-train & Fine-tune' paradigm faces inefficiencies and negative transfer issues, particularly in complex and few-shot settings. Graph prompt learning emerges as a promising alternative, leveraging lightweight prompts to manipulate data and fill the task gap by reformulating downstream tasks to the pretext. However, several critical challenges still remain: how to unify diverse graph prompt models, how to evaluate the quality of graph prompts, and to improve their usability for practical comparisons and selection. In response to these challenges, we introduce the first comprehensive benchmark for graph prompt learning. Our benchmark integrates SIX pre-training methods and FIVE state-of-the-art graph prompt techniques, evaluated across FIFTEEN diverse datasets to assess performance, flexibility, and efficiency. We also present 'ProG', an easy-to-use open-source library that streamlines the execution of various graph prompt models, facilitating objective evaluations. Additionally, we propose a unified framework that categorizes existing graph prompt methods into two main approaches: prompts as graphs and prompts as tokens. This framework enhances the applicability and comparison of graph prompt techniques. The code is available at: https://github.com/sheldonresearch/ProG. △ Less

Submitted 19 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.04520 [pdf, other]

NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

Authors: Huaixiu Steven Zheng, Swaroop Mishra, Hugh Zhang, Xinyun Chen, Minmin Chen, Azade Nova, Le Hou, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou

Abstract: We introduce NATURAL PLAN, a realistic planning benchmark in natural language containing 3 key tasks: Trip Planning, Meeting Planning, and Calendar Scheduling. We focus our evaluation on the planning capabilities of LLMs with full information on the task, by providing outputs from tools such as Google Flights, Google Maps, and Google Calendar as contexts to the models. This eliminates the need for… ▽ More We introduce NATURAL PLAN, a realistic planning benchmark in natural language containing 3 key tasks: Trip Planning, Meeting Planning, and Calendar Scheduling. We focus our evaluation on the planning capabilities of LLMs with full information on the task, by providing outputs from tools such as Google Flights, Google Maps, and Google Calendar as contexts to the models. This eliminates the need for a tool-use environment for evaluating LLMs on Planning. We observe that NATURAL PLAN is a challenging benchmark for state of the art models. For example, in Trip Planning, GPT-4 and Gemini 1.5 Pro could only achieve 31.1% and 34.8% solve rate respectively. We find that model performance drops drastically as the complexity of the problem increases: all models perform below 5% when there are 10 cities, highlighting a significant gap in planning in natural language for SoTA LLMs. We also conduct extensive ablation studies on NATURAL PLAN to further shed light on the (in)effectiveness of approaches such as self-correction, few-shot generalization, and in-context planning with long-contexts on improving LLM planning. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03240 [pdf, other]

Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy

Authors: Yuankun Xie, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Xiaopeng Wang, Haonnan Cheng, Long Ye, Jianhua Tao

Abstract: With the proliferation of deepfake audio, there is an urgent need to investigate their attribution. Current source tracing methods can effectively distinguish in-distribution (ID) categories. However, the rapid evolution of deepfake algorithms poses a critical challenge in the accurate identification of out-of-distribution (OOD) novel deepfake algorithms. In this paper, we propose Real Emphasis an… ▽ More With the proliferation of deepfake audio, there is an urgent need to investigate their attribution. Current source tracing methods can effectively distinguish in-distribution (ID) categories. However, the rapid evolution of deepfake algorithms poses a critical challenge in the accurate identification of out-of-distribution (OOD) novel deepfake algorithms. In this paper, we propose Real Emphasis and Fake Dispersion (REFD) strategy for audio deepfake algorithm recognition, demonstrating its effectiveness in discriminating ID samples while identifying OOD samples. For effective OOD detection, we first explore current post-hoc OOD methods and propose NSD, a novel OOD approach in identifying novel deepfake algorithms through the similarity consideration of both feature and logits scores. REFD achieves 86.83% F1-score as a single system in Audio Deepfake Detection Challenge 2023 Track3, showcasing its state-of-the-art performance. △ Less

Submitted 8 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH 2024

arXiv:2406.03215 [pdf, other]

Searching Priors Makes Text-to-Video Synthesis Better

Authors: Haoran Cheng, Liang Peng, Linxuan Xia, Yuepeng Hu, Hengjia Li, Qinglin Lu, Xiaofei He, Boxi Wu

Abstract: Significant advancements in video diffusion models have brought substantial progress to the field of text-to-video (T2V) synthesis. However, existing T2V synthesis model struggle to accurately generate complex motion dynamics, leading to a reduction in video realism. One possible solution is to collect massive data and train the model on it, but this would be extremely expensive. To alleviate this… ▽ More Significant advancements in video diffusion models have brought substantial progress to the field of text-to-video (T2V) synthesis. However, existing T2V synthesis model struggle to accurately generate complex motion dynamics, leading to a reduction in video realism. One possible solution is to collect massive data and train the model on it, but this would be extremely expensive. To alleviate this problem, in this paper, we reformulate the typical T2V generation process as a search-based generation pipeline. Instead of scaling up the model training, we employ existing videos as the motion prior database. Specifically, we divide T2V generation process into two steps: (i) For a given prompt input, we search existing text-video datasets to find videos with text labels that closely match the prompt motions. We propose a tailored search algorithm that emphasizes object motion features. (ii) Retrieved videos are processed and distilled into motion priors to fine-tune a pre-trained base T2V model, followed by generating desired videos using input prompt. By utilizing the priors gleaned from the searched videos, we enhance the realism of the generated videos' motion. All operations can be finished on a single NVIDIA RTX 4090 GPU. We validate our method against state-of-the-art T2V models across diverse prompt inputs. The code will be public. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.02983 [pdf, other]

FREA: Feasibility-Guided Generation of Safety-Critical Scenarios with Reasonable Adversariality

Authors: Keyu Chen, Yuheng Lei, Hao Cheng, Haoran Wu, Wenchao Sun, Sifa Zheng

Abstract: Generating safety-critical scenarios, which are essential yet difficult to collect at scale, offers an effective method to evaluate the robustness of autonomous vehicles (AVs). Existing methods focus on optimizing adversariality while preserving the naturalness of scenarios, aiming to achieve a balance through data-driven approaches. However, without an appropriate upper bound for adversariality,… ▽ More Generating safety-critical scenarios, which are essential yet difficult to collect at scale, offers an effective method to evaluate the robustness of autonomous vehicles (AVs). Existing methods focus on optimizing adversariality while preserving the naturalness of scenarios, aiming to achieve a balance through data-driven approaches. However, without an appropriate upper bound for adversariality, the scenarios might exhibit excessive adversariality, potentially leading to unavoidable collisions. In this paper, we introduce FREA, a novel safety-critical scenarios generation method that incorporates the Largest Feasible Region (LFR) of AV as guidance to ensure the reasonableness of the adversarial scenarios. Concretely, FREA initially pre-calculates the LFR of AV from offline datasets. Subsequently, it learns a reasonable adversarial policy that controls critical background vehicles (CBVs) in the scene to generate adversarial yet AV-feasible scenarios by maximizing a novel feasibility-dependent objective function. Extensive experiments illustrate that FREA can effectively generate safety-critical scenarios, yielding considerable near-miss events while ensuring AV's feasibility. Generalization analysis also confirms the robustness of FREA in AV testing across various surrogate AV methods and traffic environments. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 19 pages. Under review

arXiv:2406.02020 [pdf, ps, other]

doi 10.1093/mnras/stae1056

A large population of neutron star low-mass X-ray binaries with long outburst recurrence time ?

Authors: E. Meyer-Hofmeister, Huaqing Cheng, B. F. Liu

Abstract: Low-mass X-ray binaries (LMXBs) with neutron stars show quite different features which depend on the rate of mass transfer from the donor star. With a high transfer rate the Z sources are in a persistent soft spectral state, with a moderate rate the transient Atoll sources have outburst cycles like the black hole X-ray binaries. The observations document very long outburst recurrence times for qui… ▽ More Low-mass X-ray binaries (LMXBs) with neutron stars show quite different features which depend on the rate of mass transfer from the donor star. With a high transfer rate the Z sources are in a persistent soft spectral state, with a moderate rate the transient Atoll sources have outburst cycles like the black hole X-ray binaries. The observations document very long outburst recurrence times for quite a number of sources. We follow with our computations the evolution of the accretion disc until the onset of the ionization instability. For sources with a low mass transfer rate the accumulation of matter in the disc is essentially reduced due to the continuous evaporation of matter from the disc to the coronal flow. Different mass transfer rates result in nearly the same amount of matter accumulated for the outburst which means the outburst properties are similar for sources with short and sources with long outburst cycles, contrary to some expectations. Then of systems with long recurrence time less sources will be detected and the total population of LMXBs could be larger than it appears. This would relieve the apparent problem that the observed number of LMXBs as progenitors of millisecond pulsars (MSP) is too small compared to the number of MSP. Concerning the few quasi-persistent sources with year-long soft states we argue that these states are not outbursts, but quasi-stationary hot states as in Z sources. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 7 pages, 4 figures, published in MNRAS

Journal ref: Monthly Notices of the Royal Astronomical Society, Volume 531, Issue 1, pp.1578-1584, 2024

arXiv:2406.02013 [pdf, other]

Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning

Authors: Jiahang Cao, Qiang Zhang, Ziqing Wang, Jiaxu Wang, Hao Cheng, Yecheng Shao, Wen Zhao, Gang Han, Yijie Guo, Renjing Xu

Abstract: Sequential modeling has demonstrated remarkable capabilities in offline reinforcement learning (RL), with Decision Transformer (DT) being one of the most notable representatives, achieving significant success. However, RL trajectories possess unique properties to be distinguished from the conventional sequence (e.g., text or audio): (1) local correlation, where the next states in RL are theoretica… ▽ More Sequential modeling has demonstrated remarkable capabilities in offline reinforcement learning (RL), with Decision Transformer (DT) being one of the most notable representatives, achieving significant success. However, RL trajectories possess unique properties to be distinguished from the conventional sequence (e.g., text or audio): (1) local correlation, where the next states in RL are theoretically determined solely by current states and actions based on the Markov Decision Process (MDP), and (2) global correlation, where each step's features are related to long-term historical information due to the time-continuous nature of trajectories. In this paper, we propose a novel action sequence predictor, named Mamba Decision Maker (MambaDM), where Mamba is expected to be a promising alternative for sequence modeling paradigms, owing to its efficient modeling of multi-scale dependencies. In particular, we introduce a novel mixer module that proficiently extracts and integrates both global and local features of the input sequence, effectively capturing interrelationships in RL datasets. Extensive experiments demonstrate that MambaDM achieves state-of-the-art performance in Atari and OpenAI Gym datasets. Furthermore, we empirically investigate the scaling laws of MambaDM, finding that increasing model size does not bring performance improvement, but scaling the dataset amount by 2x for MambaDM can obtain up to 33.7% score improvement on Atari dataset. This paper delves into the sequence modeling capabilities of MambaDM in the RL domain, paving the way for future advancements in robust and efficient decision-making systems. Our code will be available at https://github.com/AndyCao1125/MambaDM. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 16 pages, 5 figures

arXiv:2406.01598 [pdf]

D2E-An Autonomous Decision-making Dataset involving Driver States and Human Evaluation

Authors: Zehong Ke, Yanbo Jiang, Yuning Wang, Hao Cheng, Jinhao Li, Jianqiang Wang

Abstract: With the advancement of deep learning technology, data-driven methods are increasingly used in the decision-making of autonomous driving, and the quality of datasets greatly influenced the model performance. Although current datasets have made significant progress in the collection of vehicle and environment data, emphasis on human-end data including the driver states and human evaluation is not s… ▽ More With the advancement of deep learning technology, data-driven methods are increasingly used in the decision-making of autonomous driving, and the quality of datasets greatly influenced the model performance. Although current datasets have made significant progress in the collection of vehicle and environment data, emphasis on human-end data including the driver states and human evaluation is not sufficient. In addition, existing datasets consist mostly of simple scenarios such as car following, resulting in low interaction levels. In this paper, we introduce the Driver to Evaluation dataset (D2E), an autonomous decision-making dataset that contains data on driver states, vehicle states, environmental situations, and evaluation scores from human reviewers, covering a comprehensive process of vehicle decision-making. Apart from regular agents and surrounding environment information, we not only collect driver factor data including first-person view videos, physiological signals, and eye attention data, but also provide subjective rating scores from 40 human volunteers. The dataset is mixed of driving simulator scenes and real-road ones. High-interaction situations are designed and filtered to ensure behavior diversity. Through data organization, analysis, and preprocessing, D2E contains over 1100 segments of interactive driving case data covering from human driver factor to evaluation results, supporting the development of data-driven decision-making related algorithms. △ Less

Submitted 12 April, 2024; originally announced June 2024.

Comments: Submit for ITSC 2024

arXiv:2405.20606 [pdf, other]

Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning

Authors: Yang Chen, Tian He, Junfeng Fu, Ling Wang, Jingcai Guo, Hong Cheng

Abstract: Supervised and self-supervised learning are two main training paradigms for skeleton-based human action recognition. However, the former one-hot classification requires labor-intensive predefined action categories annotations, while the latter involves skeleton transformations (e.g., cropping) in the pretext tasks that may impair the skeleton structure. To address these challenges, we introduce a… ▽ More Supervised and self-supervised learning are two main training paradigms for skeleton-based human action recognition. However, the former one-hot classification requires labor-intensive predefined action categories annotations, while the latter involves skeleton transformations (e.g., cropping) in the pretext tasks that may impair the skeleton structure. To address these challenges, we introduce a novel skeleton-based training framework (C$^2$VL) based on Cross-modal Contrastive learning that uses the progressive distillation to learn task-agnostic human skeleton action representation from the Vision-Language knowledge prompts. Specifically, we establish the vision-language action concept space through vision-language knowledge prompts generated by pre-trained large multimodal models (LMMs), which enrich the fine-grained details that the skeleton action space lacks. Moreover, we propose the intra-modal self-similarity and inter-modal cross-consistency softened targets in the cross-modal contrastive process to progressively control and guide the degree of pulling vision-language knowledge prompts and corresponding skeletons closer. These soft instance discrimination and self-knowledge distillation strategies contribute to the learning of better skeleton-based action representations from the noisy skeleton-vision-language pairs. During the inference phase, our method requires only the skeleton data as the input for action recognition and no longer for vision-language prompts. Extensive experiments show that our method achieves state-of-the-art results on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets. The code will be available in the future. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.20090 [pdf, other]

Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models

Authors: Hao Cheng, Erjia Xiao, Jiahang Cao, Le Yang, Kaidi Xu, Jindong Gu, Renjing Xu

Abstract: Following the advent of the Artificial Intelligence (AI) era of large models, Multimodal Large Language Models (MLLMs) with the ability to understand cross-modal interactions between vision and text have attracted wide attention. Adversarial examples with human-imperceptible perturbation are shown to possess a characteristic known as transferability, which means that a perturbation generated by on… ▽ More Following the advent of the Artificial Intelligence (AI) era of large models, Multimodal Large Language Models (MLLMs) with the ability to understand cross-modal interactions between vision and text have attracted wide attention. Adversarial examples with human-imperceptible perturbation are shown to possess a characteristic known as transferability, which means that a perturbation generated by one model could also mislead another different model. Augmenting the diversity in input data is one of the most significant methods for enhancing adversarial transferability. This method has been certified as a way to significantly enlarge the threat impact under black-box conditions. Research works also demonstrate that MLLMs can be exploited to generate adversarial examples in the white-box scenario. However, the adversarial transferability of such perturbations is quite limited, failing to achieve effective black-box attacks across different models. In this paper, we propose the Typographic-based Semantic Transfer Attack (TSTA), which is inspired by: (1) MLLMs tend to process semantic-level information; (2) Typographic Attack could effectively distract the visual information captured by MLLMs. In the scenarios of Harmful Word Insertion and Important Information Protection, our TSTA demonstrates superior performance. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19119 [pdf, other]

Can Graph Learning Improve Task Planning?

Authors: Xixi Wu, Yifei Shen, Caihua Shan, Kaitao Song, Siwei Wang, Bohang Zhang, Jiarui Feng, Hong Cheng, Wei Chen, Yun Xiong, Dongsheng Li

Abstract: Task planning is emerging as an important research topic alongside the development of large language models (LLMs). It aims to break down complex user requests into solvable sub-tasks, thereby fulfilling the original requests. In this context, the sub-tasks can be naturally viewed as a graph, where the nodes represent the sub-tasks, and the edges denote the dependencies among them. Consequently, t… ▽ More Task planning is emerging as an important research topic alongside the development of large language models (LLMs). It aims to break down complex user requests into solvable sub-tasks, thereby fulfilling the original requests. In this context, the sub-tasks can be naturally viewed as a graph, where the nodes represent the sub-tasks, and the edges denote the dependencies among them. Consequently, task planning is a decision-making problem that involves selecting a connected path or subgraph within the corresponding graph and invoking it. In this paper, we explore graph learning-based methods for task planning, a direction that is orthogonal to the prevalent focus on prompt design. Our interest in graph learning stems from a theoretical discovery: the biases of attention and auto-regressive loss impede LLMs' ability to effectively navigate decision-making on graphs, which is adeptly addressed by graph neural networks (GNNs). This theoretical insight led us to integrate GNNs with LLMs to enhance overall performance. Extensive experiments demonstrate that GNN-based methods surpass existing solutions even without training, and minimal training can further enhance their performance. Additionally, our approach complements prompt engineering and fine-tuning techniques, with performance further enhanced by improved prompts or a fine-tuned model. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.17678 [pdf, other]

TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability

Authors: Fengji Ma, Li Liu, Hei Victor Cheng

Abstract: This work addresses the challenge of achieving zero-shot adversarial robustness while preserving zero-shot generalization in large-scale foundation models, with a focus on the popular Contrastive Language-Image Pre-training (CLIP). Although foundation models were reported to have exceptional zero-shot generalization, they are highly vulnerable to adversarial perturbations. Existing methods achieve… ▽ More This work addresses the challenge of achieving zero-shot adversarial robustness while preserving zero-shot generalization in large-scale foundation models, with a focus on the popular Contrastive Language-Image Pre-training (CLIP). Although foundation models were reported to have exceptional zero-shot generalization, they are highly vulnerable to adversarial perturbations. Existing methods achieve a comparable good tradeoff between zero-shot adversarial robustness and generalization under small adversarial perturbations. However, they fail to achieve a good tradeoff under large adversarial perturbations. To this end, we propose a novel Text-Image Mutual Awareness (TIMA) method that strikes a balance between zero-shot adversarial robustness and generalization. More precisely, we propose an Image-Aware Text (IAT) tuning mechanism that increases the inter-class distance of text embeddings by incorporating the Minimum Hyperspherical Energy (MHE). Simultaneously, fixed pre-trained image embeddings are used as cross-modal auxiliary supervision to maintain the similarity between the MHE-tuned and original text embeddings by the knowledge distillation, preserving semantic information between different classes. Besides, we introduce a Text-Aware Image (TAI) tuning mechanism, which increases inter-class distance between image embeddings during the training stage by Text-distance based Adaptive Margin (TAM). Similarly, a knowledge distillation is utilized to retain the similarity between fine-tuned and pre-trained image embeddings. Extensive experimental results demonstrate the effectiveness of our approach, showing impressive zero-shot performance against a wide range of adversarial perturbations while preserving the zero-shot generalization capabilities of the original CLIP model. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16672 [pdf, other]

Transfer Learning Under High-Dimensional Graph Convolutional Regression Model for Node Classification

Authors: Jiachen Chen, Danyang Huang, Liyuan Wang, Kathryn L. Lunetta, Debarghya Mukherjee, Huimin Cheng

Abstract: Node classification is a fundamental task, but obtaining node classification labels can be challenging and expensive in many real-world scenarios. Transfer learning has emerged as a promising solution to address this challenge by leveraging knowledge from source domains to enhance learning in a target domain. Existing transfer learning methods for node classification primarily focus on integrating… ▽ More Node classification is a fundamental task, but obtaining node classification labels can be challenging and expensive in many real-world scenarios. Transfer learning has emerged as a promising solution to address this challenge by leveraging knowledge from source domains to enhance learning in a target domain. Existing transfer learning methods for node classification primarily focus on integrating Graph Convolutional Networks (GCNs) with various transfer learning techniques. While these approaches have shown promising results, they often suffer from a lack of theoretical guarantees, restrictive conditions, and high sensitivity to hyperparameter choices. To overcome these limitations, we propose a Graph Convolutional Multinomial Logistic Regression (GCR) model and a transfer learning method based on the GCR model, called Trans-GCR. We provide theoretical guarantees of the estimate obtained under GCR model in high-dimensional settings. Moreover, Trans-GCR demonstrates superior empirical performance, has a low computational cost, and requires fewer hyperparameters than existing methods. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.13992 [pdf, other]

Learning Cut Generating Functions for Integer Programming

Authors: Hongyu Cheng, Amitabh Basu

Abstract: The branch-and-cut algorithm is the method of choice to solve large scale integer programming problems in practice. A key ingredient of branch-and-cut is the use of cutting planes which are derived constraints that reduce the search space for an optimal solution. Selecting effective cutting planes to produce small branch-and-cut trees is a critical challenge in the branch-and-cut algorithm. Recent… ▽ More The branch-and-cut algorithm is the method of choice to solve large scale integer programming problems in practice. A key ingredient of branch-and-cut is the use of cutting planes which are derived constraints that reduce the search space for an optimal solution. Selecting effective cutting planes to produce small branch-and-cut trees is a critical challenge in the branch-and-cut algorithm. Recent advances have employed a data-driven approach to select optimal cutting planes from a parameterized family, aimed at reducing the branch-and-bound tree size (in expectation) for a given distribution of integer programming instances. We extend this idea to the selection of the best cut generating function (CGF), which is a tool in the integer programming literature for generating a wide variety of cutting planes that generalize the well-known Gomory Mixed-Integer (GMI) cutting planes. We provide rigorous sample complexity bounds for the selection of an effective CGF from certain parameterized families that provably performs well for any specified distribution on the problem instances. Our empirical results show that the selected CGF can outperform the GMI cuts for certain distributions. Additionally, we explore the sample complexity of using neural networks for instance-dependent CGF selection. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.12538 [pdf, other]

Bridging the Intent Gap: Knowledge-Enhanced Visual Generation

Authors: Yi Cheng, Ziwei Xu, Dongyun Lin, Harry Cheng, Yongkang Wong, Ying Sun, Joo Hwee Lim, Mohan Kankanhalli

Abstract: For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leadi… ▽ More For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leading to a mismatch between the desired and generated output. Second, generative models trained on visual-label pairs lack the comprehensive knowledge to accurately represent all aspects of the input data in their generated outputs. To address these challenges, we propose a knowledge-enhanced iterative refinement framework for visual content generation. We begin by analyzing and identifying the key challenges faced by existing generative models. Then, we introduce various knowledge sources, including human insights, pre-trained models, logic rules, and world knowledge, which can be leveraged to address these challenges. Furthermore, we propose a novel visual generation framework that incorporates a knowledge-based feedback module to iteratively refine the generation process. This module gradually improves the alignment between the generated content and user intentions. We demonstrate the efficacy of the proposed framework through preliminary results, highlighting the potential of knowledge-enhanced generative models for intention-aligned content generation. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.07148 [pdf, other]

Investigate the efficiency of incompressible flow simulations on CPUs and GPUs with BSAMR

Authors: Dewen Liu, Shuai He, Haoran Cheng, Yadong Zeng

Abstract: Adaptive mesh refinement (AMR) is a classical technique about local refinement in space where needed, thus effectively reducing computational costs for HPC-based physics simulations. Although AMR has been used for many years, little reproducible research discusses the impact of software-based parameters on block-structured AMR (BSAMR) efficiency and how to choose them. This article primarily does… ▽ More Adaptive mesh refinement (AMR) is a classical technique about local refinement in space where needed, thus effectively reducing computational costs for HPC-based physics simulations. Although AMR has been used for many years, little reproducible research discusses the impact of software-based parameters on block-structured AMR (BSAMR) efficiency and how to choose them. This article primarily does parametric studies to investigate the computational efficiency of incompressible flows on a block-structured adaptive mesh. The parameters include refining block size, refining frequency, maximum level, and cycling method. A new projection skipping (PS) method is proposed, which brings insights about when and where the projections on coarser levels are safe to be omitted. We conduct extensive tests on different CPUs/GPUs for various 2D/3D incompressible flow cases, including bubble, RT instability, Taylor Green vortex, etc. Several valuable empirical conclusions are obtained to help guide simulations with BSAMR. Codes and all profiling data are available on GitHub. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: 22 pages include reference, 9 figures

arXiv:2405.06388 [pdf, other]

Recovery of transversely-isotropic elastic material parameters in induction motor rotors

Authors: Hanz Martin Cheng, Tapio Helin, Ville-Petteri Manninen, Timo Holopainen, Juha Jokinen, Samu Sorvari, Andreas Rupp

Abstract: We propose numerical algorithms for recovering parameters in eigenvalue problems for linear elasticity of transversely isotropic materials. Specifically, the algorithms are used to recover the elastic constants of a rotor core. Numerical tests show that in the noiseless setup, two pairs of bending modes are sufficient for recovering one to four parameters accurately. To recover all five parameters… ▽ More We propose numerical algorithms for recovering parameters in eigenvalue problems for linear elasticity of transversely isotropic materials. Specifically, the algorithms are used to recover the elastic constants of a rotor core. Numerical tests show that in the noiseless setup, two pairs of bending modes are sufficient for recovering one to four parameters accurately. To recover all five parameters that govern the elastic properties of electric engines accurately, we require three pairs of bending modes and one torsional mode. Moreover, we study the stability of the inversion method against multiplicative noise; for tests in which the data contained multiplicative noise of at most $1\%$, we find that all parameters can be recovered with an error less than $10\%$. △ Less

Submitted 10 May, 2024; originally announced May 2024.

MSC Class: 65Z05; 65C20

arXiv:2405.05363 [pdf, other]

LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation

Authors: Tianrui Guan, Yurou Yang, Harry Cheng, Muyuan Lin, Richard Kim, Rajasimman Madhivanan, Arnie Sen, Dinesh Manocha

Abstract: In this paper, we present LOC-ZSON, a novel Language-driven Object-Centric image representation for object navigation task within complex scenes. We propose an object-centric image representation and corresponding losses for visual-language model (VLM) fine-tuning, which can handle complex object-level queries. In addition, we design a novel LLM-based augmentation and prompt templates for stabilit… ▽ More In this paper, we present LOC-ZSON, a novel Language-driven Object-Centric image representation for object navigation task within complex scenes. We propose an object-centric image representation and corresponding losses for visual-language model (VLM) fine-tuning, which can handle complex object-level queries. In addition, we design a novel LLM-based augmentation and prompt templates for stability during training and zero-shot inference. We implement our method on Astro robot and deploy it in both simulated and real-world environments for zero-shot object navigation. We show that our proposed method can achieve an improvement of 1.38 - 13.38% in terms of text-to-image recall on different benchmark settings for the retrieval task. For object navigation, we show the benefit of our approach in simulation and real world, showing 5% and 16.67% improvement in terms of navigation success rate, respectively. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: Accepted to ICRA 2024

arXiv:2405.04880 [pdf, other]

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

Authors: Yuankun Xie, Yi Lu, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Jianhua Tao, Xin Qi, Xiaopeng Wang, Yukun Liu, Haonan Cheng, Long Ye, Yi Sun

Abstract: With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively detect ALM-based deepfake audio, we focus on… ▽ More With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method, the conversion from neural codec to waveform. We initially construct the Codecfake dataset, an open-source large-scale dataset, including 2 languages, over 1M audio samples, and various test conditions, focus on ALM-based audio detection. As countermeasure, to achieve universal detection of deepfake audio and tackle domain ascent bias issue of original SAM, we propose the CSAM strategy to learn a domain balanced and generalized minima. In our experiments, we first demonstrate that ADD model training with the Codecfake dataset can effectively detects ALM-based audio. Furthermore, our proposed generalization countermeasure yields the lowest average Equal Error Rate (EER) of 0.616% across all test conditions compared to baseline models. The dataset and associated code are available online. △ Less

Submitted 15 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.03194 [pdf, other]

CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario

Authors: Zhizhao Duan, Hao Cheng, Duo Xu, Xi Wu, Xiangxie Zhang, Xi Ye, Zhen Xie

Abstract: In the vast and dynamic landscape of urban settings, Traffic Safety Description and Analysis plays a pivotal role in applications ranging from insurance inspection to accident prevention. This paper introduces CityLLaVA, a novel fine-tuning framework for Visual Language Models (VLMs) designed for urban scenarios. CityLLaVA enhances model comprehension and prediction accuracy through (1) employing… ▽ More In the vast and dynamic landscape of urban settings, Traffic Safety Description and Analysis plays a pivotal role in applications ranging from insurance inspection to accident prevention. This paper introduces CityLLaVA, a novel fine-tuning framework for Visual Language Models (VLMs) designed for urban scenarios. CityLLaVA enhances model comprehension and prediction accuracy through (1) employing bounding boxes for optimal visual data preprocessing, including video best-view selection and visual prompt engineering during both training and testing phases; (2) constructing concise Question-Answer sequences and designing textual prompts to refine instruction comprehension; (3) implementing block expansion to fine-tune large VLMs efficiently; and (4) advancing prediction accuracy via a unique sequential questioning-based prediction augmentation. Demonstrating top-tier performance, our method achieved a benchmark score of 33.4308, securing the leading position on the leaderboard. The code can be found: https://github.com/alibaba/AICITY2024_Track2_AliOpenTrek_CityLLaVA △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Accepted by AICITY2024 Workshop Track2 at CVPR2024

arXiv:2405.00079 [pdf]

A global evidence map of human well-being and biodiversity co-benefits and trade-offs of natural climate solutions

Authors: Charlotte H. Chang, James T. Erbaugh, Paola Fajardo, Luci Lu, István Molnár, Dávid Papp, Brian E. Robinson, Kemen Austin, Susan Cook-Patton, Timm Kroeger, Lindsey Smart, Miguel Castro, Samantha H. Cheng, Peter W. Ellis, Rob I. McDonald, Teevrat Garg, Erin E. Poor, Preston Welker, Andrew R. Tilman, Stephen A. Wood, Yuta J. Masuda

Abstract: Natural climate solutions (NCS) are critical for mitigating climate change through ecosystem-based carbon removal and emissions reductions. NCS implementation can also generate biodiversity and human well-being co-benefits and trade-offs ("NCS co-impacts"), but the volume of evidence on NCS co-impacts has grown rapidly across disciplines, is poorly understood, and remains to be systematically coll… ▽ More Natural climate solutions (NCS) are critical for mitigating climate change through ecosystem-based carbon removal and emissions reductions. NCS implementation can also generate biodiversity and human well-being co-benefits and trade-offs ("NCS co-impacts"), but the volume of evidence on NCS co-impacts has grown rapidly across disciplines, is poorly understood, and remains to be systematically collated and synthesized. A global evidence map of NCS co-impacts would overcome key barriers to NCS implementation by providing relevant information on co-benefits and trade-offs where carbon mitigation potential alone does not justify NCS projects. We employ large language models to assess over two million articles, finding 257,266 relevant articles on NCS co-impacts. We analyze this large and dispersed body of literature using innovative machine learning methods to extract relevant data (e.g., study location, species, and other key variables), and create a global evidence map on NCS co-impacts. Evidence on NCS co-impacts has grown approximately ten-fold in three decades, although some of the most abundant evidence is associated with pathways that have less mitigation potential. We find that studies often examine multiple NCS pathways, indicating natural NCS pathway complements, and each NCS is often associated with two or more coimpacts. Finally, NCS co-impacts evidence and priority areas for NCS are often mismatched--some countries with high mitigation potential from NCS have few published studies on the broader co-impacts of NCS implementation. Our work advances and makes available novel methods and systematic and representative data of NCS co-impacts studies, thus providing timely insights to inform NCS research and action globally. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: 28 pages, 5 figures

arXiv:2404.18580 [pdf, other]

Data-Driven Dynamics Modeling of Miniature Robotic Blimps Using Neural ODEs With Parameter Auto-Tuning

Authors: Yongjian Zhu, Hao Cheng, Feitian Zhang

Abstract: Miniature robotic blimps, as one type of lighter-than-air aerial vehicles, have attracted increasing attention in the science and engineering community for their enhanced safety, extended endurance, and quieter operation compared to quadrotors. Accurately modeling the dynamics of these robotic blimps poses a significant challenge due to the complex aerodynamics stemming from their large lifting bo… ▽ More Miniature robotic blimps, as one type of lighter-than-air aerial vehicles, have attracted increasing attention in the science and engineering community for their enhanced safety, extended endurance, and quieter operation compared to quadrotors. Accurately modeling the dynamics of these robotic blimps poses a significant challenge due to the complex aerodynamics stemming from their large lifting bodies. Traditional first-principle models have difficulty obtaining accurate aerodynamic parameters and often overlook high-order nonlinearities, thus coming to its limit in modeling the motion dynamics of miniature robotic blimps. To tackle this challenge, this letter proposes the Auto-tuning Blimp-oriented Neural Ordinary Differential Equation method (ABNODE), a data-driven approach that integrates first-principle and neural network modeling. Spiraling motion experiments of robotic blimps are conducted, comparing the ABNODE with first-principle and other data-driven benchmark models, the results of which demonstrate the effectiveness of the proposed method. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 8 pages, 8 figures

arXiv:2404.17317 [pdf, other]

Colosseum: The Open RAN Digital Twin

Authors: Michele Polese, Leonardo Bonati, Salvatore D'Oro, Pedram Johari, Davide Villa, Sakthivel Velumani, Rajeev Gangula, Maria Tsampazi, Clifton Paul Robinson, Gabriele Gemmi, Andrea Lacava, Stefano Maxenti, Hai Cheng, Tommaso Melodia

Abstract: Recent years have witnessed the Open Radio Access Network (RAN) paradigm transforming the fundamental ways cellular systems are deployed, managed, and optimized. This shift is led by concepts such as openness, softwarization, programmability, interoperability, and intelligence of the network, all of which had never been applied to the cellular ecosystem before. The realization of the Open RAN visi… ▽ More Recent years have witnessed the Open Radio Access Network (RAN) paradigm transforming the fundamental ways cellular systems are deployed, managed, and optimized. This shift is led by concepts such as openness, softwarization, programmability, interoperability, and intelligence of the network, all of which had never been applied to the cellular ecosystem before. The realization of the Open RAN vision into practical architectures, intelligent data-driven control loops, and efficient software implementations, however, is a multifaceted challenge, which requires (i) datasets to train Artificial Intelligence (AI) and Machine Learning (ML) models; (ii) facilities to test models without disrupting production networks; (iii) continuous and automated validation of the RAN software; and (iv) significant testing and integration efforts. This paper poses itself as a tutorial on how Colosseum - the world's largest wireless network emulator with hardware in the loop - can provide the research infrastructure and tools to fill the gap between the Open RAN vision, and the deployment and commercialization of open and programmable networks. We describe how Colosseum implements an Open RAN digital twin through a high-fidelity Radio Frequency (RF) channel emulator and end-to-end softwarized O-RAN and 5G-compliant protocol stacks, thus allowing users to reproduce and experiment upon topologies representative of real-world cellular deployments. Then, we detail the twinning infrastructure of Colosseum, as well as the automation pipelines for RF and protocol stack twinning. Finally, we showcase a broad range of Open RAN use cases implemented on Colosseum, including the real-time connection between the digital twin and real-world networks, and the development, prototyping, and testing of AI/ML solutions for Open RAN. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 13 pages, 8 figures, 1 table, submitted to IEEE for publication

arXiv:2404.17152 [pdf, other]

CSCO: Connectivity Search of Convolutional Operators

Authors: Tunhou Zhang, Shiyu Li, Hsin-Pai Cheng, Feng Yan, Hai Li, Yiran Chen

Abstract: Exploring dense connectivity of convolutional operators establishes critical "synapses" to communicate feature vectors from different levels and enriches the set of transformations on Computer Vision applications. Yet, even with heavy-machinery approaches such as Neural Architecture Search (NAS), discovering effective connectivity patterns requires tremendous efforts due to either constrained conn… ▽ More Exploring dense connectivity of convolutional operators establishes critical "synapses" to communicate feature vectors from different levels and enriches the set of transformations on Computer Vision applications. Yet, even with heavy-machinery approaches such as Neural Architecture Search (NAS), discovering effective connectivity patterns requires tremendous efforts due to either constrained connectivity design space or a sub-optimal exploration process induced by an unconstrained search space. In this paper, we propose CSCO, a novel paradigm that fabricates effective connectivity of convolutional operators with minimal utilization of existing design motifs and further utilizes the discovered wiring to construct high-performing ConvNets. CSCO guides the exploration via a neural predictor as a surrogate of the ground-truth performance. We introduce Graph Isomorphism as data augmentation to improve sample efficiency and propose a Metropolis-Hastings Evolutionary Search (MH-ES) to evade locally optimal architectures and advance search quality. Results on ImageNet show ~0.6% performance improvement over hand-crafted and NAS-crafted dense connectivity. Our code is publicly available. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: To appear on Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2024)

arXiv:2404.16425 [pdf, other]

Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a

Authors: Y. Liu, H. Sun, D. Xu, D. S. Svinkin, J. Delaunay, N. R. Tanvir, H. Gao, C. Zhang, Y. Chen, X. -F. Wu, B. Zhang, W. Yuan, J. An, G. Bruni, D. D. Frederiks, G. Ghirlanda, J. -W. Hu, A. Li, C. -K. Li, J. -D. Li, D. B. Malesani, L. Piro, G. Raman, R. Ricci, E. Troja , et al. (170 additional authors not shown)

Abstract: Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,… ▽ More Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a, whose bright peak was also detected by the Swift Burst Alert Telescope and Konus-Wind through off-line analyses. At a redshift of $z=4.859$, EP240315a showed a much longer and more complicated light curve in the soft X-ray band than in gamma-rays. Benefiting from a large field-of-view ($\sim$3600 deg$^2$) and a high sensitivity, EP-WXT captured the earlier engine activation and extended late engine activity through a continuous detection. With a peak X-ray flux at the faint end of previously known high-$z$ GRBs, the detection of EP240315a demonstrates the great potential for EP to study the early universe via GRBs. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 41 pages, 8 figures, 7 tables

arXiv:2404.14890 [pdf, other]

DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition

Authors: Haozhe Cheng, Cheng Ju, Haicheng Wang, Jinxiang Liu, Mengting Chen, Qiang Hu, Xiaoyun Zhang, Yanfeng Wang

Abstract: As one of the fundamental video tasks in computer vision, Open-Vocabulary Action Recognition (OVAR) recently gains increasing attention, with the development of vision-language pre-trainings. To enable generalization of arbitrary classes, existing methods treat class labels as text descriptions, then formulate OVAR as evaluating embedding similarity between visual samples and textual classes. Howe… ▽ More As one of the fundamental video tasks in computer vision, Open-Vocabulary Action Recognition (OVAR) recently gains increasing attention, with the development of vision-language pre-trainings. To enable generalization of arbitrary classes, existing methods treat class labels as text descriptions, then formulate OVAR as evaluating embedding similarity between visual samples and textual classes. However, one crucial issue is completely ignored: the class descriptions given by users may be noisy, e.g., misspellings and typos, limiting the real-world practicality of vanilla OVAR. To fill the research gap, this paper pioneers to evaluate existing methods by simulating multi-level noises of various types, and reveals their poor robustness. To tackle the noisy OVAR task, we further propose one novel DENOISER framework, covering two parts: generation and discrimination. Concretely, the generative part denoises noisy class-text names via one decoding process, i.e., propose text candidates, then utilize inter-modal and intra-modal information to vote for the best. At the discriminative part, we use vanilla OVAR models to assign visual samples to class-text names, thus obtaining more semantics. For optimization, we alternately iterate between generative and discriminative parts for progressive refinements. The denoised text classes help OVAR models classify visual samples more accurately; in return, classified visual samples help better denoising. On three datasets, we carry out extensive experiments to show our superior robustness, and thorough ablations to dissect the effectiveness of each component. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14815 [pdf, other]

Time-aware Heterogeneous Graph Transformer with Adaptive Attention Merging for Health Event Prediction

Authors: Shibo Li, Hengliang Cheng, Weihua Li

Abstract: The widespread application of Electronic Health Records (EHR) data in the medical field has led to early successes in disease risk prediction using deep learning methods. These methods typically require extensive data for training due to their large parameter sets. However, existing works do not exploit the full potential of EHR data. A significant challenge arises from the infrequent occurrence o… ▽ More The widespread application of Electronic Health Records (EHR) data in the medical field has led to early successes in disease risk prediction using deep learning methods. These methods typically require extensive data for training due to their large parameter sets. However, existing works do not exploit the full potential of EHR data. A significant challenge arises from the infrequent occurrence of many medical codes within EHR data, limiting their clinical applicability. Current research often lacks in critical areas: 1) incorporating disease domain knowledge; 2) heterogeneously learning disease representations with rich meanings; 3) capturing the temporal dynamics of disease progression. To overcome these limitations, we introduce a novel heterogeneous graph learning model designed to assimilate disease domain knowledge and elucidate the intricate relationships between drugs and diseases. This model innovatively incorporates temporal data into visit-level embeddings and leverages a time-aware transformer alongside an adaptive attention mechanism to produce patient representations. When evaluated on two healthcare datasets, our approach demonstrated notable enhancements in both prediction accuracy and interpretability over existing methodologies, signifying a substantial advancement towards personalized and proactive healthcare management. △ Less

Submitted 10 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

Comments: 16 pages, 9 figures, 4 tables

arXiv:2404.03202 [pdf, other]

OmniGS: Omnidirectional Gaussian Splatting for Fast Radiance Field Reconstruction using Omnidirectional Images

Authors: Longwei Li, Huajian Huang, Sai-Kit Yeung, Hui Cheng

Abstract: Photorealistic reconstruction relying on 3D Gaussian Splatting has shown promising potential in robotics. However, the current 3D Gaussian Splatting system only supports radiance field reconstruction using undistorted perspective images. In this paper, we present OmniGS, a novel omnidirectional Gaussian splatting system, to take advantage of omnidirectional images for fast radiance field reconstru… ▽ More Photorealistic reconstruction relying on 3D Gaussian Splatting has shown promising potential in robotics. However, the current 3D Gaussian Splatting system only supports radiance field reconstruction using undistorted perspective images. In this paper, we present OmniGS, a novel omnidirectional Gaussian splatting system, to take advantage of omnidirectional images for fast radiance field reconstruction. Specifically, we conduct a theoretical analysis of spherical camera model derivatives in 3D Gaussian Splatting. According to the derivatives, we then implement a new GPU-accelerated omnidirectional rasterizer that directly splats 3D Gaussians onto the equirectangular screen space for omnidirectional image rendering. As a result, we realize differentiable optimization of the radiance field without the requirement of cube-map rectification or tangent-plane approximation. Extensive experiments conducted in egocentric and roaming scenarios demonstrate that our method achieves state-of-the-art reconstruction quality and high rendering speed using omnidirectional images. To benefit the research community, the code will be made publicly available once the paper is published. △ Less

Submitted 7 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures

arXiv:2404.01350 [pdf, other]

Analysis of Hadronic Weak Decays of Charmed Baryons in the Topological Diagrammatic Approach

Authors: Huiling Zhong, Fanrong Xu, Hai-Yang Cheng

Abstract: We perform a global fit to the experimental data of two-body charmed baryon decays based on the topological diagrammatic approach (TDA) and take into account the phase shifts between $S$- and $P$-wave amplitudes as inspired by the recent BESIII measurement of the decay asymmetry in the decay $Λ_c^+\to Ξ^0K^+$. The TDA has the advantage that it is more intuitive, graphic and easier to implement mod… ▽ More We perform a global fit to the experimental data of two-body charmed baryon decays based on the topological diagrammatic approach (TDA) and take into account the phase shifts between $S$- and $P$-wave amplitudes as inspired by the recent BESIII measurement of the decay asymmetry in the decay $Λ_c^+\to Ξ^0K^+$. The TDA has the advantage that it is more intuitive, graphic and easier to implement model calculations. The measured branching fractions and decay asymmetries are well accommodated in the TDA except for a few modes, in particular, the predicted ${\cal B}(Ξ_c^0\to Ξ^-π^+)=(2.83\pm0.10)\%$ is larger than its current value. The equivalence of the TDA and the irreducible SU(3) approach (IRA) is established. We show that the number of the minimum set of tensor invariants in the IRA and the topological amplitudes in the TDA is the same and present their relations. The predicted magnitudes of $S$- and $P$-wave amplitudes and their phase shifts are presented for measured and yet-to-be-measured modes in both the TDA and IRA which can be tested in the near future. Besides the decay $Λ_c^+\to Ξ^0 K^+$, there exist several modes which proceed only through $W$-exchange. In particular, the observed channel $Ξ_c^0\to Σ^+ K^-$ should have phase shifts similar to that in $Λ_c^+\to Ξ^0 K^+$ and its decay asymmetry is predicted to be $-0.21\pm0.17$ which can be used to test our theoretical framework. In contrast, the TDA leads to a large $α$ of order $-0.93$ for the decay $Ξ_c^+\to Ξ^0π^+$ even after the phase-shift effect is incorporated in the fit. △ Less

Submitted 20 May, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

Comments: 24 pages, 1 figure. Tables I, II, V and VI revised, accepted by PRD. arXiv admin note: text overlap with arXiv:2401.15926

arXiv:2403.17868 [pdf, other]

An invitation to the sample complexity of quantum hypothesis testing

Authors: Hao-Chung Cheng, Nilanjana Datta, Nana Liu, Theshani Nuradha, Robert Salzmann, Mark M. Wilde

Abstract: Quantum hypothesis testing (QHT) has been traditionally studied from the information-theoretic perspective, wherein one is interested in the optimal decay rate of error probabilities as a function of the number of samples of an unknown state. In this paper, we study the sample complexity of QHT, wherein the goal is to determine the minimum number of samples needed to reach a desired error probabil… ▽ More Quantum hypothesis testing (QHT) has been traditionally studied from the information-theoretic perspective, wherein one is interested in the optimal decay rate of error probabilities as a function of the number of samples of an unknown state. In this paper, we study the sample complexity of QHT, wherein the goal is to determine the minimum number of samples needed to reach a desired error probability. By making use of the wealth of knowledge that already exists in the literature on QHT, we characterize the sample complexity of binary QHT in the symmetric and asymmetric settings, and we provide bounds on the sample complexity of multiple QHT. In more detail, we prove that the sample complexity of symmetric binary QHT depends logarithmically on the inverse error probability and inversely on the negative logarithm of the fidelity. As a counterpart of the quantum Stein's lemma, we also find that the sample complexity of asymmetric binary QHT depends logarithmically on the inverse type II error probability and inversely on the quantum relative entropy, provided that the type II error probability is sufficiently small. We then provide lower and upper bounds on the sample complexity of multiple QHT, with it remaining an intriguing open question to improve these bounds. The final part of our paper outlines and reviews how sample complexity of QHT is relevant to a broad swathe of research areas and can enhance understanding of many fundamental concepts, including quantum algorithms for simulation and search, quantum learning and classification, and foundations of quantum mechanics. As such, we view our paper as an invitation to researchers coming from different communities to study and contribute to the problem of sample complexity of QHT, and we outline a number of open directions for future research. △ Less

Submitted 16 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: v3: 58 pages, 1 figure, correction to Corollary 10; see independent and concurrent work of Pensia, Jog, Loh at arXiv:2403.16981

arXiv:2403.17807 [pdf, other]

Towards Inclusive Video Commenting: Introducing Signmaku for the Deaf and Hard-of-Hearing

Authors: Si Chen, Haocong Cheng, Jason Situ, Desirée Kirst, Suzy Su, Saumya Malhotra, Lawrence Angrave, Qi Wang, Yun Huang

Abstract: Previous research underscored the potential of danmaku--a text-based commenting feature on videos--in engaging hearing audiences. Yet, for many Deaf and hard-of-hearing (DHH) individuals, American Sign Language (ASL) takes precedence over English. To improve inclusivity, we introduce "Signmaku," a new commenting mechanism that uses ASL, serving as a sign language counterpart to danmaku. Through a… ▽ More Previous research underscored the potential of danmaku--a text-based commenting feature on videos--in engaging hearing audiences. Yet, for many Deaf and hard-of-hearing (DHH) individuals, American Sign Language (ASL) takes precedence over English. To improve inclusivity, we introduce "Signmaku," a new commenting mechanism that uses ASL, serving as a sign language counterpart to danmaku. Through a need-finding study (N=12) and a within-subject experiment (N=20), we evaluated three design styles: real human faces, cartoon-like figures, and robotic representations. The results showed that cartoon-like signmaku not only entertained but also encouraged participants to create and share ASL comments, with fewer privacy concerns compared to the other designs. Conversely, the robotic representations faced challenges in accurately depicting hand movements and facial expressions, resulting in higher cognitive demands on users. Signmaku featuring real human faces elicited the lowest cognitive load and was the most comprehensible among all three types. Our findings offered novel design implications for leveraging generative AI to create signmaku comments, enriching co-learning experiences for DHH individuals. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 14 pages, CHI 2024

ACM Class: F.2.2; I.2.7

arXiv:2403.14450 [pdf, ps, other]

Maximal $α$-Leakage for Quantum Privacy Mechanisms

Authors: Bo-Yu Yang, Hsuan Yu, Hao-Chung Cheng

Abstract: In this work, maximal $α$-leakage is introduced to quantify how much a quantum adversary can learn about any sensitive information of data upon observing its disturbed version via a quantum privacy mechanism. We first show that an adversary's maximal expected $α$-gain using optimal measurement is characterized by measured conditional Rényi entropy. This can be viewed as a parametric generalization… ▽ More In this work, maximal $α$-leakage is introduced to quantify how much a quantum adversary can learn about any sensitive information of data upon observing its disturbed version via a quantum privacy mechanism. We first show that an adversary's maximal expected $α$-gain using optimal measurement is characterized by measured conditional Rényi entropy. This can be viewed as a parametric generalization of König et al.'s famous guessing probability formula [IEEE Trans. Inf. Theory, 55(9), 2009]. Then, we prove that the $α$-leakage and maximal $α$-leakage for a quantum privacy mechanism are determined by measured Arimoto information and measured Rényi capacity, respectively. Various properties of maximal $α$-leakage, such as data processing inequality and composition property are established as well. Moreover, we show that regularized $α$-leakage and regularized maximal $α$-leakage for identical and independent quantum privacy mechanisms coincide with $α$-tilted sandwiched Rényi information and sandwiched Rényi capacity, respectively. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.14338 [pdf, ps, other]

Optimal Second-Order Rates for Quantum Information Decoupling

Authors: Yu-Chen Shen, Li Gao, Hao-Chung Cheng

Abstract: In this paper, we consider the standard quantum information decoupling, in which Alice aims to decouple her system from the environment by local operations and discarding some of her systems. To achieve an $\varepsilon$-decoupling with trace distance as the error criterion, we establish a near-optimal one-shot characterization for the largest dimension of the remainder system in terms of the condi… ▽ More In this paper, we consider the standard quantum information decoupling, in which Alice aims to decouple her system from the environment by local operations and discarding some of her systems. To achieve an $\varepsilon$-decoupling with trace distance as the error criterion, we establish a near-optimal one-shot characterization for the largest dimension of the remainder system in terms of the conditional $(1-\varepsilon)$-hypothesis-testing entropy. When the underlying system is independent and identically prepared, our result leads to the matched second-order rate as well as the matched moderate deviation rate. As an application, we find an achievability bound in entanglement distillation protocol, where the objective is for Alice and Bob to transform their quantum state to maximally entangled state with largest possible dimension using only local operations and one-way classical communications. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.13584 [pdf, ps, other]

On Strong Converse Theorems for Quantum Hypothesis Testing and Channel Coding

Authors: Hao-Chung Cheng, Li Gao

Abstract: Strong converse theorems refer to the study of impossibility results in information theory. In particular, Mosonyi and Ogawa established a one-shot strong converse bound for quantum hypothesis testing [Comm. Math. Phys, 334(3), 2014], which servers as a primitive tool for establishing a variety of tight strong converse theorems in quantum information theory. In this short note, we demonstrate an a… ▽ More Strong converse theorems refer to the study of impossibility results in information theory. In particular, Mosonyi and Ogawa established a one-shot strong converse bound for quantum hypothesis testing [Comm. Math. Phys, 334(3), 2014], which servers as a primitive tool for establishing a variety of tight strong converse theorems in quantum information theory. In this short note, we demonstrate an alternative one-line proof for this bound via the variational expression of measured Rényi divergences [Lett. Math. Phys, 107(12), 2017]. Then, we show that the variational expression is a direct consequence of Hölder's inequality. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: one-shot strong converse bound by Mosonyi and Ogawa [arXiv:1309.3228], variational expression by Berta, Fawzi, and Tomamichel [arXiv:1512.02615]

arXiv:2403.13112 [pdf, other]

Efficient Encoder-Decoder Transformer Decoding for Decomposable Tasks

Authors: Bo-Ru Lu, Nikita Haduong, Chien-Yu Lin, Hao Cheng, Noah A. Smith, Mari Ostendorf

Abstract: Transformer-based NLP models are powerful but have high computational costs that limit deployment. Finetuned encoder-decoder models are popular in specialized domains and can outperform larger more generalized decoder-only models, such as GPT-4. We introduce a new configuration for encoder-decoder models that improves efficiency on structured output and decomposable tasks where multiple outputs ar… ▽ More Transformer-based NLP models are powerful but have high computational costs that limit deployment. Finetuned encoder-decoder models are popular in specialized domains and can outperform larger more generalized decoder-only models, such as GPT-4. We introduce a new configuration for encoder-decoder models that improves efficiency on structured output and decomposable tasks where multiple outputs are required for a single shared input. Our method, prompt-in-decoder (PiD), encodes the input once and decodes the output in parallel, boosting both training and inference efficiency by avoiding duplicate input encoding and increasing the operational intensity (ratio of numbers of arithmetic operation to memory access) of decoding process by sharing the input key-value cache. We achieve computation reduction that roughly scales with the number of subtasks, gaining up to 4.6x speed-up over state-of-the-art models for dialogue state tracking, summarization, and question-answering tasks, with comparable or better performance. △ Less

Submitted 23 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: 14 pages, 4 figures. https://github.com/boru-roylu/encode-once-and-decode-in-parallel

arXiv:2403.11238 [pdf, other]

JUMBO: Fully Asynchronous BFT Consensus Made Truly Scalable

Authors: Hao Cheng, Yuan Lu, Zhenliang Lu, Qiang Tang, Yuxuan Zhang, Zhenfeng Zhang

Abstract: Recent progresses in asynchronous Byzantine fault-tolerant (BFT) consensus, e.g. Dumbo-NG (CCS' 22) and Tusk (EuroSys' 22), show promising performance through decoupling transaction dissemination and block agreement. However, when executed with a larger number $n$ of nodes, like several hundreds, they would suffer from significant degradation in performance. Their dominating scalability bottleneck… ▽ More Recent progresses in asynchronous Byzantine fault-tolerant (BFT) consensus, e.g. Dumbo-NG (CCS' 22) and Tusk (EuroSys' 22), show promising performance through decoupling transaction dissemination and block agreement. However, when executed with a larger number $n$ of nodes, like several hundreds, they would suffer from significant degradation in performance. Their dominating scalability bottleneck is the huge authenticator complexity: each node has to multicast $\bigO(n)$ quorum certificates (QCs) and subsequently verify them for each block. This paper systematically investigates and resolves the above scalability issue. We first propose a signature-free asynchronous BFT consensus FIN-NG that adapts a recent signature-free asynchronous common subset protocol FIN (CCS' 23) into the state-of-the-art framework of concurrent broadcast and agreement. The liveness of FIN-NG relies on our non-trivial redesign of FIN's multi-valued validated Byzantine agreement towards achieving optimal quality. FIN-NG greatly improves the performance of FIN and already outperforms Dumbo-NG in most deployment settings. To further overcome the scalability limit of FIN-NG due to $\bigO(n^3)$ messages, we propose JUMBO, a scalable instantiation of Dumbo-NG, with only $\bigO(n^2)$ complexities for both authenticators and messages. We use various aggregation and dispersal techniques for QCs to significantly reduce the authenticator complexity of original Dumbo-NG implementations by up to $\bigO(n^2)$ orders. We also propose a ``fairness'' patch for JUMBO, thus preventing a flooding adversary from controlling an overwhelming portion of transactions in its output. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.10628 [pdf, other]

A Terahertz Bandwidth Nonmagnetic Isolator

Authors: Haotian Cheng, Yishu Zhou, Freek Ruesink, Margaret Pavlovich, Shai Gertler, Andrew L. Starbuck, Andrew J. Leenheer, Andrew T. Pomerene, Douglas C. Trotter, Christina Dallo, Matthew Boady, Katherine M. Musick, Michael Gehl, Ashok Kodigala, Matt Eichenfield, Anthony L. Lentine, Nils T. Otterstrom, Peter T. Rakich

Abstract: Integrated photonics could bring transformative breakthroughs in computing, networking, imaging, sensing, and quantum information processing, enabled by increasingly sophisticated optical functionalities on a photonic chip. However, wideband optical isolators, which are essential for the robust operation of practically all optical systems, have been challenging to realize in integrated form due to… ▽ More Integrated photonics could bring transformative breakthroughs in computing, networking, imaging, sensing, and quantum information processing, enabled by increasingly sophisticated optical functionalities on a photonic chip. However, wideband optical isolators, which are essential for the robust operation of practically all optical systems, have been challenging to realize in integrated form due to the incompatibility of magnetic media with these circuit technologies. Here, we present the first-ever demonstration of an integrated non-magnetic optical isolator with terahertz-level optical bandwidth. The system is comprised of two acousto-optic frequency-shifting beam splitters which create a non-reciprocal multimode interferometer exhibiting high-contrast, nonreciprocal light transmission. We dramatically enhance the isolation bandwidth of this system by precisely dispersion balancing the paths of the interferometer. Using this approach, we demonstrate integrated nonmagnetic isolators with an optical contrast as high as 28 dB, insertion losses as low as -2.16 dB, and optical bandwidths as high as 2 THz (16 nm). We also show that the center frequency and direction of optical isolation are rapidly reconfigurable by tuning the relative phase of the microwave signals used to drive the acousto-optic beam splitters. With their CMOS compatibility, wideband operation, low losses, and rapid reconfigurability, such integrated isolators could address a key barrier to the integration of a wide range of photonic functionalities on a chip. Looking beyond the current demonstration, this bandwidth-scalable approach to nonmagnetic isolation opens the door to ultrawideband (>10 THz) isolators, which are needed to shrink state-of-the-art imaging, sensing, and communications systems into photonic integrated circuits. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.10064 [pdf, other]

Progressive Divide-and-Conquer via Subsampling Decomposition for Accelerated MRI

Authors: Chong Wang, Lanqing Guo, Yufei Wang, Hao Cheng, Yi Yu, Bihan Wen

Abstract: Deep unfolding networks (DUN) have emerged as a popular iterative framework for accelerated magnetic resonance imaging (MRI) reconstruction. However, conventional DUN aims to reconstruct all the missing information within the entire null space in each iteration. Thus it could be challenging when dealing with highly ill-posed degradation, usually leading to unsatisfactory reconstruction. In this wo… ▽ More Deep unfolding networks (DUN) have emerged as a popular iterative framework for accelerated magnetic resonance imaging (MRI) reconstruction. However, conventional DUN aims to reconstruct all the missing information within the entire null space in each iteration. Thus it could be challenging when dealing with highly ill-posed degradation, usually leading to unsatisfactory reconstruction. In this work, we propose a Progressive Divide-And-Conquer (PDAC) strategy, aiming to break down the subsampling process in the actual severe degradation and thus perform reconstruction sequentially. Starting from decomposing the original maximum-a-posteriori problem of accelerated MRI, we present a rigorous derivation of the proposed PDAC framework, which could be further unfolded into an end-to-end trainable network. Specifically, each iterative stage in PDAC focuses on recovering a distinct moderate degradation according to the decomposition. Furthermore, as part of the PDAC iteration, such decomposition is adaptively learned as an auxiliary task through a degradation predictor which provides an estimation of the decomposed sampling mask. Following this prediction, the sampling mask is further integrated via a severity conditioning module to ensure awareness of the degradation severity at each stage. Extensive experiments demonstrate that our proposed method achieves superior performance on the publicly available fastMRI and Stanford2D FSE datasets in both multi-coil and single-coil settings. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2403.09707 [pdf]

Understanding data analysis aspects of TMS-EEG in clinical study: a mini review and a case study with open dataset

Authors: Hua Cheng

Abstract: Concurrency of transcranial magnetic stimulation with electroencephalography (TMS-EEG) technique is a powerful and challenging methodology for basic research and clinical applications. Aspects considered in experiments for effective TMS-EEG recordings and analysis, including artifact management, data analysis and interpretation and protocols. mini review offers an extensive insight of TMS-EEG meth… ▽ More Concurrency of transcranial magnetic stimulation with electroencephalography (TMS-EEG) technique is a powerful and challenging methodology for basic research and clinical applications. Aspects considered in experiments for effective TMS-EEG recordings and analysis, including artifact management, data analysis and interpretation and protocols. mini review offers an extensive insight of TMS-EEG methodology in experimental and computational procedures. Case study aims to leverage an openly available, high-quality EEG dataset to delve into the alterations in cortical activity. By applying Intermittent theta-burst stimulation (iTBS) and continuous theta-burst stimulation (cTBS) to the left dorsolateral prefrontal cortex (DLPFC) in healthy individuals, we observe changes in oscillatory patterns within the EEG data. The dataset includes meticulously extracted resting-state EEG recordings, TMS-evoked potential data, and MRI scans. To process these data, we utilized Brainstorm, an open-source Matlab application, which facilitated noise reduction through independent component analysis and signal-space projection techniques. It allowed us to identify, visualize, and analyze TMS-evoked potentials (TEPs) and TMS-induced oscillations (TIOs). In addition, the study presents detailed plots of resting-state EEG power, local mean field power (LMFP), TMS-related spectral perturbation (TSRP), and inter-trial phase clustering (ITPC). Paired t-tests and cluster-based permutation tests have been performed for statistical analysis. The wealth and quality of this dataset make it ideal for examining the neuromodulatory impact of TBS on the prefrontal cortex. Brainstorm's extensive feature set greatly supports the exploration of such neurological data. Future research directions could concentrate on conducting source localization analyses and comparative group studies. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 39 pages,36 fighures,TMS-EEG data analysis

arXiv:2403.08857 [pdf, other]

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

Authors: Minbin Huang, Yanxin Long, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu, Wei Liu

Abstract: Text-to-image (T2I) generation models have significantly advanced in recent years. However, effective interaction with these models is challenging for average users due to the need for specialized prompt engineering knowledge and the inability to perform multi-turn image generation, hindering a dynamic and iterative creation process. Recent attempts have tried to equip Multi-modal Large Language M… ▽ More Text-to-image (T2I) generation models have significantly advanced in recent years. However, effective interaction with these models is challenging for average users due to the need for specialized prompt engineering knowledge and the inability to perform multi-turn image generation, hindering a dynamic and iterative creation process. Recent attempts have tried to equip Multi-modal Large Language Models (MLLMs) with T2I models to bring the user's natural language instructions into reality. Hence, the output modality of MLLMs is extended, and the multi-turn generation quality of T2I models is enhanced thanks to the strong multi-modal comprehension ability of MLLMs. However, many of these works face challenges in identifying correct output modalities and generating coherent images accordingly as the number of output modalities increases and the conversations go deeper. Therefore, we propose DialogGen, an effective pipeline to align off-the-shelf MLLMs and T2I models to build a Multi-modal Interactive Dialogue System (MIDS) for multi-turn Text-to-Image generation. It is composed of drawing prompt alignment, careful training data curation, and error correction. Moreover, as the field of MIDS flourishes, comprehensive benchmarks are urgently needed to evaluate MIDS fairly in terms of output modality correctness and multi-modal output coherence. To address this issue, we introduce the Multi-modal Dialogue Benchmark (DialogBen), a comprehensive bilingual benchmark designed to assess the ability of MLLMs to generate accurate and coherent multi-modal content that supports image editing. It contains two evaluation metrics to measure the model's ability to switch modalities and the coherence of the output images. Our extensive experiments on DialogBen and user study demonstrate the effectiveness of DialogGen compared with other State-of-the-Art models. △ Less

Submitted 3 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: Project page: https://hunyuan-dialoggen.github.io/

Showing 1–50 of 1,430 results for author: Cheng, H