subscribe to arXiv mailings

Strong quantum nonlocality without entanglement in every $(n-1)$-partition

Authors: Huaqi Zhou, Ting Gao, Fengli Yan

Abstract: Orthogonal product sets that are locally irreducible in every bipartition have the strongest nonlocality while also need a large number of quantum states. In this paper, we construct the orthogonal product sets with strong quantum nonlocality in any possible $n$-partite systems, where $n$ is greater than three. Rigorous proofs show that these sets are locally irreducible in every $(n-1)$-partition… ▽ More Orthogonal product sets that are locally irreducible in every bipartition have the strongest nonlocality while also need a large number of quantum states. In this paper, we construct the orthogonal product sets with strong quantum nonlocality in any possible $n$-partite systems, where $n$ is greater than three. Rigorous proofs show that these sets are locally irreducible in every $(n-1)$-partition. They not only possess stronger properties than nonlocality and fewer quantum states than the strongest nonlocal sets, but also are positive answers to the open question "how to construct different strength nonlocality of orthogonal product states for general multipartite and high-dimensional quantum systems" of Zhang et al. [{Phys. Rev. A \textbf{99}, 062108 (2019)}]. Our results can also enhance one understanding for the nonlocality without entanglement. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 13 pages, 6 figures

arXiv:2407.01548 [pdf, ps, other]

From Cognition to Computation: A Comparative Review of Human Attention and Transformer Architectures

Authors: Minglu Zhao, Dehong Xu, Tao Gao

Abstract: Attention is a cornerstone of human cognition that facilitates the efficient extraction of information in everyday life. Recent developments in artificial intelligence like the Transformer architecture also incorporate the idea of attention in model designs. However, despite the shared fundamental principle of selectively attending to information, human attention and the Transformer model display… ▽ More Attention is a cornerstone of human cognition that facilitates the efficient extraction of information in everyday life. Recent developments in artificial intelligence like the Transformer architecture also incorporate the idea of attention in model designs. However, despite the shared fundamental principle of selectively attending to information, human attention and the Transformer model display notable differences, particularly in their capacity constraints, attention pathways, and intentional mechanisms. Our review aims to provide a comparative analysis of these mechanisms from a cognitive-functional perspective, thereby shedding light on several open research questions. The exploration encourages interdisciplinary efforts to derive insights from human attention mechanisms in the pursuit of developing more generalized artificial intelligence. △ Less

Submitted 25 April, 2024; originally announced July 2024.

arXiv:2406.19247 [pdf, other]

Local Manifold Learning for No-Reference Image Quality Assessment

Authors: Timin Gao, Wensheng Pan, Yan Zhang, Sicheng Zhao, Shengchuan Zhang, Xiawu Zheng, Ke Li, Liujuan Cao, Rongrong Ji

Abstract: Contrastive learning has considerably advanced the field of Image Quality Assessment (IQA), emerging as a widely adopted technique. The core mechanism of contrastive learning involves minimizing the distance between quality-similar (positive) examples while maximizing the distance between quality-dissimilar (negative) examples. Despite its successes, current contrastive learning methods often negl… ▽ More Contrastive learning has considerably advanced the field of Image Quality Assessment (IQA), emerging as a widely adopted technique. The core mechanism of contrastive learning involves minimizing the distance between quality-similar (positive) examples while maximizing the distance between quality-dissimilar (negative) examples. Despite its successes, current contrastive learning methods often neglect the importance of preserving the local manifold structure. This oversight can result in a high degree of similarity among hard examples within the feature space, thereby impeding effective differentiation and assessment. To address this issue, we propose an innovative framework that integrates local manifold learning with contrastive learning for No-Reference Image Quality Assessment (NR-IQA). Our method begins by sampling multiple crops from a given image, identifying the most visually salient crop. This crop is then used to cluster other crops from the same image as the positive class, while crops from different images are treated as negative classes to increase inter-class distance. Uniquely, our approach also considers non-saliency crops from the same image as intra-class negative classes to preserve their distinctiveness. Additionally, we employ a mutual learning framework, which further enhances the model's ability to adaptively learn and identify visual saliency regions. Our approach demonstrates a better performance compared to state-of-the-art methods in 7 standard datasets, achieving PLCC values of 0.942 (compared to 0.908 in TID2013) and 0.914 (compared to 0.894 in LIVEC). △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.19080 [pdf, other]

G_q-concurrence and entanglement constraints in multiqubit systems

Authors: Hui Li, Ting Gao, Fengli Yan

Abstract: In this paper, we introduce a category of one-parameter bipartite entanglement quantifiers, termed $G_q$-concurrence ($q>1$), and show rigorously that they satisfy all the axiomatic conditions of an entanglement measure and can be considered as a generalization of concurrence. In addition, we establish an analytic formula relating $G_q$-concurrence to concurrence for $1<q\leq2$ in two-qubit system… ▽ More In this paper, we introduce a category of one-parameter bipartite entanglement quantifiers, termed $G_q$-concurrence ($q>1$), and show rigorously that they satisfy all the axiomatic conditions of an entanglement measure and can be considered as a generalization of concurrence. In addition, we establish an analytic formula relating $G_q$-concurrence to concurrence for $1<q\leq2$ in two-qubit systems. Furthermore, the polygamy relation is presented based on the $G_q$-concurrence of assistance in multiqubit systems. As far as $G_q$-concurrence ($1<q\leq2$) itself is concerned, however, it does not obey the monogamy relation, but we prove that the square of $G_q$-concurrence does. By means of this monogamy inequality, we construct a set of entanglement indicators that can detect genuinely multiqubit entangled states even when the tangle loses its efficacy. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 11 pages, 7 figures

arXiv:2406.15686 [pdf, other]

The Case for Transport-Level Encryption in Datacenter Networks

Authors: Tianyi Gao, Xinshu Ma, Suhas Narreddy, Eugenio Luo, Steven W. D. Chien, Michio Honda

Abstract: Cloud applications need network data encryption to isolate from other tenants and protect their data from potential eavesdroppers in the network infrastructure. This paper presents SDP, a protocol design for emerging datacenter transport protocols, such as pHost, NDP, and Homa, to integrate data encryption with the use of existing NIC offloading of cryptographic operations designed for TLS over TC… ▽ More Cloud applications need network data encryption to isolate from other tenants and protect their data from potential eavesdroppers in the network infrastructure. This paper presents SDP, a protocol design for emerging datacenter transport protocols, such as pHost, NDP, and Homa, to integrate data encryption with the use of existing NIC offloading of cryptographic operations designed for TLS over TCP. Therefore, SDP could enable a deployment path of new transport protocols in datacenters without giving up hardware offloading support, which would otherwise make encryption on those protocols even slower than TLS over TCP. SDP is based on Homa, and outperforms TLS over TCP by up to 29 % in throughput. SDP currently supports two real-world applications, Redis, improving throughput by up to 24 %, and in-kernel NVMe-oF, cutting P99 latency by up to 21 %. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.10462 [pdf, other]

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

Authors: Wei Chen, Lin Li, Yongqi Yang, Bin Wen, Fan Yang, Tingting Gao, Yu Wu, Long Chen

Abstract: Interleaved image-text generation has emerged as a crucial multimodal task, aiming at creating sequences of interleaved visual and textual content given a query. Despite notable advancements in recent multimodal large language models (MLLMs), generating integrated image-text sequences that exhibit narrative coherence and entity and style consistency remains challenging due to poor training data qu… ▽ More Interleaved image-text generation has emerged as a crucial multimodal task, aiming at creating sequences of interleaved visual and textual content given a query. Despite notable advancements in recent multimodal large language models (MLLMs), generating integrated image-text sequences that exhibit narrative coherence and entity and style consistency remains challenging due to poor training data quality. To address this gap, we introduce CoMM, a high-quality Coherent interleaved image-text MultiModal dataset designed to enhance the coherence, consistency, and alignment of generated multimodal content. Initially, CoMM harnesses raw data from diverse sources, focusing on instructional content and visual storytelling, establishing a foundation for coherent and consistent content. To further refine the data quality, we devise a multi-perspective filter strategy that leverages advanced pre-trained models to ensure the development of sentences, consistency of inserted images, and semantic alignment between them. Various quality evaluation metrics are designed to prove the high quality of the filtered dataset. Meanwhile, extensive few-shot experiments on various downstream tasks demonstrate CoMM's effectiveness in significantly enhancing the in-context learning capabilities of MLLMs. Moreover, we propose four new tasks to evaluate MLLMs' interleaved generation abilities, supported by a comprehensive evaluation framework. We believe CoMM opens a new avenue for advanced MLLMs with superior multimodal in-context learning and understanding ability. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 22 pages

arXiv:2405.14705 [pdf, other]

Learning Multi-dimensional Human Preference for Text-to-Image Generation

Authors: Sixian Zhang, Bohan Wang, Junqiang Wu, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang

Abstract: Current metrics for text-to-image models typically rely on statistical metrics which inadequately represent the real preference of humans. Although recent work attempts to learn these preferences via human annotated images, they reduce the rich tapestry of human preference to a single overall score. However, the preference results vary when humans evaluate images with different aspects. Therefore,… ▽ More Current metrics for text-to-image models typically rely on statistical metrics which inadequately represent the real preference of humans. Although recent work attempts to learn these preferences via human annotated images, they reduce the rich tapestry of human preference to a single overall score. However, the preference results vary when humans evaluate images with different aspects. Therefore, to learn the multi-dimensional human preferences, we propose the Multi-dimensional Preference Score (MPS), the first multi-dimensional preference scoring model for the evaluation of text-to-image models. The MPS introduces the preference condition module upon CLIP model to learn these diverse preferences. It is trained based on our Multi-dimensional Human Preference (MHP) Dataset, which comprises 918,315 human preference choices across four dimensions (i.e., aesthetics, semantic alignment, detail quality and overall assessment) on 607,541 images. The images are generated by a wide range of latest text-to-image models. The MPS outperforms existing scoring methods across 3 datasets in 4 dimensions, enabling it a promising metric for evaluating and improving text-to-image generation. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.09394 [pdf, other]

SA-FedLora: Adaptive Parameter Allocation for Efficient Federated Learning with LoRA Tuning

Authors: Yuning Yang, Xiaohong Liu, Tianrun Gao, Xiaodong Xu, Guangyu Wang

Abstract: Fine-tuning large-scale pre-trained models via transfer learning is an emerging important paradigm for a wide range of downstream tasks, with performance heavily reliant on extensive data. Federated learning (FL), as a distributed framework, provides a secure solution to train models on local datasets while safeguarding raw sensitive data. However, FL networks encounter high communication costs du… ▽ More Fine-tuning large-scale pre-trained models via transfer learning is an emerging important paradigm for a wide range of downstream tasks, with performance heavily reliant on extensive data. Federated learning (FL), as a distributed framework, provides a secure solution to train models on local datasets while safeguarding raw sensitive data. However, FL networks encounter high communication costs due to the massive parameters of large-scale pre-trained models, necessitating parameter-efficient methods. Notably, parameter efficient fine tuning, such as Low-Rank Adaptation (LoRA), has shown remarkable success in fine-tuning pre-trained models. However, prior research indicates that the fixed parameter budget may be prone to the overfitting or slower convergence. To address this challenge, we propose a Simulated Annealing-based Federated Learning with LoRA tuning (SA-FedLoRA) approach by reducing trainable parameters. Specifically, SA-FedLoRA comprises two stages: initiating and annealing. (1) In the initiating stage, we implement a parameter regularization approach during the early rounds of aggregation, aiming to mitigate client drift and accelerate the convergence for the subsequent tuning. (2) In the annealing stage, we allocate higher parameter budget during the early 'heating' phase and then gradually shrink the budget until the 'cooling' phase. This strategy not only facilitates convergence to the global optimum but also reduces communication costs. Experimental results demonstrate that SA-FedLoRA is an efficient FL, achieving superior performance to FedAvg and significantly reducing communication parameters by up to 93.62%. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.08856 [pdf, ps, other]

Chiral properties of the nucleon interpolating current and $θ$-dependent observables

Authors: Yohei Ema, Ting Gao, Maxim Pospelov, Adam Ritz

Abstract: We revisit the chiral properties of nucleon interpolating currents, and show that of the two leading order currents $j_1$ and $j_2$, only two linear combinations $j_1\pm j_2$ transform covariantly under the anomalous $U(1)_A$ symmetry. As a result, calculations of quantities which vanish by symmetry in the chiral limit may produce unphysical results if carried out with different linear combination… ▽ More We revisit the chiral properties of nucleon interpolating currents, and show that of the two leading order currents $j_1$ and $j_2$, only two linear combinations $j_1\pm j_2$ transform covariantly under the anomalous $U(1)_A$ symmetry. As a result, calculations of quantities which vanish by symmetry in the chiral limit may produce unphysical results if carried out with different linear combinations of the currents. This includes observables such as electric dipole moments, induced by the QCD parameter $θ$, and the $θ$-dependence of the nucleon mass. For completeness, we also exhibit the leading order results for nucleon electric dipole moments ($d_{n,p}$) induced by $θ$, and the nucleon magnetic moments ($μ_{n,p}$), when calculated using QCD sum rules for both the covariant choices of the nucleon interpolating current. The results in each channel, conveniently expressed as the ratios, $d_{n,p}/μ_{n,p}$, are numerically consistent, and reflect the required physical dependence on $θ$. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 10 pages

Report number: UMN-TH-4319/24, FTPI-MINN-24-10

arXiv:2405.07518 [pdf, other]

SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Authors: Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Edison Chen, Kaizhao Liang, Swayambhoo Jain , et al. (5 additional authors not shown)

Abstract: Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Expert… ▽ More Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Experts (CoE) is an alternative modular approach that lowers the cost and complexity of training and serving. However, this approach presents two key challenges when using conventional hardware: (1) without fused operations, smaller models have lower operational intensity, which makes high utilization more challenging to achieve; and (2) hosting a large number of models can be either prohibitively expensive or slow when dynamically switching between them. In this paper, we describe how combining CoE, streaming dataflow, and a three-tier memory system scales the AI memory wall. We describe Samba-CoE, a CoE system with 150 experts and a trillion total parameters. We deploy Samba-CoE on the SambaNova SN40L Reconfigurable Dataflow Unit (RDU) - a commercial dataflow accelerator architecture that has been co-designed for enterprise inference and training applications. The chip introduces a new three-tier memory system with on-chip distributed SRAM, on-package HBM, and off-package DDR DRAM. A dedicated inter-RDU network enables scaling up and out over multiple sockets. We demonstrate speedups ranging from 2x to 13x on various benchmarks running on eight RDU sockets compared with an unfused baseline. We show that for CoE inference deployments, the 8-socket RDU Node reduces machine footprint by up to 19x, speeds up model switching time by 15x to 31x, and achieves an overall speedup of 3.7x over a DGX H100 and 6.6x over a DGX A100. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.03163 [pdf]

doi 10.1021/acs.jpcc.9b07620

Magnetic Ordering of Ammonium Cations in NH$_4$I, NH$_4$Br and NH$_4$Cl

Authors: Fei Yen, Lei Meng, Tian Gao, Sixia Hu

Abstract: The different types of magnetism arise mainly from how electrons move and interact with each other. In this work, we show how protons (H$^+$) also exhibit magnetic behavior. We measured the magnetic susceptibility of the ammonium halides and identified pronounced increases at 232 K, 233 K and 243 K for NH$_4$I, NH$_4$Br and NH$_4$Cl, respectively, which all coincide to the geometric ordering of it… ▽ More The different types of magnetism arise mainly from how electrons move and interact with each other. In this work, we show how protons (H$^+$) also exhibit magnetic behavior. We measured the magnetic susceptibility of the ammonium halides and identified pronounced increases at 232 K, 233 K and 243 K for NH$_4$I, NH$_4$Br and NH$_4$Cl, respectively, which all coincide to the geometric ordering of its ammonium cations. With extensive literature establishing the fact that the ammonium cations exhibit rotational motion even towards the lowest temperatures, we take into account that the orbital motion of the protons carries a magnetic moment and find it to be larger than that of the paired electrons. Consequently, the structural phase transitions are magnetically-driven as the system attempts to lift 8-fold energy degeneracies of the proton orbitals via Jahn-Teller distortions. Our findings identify that NH$_4$$^+$ cations are capable of comprising magnetism which appears to be ubiquitous in ammonia-based molecular solids. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Manuscript + Supporting Information file (19 + 4 pages, 5 + 3 figures). Sorry for not uploading this back in 2020!

Journal ref: J. Phys. Chem. C 123, 23655-23660 (2019)

arXiv:2404.19525 [pdf, other]

MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction

Authors: Luxi Chen, Zhengyi Wang, Zihan Zhou, Tingting Gao, Hang Su, Jun Zhu, Chongxuan Li

Abstract: Optimization-based approaches, such as score distillation sampling (SDS), show promise in zero-shot 3D generation but suffer from low efficiency, primarily due to the high number of function evaluations (NFEs) required for each sample. In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm mimicking a differentiable 3D reconstruction process to r… ▽ More Optimization-based approaches, such as score distillation sampling (SDS), show promise in zero-shot 3D generation but suffer from low efficiency, primarily due to the high number of function evaluations (NFEs) required for each sample. In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm mimicking a differentiable 3D reconstruction process to reduce the NFEs. Given a single set of images sampled from a multi-view score-based diffusion model, SIR repeatedly optimizes 3D parameters, unlike the single-step optimization in SDS. With other improvements in training, we present an efficient approach called MicroDreamer that generally applies to various 3D representations and 3D generation tasks. In particular, retaining a comparable performance, MicroDreamer is 5-20 times faster than SDS in generating neural radiance field and takes about 20 seconds to generate meshes from 3D Gaussian splatting on a single A100 GPU, halving the time of the fastest zero-shot baseline, DreamGaussian. Our code is available at \url{https://github.com/ML-GSAI/MicroDreamer}. △ Less

Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.19412 [pdf]

Enhancing Robotic Adaptability: Integrating Unsupervised Trajectory Segmentation and Conditional ProMPs for Dynamic Learning Environments

Authors: Tianci Gao

Abstract: We propose a novel framework for enhancing robotic adaptability and learning efficiency, which integrates unsupervised trajectory segmentation with adaptive probabilistic movement primitives (ProMPs). By employing a cutting-edge deep learning architecture that combines autoencoders and Recurrent Neural Networks (RNNs), our approach autonomously pinpoints critical transitional points in continuous,… ▽ More We propose a novel framework for enhancing robotic adaptability and learning efficiency, which integrates unsupervised trajectory segmentation with adaptive probabilistic movement primitives (ProMPs). By employing a cutting-edge deep learning architecture that combines autoencoders and Recurrent Neural Networks (RNNs), our approach autonomously pinpoints critical transitional points in continuous, unlabeled motion data, thus significantly reducing dependence on extensively labeled datasets. This innovative method dynamically adjusts motion trajectories using conditional variables, significantly enhancing the flexibility and accuracy of robotic actions under dynamic conditions while also reducing the computational overhead associated with traditional robotic programming methods. Our experimental validation demonstrates superior learning efficiency and adaptability compared to existing techniques, paving the way for advanced applications in industrial and service robotics. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.19377 [pdf]

doi 10.1038/s41565-024-01666-6

Toroidic phase transitions in a direct-kagome artificial spin ice

Authors: Wen-Cheng Yue, Zixiong Yuan, Peiyuan Huang, Yizhe Sun, Tan Gao, Yang-Yang Lyu, Xuecou Tu, Sining Dong, Liang He, Ying Dong, Xun Cao, Lin Kang, Huabing Wang, Peiheng Wu, Cristiano Nisoli, Yong-Lei Wang

Abstract: Ferrotoroidicity, the fourth form of primary ferroic order, breaks both space and time inversion symmetry. So far, direct observation of ferrotoroidicity in natural materials remains elusive, which impedes the exploration of ferrotoroidic phase transitions. Here, we overcome the limitations of natural materials using an artificial nanomagnet system that can be characterized at the constituent leve… ▽ More Ferrotoroidicity, the fourth form of primary ferroic order, breaks both space and time inversion symmetry. So far, direct observation of ferrotoroidicity in natural materials remains elusive, which impedes the exploration of ferrotoroidic phase transitions. Here, we overcome the limitations of natural materials using an artificial nanomagnet system that can be characterized at the constituent level and at different effective temperatures. We design a nanomagnet array as to realize a direct-kagome spin ice. This artificial spin ice exhibits robust toroidal moments and a quasi-degenerate ground state with two distinct low-temperature toroidal phases: ferrotoroidicity and paratoroidicity. Using magnetic force microscopy and Monte Carlo simulation, we demonstrate a phase transition between ferrotoroidicity and paratoroidicity, along with a crossover to a non-toroidal paramagnetic phase. Our quasi-degenerate artificial spin ice in a direct-kagome structure provides a model system for the investigation of magnetic states and phase transitions that are inaccessible in natural materials. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Journal ref: Nature Nanotechnology (2024)

arXiv:2404.16033 [pdf, other]

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Authors: Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji

Abstract: With the advent of large language models(LLMs) enhanced by the chain-of-thought(CoT) methodology, visual reasoning problem is usually decomposed into manageable sub-tasks and tackled sequentially with various external tools. However, such a paradigm faces the challenge of the potential "determining hallucinations" in decision-making due to insufficient visual information and the limitation of low-… ▽ More With the advent of large language models(LLMs) enhanced by the chain-of-thought(CoT) methodology, visual reasoning problem is usually decomposed into manageable sub-tasks and tackled sequentially with various external tools. However, such a paradigm faces the challenge of the potential "determining hallucinations" in decision-making due to insufficient visual information and the limitation of low-level perception tools that fail to provide abstract summaries necessary for comprehensive reasoning. We argue that converging visual context acquisition and logical reasoning is pivotal for tackling visual reasoning tasks. This paper delves into the realm of multimodal CoT to solve intricate visual reasoning tasks with multimodal large language models(MLLMs) and their cognitive capability. To this end, we propose an innovative multimodal CoT framework, termed Cantor, characterized by a perception-decision architecture. Cantor first acts as a decision generator and integrates visual inputs to analyze the image and problem, ensuring a closer alignment with the actual context. Furthermore, Cantor leverages the advanced cognitive functions of MLLMs to perform as multifaceted experts for deriving higher-level information, enhancing the CoT generation process. Our extensive experiments demonstrate the efficacy of the proposed framework, showing significant improvements in multimodal CoT performance across two complex visual reasoning datasets, without necessitating fine-tuning or ground-truth rationales. Project Page: https://ggg0919.github.io/cantor/ . △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: The project page is available at https://ggg0919.github.io/cantor/

arXiv:2404.15013 [pdf, other]

Quantifying multipartite quantum states by ($k+1$)-partite entanglement measures

Authors: Hui Li, Ting Gao, Fengli Yan

Abstract: In this paper, we investigate how to quantify the quantum states of $n$-particles from the point of $(k+1)$-partite entanglement $(1\leq k\leq n-1)$, which plays an instrumental role in quantum nonlocality and quantum metrology. We put forward two families of entanglement measures termed $q$-$(k+1)$-PE concurrence $(q>1)$ and $α$-$(k+1)$-PE concurrence $(0\leqα<1)$, respectively. As far as the pur… ▽ More In this paper, we investigate how to quantify the quantum states of $n$-particles from the point of $(k+1)$-partite entanglement $(1\leq k\leq n-1)$, which plays an instrumental role in quantum nonlocality and quantum metrology. We put forward two families of entanglement measures termed $q$-$(k+1)$-PE concurrence $(q>1)$ and $α$-$(k+1)$-PE concurrence $(0\leqα<1)$, respectively. As far as the pure state is concerned, they are defined based on the minimum in entanglement. Meanwhile, rigorous proofs showing that both types of quantifications fulfill all the requirements of an entanglement measure are provided. In addition, we also propose two alternative kinds of entanglement measures, named $q$-$(k+1)$-GPE concurrence $(q>1)$ and $α$-$(k+1)$-GPE concurrence $(0\leqα<1)$, respectively, where the quantifications of any pure state are given by taking the geometric mean of entanglement under all partitions satisfying preconditions. Besides, the lower bounds of these measures are presented by means of the entanglement of permutationally invariant (PI) part of quantum states and the connections of these measures are offered. Moreover, we compare these measures and explain the similarities and differences among them. Furthermore, for computational convenience, we consider enhanced versions of the above quantifications that can be utilized to distinguish whether a multipartite state is genuinely strong $k$-producible. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 10 pages,2 figures

arXiv:2404.14949 [pdf, other]

Multi-Modal Prompt Learning on Blind Image Quality Assessment

Authors: Wensheng Pan, Timin Gao, Yan Zhang, Runze Hu, Xiawu Zheng, Enwei Zhang, Yuting Gao, Yutao Liu, Yunhang Shen, Ke Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

Abstract: Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Currently, leveraging semantic information to enhance IQA is a crucial research direction. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semant… ▽ More Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Currently, leveraging semantic information to enhance IQA is a crucial research direction. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. However, the generalist nature of these pre-trained Vision-Language (VL) models often renders them suboptimal for IQA-specific tasks. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. Existing prompt-based VL models overly focus on incremental semantic information from text, neglecting the rich insights available from visual data analysis. This imbalance limits their performance improvements in IQA tasks. This paper introduces an innovative multi-modal prompt-based methodology for IQA. Our approach employs carefully crafted prompts that synergistically mine incremental semantic information from both visual and linguistic data. Specifically, in the visual branch, we introduce a multi-layer prompt structure to enhance the VL model's adaptability. In the text branch, we deploy a dual-prompt scheme that steers the model to recognize and differentiate between scene category and distortion type, thereby refining the model's capacity to assess image quality. Our experimental findings underscore the effectiveness of our method over existing Blind Image Quality Assessment (BIQA) approaches. Notably, it demonstrates competitive performance across various datasets. Our method achieves Spearman Rank Correlation Coefficient (SRCC) values of 0.961(surpassing 0.946 in CSIQ) and 0.941 (exceeding 0.930 in KADID), illustrating its robustness and accuracy in diverse contexts. △ Less

Submitted 18 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.07458 [pdf, other]

I-mode Plasma Confinement Improvement by Real-time Lithium Injection and its Classification on EAST Tokamak

Authors: X. M. Zhong, X. L. Zou, A. D. Liu, Y. T. Song, G. Zhuang, H. Q. Liu, L. Q. Xu, E. Z. Li, B. Zhang, G. Z. Zuo, Z. Wang, C. Zhou, J. Zhang, W. X. Shi, L. T. Gao, S. F. Wang, W. Gao, T. Q. Jia, Q. Zang, H. L. Zhao, M. Wang, H. D. Xu, X. J. Wang, X. Gao, X. D. Lin , et al. (3 additional authors not shown)

Abstract: I-mode is a promising regime for future fusion reactors due to the high energy confinement and the moderate particle confinement. However, the effect of lithium, which has been widely applied for particle recycling and impurity control, on I-mode plasma is still unclear. Recently, experiments of real-time lithium powder injection on I-mode plasma have been carried out in EAST Tokamak. It was found… ▽ More I-mode is a promising regime for future fusion reactors due to the high energy confinement and the moderate particle confinement. However, the effect of lithium, which has been widely applied for particle recycling and impurity control, on I-mode plasma is still unclear. Recently, experiments of real-time lithium powder injection on I-mode plasma have been carried out in EAST Tokamak. It was found that the confinement performance of the I-mode can be improved by the lithium powder injection, which can strongly reduce electron turbulence (ET) and then trigger ion turbulence (IT). Four different regimes of I-mode have been identified in EAST. The Type I I-mode plasma is characterized by the weakly coherent mode (WCM) and the geodesic-acoustic mode (GAM). The Type II I-mode is featured as the WCM and the edge temperature ring oscillation (ETRO). The Type III I-mode corresponds to the plasma with the co-existence of ETRO, GAM, and WCM. The Type IV I-mode denotes the plasma with only WCM but without ETRO and GAM. It has been observed that WCM and ETRO are increased with lithium powder injection due to the reduction of ion and electron turbulence, and the enhancement of the pedestal electron temperature gradient. EAST experiments demonstrate that lithium powder injection is an effective tool for real-time control and confinement improvement of I-mode plasma. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.01960 [pdf, ps, other]

$(n,m,p)$-type quantum network configuration and its nonlocality

Authors: Zan-Jia Li, Ying-Qiu He, Dong Ding, Ming-Xing Yu, Ting Gao, Feng-Li Yan

Abstract: A quantum network shared entangled sources among distant nodes enables us to distribute entanglement along the network by suitable measurements. Network nonlocality means that it does not admit a network model involving local variables emitted from independent sources. In this work, we construct an $(n,m,p)$-type quantum network configuration and then derive the corresponding $n$-local correlation… ▽ More A quantum network shared entangled sources among distant nodes enables us to distribute entanglement along the network by suitable measurements. Network nonlocality means that it does not admit a network model involving local variables emitted from independent sources. In this work, we construct an $(n,m,p)$-type quantum network configuration and then derive the corresponding $n$-local correlation inequalities based on the assumption of independent sources. As a universal acyclic network configuration, it can cover most of the existing network models, such as the typical chain-network and star-network, and admit both centerless and asymmetric configurations. Then we demonstrate the non-$n$-locality of the present network by calculating the violation of the $n$-local inequality with bipartite entangled sources and Pauli measurements. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 7 pages, 2 figures

arXiv:2403.15538 [pdf, other]

Momentum shift and on-shell constructible massive amplitudes

Authors: Yohei Ema, Ting Gao, Wenqi Ke, Zhen Liu, Kun-Feng Lyu, Ishmam Mahbub

Abstract: We construct tree-level amplitude for massive particles using on-shell recursion relations based on two classes of momentum shifts: an all-line transverse shift that deforms momentum by its transverse polarization vector, and a massive BCFW-type shift. We illustrate that these shifts allow us to correctly calculate four-point and five-point amplitudes in massive QED, without an ambiguity associate… ▽ More We construct tree-level amplitude for massive particles using on-shell recursion relations based on two classes of momentum shifts: an all-line transverse shift that deforms momentum by its transverse polarization vector, and a massive BCFW-type shift. We illustrate that these shifts allow us to correctly calculate four-point and five-point amplitudes in massive QED, without an ambiguity associated with the contact terms that may arise from a simple ``gluing'' of lower-point on-shell amplitudes. We discuss various aspects and applicability of the two shifts, including the large-z behavior and complexity scaling. We show that there exists a ``good'' all-line transverse shift for all possible little group configurations of the external particles, which can be extended to a broader class of theories with massive particles such as massive QCD and theories with massive spin-1 particles. The massive BCFW-type shift enjoys more simplicity, but a ``good'' shift does not exist for all the spin states due to the specific choice of spin axis. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 26 pages, 1 figure and comments are welcome

Report number: UMN-TH-4316/24, FTPI-MINN-24-07

arXiv:2403.15133 [pdf, other]

Observation of sub-Poissonian correlation in spin-orbit coupled polariton vortex pairs at room temperature

Authors: Xiaokun Zhai, Ying Gao, Xuekai Ma, Chunzi Xing, Xiao Wang, Anlian Pan, Marc Assmann, Stefan Schumacher, Tingge Gao

Abstract: Coupling of orbital and spin degrees of freedom gives rise to intriguing physical phenomena in bosonic condensates, such as formation of stripe phases and domains with vortex arrays. However, the robust locking of spin and orbital degrees of freedom of the nonlinear topological objects such as vortex pairs with sub-Poissonian fluctuation in bosonic condensates remains challenging. In the present w… ▽ More Coupling of orbital and spin degrees of freedom gives rise to intriguing physical phenomena in bosonic condensates, such as formation of stripe phases and domains with vortex arrays. However, the robust locking of spin and orbital degrees of freedom of the nonlinear topological objects such as vortex pairs with sub-Poissonian fluctuation in bosonic condensates remains challenging. In the present work, we realize a non-equilibrium room-temperature condensate in a liquid crystal (LC) planar photonic microcavity with the perovskite CsPbBr3 as optically active material. We use the interplay of TE-TM mode splitting and Rashba-Dresselhaus spin-orbit coupling (RDSOC) to realize electrically tunable polariton vortex pairs with locked spin and orbital angular momentum. Remarkably, the counts difference between opposite wavevector states shows sub-Poissonian fluctuation, indicating the existence of the correlation between the two vortices. Our results are robust against sample imperfections and pave the way to investigate coupling and locking of correlated vortex orbital and spin degrees of freedom in a quantum fluid of light at room temperature, offering potential for generation of complex squeezed states of light for quantum optical information processing with optoelectronic chips △ Less

Submitted 5 June, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.11091 [pdf, other]

Multitask frame-level learning for few-shot sound event detection

Authors: Liang Zou, Genwei Yan, Ruoyu Wang, Jun Du, Meng Lei, Tian Gao, Xin Fang

Abstract: This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples. However, prevailing methods methods in few-shot SED predominantly rely on segment-level predictions, which often providing detailed, fine-grained predictions, particularly for events of brief duration. Although frame-level prediction strategies have been… ▽ More This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples. However, prevailing methods methods in few-shot SED predominantly rely on segment-level predictions, which often providing detailed, fine-grained predictions, particularly for events of brief duration. Although frame-level prediction strategies have been proposed to overcome these limitations, these strategies commonly face difficulties with prediction truncation caused by background noise. To alleviate this issue, we introduces an innovative multitask frame-level SED framework. In addition, we introduce TimeFilterAug, a linear timing mask for data augmentation, to increase the model's robustness and adaptability to diverse acoustic environments. The proposed method achieves a F-score of 63.8%, securing the 1st rank in the few-shot bioacoustic event detection category of the Detection and Classification of Acoustic Scenes and Events Challenge 2023. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 6 pages, 4 figures, conference

arXiv:2403.10405 [pdf, other]

Action Functional as an Early Warning Indicator in the Space of Probability Measures via Schrödinger Bridge

Authors: Peng Zhang, Ting Gao, Jin Guo, Jinqiao Duan

Abstract: Critical transition and tipping phenomena between two meta-stable states in stochastic dynamical systems represents an important problem. In this work, we expand the methodology from the traditional Onsager-Machlup action functional, which typically identifies the most probable transition pathway between two meta-stable states, to investigate the evolutionary transition dynamics between two meta-s… ▽ More Critical transition and tipping phenomena between two meta-stable states in stochastic dynamical systems represents an important problem. In this work, we expand the methodology from the traditional Onsager-Machlup action functional, which typically identifies the most probable transition pathway between two meta-stable states, to investigate the evolutionary transition dynamics between two meta-stable invariant sets. To address this, we incorporate a comprehensive framework derived from Schrödinger bridge and Optimal Transport. In contrast to existing methodologies such as statistical analysis, bifurcation theory, information theory, statistical physics, topology, and graph theory for early warning indicators, we introduce a novel perspective on early warning signals within the realm of probability measures which enables the development of indicators grounded in action functionals. In order to validate our framework, we apply this methodology to the Morris-Lecar model, which exhibits the generation of the repetitive firing in certain neurons resulting from a saddle-node bifurcation on an invariant circle. By varying the current condition, we investigate the transition dynamics between a meta-stable state and a stable invariant set (the limit cycle or homo-clinic orbit) within Morris-Lecar model. Additionally, we analyze real Alzheimer's data from the ADNI database to explore early warning signals indicating the transition from healthy to pre-AD states. This framework not only expands the transition pathway to encompass measures between two specified densities on invariant sets but also demonstrates potential of early warning indicators or biomarkers in complex diseases. △ Less

Submitted 8 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 18pages

arXiv:2403.07420 [pdf, other]

DragAnything: Motion Control for Anything using Entity Representation

Authors: Weijia Wu, Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang

Abstract: We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation. Comparison to existing motion control methods, DragAnything offers several advantages. Firstly, trajectory-based is more userfriendly for interaction, when acquiring other guidance signals (e.g., masks, depth maps) is labor-intensive. Users only need to draw… ▽ More We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation. Comparison to existing motion control methods, DragAnything offers several advantages. Firstly, trajectory-based is more userfriendly for interaction, when acquiring other guidance signals (e.g., masks, depth maps) is labor-intensive. Users only need to draw a line (trajectory) during interaction. Secondly, our entity representation serves as an open-domain embedding capable of representing any object, enabling the control of motion for diverse entities, including background. Lastly, our entity representation allows simultaneous and distinct motion control for multiple objects. Extensive experiments demonstrate that our DragAnything achieves state-of-the-art performance for FVD, FID, and User Study, particularly in terms of object motion control, where our method surpasses the previous methods (e.g., DragNUWA) by 26% in human voting. △ Less

Submitted 15 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: The project website is at: https://weijiawu.github.io/draganything_page/ . The code is at: https://github.com/showlab/DragAnything

arXiv:2403.01391 [pdf, other]

Planar two-region multi-partite maximally entangled states

Authors: Yanwen Liang, Fengli Yan, Ting Gao

Abstract: In entanglement theory, there are different methods to consider one state being more entangled than another. The "maximally" entangled states in a multipartite system can be defined from an axiomatic perspective. According to different criteria for selection, there are many specific types of quantum maximally entangled states, such as absolutely maximally entangled state, planar maximally entangle… ▽ More In entanglement theory, there are different methods to consider one state being more entangled than another. The "maximally" entangled states in a multipartite system can be defined from an axiomatic perspective. According to different criteria for selection, there are many specific types of quantum maximally entangled states, such as absolutely maximally entangled state, planar maximally entangled state and so on. In this paper we propose a new type of maximally entangled states, the planar two-region multipartite maximally entangled state. The requirement condition of this maximally entangled state is weak than that of the absolutely maximally entangled state and different from that of the planar maximally entangled state. We show that there are the two-region four-partite maximally entangled states in 4-qubit and 7-qubit planar systems, although there is no absolutely maximally entangled state in these systems. It is proved that there are the planar two-region four-partite maximally entangled states in both even particle quantum systems and odd particle quantum systems. Additionally, based on some planar two-region four-partite maximally entangled states, the new planar two-region four-partite maximally entangled states are generated. We also provide some important examples of the planar two-region multi-partite maximally entangled states. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 12 pages, 8 figures

arXiv:2403.00929 [pdf, other]

PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for Data-Efficient Imitation Learning

Authors: Tian Gao, Soroush Nasiriany, Huihan Liu, Quantao Yang, Yuke Zhu

Abstract: Imitation learning has shown great potential for enabling robots to acquire complex manipulation behaviors. However, these algorithms suffer from high sample complexity in long-horizon tasks, where compounding errors accumulate over the task horizons. We present PRIME (PRimitive-based IMitation with data Efficiency), a behavior primitive-based framework designed for improving the data efficiency o… ▽ More Imitation learning has shown great potential for enabling robots to acquire complex manipulation behaviors. However, these algorithms suffer from high sample complexity in long-horizon tasks, where compounding errors accumulate over the task horizons. We present PRIME (PRimitive-based IMitation with data Efficiency), a behavior primitive-based framework designed for improving the data efficiency of imitation learning. PRIME scaffolds robot tasks by decomposing task demonstrations into primitive sequences, followed by learning a high-level control policy to sequence primitives through imitation learning. Our experiments demonstrate that PRIME achieves a significant performance improvement in multi-stage manipulation tasks, with 10-34% higher success rates in simulation over state-of-the-art baselines and 20-48% on physical hardware. △ Less

Submitted 10 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.16617 [pdf, other]

Long-Context Language Modeling with Parallel Context Encoding

Authors: Howard Yen, Tianyu Gao, Danqi Chen

Abstract: Extending large language models (LLMs) to process longer inputs is crucial for a wide range of applications. However, the substantial computational cost of transformers and limited generalization of positional encoding restrict the size of their context window. We introduce Context Expansion with Parallel Encoding (CEPE), a framework that can be applied to any existing decoder-only LLMs to extend… ▽ More Extending large language models (LLMs) to process longer inputs is crucial for a wide range of applications. However, the substantial computational cost of transformers and limited generalization of positional encoding restrict the size of their context window. We introduce Context Expansion with Parallel Encoding (CEPE), a framework that can be applied to any existing decoder-only LLMs to extend their context window. CEPE employs a small encoder to process long inputs chunk by chunk, enabling the frozen decoder to utilize additional contexts via cross-attention. CEPE is efficient, generalizable, and versatile: trained with 8K-token documents, it extends the context window of LLAMA-2 to 128K tokens, offering 10x the throughput with only 1/6 of the memory. CEPE yields strong performance on language modeling and in-context learning. CEPE also excels in retrieval-augmented applications, while existing long-context models degenerate with retrieved contexts. We further introduce a CEPE variant that can extend the context window of instruction-tuned models using only unlabeled data, and showcase its effectiveness on LLAMA-2-CHAT, leading to a strong instruction-following model that can leverage very long contexts on downstream tasks. △ Less

Submitted 11 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: ACL 2024. Code, models, and data are available at https://github.com/princeton-nlp/CEPE. arXiv admin note: text overlap with arXiv:1912.01214 by other authors

arXiv:2402.14073 [pdf, other]

Improving Language Understanding from Screenshots

Authors: Tianyu Gao, Zirui Wang, Adithya Bhaskar, Danqi Chen

Abstract: An emerging family of language models (LMs), capable of processing both text and images within a single visual view, has the promise to unlock complex tasks such as chart understanding and UI navigation. We refer to these models as screenshot language models. Despite their appeal, existing screenshot LMs substantially lag behind text-only models on language understanding tasks. To close this gap,… ▽ More An emerging family of language models (LMs), capable of processing both text and images within a single visual view, has the promise to unlock complex tasks such as chart understanding and UI navigation. We refer to these models as screenshot language models. Despite their appeal, existing screenshot LMs substantially lag behind text-only models on language understanding tasks. To close this gap, we adopt a simplified setting where the model inputs are plain-text-rendered screenshots, and we focus on improving the text ability of screenshot LMs. We propose a novel Patch-and-Text Prediction (PTP) objective, which masks and recovers both image patches of screenshots and text within screenshots. We also conduct extensive ablation studies on masking rates and patch sizes, as well as designs for improving training stability. Our pre-trained model, while solely taking visual inputs, achieves comparable performance with BERT on 6 out of 8 GLUE tasks (within 2%) and improves up to 8% over prior work. Additionally, we extend PTP to train autoregressive screenshot LMs and demonstrate its effectiveness--our models can significantly reduce perplexity by utilizing the screenshot context. Together, we hope our findings can inspire future research on developing powerful screenshot LMs and extending their reach to broader applications. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: Our model and code are available at https://github.com/princeton-nlp/PTP

arXiv:2402.04111 [pdf, ps, other]

Vector Approximate Message Passing With Arbitrary I.I.D. Noise Priors

Authors: Mohamed Akrout, Tiancheng Gao, Faouzi Bellili, Amine Mezghani

Abstract: Approximate message passing (AMP) algorithms are devised under the Gaussianity assumption of the measurement noise vector. In this work, we relax this assumption within the vector AMP (VAMP) framework to arbitrary independent and identically distributed (i.i.d.) noise priors. We do so by rederiving the linear minimum mean square error (LMMSE) to accommodate both the noise and signal estimations wi… ▽ More Approximate message passing (AMP) algorithms are devised under the Gaussianity assumption of the measurement noise vector. In this work, we relax this assumption within the vector AMP (VAMP) framework to arbitrary independent and identically distributed (i.i.d.) noise priors. We do so by rederiving the linear minimum mean square error (LMMSE) to accommodate both the noise and signal estimations within the message passing steps of VAMP. Numerical results demonstrate how our proposed algorithm handles non-Gaussian noise models as compared to VAMP. This extension to general noise priors enables the use of AMP algorithms in a wider range of engineering applications where non-Gaussian noise models are more appropriate. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: Accepted to the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2402.02462 [pdf, ps, other]

Quantum teleportation based on the elegant joint measurement

Authors: Dong Ding, Ming-Xing Yu, Ying-Qiu He, Hao-Sen Ji, Ting Gao, Feng-Li Yan

Abstract: As a generalization of the well-known Bell state measurement (BSM), the elegant joint measurement (EJM) is a kind of novel two-qubit joint measurement, parameterized by a subtle phase factor $θ\in [0,π/2]$. We explore quantum teleportation based on the EJM, inspired by Gisin's idea that quantum entanglement not only provides quantum channel and also quantum joint measurement for quantum teleportat… ▽ More As a generalization of the well-known Bell state measurement (BSM), the elegant joint measurement (EJM) is a kind of novel two-qubit joint measurement, parameterized by a subtle phase factor $θ\in [0,π/2]$. We explore quantum teleportation based on the EJM, inspired by Gisin's idea that quantum entanglement not only provides quantum channel and also quantum joint measurement for quantum teleportation. It is a probabilistic teleportation caused by undesired nonunitary quantum evolution. There are two interesting features in the present scenario. First, it goes beyond the conventional teleportation scenario, which can be included in the present scenario. Second, different from the BSM being single input and four outcomes, it can provide an adjustable input setting or even multiple measurement settings for the sender (or the controller). Moreover, we show in detail the feasible quantum circuits to realize the present scenario, where a few unitary operations and a nonunitary quantum gate are being utilized. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 8 pages, 3 figures

arXiv:2402.01168 [pdf, other]

Measurement of transverse polarization of $Λ$/$\barΛ$ within jet in $pp$ collisions at STAR

Authors: Taoya Gao

Abstract: Spontaneous polarization of $Λ/\barΛ$ in unpolarized hadron interactions has been observed experimentally for nearly half a century and still eludes a definitive explanation. One possible origin is the effect arising from polarizing fragmentation functions (pFFs), which describe the production of polarized hadrons from the fragmentation of an unpolarized parton. Recently, significant transverse po… ▽ More Spontaneous polarization of $Λ/\barΛ$ in unpolarized hadron interactions has been observed experimentally for nearly half a century and still eludes a definitive explanation. One possible origin is the effect arising from polarizing fragmentation functions (pFFs), which describe the production of polarized hadrons from the fragmentation of an unpolarized parton. Recently, significant transverse polarization of $Λ/\barΛ$ has been observed in unpolarized $e^{+}e^{-}$ annihilation at Belle experiment, along the normal to the plane defined by the thrust axis and $Λ$ momentum. In unpolarized $pp$ collisions, the measurement of transverse polarization of $Λ$/$\barΛ$ within jet could also provide important constraints and universality test for the pFFs. In this contribution, preliminary results on the first measurement of $Λ$/$\barΛ$ polarization within a jet in $pp$ collision at $\sqrt{s}$ = 200 GeV are reported. The data used for this measurement were taken by the STAR experiment at RHIC in 2015. △ Less

Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: SPIN2023 proceeding

arXiv:2402.00987 [pdf, other]

Self-Supervised Contrastive Pre-Training for Multivariate Point Processes

Authors: Xiao Shou, Dharmashankar Subramanian, Debarun Bhattacharjya, Tian Gao, Kristin P. Bennet

Abstract: Self-supervision is one of the hallmarks of representation learning in the increasingly popular suite of foundation models including large language models such as BERT and GPT-3, but it has not been pursued in the context of multivariate event streams, to the best of our knowledge. We introduce a new paradigm for self-supervised learning for multivariate point processes using a transformer encoder… ▽ More Self-supervision is one of the hallmarks of representation learning in the increasingly popular suite of foundation models including large language models such as BERT and GPT-3, but it has not been pursued in the context of multivariate event streams, to the best of our knowledge. We introduce a new paradigm for self-supervised learning for multivariate point processes using a transformer encoder. Specifically, we design a novel pre-training strategy for the encoder where we not only mask random event epochs but also insert randomly sampled "void" epochs where an event does not occur; this differs from the typical discrete-time pretext tasks such as word-masking in BERT but expands the effectiveness of masking to better capture continuous-time dynamics. To improve downstream tasks, we introduce a contrasting module that compares real events to simulated void instances. The pre-trained model can subsequently be fine-tuned on a potentially much smaller event dataset, similar conceptually to the typical transfer of popular pre-trained language models. We demonstrate the effectiveness of our proposed paradigm on the next-event prediction task using synthetic datasets and 3 real applications, observing a relative performance boost of as high as up to 20% compared to state-of-the-art models. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2402.00330 [pdf, other]

Night-Rider: Nocturnal Vision-aided Localization in Streetlight Maps Using Invariant Extended Kalman Filtering

Authors: Tianxiao Gao, Mingle Zhao, Chengzhong Xu, Hui Kong

Abstract: Vision-aided localization for low-cost mobile robots in diverse environments has attracted widespread attention recently. Although many current systems are applicable in daytime environments, nocturnal visual localization is still an open problem owing to the lack of stable visual information. An insight from most nocturnal scenes is that the static and bright streetlights are reliable visual info… ▽ More Vision-aided localization for low-cost mobile robots in diverse environments has attracted widespread attention recently. Although many current systems are applicable in daytime environments, nocturnal visual localization is still an open problem owing to the lack of stable visual information. An insight from most nocturnal scenes is that the static and bright streetlights are reliable visual information for localization. Hence we propose a nocturnal vision-aided localization system in streetlight maps with a novel data association and matching scheme using object detection methods. We leverage the Invariant Extended Kalman Filter (InEKF) to fuse IMU, odometer, and camera measurements for consistent state estimation at night. Furthermore, a tracking recovery module is also designed for tracking failures. Experimental results indicate that our proposed system achieves accurate and robust localization with less than $0.2\%$ relative error of trajectory length in four nocturnal environments. △ Less

Submitted 3 March, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

arXiv:2401.16254 [pdf, ps, other]

YingLong: Skillful High Resolution Regional Short Term Forecasting with Boundary Smoothing

Authors: Pengbo Xu, Tianyan Gao, Yu Wang, Junping Yin, Juan Zhang, Xiaogu Zheng, Zhimin Zhang, Xiaoguang Hu, Xiaoxu Chen

Abstract: In the realm of numerical weather forecasting, achieving higher resolution demands increased computational resources and time investment, and leveraging deep learning networks trained solely on data significantly reduces the time expenditure during forecasting. Recently, several global forecasting artificial-intelligence-based models are developed, which are mainly trained on reanalysis dataset wi… ▽ More In the realm of numerical weather forecasting, achieving higher resolution demands increased computational resources and time investment, and leveraging deep learning networks trained solely on data significantly reduces the time expenditure during forecasting. Recently, several global forecasting artificial-intelligence-based models are developed, which are mainly trained on reanalysis dataset with a spatial resolution of approximately 25km. However, regional forecasting prefers a higher spatial resolution, and boundary information for the region also plays an important role in regional forecasting, which turns out to be a major difference from global forecasting. Here we introduce a high resolution, short-term regional weather forecasting, artificial-intelligence-based model called 'YingLong', which is capable of hourly predicting weather fields including wind speed, temperature, and specific humidity at a 3km resolution. YingLong utilizes a parallel structure of global and local blocks to capture multiscale meteorological features and is trained on analysis dataset. Additionally, the necessary information around the regional boundary is introduced to YingLong through the boundary smoothing strategy, which significantly improves the regional forecasting results. By comparing forecast results with those from WRF-ARW, one of the best numerical prediction models, YingLong demonstrates superior forecasting performances in most cases, especially on surface variables. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.06428 [pdf, ps, other]

First Exploration of Monopole-Driven Shell Evolution above the N = 126 shell closure: new Millisecond Isomers in 213Tl and 215Tl

Authors: T. T. Yeung, A. I. Morales, J. Wu, M. Liu, C. Yuan, S. Nishimura, V. H. Phong, N. Fukuda, J. L. Tain, T. Davinson, K. P. Rykaczewski, R. Yokoyama, T. Isobe, M. Niikura, Zs. Podolyak, G. Alcala, A. Algora, J. Agramunt, C. Appleton, H. Baba, R. Caballero-Folch, P. Calvino, M. P. Carpenter, I. Dillmann, A. Estrade , et al. (30 additional authors not shown)

Abstract: Isomer spectroscopy of heavy neutron-rich nuclei beyond the N=126 closed shell has been performed for the first time at the Radioactive Isotope Beam Factory of the RIKEN Nishina Center. New millisecond isomers have been identified at low excitation energies, 985.3(19) keV in 213Tl and 874(5) keV in 215Tl. The measured half-lives of 1.34(5) ms in 213Tl and 3.0(3) ms in 215Tl suggest spins and parit… ▽ More Isomer spectroscopy of heavy neutron-rich nuclei beyond the N=126 closed shell has been performed for the first time at the Radioactive Isotope Beam Factory of the RIKEN Nishina Center. New millisecond isomers have been identified at low excitation energies, 985.3(19) keV in 213Tl and 874(5) keV in 215Tl. The measured half-lives of 1.34(5) ms in 213Tl and 3.0(3) ms in 215Tl suggest spins and parities 11/2- with the single proton-hole configuration h11/2 as leading component. They are populated via E1 transitions by the decay of higher-lying isomeric states with proposed spin and parity 17/2+, interpreted as arising from a single s1/2 proton hole coupled to the 8+ seniority isomer in the (A+1)Pb cores. The lowering of the 11/2- states is ascribed to an increase of the h11/2 proton effective single-particle energy as the second g9/2 orbital is filled by neutrons, owing to a significant reduction of the proton-neutron monopole interaction between the h11/2 and g9/2 orbitals. The new ms-isomers provide the first experimental observation of shell evolution in the almost unexplored N>126 nuclear region below doubly-magic 208Pb. △ Less

Submitted 25 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

Comments: 9 pages, 3 figures, 1 table

arXiv:2401.03625 [pdf, other]

Optically controllable localization of exciton polariton condensates in a potential lattice

Authors: Qiang Ai, Jan Wingenbach, Xinmiao Yang, Jing Wei, Zaharias Hatzopoulos, Pavlos G. Savvidis, Stefan Schumacher, Xuekai Ma, Tingge Gao

Abstract: Exciton polaritons are inherently non-Hermitian systems with adjustable gain and loss coefficients. In this work we show that exciton polariton condensates can be selectively localized in an optically-induced lattice with equal potential depth by judiciously controlling a second focused pump with a very small size. Specifically, the localized polariton condensate can be tuned among different poten… ▽ More Exciton polaritons are inherently non-Hermitian systems with adjustable gain and loss coefficients. In this work we show that exciton polariton condensates can be selectively localized in an optically-induced lattice with equal potential depth by judiciously controlling a second focused pump with a very small size. Specifically, the localized polariton condensate can be tuned among different potential traps by adjusting the relative distance between the small pump spot and the potential lattice. The adjustment of the excitation position of the smaller pump and its combination with the bigger pump for the potential creation induce a position-dependent loss distribution across the system. The localization of the exciton polariton condensate and its control are independent of the orientation of the potential lattice, thus, even in slightly disordered system, one can selectively excite such localized polariton condensates. Our results illuminate a path to manipulate the non-Hermitian bosonic condensates in integrated photonic chips. △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2401.02311 [pdf, other]

Fourier neural operator based fluid-structure interaction for predicting the vesicle dynamics

Authors: Wang Xiao, Ting Gao, Kai Liu, Jinqiao Duan, Meng Zhao

Abstract: Solving complex fluid-structure interaction (FSI) problems, characterized by nonlinear partial differential equations, is crucial in various scientific and engineering applications. Traditional computational fluid dynamics (CFD) solvers are insufficient to meet the growing requirements for large-scale and long-period simulations. Fortunately, the rapid advancement in neural networks, especially ne… ▽ More Solving complex fluid-structure interaction (FSI) problems, characterized by nonlinear partial differential equations, is crucial in various scientific and engineering applications. Traditional computational fluid dynamics (CFD) solvers are insufficient to meet the growing requirements for large-scale and long-period simulations. Fortunately, the rapid advancement in neural networks, especially neural operator learning mappings between function spaces, has introduced novel approaches to tackle these challenges via data-driven modeling. In this paper, we propose a Fourier neural operator-based fluid-structure interaction solver (FNO-based FSI solver) for efficient simulation of FSI problems, where the solid solver based on the finite difference method is seamlessly integrated with the Fourier neural operator to predict incompressible flow using the immersed boundary method. We analyze the performance of the FNO-based FSI solver in the following three situations: training data with or without the steady state, training method with one-step label or multi-step labels, and prediction in interpolation or extrapolation. We find that the best performance for interpolation is achieved by training the operator with multi-step labels using steady-state data. Finally, we train the FNO-based FSI solver using this optimal training method and apply it to vesicle dynamics. The results show that the FNO-based FSI solver is capable of capturing the variations in the fluid and the vesicle. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2401.01184 [pdf]

A high resolution rovibronic molecular cross-section of MgH+ molecular cation

Authors: Huagang Xiao, Tao Gao

Abstract: The high resolution rovibronic line list of MgH+ molecular cation are presented in our work. The potential energy curves are calculated by the method of multireference configuration interaction plus Davidson correction (MRCI+Q) and spin-orbit coupling (SOC) effect. Spectroscopy constants are fitted and the results are in good agreement with the experiment, ensuring the accuracy of the electronic s… ▽ More The high resolution rovibronic line list of MgH+ molecular cation are presented in our work. The potential energy curves are calculated by the method of multireference configuration interaction plus Davidson correction (MRCI+Q) and spin-orbit coupling (SOC) effect. Spectroscopy constants are fitted and the results are in good agreement with the experiment, ensuring the accuracy of the electronic structure. On account of potential energy curves and transition dipole moments, the Franck - Condon factors and Einstein coefficients of transition are obtained. These calculations are used to obtain an accurate partition functions and line list for the molecule. Using the data obtained from the ab initio calculation, the absorption cross-sections under different temperatures and pressures were simulated. Our work could provide some theoretical insights into solar and cold planet spectrum. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2401.01065 [pdf, other]

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving

Authors: Tao Tang, Dafeng Wei, Zhengyu Jia, Tian Gao, Changwei Cai, Chengkai Hou, Peng Jia, Kun Zhan, Haiyang Sun, Jingchen Fan, Yixing Zhao, Fu Liu, Xiaodan Liang, Xianpeng Lang, Yang Wang

Abstract: The rapid development of the autonomous driving industry has led to a significant accumulation of autonomous driving data. Consequently, there comes a growing demand for retrieving data to provide specialized optimization. However, directly applying previous image retrieval methods faces several challenges, such as the lack of global feature representation and inadequate text retrieval ability for… ▽ More The rapid development of the autonomous driving industry has led to a significant accumulation of autonomous driving data. Consequently, there comes a growing demand for retrieving data to provide specialized optimization. However, directly applying previous image retrieval methods faces several challenges, such as the lack of global feature representation and inadequate text retrieval ability for complex driving scenes. To address these issues, firstly, we propose the BEV-TSR framework which leverages descriptive text as an input to retrieve corresponding scenes in the Bird's Eye View (BEV) space. Then to facilitate complex scene retrieval with extensive text descriptions, we employ a large language model (LLM) to extract the semantic features of the text inputs and incorporate knowledge graph embeddings to enhance the semantic richness of the language embedding. To achieve feature alignment between the BEV feature and language embedding, we propose Shared Cross-modal Embedding with a set of shared learnable embeddings to bridge the gap between these two modalities, and employ a caption generation task to further enhance the alignment. Furthermore, there lack of well-formed retrieval datasets for effective evaluation. To this end, we establish a multi-level retrieval dataset, nuScenes-Retrieval, based on the widely adopted nuScenes dataset. Experimental results on the multi-level nuScenes-Retrieval show that BEV-TSR achieves state-of-the-art performance, e.g., 85.78% and 87.66% top-1 accuracy on scene-to-text and text-to-scene retrieval respectively. Codes and datasets will be available. △ Less

Submitted 18 June, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

arXiv:2401.01014 [pdf, other]

Entanglement hierarchies in multipartite scenarios

Authors: Hui Li, Ting Gao, Fengli Yan

Abstract: In this paper, we investigate the hierarchical structure of the $n$-partite quantum states. We present a whole set of hierarchical quantifications as a method of characterizing quantum states, which go beyond genuine multipartite entanglement measures and allow for fine identification among distinct entanglement contributions. This kind of quantifications, termed $k$-GM concurrence, can unambiguou… ▽ More In this paper, we investigate the hierarchical structure of the $n$-partite quantum states. We present a whole set of hierarchical quantifications as a method of characterizing quantum states, which go beyond genuine multipartite entanglement measures and allow for fine identification among distinct entanglement contributions. This kind of quantifications, termed $k$-GM concurrence, can unambiguously classify entangled states into $(n-1)$ distinct classes from the perspective of $k$-nonseparability with $k$ running from $n$ down to 2, and comply with the axiomatic conditions of an entanglement measure. Compared to $k$-ME concurrence [\href{https://journals.aps.org/pra/abstract/10.1103/PhysRevA.86.062323} {Phys. Rev. A \textbf{86}, 062323 (2012)}], the hierarchical measures proposed by us embody advantages in distinguishing same class entangled state and measuring continuity. In addition, we establish the relation between $k$-ME concurrence and $k$-GM concurrence, and further derive a strong lower bound on the $k$-GM concurrence by exploiting the permutationally invariant part of a quantum state. Furthermore, we parametrize $k$-GM concurrence to obtain two more general and complete categories of quantifications, $q$-$k$-GM concurrence $(q>1)$ and $α$-$k$-GM concurrence $(0\leqα<1)$, which obey the properties enjoyed by $k$-GM concurrence as well. In particular, $α$-$2$-GM concurrence $(0<α<1)$ determines that the GHZ state and the $W$ state belong to the same hierarchy, and it is proven in detail satisfying the requirement that the GHZ state is more entangled than the $W$ state in multiqubit systems. △ Less

Submitted 23 May, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

Comments: 10 pages, 3 figures

arXiv:2401.00744 [pdf, other]

Towards Harmonization of SO(3)-Equivariance and Expressiveness: a Hybrid Deep Learning Framework for Electronic-Structure Hamiltonian Prediction

Authors: Shi Yin, Xinyang Pan, Xudong Zhu, Tianyu Gao, Haochong Zhang, Feng Wu, Lixin He

Abstract: Deep learning for predicting the electronic-structure Hamiltonian of quantum systems necessitates satisfying the covariance laws, among which achieving SO(3)-equivariance without sacrificing the non-linear expressive capability of networks remains unsolved. To navigate the harmonization between equivariance and expressiveness, we propose a deep learning method synergizing two distinct categories o… ▽ More Deep learning for predicting the electronic-structure Hamiltonian of quantum systems necessitates satisfying the covariance laws, among which achieving SO(3)-equivariance without sacrificing the non-linear expressive capability of networks remains unsolved. To navigate the harmonization between equivariance and expressiveness, we propose a deep learning method synergizing two distinct categories of neural mechanisms as a two-stage encoding and regression framework. The first stage corresponds to group theory-based neural mechanisms with inherent SO(3)-equivariant properties prior to the parameter learning process, while the second stage is characterized by a non-linear 3D graph Transformer network we propose, featuring high capability on non-linear expressiveness. The novel combination lies in the point that, the first stage predicts baseline Hamiltonians with abundant SO(3)-equivariant features extracted, assisting the second stage in empirical learning of equivariance; and in turn, the second stage refines the first stage's output as a fine-grained prediction of Hamiltonians using powerful non-linear neural mappings, compensating for the intrinsic weakness on non-linear expressiveness capability of mechanisms in the first stage. Our method enables precise, generalizable predictions while capturing SO(3)-equivariance under rotational transformations, and achieves state-of-the-art performance in Hamiltonian prediction on six benchmark databases. △ Less

Submitted 21 June, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

arXiv:2312.12844 [pdf, other]

Effective Causal Discovery under Identifiable Heteroscedastic Noise Model

Authors: Naiyu Yin, Tian Gao, Yue Yu, Qiang Ji

Abstract: Capturing the underlying structural causal relations represented by Directed Acyclic Graphs (DAGs) has been a fundamental task in various AI disciplines. Causal DAG learning via the continuous optimization framework has recently achieved promising performance in terms of both accuracy and efficiency. However, most methods make strong assumptions of homoscedastic noise, i.e., exogenous noises have… ▽ More Capturing the underlying structural causal relations represented by Directed Acyclic Graphs (DAGs) has been a fundamental task in various AI disciplines. Causal DAG learning via the continuous optimization framework has recently achieved promising performance in terms of both accuracy and efficiency. However, most methods make strong assumptions of homoscedastic noise, i.e., exogenous noises have equal variances across variables, observations, or even both. The noises in real data usually violate both assumptions due to the biases introduced by different data collection processes. To address the issue of heteroscedastic noise, we introduce relaxed and implementable sufficient conditions, proving the identifiability of a general class of SEM subject to these conditions. Based on the identifiable general SEM, we propose a novel formulation for DAG learning that accounts for the variation in noise variance across variables and observations. We then propose an effective two-phase iterative DAG learning algorithm to address the increasing optimization difficulties and to learn a causal DAG from data with heteroscedastic variable noise under varying variance. We show significant empirical gains of the proposed approaches over state-of-the-art methods on both synthetic data and real data. △ Less

Submitted 9 June, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.10593 [pdf, other]

A Novel RFID Authentication Protocol Based on A Block-Order-Modulus Variable Matrix Encryption Algorithm

Authors: Yan Wang, Ruiqi Liu, Tong Gao, Feng Shu, Xuemei Lei, Guan Gui, Jiangzhou Wang

Abstract: In this paper, authentication for mobile radio frequency identification (RFID) systems with low-cost tags is studied. Firstly, an adaptive modulus (AM) encryption algorithm is proposed. Subsequently, in order to enhance the security without additional storage of new key matrices, a self-updating encryption order (SUEO) algorithm is designed. Furthermore, a diagonal block local transpose key matrix… ▽ More In this paper, authentication for mobile radio frequency identification (RFID) systems with low-cost tags is studied. Firstly, an adaptive modulus (AM) encryption algorithm is proposed. Subsequently, in order to enhance the security without additional storage of new key matrices, a self-updating encryption order (SUEO) algorithm is designed. Furthermore, a diagonal block local transpose key matrix (DBLTKM) encryption algorithm is presented, which effectively expands the feasible domain of the key space. Based on the above three algorithms, a novel joint AM-SUEO-DBLTKM encryption algorithm is constructed. Making full use of the advantages of the proposed joint algorithm, a two-way RFID authentication protocol, named AM-SUEO-DBLTKM-RFID, is proposed for mobile RFID systems. In addition, the Burrows-Abadi-Needham (BAN) logic and security analysis indicate that the proposed AM-SUEO-DBLTKM-RFID protocol can effectively combat various typical attacks. Numerical results demonstrate that the proposed AM-SUEO-DBLTKM algorithm can save 99.59\% of tag storage over traditional algorithms. Finally, the low computational complexity as well as the low storage cost of the proposed AM-SUEO-DBLTKM-RFID protocol facilitates deployment within low-cost RFID tags. △ Less

Submitted 9 May, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

arXiv:2312.07849 [pdf, other]

Encoder-minimal and Decoder-minimal Framework for Remote Sensing Image Dehazing

Authors: Yuanbo Wen, Tao Gao, Ziqi Li, Jing Zhang, Ting Chen

Abstract: Haze obscures remote sensing images, hindering valuable information extraction. To this end, we propose RSHazeNet, an encoder-minimal and decoder-minimal framework for efficient remote sensing image dehazing. Specifically, regarding the process of merging features within the same level, we develop an innovative module called intra-level transposed fusion module (ITFM). This module employs adaptive… ▽ More Haze obscures remote sensing images, hindering valuable information extraction. To this end, we propose RSHazeNet, an encoder-minimal and decoder-minimal framework for efficient remote sensing image dehazing. Specifically, regarding the process of merging features within the same level, we develop an innovative module called intra-level transposed fusion module (ITFM). This module employs adaptive transposed self-attention to capture comprehensive context-aware information, facilitating the robust context-aware feature fusion. Meanwhile, we present a cross-level multi-view interaction module (CMIM) to enable effective interactions between features from various levels, mitigating the loss of information due to the repeated sampling operations. In addition, we propose a multi-view progressive extraction block (MPEB) that partitions the features into four distinct components and employs convolution with varying kernel sizes, groups, and dilation factors to facilitate view-progressive feature learning. Extensive experiments demonstrate the superiority of our proposed RSHazeNet. We release the source code and all pre-trained models at \url{https://github.com/chdwyb/RSHazeNet}. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.06158 [pdf, other]

Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity

Authors: Xudong Li, Timin Gao, Runze Hu, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Jingyuan Zheng, Yunhang Shen, Ke Li, Yutao Liu, Pingyang Dai, Rongrong Ji

Abstract: The current state-of-the-art No-Reference Image Quality Assessment (NR-IQA) methods typically rely on feature extraction from upstream semantic backbone networks, assuming that all extracted features are relevant. However, we make a key observation that not all features are beneficial, and some may even be harmful, necessitating careful selection. Empirically, we find that many image pairs with sm… ▽ More The current state-of-the-art No-Reference Image Quality Assessment (NR-IQA) methods typically rely on feature extraction from upstream semantic backbone networks, assuming that all extracted features are relevant. However, we make a key observation that not all features are beneficial, and some may even be harmful, necessitating careful selection. Empirically, we find that many image pairs with small feature spatial distances can have vastly different quality scores, indicating that the extracted features may contain a significant amount of quality-irrelevant noise. To address this issue, we propose a Quality-Aware Feature Matching IQA Metric (QFM-IQM) that employs an adversarial perspective to remove harmful semantic noise features from the upstream task. Specifically, QFM-IQM enhances the semantic noise distinguish capabilities by matching image pairs with similar quality scores but varying semantic features as adversarial semantic noise and adaptively adjusting the upstream task's features by reducing sensitivity to adversarial noise perturbation. Furthermore, we utilize a distillation framework to expand the dataset and improve the model's generalization ability. Our approach achieves superior performance to the state-of-the-art NR-IQA methods on eight standard IQA datasets. △ Less

Submitted 26 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.00962 [pdf, other]

MBot: A Modular Ecosystem for Scalable Robotics Education

Authors: Peter Gaskell, Jana Pavlasek, Tom Gao, Abhishek Narula, Stanley Lewis, Odest Chadwicke Jenkins

Abstract: The Michigan Robotics MBot is a low-cost mobile robot platform that has been used to train over 1,400 students in autonomous navigation since 2014 at the University of Michigan and our collaborating colleges. The MBot platform was designed to meet the needs of teaching robotics at scale to match the growth of robotics as a field and an academic discipline. Transformative advancements in robot navi… ▽ More The Michigan Robotics MBot is a low-cost mobile robot platform that has been used to train over 1,400 students in autonomous navigation since 2014 at the University of Michigan and our collaborating colleges. The MBot platform was designed to meet the needs of teaching robotics at scale to match the growth of robotics as a field and an academic discipline. Transformative advancements in robot navigation over the past decades have led to a significant demand for skilled roboticists across industry and academia. This demand has sparked a need for robotics courses in higher education, spanning all levels of undergraduate and graduate experiences. Incorporating real robot platforms into such courses and curricula is effective for conveying the unique challenges of programming embodied agents in real-world environments and sparking student interest. However, teaching with real robots remains challenging due to the cost of hardware and the development effort involved in adapting existing hardware for a new course. In this paper, we describe the design and evolution of the MBot platform, and the underlying principals of scalability and flexibility which are keys to its success. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2311.14294 [pdf, other]

Decouple Content and Motion for Conditional Image-to-Video Generation

Authors: Cuifeng Shen, Yulu Gan, Chen Chen, Xiongwei Zhu, Lele Cheng, Tingting Gao, Jinzhi Wang

Abstract: The goal of conditional image-to-video (cI2V) generation is to create a believable new video by beginning with the condition, i.e., one image and text.The previous cI2V generation methods conventionally perform in RGB pixel space, with limitations in modeling motion consistency and visual continuity. Additionally, the efficiency of generating videos in pixel space is quite low.In this paper, we pr… ▽ More The goal of conditional image-to-video (cI2V) generation is to create a believable new video by beginning with the condition, i.e., one image and text.The previous cI2V generation methods conventionally perform in RGB pixel space, with limitations in modeling motion consistency and visual continuity. Additionally, the efficiency of generating videos in pixel space is quite low.In this paper, we propose a novel approach to address these challenges by disentangling the target RGB pixels into two distinct components: spatial content and temporal motions. Specifically, we predict temporal motions which include motion vector and residual based on a 3D-UNet diffusion model. By explicitly modeling temporal motions and warping them to the starting image, we improve the temporal consistency of generated videos. This results in a reduction of spatial redundancy, emphasizing temporal details. Our proposed method achieves performance improvements by disentangling content and motion, all without introducing new structural complexities to the model. Extensive experiments on various datasets confirm our approach's superior performance over the majority of state-of-the-art methods in both effectiveness and efficiency. △ Less

Submitted 14 December, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

arXiv:2311.14284 [pdf, other]

Paragraph-to-Image Generation with Information-Enriched Diffusion Model

Authors: Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang

Abstract: Text-to-image (T2I) models have recently experienced rapid development, achieving astonishing performance in terms of fidelity and textual alignment capabilities. However, given a long paragraph (up to 512 words), these generation models still struggle to achieve strong alignment and are unable to generate images depicting complex scenes. In this paper, we introduce an information-enriched diffusi… ▽ More Text-to-image (T2I) models have recently experienced rapid development, achieving astonishing performance in terms of fidelity and textual alignment capabilities. However, given a long paragraph (up to 512 words), these generation models still struggle to achieve strong alignment and are unable to generate images depicting complex scenes. In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation. At its core is using a large language model (e.g., Llama V2) to encode long-form text, followed by fine-tuning with LORA to alignthe text-image feature spaces in the generation task. To facilitate the training of long-text semantic alignment, we also curated a high-quality paragraph-image pair dataset, namely ParaImage. This dataset contains a small amount of high-quality, meticulously annotated data, and a large-scale synthetic dataset with long text descriptions being generated using a vision-language model. Experiments demonstrate that ParaDiffusion outperforms state-of-the-art models (SD XL, DeepFloyd IF) on ViLG-300 and ParaPrompts, achieving up to 15% and 45% human voting rate improvements for visual appeal and text faithfulness, respectively. The code and dataset will be released to foster community research on long-text alignment. △ Less

Submitted 29 November, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

Comments: The project website is at: https://weijiawu.github.io/ParaDiffusionPage/. Code: https://github.com/weijiawu/ParaDiffusion

arXiv:2311.12320 [pdf, other]

A Survey on Multimodal Large Language Models for Autonomous Driving

Authors: Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Yang Zhou, Kaizhao Liang, Jintai Chen, Juanwu Lu, Zichong Yang, Kuei-Da Liao, Tianren Gao, Erlong Li, Kun Tang, Zhipeng Cao, Tong Zhou, Ao Liu, Xinrui Yan, Shuqi Mei, Jianguo Cao, Ziran Wang, Chao Zheng

Abstract: With the emergence of Large Language Models (LLMs) and Vision Foundation Models (VFMs), multimodal AI systems benefiting from large models have the potential to equally perceive the real world, make decisions, and control tools as humans. In recent months, LLMs have shown widespread attention in autonomous driving and map systems. Despite its immense potential, there is still a lack of a comprehen… ▽ More With the emergence of Large Language Models (LLMs) and Vision Foundation Models (VFMs), multimodal AI systems benefiting from large models have the potential to equally perceive the real world, make decisions, and control tools as humans. In recent months, LLMs have shown widespread attention in autonomous driving and map systems. Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors to apply in LLM driving systems. In this paper, we present a systematic investigation in this field. We first introduce the background of Multimodal Large Language Models (MLLMs), the multimodal models development using LLMs, and the history of autonomous driving. Then, we overview existing MLLM tools for driving, transportation, and map systems together with existing datasets and benchmarks. Moreover, we summarized the works in The 1st WACV Workshop on Large Language and Vision Models for Autonomous Driving (LLVM-AD), which is the first workshop of its kind regarding LLMs in autonomous driving. To further promote the development of this field, we also discuss several important problems regarding using MLLMs in autonomous driving systems that need to be solved by both academia and industry. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.09539 [pdf, ps, other]

Entanglement constraint on wave-particle duality for tripartite systems

Authors: Zanjia Li, Yingqiu He, Dong Ding, Ting Gao, Fengli Yan

Abstract: A global multi-partite entanglement may place a constraint on the wave-particle duality. We investigate this constraint relation of the global entanglement and the quantitative wave-particle duality in tripartite systems. We perform quantum state tomography to reconstruct the reduced density matrix by using the OriginQ quantum computing cloud platform. As a result, we show that, theoretically and… ▽ More A global multi-partite entanglement may place a constraint on the wave-particle duality. We investigate this constraint relation of the global entanglement and the quantitative wave-particle duality in tripartite systems. We perform quantum state tomography to reconstruct the reduced density matrix by using the OriginQ quantum computing cloud platform. As a result, we show that, theoretically and experimentally, the quantitative wave-particle duality is indeed constrained by the global tripartite entanglement. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: 10 pages, 3 figures

Showing 1–50 of 421 results for author: Gao, T