subscribe to arXiv mailings

Haar Nuclear Norms with Applications to Remote Sensing Imagery Restoration

Authors: Shuang Xu, Chang Yu, Jiangjun Peng, Xiangyong Cao

Abstract: Remote sensing image restoration aims to reconstruct missing or corrupted areas within images. To date, low-rank based models have garnered significant interest in this field. This paper proposes a novel low-rank regularization term, named the Haar nuclear norm (HNN), for efficient and effective remote sensing image restoration. It leverages the low-rank properties of wavelet coefficients derive… ▽ More Remote sensing image restoration aims to reconstruct missing or corrupted areas within images. To date, low-rank based models have garnered significant interest in this field. This paper proposes a novel low-rank regularization term, named the Haar nuclear norm (HNN), for efficient and effective remote sensing image restoration. It leverages the low-rank properties of wavelet coefficients derived from the 2-D frontal slice-wise Haar discrete wavelet transform, effectively modeling the low-rank prior for separated coarse-grained structure and fine-grained textures in the image. Experimental evaluations conducted on hyperspectral image inpainting, multi-temporal image cloud removal, and hyperspectral image denoising have revealed the HNN's potential. Typically, HNN achieves a performance improvement of 1-4 dB and a speedup of 10-28x compared to some state-of-the-art methods (e.g., tensor correlated total variation, and fully-connected tensor network) for inpainting tasks. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.08377 [pdf, other]

Long-range Turbulence Mitigation: A Large-scale Dataset and A Coarse-to-fine Framework

Authors: Shengqi Xu, Run Sun, Yi Chang, Shuning Cao, Xueyao Xiao, Luxin Yan

Abstract: Long-range imaging inevitably suffers from atmospheric turbulence with severe geometric distortions due to random refraction of light. The further the distance, the more severe the disturbance. Despite existing research has achieved great progress in tackling short-range turbulence, there is less attention paid to long-range turbulence with significant distortions. To address this dilemma and adva… ▽ More Long-range imaging inevitably suffers from atmospheric turbulence with severe geometric distortions due to random refraction of light. The further the distance, the more severe the disturbance. Despite existing research has achieved great progress in tackling short-range turbulence, there is less attention paid to long-range turbulence with significant distortions. To address this dilemma and advance the field, we construct a large-scale real long-range atmospheric turbulence dataset (RLR-AT), including 1500 turbulence sequences spanning distances from 1 Km to 13 Km. The advantages of RLR-AT compared to existing ones: turbulence with longer-distances and higher-diversity, scenes with greater-variety and larger-scale. Moreover, most existing work adopts either registration-based or decomposition-based methods to address distortions through one-step mitigation. However, they fail to effectively handle long-range turbulence due to its significant pixel displacements. In this work, we propose a coarse-to-fine framework to handle severe distortions, which cooperates dynamic turbulence and static background priors (CDSP). On the one hand, we discover the pixel motion statistical prior of turbulence, and propose a frequency-aware reference frame for better large-scale distortion registration, greatly reducing the burden of refinement. On the other hand, we take advantage of the static prior of background, and propose a subspace-based low-rank tensor refinement model to eliminate the misalignments inevitably left by registration while well preserving details. The dynamic and static priors complement to each other, facilitating us to progressively mitigate long-range turbulence with severe distortions. Extensive experiments demonstrate that the proposed method outperforms SOTA methods on different datasets. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: This paper is accepted by ECCV 2024

arXiv:2407.08224 [pdf, other]

stEnTrans: Transformer-based deep learning for spatial transcriptomics enhancement

Authors: Shuailin Xue, Fangfang Zhu, Changmiao Wang, Wenwen Min

Abstract: The spatial location of cells within tissues and organs is crucial for the manifestation of their specific functions.Spatial transcriptomics technology enables comprehensive measurement of the gene expression patterns in tissues while retaining spatial information. However, current popular spatial transcriptomics techniques either have shallow sequencing depth or low resolution. We present stEnTra… ▽ More The spatial location of cells within tissues and organs is crucial for the manifestation of their specific functions.Spatial transcriptomics technology enables comprehensive measurement of the gene expression patterns in tissues while retaining spatial information. However, current popular spatial transcriptomics techniques either have shallow sequencing depth or low resolution. We present stEnTrans, a deep learning method based on Transformer architecture that provides comprehensive predictions for gene expression in unmeasured areas or unexpectedly lost areas and enhances gene expression in original and inputed spots. Utilizing a self-supervised learning approach, stEnTrans establishes proxy tasks on gene expression profile without requiring additional data, mining intrinsic features of the tissues as supervisory information. We evaluate stEnTrans on six datasets and the results indicate superior performance in enhancing spots resolution and predicting gene expression in unmeasured areas compared to other deep learning and traditional interpolation methods. Additionally, Our method also can help the discovery of spatial patterns in Spatial Transcriptomics and enrich to more biologically significant pathways. Our source code is available at https://github.com/shuailinxue/stEnTrans. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: ISBRA2024, Code: https://github.com/shuailinxue/stEnTrans

arXiv:2407.08200 [pdf, other]

Deep Understanding of Soccer Match Videos

Authors: Shikun Xu, Yandong Zhu, Gen Li, Changhu Wang

Abstract: Soccer is one of the most popular sport worldwide, with live broadcasts frequently available for major matches. However, extracting detailed, frame-by-frame information on player actions from these videos remains a challenge. Utilizing state-of-the-art computer vision technologies, our system can detect key objects such as soccer balls, players and referees. It also tracks the movements of players… ▽ More Soccer is one of the most popular sport worldwide, with live broadcasts frequently available for major matches. However, extracting detailed, frame-by-frame information on player actions from these videos remains a challenge. Utilizing state-of-the-art computer vision technologies, our system can detect key objects such as soccer balls, players and referees. It also tracks the movements of players and the ball, recognizes player numbers, classifies scenes, and identifies highlights such as goal kicks. By analyzing live TV streams of soccer matches, our system can generate highlight GIFs, tactical illustrations, and diverse summary graphs of ongoing games. Through these visual recognition techniques, we deliver a comprehensive understanding of soccer game videos, enriching the viewer's experience with detailed and insightful analysis. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.08156 [pdf, other]

AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization

Authors: Shixiong Xu, Chenghao Zhang, Lubin Fan, Gaofeng Meng, Shiming Xiang, Jieping Ye

Abstract: In this study, we introduce a new problem raised by social media and photojournalism, named Image Address Localization (IAL), which aims to predict the readable textual address where an image was taken. Existing two-stage approaches involve predicting geographical coordinates and converting them into human-readable addresses, which can lead to ambiguity and be resource-intensive. In contrast, we p… ▽ More In this study, we introduce a new problem raised by social media and photojournalism, named Image Address Localization (IAL), which aims to predict the readable textual address where an image was taken. Existing two-stage approaches involve predicting geographical coordinates and converting them into human-readable addresses, which can lead to ambiguity and be resource-intensive. In contrast, we propose an end-to-end framework named AddressCLIP to solve the problem with more semantics, consisting of two key ingredients: i) image-text alignment to align images with addresses and scene captions by contrastive learning, and ii) image-geography matching to constrain image features with the spatial distance in terms of manifold learning. Additionally, we have built three datasets from Pittsburgh and San Francisco on different scales specifically for the IAL problem. Experiments demonstrate that our approach achieves compelling performance on the proposed datasets and outperforms representative transfer learning methods for vision-language models. Furthermore, extensive ablations and visualizations exhibit the effectiveness of the proposed method. The datasets and source code are available at https://github.com/xsx1001/AddressCLIP. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted at ECCV 2024

arXiv:2407.07651 [pdf, other]

Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$

Authors: M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (645 additional authors not shown)

Abstract: The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be… ▽ More The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07178 [pdf]

Uniaxial plasmon polaritons $\textit{via}$ charge transfer at the graphene/CrSBr interface

Authors: Daniel J. Rizzo, Eric Seewald, Fangzhou Zhao, Jordan Cox, Kaichen Xie, Rocco A. Vitalone, Francesco L. Ruta, Daniel G. Chica, Yinming Shao, Sara Shabani, Evan J. Telford, Matthew C. Strasbourg, Thomas P. Darlington, Suheng Xu, Siyuan Qiu, Aravind Devarakonda, Takashi Taniguchi, Kenji Watanabe, Xiaoyang Zhu, P. James Schuck, Cory R. Dean, Xavier Roy, Andrew J. Millis, Ting Cao, Angel Rubio , et al. (2 additional authors not shown)

Abstract: Graphene is a privileged 2D platform for hosting confined light-matter excitations known as surface plasmon-polaritons (SPPs), as it possesses low intrinsic losses with a high degree of optical confinement. However, the inherently isotropic optical properties of graphene limit its ability to guide and focus SPPs, making it less suitable than anisotropic elliptical and hyperbolic materials as a pla… ▽ More Graphene is a privileged 2D platform for hosting confined light-matter excitations known as surface plasmon-polaritons (SPPs), as it possesses low intrinsic losses with a high degree of optical confinement. However, the inherently isotropic optical properties of graphene limit its ability to guide and focus SPPs, making it less suitable than anisotropic elliptical and hyperbolic materials as a platform for polaritonic lensing and canalization. Here, we present the graphene/CrSBr heterostructure as an engineered 2D interface that hosts highly anisotropic SPP propagation over a wide range of frequencies in the mid-infrared and terahertz. Using a combination of scanning tunneling microscopy (STM), scattering-type scanning near-field optical microscopy (s-SNOM), and first-principles calculations, we demonstrate mutual doping in excess of 10$^{13}$ cm$^{-2}$ holes/electrons between the interfacial layers of graphene/CrSBr heterostructures. SPPs in graphene activated by charge transfer interact with charge-induced anisotropic intra- and interband transitions in the interfacial doped CrSBr, leading to preferential SPP propagation along the quasi-1D chains that compose each CrSBr layer. This multifaceted proximity effect both creates SPPs and endows them with anisotropic transport and propagation lengths that differ by an order-of-magnitude between the two in-plane crystallographic axes of CrSBr. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06192 [pdf, other]

Multi-Object Hallucination in Vision-Language Models

Authors: Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang, Sihan Xu, Shengyi Qian, Jianing Yang, David F. Fouhey, Joyce Chai

Abstract: Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images. While current benchmarks for object hallucination primarily concentrate on the presence of a single object class rather than individual entities, this work systematically investigates multi-object hallucination, examining how models misperceive (e.g., invent nonexistent o… ▽ More Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images. While current benchmarks for object hallucination primarily concentrate on the presence of a single object class rather than individual entities, this work systematically investigates multi-object hallucination, examining how models misperceive (e.g., invent nonexistent objects or become distracted) when tasked with focusing on multiple objects simultaneously. We introduce Recognition-based Object Probing Evaluation (ROPE), an automated evaluation protocol that considers the distribution of object classes within a single image during testing and uses visual referring prompts to eliminate ambiguity. With comprehensive empirical studies and analysis of potential factors leading to multi-object hallucination, we found that (1) LVLMs suffer more hallucinations when focusing on multiple objects compared to a single object. (2) The tested object class distribution affects hallucination behaviors, indicating that LVLMs may follow shortcuts and spurious correlations.(3) Hallucinatory behaviors are influenced by data-specific factors, salience and frequency, and model intrinsic behaviors. We hope to enable LVLMs to recognize and reason about multiple objects that often occur in realistic visual scenes, provide insights, and quantify our progress towards mitigating the issues. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Accepted to ALVR @ ACL 2024 | Project page: https://multi-object-hallucination.github.io/

arXiv:2407.06064 [pdf, other]

Pan-denoising: Guided Hyperspectral Image Denoising via Weighted Represent Coefficient Total Variation

Authors: Shuang Xu, Qiao Ke, Jiangjun Peng, Xiangyong Cao, Zixiang Zhao

Abstract: This paper introduces a novel paradigm for hyperspectral image (HSI) denoising, which is termed \textit{pan-denoising}. In a given scene, panchromatic (PAN) images capture similar structures and textures to HSIs but with less noise. This enables the utilization of PAN images to guide the HSI denoising process. Consequently, pan-denoising, which incorporates an additional prior, has the potential t… ▽ More This paper introduces a novel paradigm for hyperspectral image (HSI) denoising, which is termed \textit{pan-denoising}. In a given scene, panchromatic (PAN) images capture similar structures and textures to HSIs but with less noise. This enables the utilization of PAN images to guide the HSI denoising process. Consequently, pan-denoising, which incorporates an additional prior, has the potential to uncover underlying structures and details beyond the internal information modeling of traditional HSI denoising methods. However, the proper modeling of this additional prior poses a significant challenge. To alleviate this issue, the paper proposes a novel regularization term, Panchromatic Weighted Representation Coefficient Total Variation (PWRCTV). It employs the gradient maps of PAN images to automatically assign different weights of TV regularization for each pixel, resulting in larger weights for smooth areas and smaller weights for edges. This regularization forms the basis of a pan-denoising model, which is solved using the Alternating Direction Method of Multipliers. Extensive experiments on synthetic and real-world datasets demonstrate that PWRCTV outperforms several state-of-the-art methods in terms of metrics and visual quality. Furthermore, an HSI classification experiment confirms that PWRCTV, as a preprocessing method, can enhance the performance of downstream classification tasks. The code and data are available at https://github.com/shuangxu96/PWRCTV. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05647 [pdf, other]

Learning to Adapt Category Consistent Meta-Feature of CLIP for Few-Shot Classification

Authors: Jiaying Shi, Xuetong Xue, Shenghui Xu

Abstract: The recent CLIP-based methods have shown promising zero-shot and few-shot performance on image classification tasks. Existing approaches such as CoOp and Tip-Adapter only focus on high-level visual features that are fully aligned with textual features representing the ``Summary" of the image. However, the goal of few-shot learning is to classify unseen images of the same category with few labeled… ▽ More The recent CLIP-based methods have shown promising zero-shot and few-shot performance on image classification tasks. Existing approaches such as CoOp and Tip-Adapter only focus on high-level visual features that are fully aligned with textual features representing the ``Summary" of the image. However, the goal of few-shot learning is to classify unseen images of the same category with few labeled samples. Especially, in contrast to high-level representations, local representations (LRs) at low-level are more consistent between seen and unseen samples. Based on this point, we propose the Meta-Feature Adaption method (MF-Adapter) that combines the complementary strengths of both LRs and high-level semantic representations. Specifically, we introduce the Meta-Feature Unit (MF-Unit), which is a simple yet effective local similarity metric to measure category-consistent local context in an inductive manner. Then we train an MF-Adapter to map image features to MF-Unit for adequately generalizing the intra-class knowledge between unseen images and the support set. Extensive experiments show that our proposed method is superior to the state-of-the-art CLIP downstream few-shot classification methods, even showing stronger performance on a set of challenging visual classification tasks. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05645 [pdf, other]

OneDiff: A Generalist Model for Image Difference

Authors: Erdong Hu, Longteng Guo, Tongtian Yue, Zijia Zhao, Shuning Xue, Jing Liu

Abstract: In computer vision, Image Difference Captioning (IDC) is crucial for accurately describing variations between closely related images. Traditional IDC methods often rely on specialist models, which restrict their applicability across varied contexts. This paper introduces the OneDiff model, a novel generalist approach that utilizes a robust vision-language model architecture, integrating a siamese… ▽ More In computer vision, Image Difference Captioning (IDC) is crucial for accurately describing variations between closely related images. Traditional IDC methods often rely on specialist models, which restrict their applicability across varied contexts. This paper introduces the OneDiff model, a novel generalist approach that utilizes a robust vision-language model architecture, integrating a siamese image encoder with a Visual Delta Module. This innovative configuration allows for the precise detection and articulation of fine-grained differences between image pairs. OneDiff is trained through a dual-phase strategy, encompassing Coupled Sample Training and multi-task learning across a diverse array of data types, supported by our newly developed DiffCap Dataset. This dataset merges real-world and synthetic data, enhancing the training process and bolstering the model's robustness. Extensive testing on diverse IDC benchmarks, such as Spot-the-Diff, CLEVR-Change, and Birds-to-Words, shows that OneDiff consistently outperforms existing state-of-the-art models in accuracy and adaptability, achieving improvements of up to 85\% CIDEr points in average. By setting a new benchmark in IDC, OneDiff paves the way for more versatile and effective applications in detecting and describing visual differences. The code, models, and data will be made publicly available. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05563 [pdf, other]

LLMBox: A Comprehensive Library for Large Language Models

Authors: Tianyi Tang, Yiwen Hu, Bingqian Li, Wenyang Luo, Zijing Qin, Haoxiang Sun, Jiapeng Wang, Shiyi Xu, Xiaoxue Cheng, Geyang Guo, Han Peng, Bowen Zheng, Yiru Tang, Yingqian Min, Yushuo Chen, Jie Chen, Yuanqian Zhao, Luran Ding, Yuhao Wang, Zican Dong, Chunxuan Xia, Junyi Li, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen

Abstract: To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets,… ▽ More To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets, and models, and (3) more practical consideration, especially on user-friendliness and efficiency. With our library, users can easily reproduce existing methods, train new models, and conduct comprehensive performance comparisons. To rigorously test LLMBox, we conduct extensive experiments in a diverse coverage of evaluation settings, and experimental results demonstrate the effectiveness and efficiency of our library in supporting various implementations related to LLMs. The detailed introduction and usage guidance can be found at https://github.com/RUCAIBox/LLMBox. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Accepted by ACL 2024 Demo

arXiv:2407.04237 [pdf, other]

GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction

Authors: Yuxuan Mu, Xinxin Zuo, Chuan Guo, Yilin Wang, Juwei Lu, Xiaofeng Wu, Songcen Xu, Peng Dai, Youliang Yan, Li Cheng

Abstract: We present GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view. Prior works suffer from inconsistent 3D geometry or mediocre rendering quality due to improper representations. We take a step towards resolving these shortcomings by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an… ▽ More We present GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view. Prior works suffer from inconsistent 3D geometry or mediocre rendering quality due to improper representations. We take a step towards resolving these shortcomings by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an unconditional diffusion model. This model learns to generate 3D objects represented by sets of GS ellipsoids. With these strong generative 3D priors, though learning unconditionally, the diffusion model is ready for view-guided reconstruction without further model fine-tuning. This is achieved by propagating fine-grained 2D features through the efficient yet flexible splatting function and the guided denoising sampling process. In addition, a 2D diffusion model is further employed to enhance rendering fidelity, and improve reconstructed GS quality by polishing and re-using the rendered images. The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views. Experiments on the challenging real-world CO3D dataset demonstrate the superiority of our approach. Project page: $\href{https://yxmu.foo/GSD/}{\text{this https URL}}$ △ Less

Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

Comments: Accepted for ECCV 2024

arXiv:2407.03308 [pdf, other]

Accelerated Proton Resonance Frequency-based Magnetic Resonance Thermometry by Optimized Deep Learning Method

Authors: Sijie Xu, Shenyan Zong, Chang-Sheng Mei, Guofeng Shen, Yueran Zhao, He Wang

Abstract: Proton resonance frequency (PRF) based MR thermometry is essential for focused ultrasound (FUS) thermal ablation therapies. This work aims to enhance temporal resolution in dynamic MR temperature map reconstruction using an improved deep learning method. The training-optimized methods and five classical neural networks were applied on the 2-fold and 4-fold under-sampling k-space data to reconstruc… ▽ More Proton resonance frequency (PRF) based MR thermometry is essential for focused ultrasound (FUS) thermal ablation therapies. This work aims to enhance temporal resolution in dynamic MR temperature map reconstruction using an improved deep learning method. The training-optimized methods and five classical neural networks were applied on the 2-fold and 4-fold under-sampling k-space data to reconstruct the temperature maps. The enhanced training modules included offline/online data augmentations, knowledge distillation, and the amplitude-phase decoupling loss function. The heating experiments were performed by a FUS transducer on phantom and ex vivo tissues, respectively. These data were manually under-sampled to imitate acceleration procedures and trained in our method to get the reconstruction model. The additional dozen or so testing datasets were separately obtained for evaluating the real-time performance and temperature accuracy. Acceleration factors of 1.9 and 3.7 were found for 2 times and 4 times k-space under-sampling strategies and the ResUNet-based deep learning reconstruction performed exceptionally well. In 2-fold acceleration scenario, the RMSE of temperature map patches provided the values of 0.888 degree centigrade and 1.145 degree centigrade on phantom and ex vivo testing datasets. The DICE value of temperature areas enclosed by 43 degree centigrade isotherm was 0.809, and the Bland-Altman analysis showed a bias of -0.253 degree centigrade with the apart of plus or minus 2.16 degree centigrade. In 4 times under-sampling case, these evaluating values decreased by approximately 10%. This study demonstrates that deep learning-based reconstruction can significantly enhance the accuracy and efficiency of MR thermometry for clinical FUS thermal therapies. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03248 [pdf, ps, other]

Section conjectures over $\mathbb{C}$ and Kodaira fibrations

Authors: Simon Shuofeng Xu

Abstract: In this paper we propose and study topological and Hodge theoretic analogues of Grothendieck's section conjecture over the complex numbers. We study these questions in the context of family of curves, in particular Kodaira fibrations, and in the context of the family of Jacobians associated to a Kodaira fibration. We showed that in the case of family of curves, both the topological and Hodge-theor… ▽ More In this paper we propose and study topological and Hodge theoretic analogues of Grothendieck's section conjecture over the complex numbers. We study these questions in the context of family of curves, in particular Kodaira fibrations, and in the context of the family of Jacobians associated to a Kodaira fibration. We showed that in the case of family of curves, both the topological and Hodge-theoretic analogues of the injectivity part of the section conjecture holds, and that in the case of family of Jacobians, the topological analogue of the surjectivity part of the section conjecture does not hold in general. For family of curves, we also reduce the topological analogue of the surjectivity part of the section conjecture to the case where the families have no algebraic sections. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 26 pages

MSC Class: 14D05; 55R37; 14H10; 14J29

arXiv:2407.03034 [pdf, ps, other]

Attention Incorporated Network for Sharing Low-rank, Image and K-space Information during MR Image Reconstruction to Achieve Single Breath-hold Cardiac Cine Imaging

Authors: Siying Xu, Kerstin Hammernik, Andreas Lingg, Jens Kuebler, Patrick Krumm, Daniel Rueckert, Sergios Gatidis, Thomas Kuestner

Abstract: Cardiac Cine Magnetic Resonance Imaging (MRI) provides an accurate assessment of heart morphology and function in clinical practice. However, MRI requires long acquisition times, with recent deep learning-based methods showing great promise to accelerate imaging and enhance reconstruction quality. Existing networks exhibit some common limitations that constrain further acceleration possibilities,… ▽ More Cardiac Cine Magnetic Resonance Imaging (MRI) provides an accurate assessment of heart morphology and function in clinical practice. However, MRI requires long acquisition times, with recent deep learning-based methods showing great promise to accelerate imaging and enhance reconstruction quality. Existing networks exhibit some common limitations that constrain further acceleration possibilities, including single-domain learning, reliance on a single regularization term, and equal feature contribution. To address these limitations, we propose to embed information from multiple domains, including low-rank, image, and k-space, in a novel deep learning network for MRI reconstruction, which we denote as A-LIKNet. A-LIKNet adopts a parallel-branch structure, enabling independent learning in the k-space and image domain. Coupled information sharing layers realize the information exchange between domains. Furthermore, we introduce attention mechanisms into the network to assign greater weights to more critical coils or important temporal frames. Training and testing were conducted on an in-house dataset, including 91 cardiovascular patients and 38 healthy subjects scanned with 2D cardiac Cine using retrospective undersampling. Additionally, we evaluated A-LIKNet on the real-time 8x prospectively undersampled data from the OCMR dataset. The results demonstrate that our proposed A-LIKNet outperforms existing methods and provides high-quality reconstructions. The network can effectively reconstruct highly retrospectively undersampled dynamic MR images up to 24x accelerations, indicating its potential for single breath-hold imaging. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02899 [pdf, other]

Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be… ▽ More A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02793 [pdf, other]

Learning Positional Attention for Sequential Recommendation

Authors: Fan Luo, Juan Zhang, Shenghui Xu

Abstract: Self-attention-based networks have achieved remarkable performance in sequential recommendation tasks. A crucial component of these models is positional encoding. In this study, we delve into the learned positional embedding, demonstrating that it often captures the distance between tokens. Building on this insight, we introduce novel attention models that directly learn positional relations. Exte… ▽ More Self-attention-based networks have achieved remarkable performance in sequential recommendation tasks. A crucial component of these models is positional encoding. In this study, we delve into the learned positional embedding, demonstrating that it often captures the distance between tokens. Building on this insight, we introduce novel attention models that directly learn positional relations. Extensive experiments reveal that our proposed models, \textbf{PARec} and \textbf{FPARec} outperform previous self-attention-based approaches.Our code is available at the link for anonymous review: https://anonymous.4open.science/ r/FPARec-2C55/ △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02772 [pdf, other]

Automatic gradient descent with generalized Newton's method

Authors: Zhiqi Bu, Shiyun Xu

Abstract: We propose the generalized Newton's method (GeN) -- a Hessian-informed approach that applies to any optimizer such as SGD and Adam, and covers the Newton-Raphson method as a sub-case. Our method automatically and dynamically selects the learning rate that accelerates the convergence, without the intensive tuning of the learning rate scheduler. In practice, out method is easily implementable, since… ▽ More We propose the generalized Newton's method (GeN) -- a Hessian-informed approach that applies to any optimizer such as SGD and Adam, and covers the Newton-Raphson method as a sub-case. Our method automatically and dynamically selects the learning rate that accelerates the convergence, without the intensive tuning of the learning rate scheduler. In practice, out method is easily implementable, since it only requires additional forward passes with almost zero computational overhead (in terms of training time and memory cost), if the overhead is amortized over many iterations. We present extensive experiments on language and vision tasks (e.g. GPT and ResNet) to showcase that GeN optimizers match the state-of-the-art performance, which was achieved with carefully tuned learning rate schedulers. Code to be released at \url{https://github.com/ShiyunXu/AutoGeN}. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02188 [pdf, other]

Structure-Aware Consensus Network on Graphs with Few Labeled Nodes

Authors: Shuaike Xu, Xiaolin Zhang, Peng Zhang, Kun Zhan

Abstract: Graph node classification with few labeled nodes presents significant challenges due to limited supervision. Conventional methods often exploit the graph in a transductive learning manner. They fail to effectively utilize the abundant unlabeled data and the structural information inherent in graphs. To address these issues, we introduce a Structure-Aware Consensus Network (SACN) from three perspec… ▽ More Graph node classification with few labeled nodes presents significant challenges due to limited supervision. Conventional methods often exploit the graph in a transductive learning manner. They fail to effectively utilize the abundant unlabeled data and the structural information inherent in graphs. To address these issues, we introduce a Structure-Aware Consensus Network (SACN) from three perspectives. Firstly, SACN leverages a novel structure-aware consensus learning strategy between two strongly augmented views. The proposed strategy can fully exploit the potentially useful information of the unlabeled nodes and the structural information of the entire graph. Secondly, SACN uniquely integrates the graph's structural information to achieve strong-to-strong consensus learning, improving the utilization of unlabeled data while maintaining multiview learning. Thirdly, unlike two-branch graph neural network-based methods, SACN is designed for multiview feature learning within a single-branch architecture. Furthermore, a class-aware pseudolabel selection strategy helps address class imbalance and achieve effective weak-to-strong supervision. Extensive experiments on three benchmark datasets demonstrate SACN's superior performance in node classification tasks, particularly at very low label rates, outperforming state-of-the-art methods while maintaining computational simplicity.The source code is available at https://github.com/kunzhan/SACN △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: under review

arXiv:2407.01552 [pdf]

High Spectral-Efficiency, Ultra-low MIMO SDM Transmission over a Field-Deployed Multi-Core OAM Fiber

Authors: Junyi Liu, Zengquan Xu, Shuqi Mo, Yuming Huang, Yining Huang, Zhenhua Li, Yuying Guo, Lei Shen, Shuo Xu, Ran Gao, Cheng Du, Qian Feng, Jie Luo, Jie Liu, Siyuan Yu

Abstract: Few-mode multi-core fiber (FM-MCF) based Space-Division Multiplexing (SDM) systems possess the potential to maximize the number of multiplexed spatial channels per fiber by harnessing both the space (fiber cores) and mode (optical mode per core) dimensions. However, to date, no SDM transmissions over field-deployed FM-MCFs in realistic outdoor settings have been reported, which contrasts with SDM… ▽ More Few-mode multi-core fiber (FM-MCF) based Space-Division Multiplexing (SDM) systems possess the potential to maximize the number of multiplexed spatial channels per fiber by harnessing both the space (fiber cores) and mode (optical mode per core) dimensions. However, to date, no SDM transmissions over field-deployed FM-MCFs in realistic outdoor settings have been reported, which contrasts with SDM schemes demonstrated using single-mode multi-core fibers (SM-MCFs) installed in practical fiber cable ducts. In this paper, we present the successful demonstration of bidirectional SDM transmission over a 5-km field-deployed seven ring-core fiber (7-RCF) with a cladding diameter of 178 $μ$m, achieving a Spectral Efficiency (SE) of 2$\times$201.6 bit/s/Hz. This work establishes a new record for the highest SE attained in SDM demonstrations utilizing field-deployed fiber cables, achieving an approximate 10x increase compared to the SE of reported field-deployed optical fiber cable transmission systems. Notably, these results are realized through the utilization of small-scale modular 4$\times$4 multiple-input multiple-output (MIMO) processing with a time-domain equalization (TDE) tap number not exceeding 15, maintaining a complexity per unit capacity comparable to that of MIMO equalization in SDM demonstrations employing weakly coupled SM-MCF cables. These results underscore the significant potential for achieving heightened SE and expanding capacity per individual fiber using SDM techniques in practical applications. △ Less

Submitted 29 April, 2024; originally announced July 2024.

Comments: 17 pages, 8 figures

arXiv:2407.01035 [pdf]

Off-site production of plasma-activated water for efficient sterilization: the crucial role of high-valence NOx and new chemical pathways

Authors: Zifeng Wang, Xiangyu Wang, Shenghang Xu, Renwu Zhou, Mingyan Zhang, Wanchun Li, Zizhu Zhang, Luge Wang, Jinkun Chen, Jishen Zhang, Li Guo, Dandan Pei, Dingxin Liu, Mingzhe Rong

Abstract: Efficient sterilization of pathogens with cleaner methods is a critical concern for environmental disinfection and clinical anti-infective treatment. Plasma-activated water (PAW) is a promising alternative to chemical disinfectants and antibiotics for its strong sterilization ability and not inducing any acute toxicity, and only water and air are consumed during production. For more efficient wate… ▽ More Efficient sterilization of pathogens with cleaner methods is a critical concern for environmental disinfection and clinical anti-infective treatment. Plasma-activated water (PAW) is a promising alternative to chemical disinfectants and antibiotics for its strong sterilization ability and not inducing any acute toxicity, and only water and air are consumed during production. For more efficient water activation, plasma sources are commonly placed near or fully in contact with water as possible, but the risks of electrode corrosion and metal contamination of water threaten the safety and stability of PAW production. Herein, plasma-activated gas rich in high-valence NOx is generated by a hybrid plasma configuration and introduced into water for off-site PAW production. Plasma-generated O3 is found to dominate the gas-phase reactions for the formation of high-valence NOx. With the time-evolution of O3 concentration, gaseous NO3 radicals are produced behind N2O5 formation, but will be decomposed before N2O5 quenching. By decoupling the roles of gaseous NO3, N2O5, and O3 in the water activation, results show that short-lived aqueous species induced by gaseous NO3 radicals play the most crucial role in PAW sterilization, and the acidic environment induced by N2O5 is also essential. Moreover, SEM photographs and biomacromolecule leakage assays demonstrate that PAW disrupts the cell membranes of bacteria to achieve inactivation. In real-life applications, an integrated device for off-site PAW production with a yield of 2 L/h and a bactericidal efficiency of >99.9% is developed. The PAW of 50mL produced in 3 minutes using this device is more effective in disinfection than 0.5% NaClO and 3% H2O2 with the same bacterial contact time. This work provides new avenues for efficient PAW production and deepens insights into the fundamental processes that govern the reactive chemistry in PAW sterilization. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00909 [pdf, other]

Heterogeneous Graph-based Framework with Disentangled Representations Learning for Multi-target Cross Domain Recommendation

Authors: Xiaopeng Liu, Juan Zhang, Chongqi Ren, Shenghui Xu, Zhaoming Pan, Zhimin Zhang

Abstract: CDR (Cross-Domain Recommendation), i.e., leveraging information from multiple domains, is a critical solution to data sparsity problem in recommendation system. The majority of previous research either focused on single-target CDR (STCDR) by utilizing data from the source domains to improve the model's performance on the target domain, or applied dual-target CDR (DTCDR) by integrating data from th… ▽ More CDR (Cross-Domain Recommendation), i.e., leveraging information from multiple domains, is a critical solution to data sparsity problem in recommendation system. The majority of previous research either focused on single-target CDR (STCDR) by utilizing data from the source domains to improve the model's performance on the target domain, or applied dual-target CDR (DTCDR) by integrating data from the source and target domains. In addition, multi-target CDR (MTCDR) is a generalization of DTCDR, which is able to capture the link among different domains. In this paper we present HGDR (Heterogeneous Graph-based Framework with Disentangled Representations Learning), an end-to-end heterogeneous network architecture where graph convolutional layers are applied to model relations among different domains, meanwhile utilizes the idea of disentangling representation for domain-shared and domain-specifc information. First, a shared heterogeneous graph is generated by gathering users and items from several domains without any further side information. Second, we use HGDR to compute disentangled representations for users and items in all domains.Experiments on real-world datasets and online A/B tests prove that our proposed model can transmit information among domains effectively and reach the SOTA performance. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00383 [pdf, other]

FANFOLD: Graph Normalizing Flows-driven Asymmetric Network for Unsupervised Graph-Level Anomaly Detection

Authors: Rui Cao, Shijie Xue, Jindong Li, Qi Wang, Yi Chang

Abstract: Unsupervised graph-level anomaly detection (UGAD) has attracted increasing interest due to its widespread application. In recent studies, knowledge distillation-based methods have been widely used in unsupervised anomaly detection to improve model efficiency and generalization. However, the inherent symmetry between the source (teacher) and target (student) networks typically results in consistent… ▽ More Unsupervised graph-level anomaly detection (UGAD) has attracted increasing interest due to its widespread application. In recent studies, knowledge distillation-based methods have been widely used in unsupervised anomaly detection to improve model efficiency and generalization. However, the inherent symmetry between the source (teacher) and target (student) networks typically results in consistent outputs across both architectures, making it difficult to distinguish abnormal graphs from normal graphs. Also, existing methods mainly rely on graph features to distinguish anomalies, which may be unstable with complex and diverse data and fail to capture the essence that differentiates normal graphs from abnormal ones. In this work, we propose a Graph Normalizing Flows-driven Asymmetric Network For Unsupervised Graph-Level Anomaly Detection (FANFOLD in short). We introduce normalizing flows to unsupervised graph-level anomaly detection due to their successful application and superior quality in learning the underlying distribution of samples. Specifically, we adopt the knowledge distillation technique and apply normalizing flows on the source network, achieving the asymmetric network. In the training stage, FANFOLD transforms the original distribution of normal graphs to a standard normal distribution. During inference, FANFOLD computes the anomaly score using the source-target loss to discriminate between normal and anomalous graphs. We conduct extensive experiments on 15 datasets of different fields with 9 baseline methods to validate the superiority of FANFOLD. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00348 [pdf, other]

Accretion of the degenerate Fermi gas onto a Reissner-Nordström black hole

Authors: Ping Li, Jiang-he Yang, Siwei Xu

Abstract: We investigate the stationary, spherically symmetric accretion of a degenerate relativistic Fermi gas onto a Reissner-Nordström black hole. The accretion theory is based on the Boyer-Lindquist coordinates and the Fermi gas follows Fermi-Dirac statistics at infinity. We have derived the expression for the particle current density, the stress energy-momentum tensor, and three accretion rates. As the… ▽ More We investigate the stationary, spherically symmetric accretion of a degenerate relativistic Fermi gas onto a Reissner-Nordström black hole. The accretion theory is based on the Boyer-Lindquist coordinates and the Fermi gas follows Fermi-Dirac statistics at infinity. We have derived the expression for the particle current density, the stress energy-momentum tensor, and three accretion rates. As the charged particle falls into the black hole, both the mass and the charge of the black hole increase. Consequently, the mass accretion rate and charge accretion rate are proportional to the particle accretion rate. We have also provided analytical results at infinity and numerical results within a finite range for these quantities. Our results indicate that the accretion rate decreases as the charge of the black hole increases. Additionally, we found that the Vlasov gas accretion is no longer an isotropic perfect fluid accretion theory in the Boyer-Lindquist coordinates at infinity, mainly due to non-vanishing non-diagonal terms of the stress energy-momentum tensor. Despite this, the radial pressure remains smaller than the tangential pressure even at infinity. This study also suggests that naked singularities are unavoidable in black hole accretion theory. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.19659 [pdf]

Object Space is Embodied

Authors: Shan Xu, Xinran Feng, Yuannan Li, Jia Liu

Abstract: The perceived similarity between objects has often been attributed to their physical and conceptual features, such as appearance and animacy, and the theoretical framework of object space is accordingly conceived. Here, we extend this framework by proposing that object space may also be defined by embodied features, specifically action possibilities that objects afford to an agent (i.e., affordanc… ▽ More The perceived similarity between objects has often been attributed to their physical and conceptual features, such as appearance and animacy, and the theoretical framework of object space is accordingly conceived. Here, we extend this framework by proposing that object space may also be defined by embodied features, specifically action possibilities that objects afford to an agent (i.e., affordance) and their spatial relation with the agent (i.e., situatedness). To test this proposal, we quantified the embodied features with a set of action atoms. We found that embodied features explained the subjective similarity among familiar objects along with the objects' visual features. This observation was further replicated with novel objects. Our study demonstrates that embodied features, which place objects within an ecological context, are essential in constructing object space in the human visual system, emphasizing the importance of incorporating embodiment as a fundamental dimension in our understanding of the visual world. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19578 [pdf, other]

PathAlign: A vision-language model for whole slide images in histopathology

Authors: Faruk Ahmed, Andrew Sellergren, Lin Yang, Shawn Xu, Boris Babenko, Abbi Ward, Niels Olson, Arash Mohtashamian, Yossi Matias, Greg S. Corrado, Quang Duong, Dale R. Webster, Shravya Shetty, Daniel Golden, Yun Liu, David F. Steiner, Ellery Wulczyn

Abstract: Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision-language modeling raise new opportunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggrega… ▽ More Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision-language modeling raise new opportunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggregating interpretation across multiple slides, often making it difficult to create robust image-text pairs. As such, pathology reports remain a largely untapped source of supervision in computational pathology, with most efforts relying on region-of-interest annotations or self-supervision at the patch-level. In this work, we develop a vision-language model based on the BLIP-2 framework using WSIs paired with curated text from pathology reports. This enables applications utilizing a shared image-text embedding space, such as text or image retrieval for finding cases of interest, as well as integration of the WSI encoder with a frozen large language model (LLM) for WSI-based generative text capabilities such as report generation or AI-in-the-loop interactions. We utilize a de-identified dataset of over 350,000 WSIs and diagnostic text pairs, spanning a wide range of diagnoses, procedure types, and tissue types. We present pathologist evaluation of text generation and text retrieval using WSI embeddings, as well as results for WSI classification and workflow prioritization (slide-level triaging). Model-generated text for WSIs was rated by pathologists as accurate, without clinically significant error or omission, for 78% of WSIs on average. This work demonstrates exciting potential capabilities for language-aligned WSI embeddings. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 9 main pages and 19 pages of supplemental material; 3 main tables, 3 main figures and 11 supplemental tables, 7 supplemental figures

arXiv:2406.19190 [pdf, ps, other]

Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec… ▽ More Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 13 pages, 6 figures

arXiv:2406.18605 [pdf, other]

The neutron array of the compact spectrometer for heavy ion experiments in Fermi energy region

Authors: Dawei Si, Sheng Xiao, Yuhao Qin, Yijie Wang, Junhuai Xu, Baiting Tian, Boyuan Zhang, Dong Guo, Qin Zhi, Xiaobao Wei, Yibo Hao, Zengxiang Wang, Tianren Zhuo, Yuansheng Yang, Xianglun Wei, Herun Yang, Peng Ma, Limin Duan, Fangfang Duan, Junbing Ma, Shiwei Xu, Zhen Bai, Guo Yang, Yanyun Yang, Zhigang Xiao

Abstract: The emission of neutrons from heavy ion reactions is an important observable for studying the asymmetric nuclear equation of state and the reaction dynamics. A 20-unit neutron array has been developed and mounted on the compact spectrometer for heavy ion experiments (CSHINE) to measure the neutron spectra, neutron-neutron and neutron-proton correlation functions. Each unit consists of a… ▽ More The emission of neutrons from heavy ion reactions is an important observable for studying the asymmetric nuclear equation of state and the reaction dynamics. A 20-unit neutron array has been developed and mounted on the compact spectrometer for heavy ion experiments (CSHINE) to measure the neutron spectra, neutron-neutron and neutron-proton correlation functions. Each unit consists of a $\rm 15\times 15\times 15~cm^3$ plastic scintillator coupled to a $ φ=52 ~\rm mm$ photomultiplier. The Geant4 simulation with optical process is performed to investigate the time resolution and the neutron detection efficiency. The inherent time resolution of 212 ps is obtained by cosmic ray coincidence test. The n-$γ$ discrimination and time-of-flight performance are given by $\rm ^{252}Cf$ radioactive source test and beam test. The neutron energy spectra have been obtained in the angle range $30^\circ \le θ_{\rm lab} \le 51^\circ$ in the beam experiment of $^{124}$Sn+$^{124}$Sn at 25 MeV/u with CSHINE. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 8 pages, 11 figures

arXiv:2406.18579 [pdf, other]

Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching

Authors: Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Jie Wang, Joemon M. Jose

Abstract: Image-text matching (ITM) is a fundamental problem in computer vision. The key issue lies in jointly learning the visual and textual representation to estimate their similarity accurately. Most existing methods focus on feature enhancement within modality or feature interaction across modalities, which, however, neglects the contextual information of the object representation based on the inter-ob… ▽ More Image-text matching (ITM) is a fundamental problem in computer vision. The key issue lies in jointly learning the visual and textual representation to estimate their similarity accurately. Most existing methods focus on feature enhancement within modality or feature interaction across modalities, which, however, neglects the contextual information of the object representation based on the inter-object relationships that match the corresponding sentences with rich contextual semantics. In this paper, we propose a Hybrid-modal Interaction with multiple Relational Enhancements (termed \textit{Hire}) for image-text matching, which correlates the intra- and inter-modal semantics between objects and words with implicit and explicit relationship modelling. In particular, the explicit intra-modal spatial-semantic graph-based reasoning network is designed to improve the contextual representation of visual objects with salient spatial and semantic relational connectivities, guided by the explicit relationships of the objects' spatial positions and their scene graph. We use implicit relationship modelling for potential relationship interactions before explicit modelling to improve the fault tolerance of explicit relationship detection. Then the visual and textual semantic representations are refined jointly via inter-modal interactive attention and cross-modal alignment. To correlate the context of objects with the textual context, we further refine the visual semantic representation via cross-level object-sentence and word-image-based interactive attention. Extensive experiments validate that the proposed hybrid-modal interaction with implicit and explicit modelling is more beneficial for image-text matching. And the proposed \textit{Hire} obtains new state-of-the-art results on MS-COCO and Flickr30K benchmarks. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 22pages, 5 Figures, 6 tables, the extension of CMSEI in WACV23, and submitted to ACM TIST. arXiv admin note: text overlap with arXiv:2210.08908

arXiv:2406.18184 [pdf, other]

Wide-binary eccentricity distribution in young star clusters: dependence on the binary separation and mass

Authors: S. S. Mathew, S. Xu, C. Federrath, Y. Hu, A. Seta

Abstract: We study the wide-binary eccentricity ($e$) distribution in young star clusters and the role of turbulence in setting the form of the $e$ distribution using magnetohydrodynamical (MHD) simulations of star cluster formation. The simulations incorporate gravity, turbulence, magnetic fields, protostellar heating, and jets/outflows. We find that (1) simulations that employ purely compressive turbulenc… ▽ More We study the wide-binary eccentricity ($e$) distribution in young star clusters and the role of turbulence in setting the form of the $e$ distribution using magnetohydrodynamical (MHD) simulations of star cluster formation. The simulations incorporate gravity, turbulence, magnetic fields, protostellar heating, and jets/outflows. We find that (1) simulations that employ purely compressive turbulence driving produce binaries with a superthermal $e$ distribution ($α>1$ in $p(e) \propto e^α$), while simulations with purely solenoidal driving or natural mixture of driving modes produce subthermal/thermal distributions ($α\leq$ 1), (2) the $e$ distribution over the full range of binary separations in our simulations is set at the early stages of the star cluster formation process, (3) while binaries (separation of $r_{\mathrm{pair}} \leq 1000\, \mathrm{AU}$) have subthermal to thermal $e$ distributions ($α\sim 0.8$), wide binaries ($r_{\mathrm{pair}} > 1000\, \mathrm{AU}$) have a superthermal distribution ($α\sim 1.8$), and (4) low-mass binary systems (system masses of $M_{\mathrm{sys}} \leq 0.8\, \mathrm{M_\odot}$) have a highly superthermal distribution ($α\sim 2.4$), whereas high-mass systems ($M_{\mathrm{sys}} > 0.8\, \mathrm{M_\odot}$) exhibit a subthermal/thermal distribution ($α\sim 0.8$). The binary eccentricity distribution is often modelled as a thermal distribution. However, our results suggest that the $e$ distribution depends on the range of separation of the sampled binaries, which agrees with the findings from recent Gaia observations. We conclude that the dependence of the $e$ distribution on the binary separation and mass is linked to the binary formation mechanism governed by the turbulent properties of the parent cloud. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 15 pages, 9 figures, 1 table (MNRAS submitted, first referee report received)

arXiv:2406.18183 [pdf, other]

Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of… ▽ More Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 26 pages,5 tables, 4 figures

arXiv:2406.18083 [pdf, other]

Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an… ▽ More Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 19 pages, 2 figures

arXiv:2406.18028 [pdf]

Plasmonic polarization sensing of electrostatic superlattice potentials

Authors: Shuai Zhang, Jordan Fonseca, Daniel Bennett, Zhiyuan Sun, Junhe Zhang, Ran Jing, Suheng Xu, Leo He, S. L. Moore, S. E. Rossi, Dmitry Ovchinnikov, David Cobden, Pablo. Jarillo-Herrero, M. M. Fogler, Philip Kim, Efthimios Kaxiras, Xiaodong Xu, D. N. Basov

Abstract: Plasmon polaritons are formed by coupling light with delocalized electrons. The half-light and half-matter nature of plasmon polaritons endows them with unparalleled tunability via a range of parameters, such as dielectric environments and carrier density. Therefore, plasmon polaritons are expected to be tuned when in proximity to polar materials since the carrier density is tuned by an electrosta… ▽ More Plasmon polaritons are formed by coupling light with delocalized electrons. The half-light and half-matter nature of plasmon polaritons endows them with unparalleled tunability via a range of parameters, such as dielectric environments and carrier density. Therefore, plasmon polaritons are expected to be tuned when in proximity to polar materials since the carrier density is tuned by an electrostatic potential; conversely, the plasmon polariton response might enable the sensing of polarization. Here, we use infrared nano-imaging and nano-photocurrent measurements to investigate heterostructures composed of graphene and twisted hexagonal boron nitride (t-BN), with alternating polarization in a triangular network of moiré stacking domains. We observe that the carrier density and the corresponding plasmonic response of graphene are modulated by polar domains in t-BN. In addition, we demonstrate that the nanometer-wide domain walls of graphene moirés superlattices, created by the polar domains of t-BN, provide momenta to assist the plasmonic excitations. Furthermore, our studies establish that the plasmon of graphene could function as a delicate sensor for polarization textures. The evolution of polarization textures in t-BN under uniform electric fields is tomographically examined via plasmonic imaging. Strikingly, no noticeable polarization switching is observed under applied electric fields up to 0.23 V/nm, at variance with transport reports. Our nano-images unambiguously reveal that t-BN with triangular domains acts like a ferrielectric, rather than ferroelectric claimed by many previous studies. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 41 pages, 20 figures

arXiv:2406.17988 [pdf, other]

DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image

Authors: Qingxuan Wu, Zhiyang Dou, Sirui Xu, Soshi Shimada, Chen Wang, Zhengming Yu, Yuan Liu, Cheng Lin, Zeyu Cao, Taku Komura, Vladislav Golyanik, Christian Theobalt, Wenping Wang, Lingjie Liu

Abstract: Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand… ▽ More Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand-face interaction recovery, Decaf, introduces a global fitting optimization guided by contact and deformation estimation networks trained on studio-collected data with 3D annotations. However, Decaf suffers from a time-consuming optimization process and limited generalization capability due to its reliance on 3D annotations of hand-face interaction data. To address these issues, we present DICE, the first end-to-end method for Deformation-aware hand-face Interaction reCovEry from a single image. DICE estimates the poses of hands and faces, contacts, and deformations simultaneously using a Transformer-based architecture. It features disentangling the regression of local deformation fields and global mesh vertex locations into two network branches, enhancing deformation and contact estimation for precise and robust hand-face mesh recovery. To improve generalizability, we propose a weakly-supervised training approach that augments the training set using in-the-wild images without 3D ground-truth annotations, employing the depths of 2D keypoints estimated by off-the-shelf models and adversarial priors of poses for supervision. Our experiments demonstrate that DICE achieves state-of-the-art performance on a standard benchmark and in-the-wild data in terms of accuracy and physical plausibility. Additionally, our method operates at an interactive rate (20 fps) on an Nvidia 4090 GPU, whereas Decaf requires more than 15 seconds for a single image. Our code will be publicly available upon publication. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 23 pages, 9 figures, 3 tables

arXiv:2406.17841 [pdf, other]

Probing many-body Bell correlation depth with superconducting qubits

Authors: Ke Wang, Weikang Li, Shibo Xu, Mengyao Hu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong Jin, Xuhao Zhu, Yu Gao, Ziqi Tan, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, Jinfeng Deng, Hang Dong, Xu Zhang, Pengfei Zhang, Wenjie Jiang , et al. (10 additional authors not shown)

Abstract: Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing… ▽ More Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing to machine learning. Nevertheless, the detection of nonlocality, especially in quantum many-body systems, is notoriously challenging. Here, we report an experimental certification of genuine multipartite Bell correlations, which signal nonlocality in quantum many-body systems, up to 24 qubits with a fully programmable superconducting quantum processor. In particular, we employ energy as a Bell correlation witness and variationally decrease the energy of a many-body system across a hierarchy of thresholds, below which an increasing Bell correlation depth can be certified from experimental data. As an illustrating example, we variationally prepare the low-energy state of a two-dimensional honeycomb model with 73 qubits and certify its Bell correlations by measuring an energy that surpasses the corresponding classical bound with up to 48 standard deviations. In addition, we variationally prepare a sequence of low-energy states and certify their genuine multipartite Bell correlations up to 24 qubits via energies measured efficiently by parity oscillation and multiple quantum coherence techniques. Our results establish a viable approach for preparing and certifying multipartite Bell correlations, which provide not only a finer benchmark beyond entanglement for quantum devices, but also a valuable guide towards exploiting multipartite Bell correlation in a wide spectrum of practical applications. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 11 pages,6 figures + 14 pages, 6 figures

arXiv:2406.17452 [pdf, ps, other]

Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (649 additional authors not shown)

Abstract: We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and… ▽ More We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and determine the branching fractions $\mathcal{B}(D_s^+\toπ^+π^+π^-π^0|_{{\rm non}-η})=(2.04\pm0.08_{\rm stat.}\pm0.05_{\rm syst.})\%$ and $\mathcal{B}(D_s^+\toηπ^+)=(1.56\pm0.09_{\rm stat.}\pm0.04_{\rm syst.})\%$. Moreover, we measure the relative branching fraction between $φ\toπ^+π^-π^0$ and $φ\to K^+K^-$ to be $\frac{\mathcal{B}(φ(1020) \to π^+π^-π^0)}{\mathcal{B}(φ(1020) \to K^+K^-)}=0.230 \pm 0.014_{\rm stat.} \pm 0.010_{\rm syst.}$, which deviates from the world average value by more than $4σ$. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17248 [pdf, other]

MindSpore Quantum: A User-Friendly, High-Performance, and AI-Compatible Quantum Computing Framework

Authors: Xusheng Xu, Jiangyu Cui, Zidong Cui, Runhong He, Qingyu Li, Xiaowei Li, Yanling Lin, Jiale Liu, Wuxin Liu, Jiale Lu, Maolin Luo, Chufan Lyu, Shijie Pan, Mosharev Pavel, Runqiu Shu, Jialiang Tang, Ruoqian Xu, Shu Xu, Kang Yang, Fan Yu, Qingguo Zeng, Haiying Zhao, Qiang Zheng, Junyuan Zhou, Xu Zhou , et al. (14 additional authors not shown)

Abstract: We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum… ▽ More We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum algorithms on both CPU and GPU platforms, delivering remarkable performance. Furthermore, this framework places a strong emphasis on enhancing the operational efficiency of quantum algorithms when executed on real quantum hardware. This encompasses the development of algorithms for quantum circuit compilation and qubit mapping, crucial components for achieving optimal performance on quantum processors. In addition to the core framework, we introduce QuPack, a meticulously crafted quantum computing acceleration engine. QuPack significantly accelerates the simulation speed of MindSpore Quantum, particularly in variational quantum eigensolver (VQE), quantum approximate optimization algorithm (QAOA), and tensor network simulations, providing astonishing speed. This combination of cutting-edge technologies empowers researchers and practitioners to explore the frontiers of quantum computing with unprecedented efficiency and performance. △ Less

Submitted 10 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16771 [pdf, other]

An antiferromagnetic diode effect in even-layered MnBi2Te4

Authors: Anyuan Gao, Shao-Wen Chen, Barun Ghosh, Jian-Xiang Qiu, Yu-Fei Liu, Yugo Onishi, Chaowei Hu, Tiema Qian, Damien Bérubé, Thao Dinh, Houchen Li, Christian Tzschaschel, Seunghyun Park, Tianye Huang, Shang-Wei Lien, Zhe Sun, Sheng-Chin Ho, Bahadur Singh, Kenji Watanabe, Takashi Taniguchi, David C. Bell, Arun Bansil, Hsin Lin, Tay-Rong Chang, Amir Yacoby , et al. (4 additional authors not shown)

Abstract: In a PN junction, the separation between positive and negative charges leads to diode transport. In the past few years, the intrinsic diode transport in noncentrosymmetric polar conductors has attracted great interest, because it suggests novel nonlinear applications and provides a symmetry-sensitive probe of Fermi surface. Recently, such studies have been extended to noncentrosymmetric supercondu… ▽ More In a PN junction, the separation between positive and negative charges leads to diode transport. In the past few years, the intrinsic diode transport in noncentrosymmetric polar conductors has attracted great interest, because it suggests novel nonlinear applications and provides a symmetry-sensitive probe of Fermi surface. Recently, such studies have been extended to noncentrosymmetric superconductors, realizing the superconducting diode effect. Here, we show that, even in a centrosymmetric crystal without directional charge separation, the spins of an antiferromagnet (AFM) can generate a spatial directionality, leading to an AFM diode effect. We observe large second-harmonic transport in a nonlinear electronic device enabled by the compensated AFM state of even-layered MnBi2Te4. We also report a novel electrical sum-frequency generation (SFG), which has been rarely explored in contrast to the well-known optical SFG in wide-gap insulators. We demonstrate that the AFM enables an in-plane field-effect transistor and harvesting of wireless electromagnetic energy. The electrical SFG establishes a powerful method to study nonlinear electronics built by quantum materials. The AFM diode effect paves the way for potential device concepts including AFM logic circuits, self-powered AFM spintronics, and other applications that potentially bridge nonlinear electronics with AFM spintronics. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 33+8 pages, 14+2 figures

arXiv:2406.16028 [pdf, other]

TimeAutoDiff: Combining Autoencoder and Diffusion model for time series tabular data synthesizing

Authors: Namjoon Suh, Yuning Yang, Din-Yin Hsieh, Qitong Luan, Shirong Xu, Shixiang Zhu, Guang Cheng

Abstract: In this paper, we leverage the power of latent diffusion models to generate synthetic time series tabular data. Along with the temporal and feature correlations, the heterogeneous nature of the feature in the table has been one of the main obstacles in time series tabular data modeling. We tackle this problem by combining the ideas of the variational auto-encoder (VAE) and the denoising diffusion… ▽ More In this paper, we leverage the power of latent diffusion models to generate synthetic time series tabular data. Along with the temporal and feature correlations, the heterogeneous nature of the feature in the table has been one of the main obstacles in time series tabular data modeling. We tackle this problem by combining the ideas of the variational auto-encoder (VAE) and the denoising diffusion probabilistic model (DDPM). Our model named as \texttt{TimeAutoDiff} has several key advantages including (1) Generality: the ability to handle the broad spectrum of time series tabular data from single to multi-sequence datasets; (2) Good fidelity and utility guarantees: numerical experiments on six publicly available datasets demonstrating significant improvements over state-of-the-art models in generating time series tabular data, across four metrics measuring fidelity and utility; (3) Fast sampling speed: entire time series data generation as opposed to the sequential data sampling schemes implemented in the existing diffusion-based models, eventually leading to significant improvements in sampling speed, (4) Entity conditional generation: the first implementation of conditional generation of multi-sequence time series tabular data with heterogenous features in the literature, enabling scenario exploration across multiple scientific and engineering domains. Codes are in preparation for release to the public, but available upon request. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.15030 [pdf, ps, other]

Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction… ▽ More Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 11 pages, 3 figures

arXiv:2406.14910 [pdf, ps, other]

Towards Dynamic Resource Allocation and Client Scheduling in Hierarchical Federated Learning: A Two-Phase Deep Reinforcement Learning Approach

Authors: Xiaojing Chen, Zhenyuan Li, Wei Ni, Xin Wang, Shunqing Zhang, Yanzan Sun, Shugong Xu, Qingqi Pei

Abstract: Federated learning (FL) is a viable technique to train a shared machine learning model without sharing data. Hierarchical FL (HFL) system has yet to be studied regrading its multiple levels of energy, computation, communication, and client scheduling, especially when it comes to clients relying on energy harvesting to power their operations. This paper presents a new two-phase deep deterministic p… ▽ More Federated learning (FL) is a viable technique to train a shared machine learning model without sharing data. Hierarchical FL (HFL) system has yet to be studied regrading its multiple levels of energy, computation, communication, and client scheduling, especially when it comes to clients relying on energy harvesting to power their operations. This paper presents a new two-phase deep deterministic policy gradient (DDPG) framework, referred to as ``TP-DDPG'', to balance online the learning delay and model accuracy of an FL process in an energy harvesting-powered HFL system. The key idea is that we divide optimization decisions into two groups, and employ DDPG to learn one group in the first phase, while interpreting the other group as part of the environment to provide rewards for training the DDPG in the second phase. Specifically, the DDPG learns the selection of participating clients, and their CPU configurations and the transmission powers. A new straggler-aware client association and bandwidth allocation (SCABA) algorithm efficiently optimizes the other decisions and evaluates the reward for the DDPG. Experiments demonstrate that with substantially reduced number of learnable parameters, the TP-DDPG can quickly converge to effective polices that can shorten the training time of HFL by 39.4% compared to its benchmarks, when the required test accuracy of HFL is 0.9. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14772 [pdf, other]

Consistent community detection in multi-layer networks with heterogeneous differential privacy

Authors: Yaoming Zhen, Shirong Xu, Junhui Wang

Abstract: As network data has become increasingly prevalent, a substantial amount of attention has been paid to the privacy issue in publishing network data. One of the critical challenges for data publishers is to preserve the topological structures of the original network while protecting sensitive information. In this paper, we propose a personalized edge flipping mechanism that allows data publishers to… ▽ More As network data has become increasingly prevalent, a substantial amount of attention has been paid to the privacy issue in publishing network data. One of the critical challenges for data publishers is to preserve the topological structures of the original network while protecting sensitive information. In this paper, we propose a personalized edge flipping mechanism that allows data publishers to protect edge information based on each node's privacy preference. It can achieve differential privacy while preserving the community structure under the multi-layer degree-corrected stochastic block model after appropriately debiasing, and thus consistent community detection in the privatized multi-layer networks is achievable. Theoretically, we establish the consistency of community detection in the privatized multi-layer network and show that better privacy protection of edges can be obtained for a proportion of nodes while allowing other nodes to give up their privacy. Furthermore, the advantage of the proposed personalized edge-flipping mechanism is also supported by its numerical performance on various synthetic networks and a real-life multi-layer network. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14621 [pdf, other]

A mid-circuit erasure check on a dual-rail cavity qubit using the joint-photon number-splitting regime of circuit QED

Authors: Stijn J. de Graaf, Sophia H. Xue, Benjamin J. Chapman, James D. Teoh, Takahiro Tsunoda, Patrick Winkel, John W. O. Garmon, Kathleen M. Chang, Luigi Frunzio, Shruti Puri, Robert J. Schoelkopf

Abstract: Quantum control of a linear oscillator using a static dispersive coupling to a nonlinear ancilla underpins a wide variety of experiments in circuit QED. Extending this control to more than one oscillator while minimizing the required connectivity to the ancilla would enable hardware-efficient multi-mode entanglement and measurements. We show that the spectrum of an ancilla statically coupled to a… ▽ More Quantum control of a linear oscillator using a static dispersive coupling to a nonlinear ancilla underpins a wide variety of experiments in circuit QED. Extending this control to more than one oscillator while minimizing the required connectivity to the ancilla would enable hardware-efficient multi-mode entanglement and measurements. We show that the spectrum of an ancilla statically coupled to a single mode can be made to depend on the joint photon number in two modes by applying a strong parametric beamsplitter coupling between them. This `joint-photon number-splitting' regime extends single-oscillator techniques to two-oscillator control, which we use to realize a hardware-efficient erasure check for a dual-rail qubit encoded in two superconducting cavities. By leveraging the beamsplitter coupling already required for single-qubit gates, this scheme permits minimal connectivity between circuit elements. Furthermore, the flexibility to choose the pulse shape allows us to limit the susceptibility to different error channels. We use this scheme to detect leakage errors with a missed erasure fraction of $(9.0 \pm 0.5)\times10^{-4}$, while incurring an erasure rate of $2.92 \pm 0.01\%$ and a Pauli error rate of $0.31 \pm 0.01\%$, both of which are dominated by cavity errors. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14541 [pdf, other]

Are LLMs Naturally Good at Synthetic Tabular Data Generation?

Authors: Shengzhe Xu, Cho-Ting Lee, Mandar Sharma, Raquib Bin Yousuf, Nikhil Muralidhar, Naren Ramakrishnan

Abstract: Large language models (LLMs) have demonstrated their prowess in generating synthetic text and images; however, their potential for generating tabular data -- arguably the most common data type in business and scientific applications -- is largely underexplored. This paper demonstrates that LLMs, used as-is, or after traditional fine-tuning, are severely inadequate as synthetic table generators. Du… ▽ More Large language models (LLMs) have demonstrated their prowess in generating synthetic text and images; however, their potential for generating tabular data -- arguably the most common data type in business and scientific applications -- is largely underexplored. This paper demonstrates that LLMs, used as-is, or after traditional fine-tuning, are severely inadequate as synthetic table generators. Due to the autoregressive nature of LLMs, fine-tuning with random order permutation runs counter to the importance of modeling functional dependencies, and renders LLMs unable to model conditional mixtures of distributions (key to capturing real world constraints). We showcase how LLMs can be made to overcome some of these deficiencies by making them permutation-aware. △ Less

Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14358 [pdf]

The neural correlates of logical-mathematical symbol systems processing resemble that of spatial cognition more than natural language processing

Authors: Yuannan Li, Shan Xu, Jia Liu

Abstract: The ability to manipulate logical-mathematical symbols (LMS), encompassing tasks such as calculation, reasoning, and programming, is a cognitive skill arguably unique to humans. Considering the relatively recent emergence of this ability in human evolutionary history, it has been suggested that LMS processing may build upon more fundamental cognitive systems, possibly through neuronal recycling. P… ▽ More The ability to manipulate logical-mathematical symbols (LMS), encompassing tasks such as calculation, reasoning, and programming, is a cognitive skill arguably unique to humans. Considering the relatively recent emergence of this ability in human evolutionary history, it has been suggested that LMS processing may build upon more fundamental cognitive systems, possibly through neuronal recycling. Previous studies have pinpointed two primary candidates, natural language processing and spatial cognition. Existing comparisons between these domains largely relied on task-level comparison, which may be confounded by task idiosyncrasy. The present study instead compared the neural correlates at the domain level with both automated meta-analysis and synthesized maps based on three representative LMS tasks, reasoning, calculation, and mental programming. Our results revealed a more substantial cortical overlap between LMS processing and spatial cognition, in contrast to language processing. Furthermore, in regions activated by both spatial and language processing, the multivariate activation pattern for LMS processing exhibited greater multivariate similarity to spatial cognition than to language processing. A hierarchical clustering analysis further indicated that typical LMS tasks were indistinguishable from spatial cognition tasks at the neural level, suggesting an inherent connection between these two cognitive processes. Taken together, our findings support the hypothesis that spatial cognition is likely the basis of LMS processing, which may shed light on the limitations of large language models in logical reasoning, particularly those trained exclusively on textual data without explicit emphasis on spatial content. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14289 [pdf]

Electrical switching of chirality in rhombohedral graphene Chern insulators

Authors: Jing Ding, Hanxiao Xiang, Jiannan Hua, Wenqiang Zhou, Naitian Liu, Le Zhang, Na Xin, Kenji Watanabe, Takashi Taniguchi, Wei Zhu, Shuigang Xu

Abstract: A Chern insulator hosts topologically protected chiral edge currents with quantized conductance characterized by its Chern number. Switching the chirality of the Chern insulator, namely, the direction of the edge current, is highly challenging due to topologically forbidden backscattering but is of considerable importance for the design of topological devices. Nevertheless, this can be achieved by… ▽ More A Chern insulator hosts topologically protected chiral edge currents with quantized conductance characterized by its Chern number. Switching the chirality of the Chern insulator, namely, the direction of the edge current, is highly challenging due to topologically forbidden backscattering but is of considerable importance for the design of topological devices. Nevertheless, this can be achieved by reversing the sign of the Chern number through a topological phase transition. Here, we report electrically switchable chirality in rhombohedral heptalayer graphene-based Chern insulators. The surface flat band and giant Berry curvature in rhombohedral multilayer graphene provide a highly tunable platform for engineering the topological states. By introducing moire superlattices in rhombohedral heptalayer graphene, we observed a cascade of topological phase transitions at quarter electron filling of a moire band. The Chern number can be continuously tuned from 0, -1, 1 to 2 by electric fields, manifesting as a large anomalous Hall effect and following Streda's formula. Sign reversal and the anomalous Hall effect also occurred at non-integer fillings, suggesting the possibility of electrically tunable topological phase transitions within the regime of fractional Chern insulators. Our work establishes rhombohedral heptalayer graphene moire superlattices as a versatile platform for topological engineering. The realization of switchable chirality enhances the potential application of chiral edge currents in topological circuit interconnects. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 21 pages, 4 figures in main text

arXiv:2406.14025 [pdf]

Direct Observation of Dendrites Nucleation in Li Metal Battery by Machine Learning Accelerated Molecular Simulations under Realistic Electrochemical Conditions

Authors: Taiping Hu, Haichao Huang, Guobing Zhou, Xinyan Wang, Zheng Cheng, Fangjia Fu, Xiaoxu Wang, Fuzhi Dai, Kuang Yu, Shenzhen Xu

Abstract: Uncontrollable dendrites growth during electrochemical cycles leads to low Coulombic efficiency and critical safety issues in Li metal batteries. Hence, a comprehensive understanding of the dendrite formation mechanism is essential for further enhancing the performance of Li metal batteries. Machine learning accelerated molecular dynamics (MD) simulations can provide atomic-scale resolution for va… ▽ More Uncontrollable dendrites growth during electrochemical cycles leads to low Coulombic efficiency and critical safety issues in Li metal batteries. Hence, a comprehensive understanding of the dendrite formation mechanism is essential for further enhancing the performance of Li metal batteries. Machine learning accelerated molecular dynamics (MD) simulations can provide atomic-scale resolution for various key processes at an ab-initio level accuracy. However, traditional MD simulation tools hardly capture Li electrochemical depositions, due to lack of an electrochemical constant potential (ConstP) condition. In this work, we propose a ConstP approach that combines a machine learning force field with the charge equilibration method to reveal the dynamic process of Li dendrites nucleation at Li metal anode surfaces. Our results show that both dead Li cluster formation and inhomogeneous Li electro-depositions can induce Li dendrites nucleation. We further reveal that the local aggregation of Li atoms in amorphous inorganic components of solid electrolyte interphase is the key factor triggering the nucleation process. Overall, our simulations provide microscopic insights for Li dendrites formations in Li metal anodes. More importantly, we present an efficient and accurate simulation method for modeling realistic ConstP conditions, which holds considerable potential for broader applications in modeling of complex electrochemical interfaces. △ Less

Submitted 3 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14005 [pdf, other]

Information Guided Regularization for Fine-tuning Language Models

Authors: Mandar Sharma, Nikhil Muralidhar, Shengzhe Xu, Raquib Bin Yousuf, Naren Ramakrishnan

Abstract: The pretraining-fine-tuning paradigm has been the de facto strategy for transfer learning in modern language modeling. With the understanding that task adaptation in LMs is often a function of parameters shared across tasks, we argue that a more surgical approach to regularization needs to exist for smoother transfer learning. Towards this end, we investigate how the pretraining loss landscape is… ▽ More The pretraining-fine-tuning paradigm has been the de facto strategy for transfer learning in modern language modeling. With the understanding that task adaptation in LMs is often a function of parameters shared across tasks, we argue that a more surgical approach to regularization needs to exist for smoother transfer learning. Towards this end, we investigate how the pretraining loss landscape is affected by these task-sensitive parameters through an information-theoretic lens. We then leverage the findings from our investigations to devise a novel approach to dropout for improved model regularization and better downstream generalization. This approach, named guided dropout, is both task & architecture agnostic and adds no computational overhead to the fine-tuning process. Through empirical evaluations, we showcase that our approach to regularization yields consistently better performance, even in scenarios of data paucity, compared to standardized baselines. △ Less

Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13919 [pdf, other]

SPL: A Socratic Playground for Learning Powered by Large Language Model

Authors: Liang Zhang, Jionghao Lin, Ziyi Kuang, Sheng Xu, Mohammed Yeasin, Xiangen Hu

Abstract: Dialogue-based Intelligent Tutoring Systems (ITSs) have significantly advanced adaptive and personalized learning by automating sophisticated human tutoring strategies within interactive dialogues. However, replicating the nuanced patterns of expert human communication remains a challenge in Natural Language Processing (NLP). Recent advancements in NLP, particularly Large Language Models (LLMs) su… ▽ More Dialogue-based Intelligent Tutoring Systems (ITSs) have significantly advanced adaptive and personalized learning by automating sophisticated human tutoring strategies within interactive dialogues. However, replicating the nuanced patterns of expert human communication remains a challenge in Natural Language Processing (NLP). Recent advancements in NLP, particularly Large Language Models (LLMs) such as OpenAI's GPT-4, offer promising solutions by providing human-like and context-aware responses based on extensive pre-trained knowledge. Motivated by the effectiveness of LLMs in various educational tasks (e.g., content creation and summarization, problem-solving, and automated feedback provision), our study introduces the Socratic Playground for Learning (SPL), a dialogue-based ITS powered by the GPT-4 model, which employs the Socratic teaching method to foster critical thinking among learners. Through extensive prompt engineering, SPL can generate specific learning scenarios and facilitates efficient multi-turn tutoring dialogues. The SPL system aims to enhance personalized and adaptive learning experiences tailored to individual needs, specifically focusing on improving critical thinking skills. Our pilot experimental results from essay writing tasks demonstrate SPL has the potential to improve tutoring interactions and further enhance dialogue-based ITS functionalities. Our study, exemplified by SPL, demonstrates how LLMs enhance dialogue-based ITSs and expand the accessibility and efficacy of educational technologies. △ Less

Submitted 20 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

Showing 1–50 of 2,626 results for author: Xu, S