subscribe to arXiv mailings

Rich and diverse molecular gas environments of closely-separated dual quasars viewed by ALMA

Authors: Shenli Tang, John D. Silverman, Zhaoxuan Liu, Manda Banerji, Tomoko Suzuki, Seiji Fujimoto, Andy Goulding, Masatoshi Imanishi, Toshihiro Kawaguchi, Connor Bottrell, Tilman Hartwig, Knud Jahnke, Masafusa Onoue, Malte Schramm, Yoshihiro Ueda

Abstract: We present a study of the molecular gas in five closely-spaced ($R_{\perp}<20$ kpc) dual quasars ($L_{\rm bol}\gtrsim10^{44}~\mathrm{erg~s}^{-1}$) at redshifts $0.4<z<0.8$ with the Atacama Large Millimeter/submillimeter Array. The dual quasar phase represents a distinctive stage during the interaction between two galaxies for investigating quasar fueling and feedback effects on the gas reservoir.… ▽ More We present a study of the molecular gas in five closely-spaced ($R_{\perp}<20$ kpc) dual quasars ($L_{\rm bol}\gtrsim10^{44}~\mathrm{erg~s}^{-1}$) at redshifts $0.4<z<0.8$ with the Atacama Large Millimeter/submillimeter Array. The dual quasar phase represents a distinctive stage during the interaction between two galaxies for investigating quasar fueling and feedback effects on the gas reservoir. The dual quasars were selected from the Sloan Digital Sky Survey and Subaru/Hyper Suprime-Cam Subaru Strategic Program, with confirmatory spectroscopic validation. Based on the detection of the CO J=2--1 emission line with Band 4, we derived key properties including CO luminosities, line widths, and molecular gas masses for these systems. Among the ten quasars of the five pairs, eight have line detections exceeding $5σ$. The detected sources prominently harbor substantial molecular gas reservoirs, with molecular gas masses ($M_{\text{molgas}}$) between $10^{9.6-10.5}~\mathrm{M_{\odot}}$, and molecular gas-to-stellar mass ratios ($μ_{\text{molgas}}$) spanning $18-97\%$. The overall $μ_{\text{molgas}}$ of these dual quasars agrees with that of inactive star-forming main-sequence galaxies at comparable redshifts, indicating no clear evidence of quenching. However, intriguing features in each individual system show possible evidence of AGN feedback, matter transfer, and compaction processes. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09336 [pdf, other]

Guidelines for Augmentation Selection in Contrastive Learning for Time Series Classification

Authors: Ziyu Liu, Azadeh Alavi, Minyi Li, Xiang Zhang

Abstract: Self-supervised contrastive learning has become a key technique in deep learning, particularly in time series analysis, due to its ability to learn meaningful representations without explicit supervision. Augmentation is a critical component in contrastive learning, where different augmentations can dramatically impact performance, sometimes influencing accuracy by over 30%. However, the selection… ▽ More Self-supervised contrastive learning has become a key technique in deep learning, particularly in time series analysis, due to its ability to learn meaningful representations without explicit supervision. Augmentation is a critical component in contrastive learning, where different augmentations can dramatically impact performance, sometimes influencing accuracy by over 30%. However, the selection of augmentations is predominantly empirical which can be suboptimal, or grid searching that is time-consuming. In this paper, we establish a principled framework for selecting augmentations based on dataset characteristics such as trend and seasonality. Specifically, we construct 12 synthetic datasets incorporating trend, seasonality, and integration weights. We then evaluate the effectiveness of 8 different augmentations across these synthetic datasets, thereby inducing generalizable associations between time series characteristics and augmentation efficiency. Additionally, we evaluated the induced associations across 6 real-world datasets encompassing domains such as activity recognition, disease diagnosis, traffic monitoring, electricity usage, mechanical fault prognosis, and finance. These real-world datasets are diverse, covering a range from 1 to 12 channels, 2 to 10 classes, sequence lengths of 14 to 1280, and data frequencies from 250 Hz to daily intervals. The experimental results show that our proposed trend-seasonality-based augmentation recommendation algorithm can accurately identify the effective augmentations for a given time series dataset, achieving an average Recall@3 of 0.667, outperforming baselines. Our work provides guidance for studies employing contrastive learning in time series analysis, with wide-ranging applications. All the code, datasets, and analysis results will be released at https://github.com/DL4mHealth/TS-Contrastive-Augmentation-Recommendation. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 20 pages, 11 figures

arXiv:2407.09139 [pdf, other]

Measurement of $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays at Belle II

Authors: Belle II Collaboration, I. Adachi, L. Aggarwal, H. Ahmed, H. Aihara, N. Akopov, A. Aloisio, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, J. Baudot, A. Baur, A. Beaubien, F. Becherer , et al. (414 additional authors not shown)

Abstract: We report measurements of time-dependent $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays based on a data sample of $(388\pm6)\times10^6$ $B\bar{B}$ events collected at the $Υ(4S)$ resonance with the Belle II detector. The Belle II experiment operates at the SuperKEKB asymmetric-energy $e^+e^-$ collider. We measure decay-time distributions to determine $CP$-violating parameters $S$ and $C$. We det… ▽ More We report measurements of time-dependent $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays based on a data sample of $(388\pm6)\times10^6$ $B\bar{B}$ events collected at the $Υ(4S)$ resonance with the Belle II detector. The Belle II experiment operates at the SuperKEKB asymmetric-energy $e^+e^-$ collider. We measure decay-time distributions to determine $CP$-violating parameters $S$ and $C$. We determine these parameters for two ranges of $K^0_S π^0$ invariant mass: $m(K^0_S π^0)\in (0.8, 1.0)$ $GeV/c^2$, which is dominated by $B^0 \to K^{*0} (\to K^0_S π^0) γ$ decays, and a complementary region $m(K^0_S π^0)\in (0.6, 0.8)\cup(1.0, 1.8)$ $GeV/c^2$. Our results have improved precision as compared to previous measurements and are consistent with theory predictions. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 10 pages, 4 figures

Report number: Belle II Preprint 2024-009, KEK Preprint 2024-1

arXiv:2407.08984 [pdf, ps, other]

Measurement of branching fractions, CP asymmetry, and isospin asymmetry for $\boldsymbol{B\rightarrowργ}$ decays using Belle and Belle II data

Authors: Belle II Collaboration, I. Adachi, K. Adamczyk, L. Aggarwal, H. Aihara, N. Akopov, A. Aloisio, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, J. Baudot, A. Baur, A. Beaubien, F. Becherer , et al. (385 additional authors not shown)

Abstract: We present measurements of $B^{+}\rightarrowρ^{+}γ$ and $B^{0}\rightarrowρ^{0}γ$ decays using a combined data sample of $772 \times 10^6$ $B\overline{B}$ pairs collected by the Belle experiment and $387\times 10^6$ $B\overline{B}$ pairs collected by the Belle II experiment in $e^{+}e^{-}$ collisions at the $Υ(4S)$ resonance. After an optimized selection, a simultaneous fit to the Belle and Belle I… ▽ More We present measurements of $B^{+}\rightarrowρ^{+}γ$ and $B^{0}\rightarrowρ^{0}γ$ decays using a combined data sample of $772 \times 10^6$ $B\overline{B}$ pairs collected by the Belle experiment and $387\times 10^6$ $B\overline{B}$ pairs collected by the Belle II experiment in $e^{+}e^{-}$ collisions at the $Υ(4S)$ resonance. After an optimized selection, a simultaneous fit to the Belle and Belle II data sets yields $114\pm 12$ $B^{+}\rightarrowρ^{+}γ$ and $99\pm 12$ $B^{0}\rightarrowρ^{0}γ$ decays. The measured branching fractions are $(13.1^{+2.0 +1.3}_{-1.9 -1.2})\times 10^{-7}$ and $(7.5\pm 1.3^{+1.0}_{-0.8})\times 10^{-7}$ for $B^{+}\rightarrowρ^{+}γ$ and $B^{0}\rightarrowρ^{0}γ$ decays, respectively, where the first uncertainty is statistical and the second is systematic. We also measure the isospin asymmetry $A_{\rm I}(B\rightarrowργ)=(10.9^{+11.2 +7.8}_{-11.7 -7.3})\%$ and the direct CP asymmetry $A_{CP}(B^{+}\rightarrowρ^{+}γ)=(-8.2\pm 15.2^{+1.6}_{-1.2})\%$. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 12 pages, 4 figures

Report number: Belle II Preprint 2023-019; KEK Preprint 2023-37

arXiv:2407.08725 [pdf, other]

MetaUrban: A Simulation Platform for Embodied AI in Urban Spaces

Authors: Wayne Wu, Honglin He, Yiran Wang, Chenda Duan, Jack He, Zhizheng Liu, Quanyi Li, Bolei Zhou

Abstract: Public urban spaces like streetscapes and plazas serve residents and accommodate social life in all its vibrant variations. Recent advances in Robotics and Embodied AI make public urban spaces no longer exclusive to humans. Food delivery bots and electric wheelchairs have started sharing sidewalks with pedestrians, while diverse robot dogs and humanoids have recently emerged in the street. Ensurin… ▽ More Public urban spaces like streetscapes and plazas serve residents and accommodate social life in all its vibrant variations. Recent advances in Robotics and Embodied AI make public urban spaces no longer exclusive to humans. Food delivery bots and electric wheelchairs have started sharing sidewalks with pedestrians, while diverse robot dogs and humanoids have recently emerged in the street. Ensuring the generalizability and safety of these forthcoming mobile machines is crucial when navigating through the bustling streets in urban spaces. In this work, we present MetaUrban, a compositional simulation platform for Embodied AI research in urban spaces. MetaUrban can construct an infinite number of interactive urban scenes from compositional elements, covering a vast array of ground plans, object placements, pedestrians, vulnerable road users, and other mobile agents' appearances and dynamics. We design point navigation and social navigation tasks as the pilot study using MetaUrban for embodied AI research and establish various baselines of Reinforcement Learning and Imitation Learning. Experiments demonstrate that the compositional nature of the simulated environments can substantially improve the generalizability and safety of the trained mobile agents. MetaUrban will be made publicly available to provide more research opportunities and foster safe and trustworthy embodied AI in urban spaces. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Technical report. Project page: https://metadriverse.github.io/metaurban/

arXiv:2407.08579 [pdf, other]

Instantaneous and Retarded Interactions in Coherent Radiation

Authors: Zhuoyuan Liu, Xiujie Deng, Tong Li, Lixin Yan

Abstract: In coherent radiation of an ensemble of electrons, radiation field from electrons resonantly drives the other electrons inside to produce stimulated emission. The radiation reaction force on the electrons accounting for this stimulated radiation loss is classically described by the Lienard-Wiechert potential. Despite its being the foundation of beam physics for decades, we show that using the "acc… ▽ More In coherent radiation of an ensemble of electrons, radiation field from electrons resonantly drives the other electrons inside to produce stimulated emission. The radiation reaction force on the electrons accounting for this stimulated radiation loss is classically described by the Lienard-Wiechert potential. Despite its being the foundation of beam physics for decades, we show that using the "acceleration field'' in Lienard-Wiechert potential to describe radiative interactions leads to divergences due to its implicit dependence on instantaneous interactions. Here, we propose an alternative theory for electromagnetic radiation by decomposing the interactions into instantaneous part and retarded part. It is shown that only the retarded part contributes to the irreversible radiation loss and the instantaneous part describes the space charge related effects. We further apply this theory to study the coherent synchrotron radiation wake, which hopefully will reshape our understanding of coherent radiation and collective interactions. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 5 pages, 3 figures

arXiv:2407.08565 [pdf, other]

Information matrix test for normality of innovations in stationary time series models

Authors: Zixuan Liu, Junmo Song

Abstract: This study focuses on the problem of testing for normality of innovations in stationary time series models.To achieve this, we introduce an information matrix (IM) based test. While the IM test was originally developed to test for model misspecification, our study addresses that the test can also be used to test for the normality of innovations in various time series models. We provide sufficient… ▽ More This study focuses on the problem of testing for normality of innovations in stationary time series models.To achieve this, we introduce an information matrix (IM) based test. While the IM test was originally developed to test for model misspecification, our study addresses that the test can also be used to test for the normality of innovations in various time series models. We provide sufficient conditions under which the limiting null distribution of the test statistics exists. As applications, a first-order threshold moving average model, GARCH model and double autoregressive model are considered. We conduct simulations to evaluate the performance of the proposed test and compare with other tests, and provide a real data analysis. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.08554 [pdf, other]

Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

Authors: Wanling Gao, Yunyou Huang, Dandan Cui, Zhuoming Yu, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Gangyuan Zhao, Chongrong Jiang, Fan Huang, Tianyi Wei, Suqin Tang, Bingjie Xia, Zhifei Zhang, Jianfeng Zhan

Abstract: A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of cl… ▽ More A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of clinicians in collaborating with AI, pivotal for determining its impact on clinical practice, is often overlooked. For the first time, we emphasize the critical necessity for rigorous and cost-effective evaluation methodologies for AI models in clinical practice, featuring patient/clinician-centered (dual-centered) AI randomized controlled trials (DC-AI RCTs) and virtual clinician-based in-silico trials (VC-MedAI) as an effective proxy for DC-AI RCTs. Leveraging 7500 diagnosis records from two-phase inaugural DC-AI RCTs across 14 medical centers with 125 clinicians, our results demonstrate the necessity of DC-AI RCTs and the effectiveness of VC-MedAI. Notably, VC-MedAI performs comparably to human clinicians, replicating insights and conclusions from prospective DC-AI RCTs. We envision DC-AI RCTs and VC-MedAI as pivotal advancements, presenting innovative and transformative evaluation methodologies for AI models in clinical practice, offering a preclinical-like setting mirroring conventional medicine, and reshaping development paradigms in a cost-effective and fast-iterative manner. Chinese Clinical Trial Registration: ChiCTR2400086816. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 23 pages

arXiv:2407.08351 [pdf, other]

AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models

Authors: Xiang Lisa Li, Evan Zheran Liu, Percy Liang, Tatsunori Hashimoto

Abstract: Evaluation is critical for assessing capabilities, tracking scientific progress, and informing model selection. In this paper, we present three desiderata for a good benchmark for language models: (i) salience (e.g., knowledge about World War II is more salient than a random day in history), (ii) novelty (i.e., the benchmark reveals new trends in model rankings not shown by previous benchmarks), a… ▽ More Evaluation is critical for assessing capabilities, tracking scientific progress, and informing model selection. In this paper, we present three desiderata for a good benchmark for language models: (i) salience (e.g., knowledge about World War II is more salient than a random day in history), (ii) novelty (i.e., the benchmark reveals new trends in model rankings not shown by previous benchmarks), and (iii) difficulty (i.e., the benchmark should be difficult for existing models, leaving headroom for future improvement). We operationalize these three desiderata and cast benchmark creation as a search problem, that of finding benchmarks that that satisfy all three desiderata. To tackle this search problem, we present AutoBencher, which uses a language model to automatically search for datasets that meet the three desiderata. AutoBencher uses privileged information (e.g. relevant documents) to construct reliable datasets, and adaptivity with reranking to optimize for the search objective. We use AutoBencher to create datasets for math, multilingual, and knowledge-intensive question answering. The scalability of AutoBencher allows it to test fine-grained categories and tail knowledge, creating datasets that are on average 27% more novel and 22% more difficult than existing benchmarks. A closer investigation of our constructed datasets shows that we can identify specific gaps in LM knowledge in language models that are not captured by existing benchmarks, such as Gemini Pro performing much worse on question answering about the Permian Extinction and Fordism, while OpenAGI-7B performing surprisingly well on QA about COVID-19. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: preprint

arXiv:2407.08044 [pdf, other]

RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

Authors: Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng

Abstract: Low-Rank Adaptation (LoRA), as a representative Parameter-Efficient Fine-Tuning (PEFT)method, significantly enhances the training efficiency by updating only a small portion of the weights in Large Language Models (LLMs). Recently, weight-only quantization techniques have also been applied to LoRA methods to reduce the memory footprint of fine-tuning. However, applying weight-activation quantizati… ▽ More Low-Rank Adaptation (LoRA), as a representative Parameter-Efficient Fine-Tuning (PEFT)method, significantly enhances the training efficiency by updating only a small portion of the weights in Large Language Models (LLMs). Recently, weight-only quantization techniques have also been applied to LoRA methods to reduce the memory footprint of fine-tuning. However, applying weight-activation quantization to the LoRA pipeline is under-explored, and we observe substantial performance degradation primarily due to the presence of activation outliers. In this work, we propose RoLoRA, the first LoRA-based scheme for effective weight-activation quantization. RoLoRA utilizes rotation for outlier elimination and proposes rotation-aware fine-tuning to preserve the outlier-free characteristics in rotated LLMs. Experimental results show RoLoRA consistently improves low-bit LoRA convergence and post-training quantization robustness in weight-activation settings. We evaluate RoLoRA across LLaMA2-7B/13B, LLaMA3-8B models, achieving up to 29.5% absolute accuracy gain of 4-bit weight-activation quantized LLaMA2- 13B on commonsense reasoning tasks compared to LoRA baseline. We further demonstrate its effectiveness on Large Multimodal Models (LLaVA-1.5-7B). Codes are available at https://github.com/HuangOwen/RoLoRA △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07780 [pdf, other]

Cross Domain Object Detection via Multi-Granularity Confidence Alignment based Mean Teacher

Authors: Jiangming Chen, Li Liu, Wanxia Deng, Zhen Liu, Yu Liu, Yingmei Wei, Yongxiang Liu

Abstract: Cross domain object detection learns an object detector for an unlabeled target domain by transferring knowledge from an annotated source domain. Promising results have been achieved via Mean Teacher, however, pseudo labeling which is the bottleneck of mutual learning remains to be further explored. In this study, we find that confidence misalignment of the predictions, including category-level ov… ▽ More Cross domain object detection learns an object detector for an unlabeled target domain by transferring knowledge from an annotated source domain. Promising results have been achieved via Mean Teacher, however, pseudo labeling which is the bottleneck of mutual learning remains to be further explored. In this study, we find that confidence misalignment of the predictions, including category-level overconfidence, instance-level task confidence inconsistency, and image-level confidence misfocusing, leading to the injection of noisy pseudo label in the training process, will bring suboptimal performance on the target domain. To tackle this issue, we present a novel general framework termed Multi-Granularity Confidence Alignment Mean Teacher (MGCAMT) for cross domain object detection, which alleviates confidence misalignment across category-, instance-, and image-levels simultaneously to obtain high quality pseudo supervision for better teacher-student learning. Specifically, to align confidence with accuracy at category level, we propose Classification Confidence Alignment (CCA) to model category uncertainty based on Evidential Deep Learning (EDL) and filter out the category incorrect labels via an uncertainty-aware selection strategy. Furthermore, to mitigate the instance-level misalignment between classification and localization, we design Task Confidence Alignment (TCA) to enhance the interaction between the two task branches and allow each classification feature to adaptively locate the optimal feature for the regression. Finally, we develop imagery Focusing Confidence Alignment (FCA) adopting another way of pseudo label learning, i.e., we use the original outputs from the Mean Teacher network for supervised learning without label assignment to concentrate on holistic information in the target image. These three procedures benefit from each other from a cooperative learning perspective. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07702 [pdf, other]

Leveraging Self-Supervised Learning for MIMO-OFDM Channel Representation and Generation

Authors: Zongxi Liu, Jiacheng Chen, Yunting Xu, Ting Ma, Jingbo Liu, Haibo Zhou, Dusit Niyato

Abstract: In communications theory, the capacity of multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems is fundamentally determined by wireless channels, which exhibit both diversity and correlation in spatial, frequency and temporal domains. It is further envisioned to exploit the inherent nature of channels, namely representation, to achieve geolocation-based MIMO… ▽ More In communications theory, the capacity of multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems is fundamentally determined by wireless channels, which exhibit both diversity and correlation in spatial, frequency and temporal domains. It is further envisioned to exploit the inherent nature of channels, namely representation, to achieve geolocation-based MIMO transmission for 6G, exemplified by the fully-decoupled radio access network (FD-RAN). Accordingly, this paper first employs self-supervised learning to obtain channel representation from unlabeled channel, then proposes a channel generation assisted approach for determining MIMO precoding matrix solely based on geolocation. Specifically, we exploit the small-scale temporal domain variations of channels at a fixed geolocation, and design an ingenious pretext task tailored for contrastive learning. Then, a Transformer-based encoder is trained to output channel representations. We further develop a conditional diffusion generator to generate channel representations from geolocation. Finally, a Transformer-encoder-based decoder is utilized to reconstruct channels from generated representations, where the optimal channel is selected for calculating the precoding matrix for both single and dual BS transmission. We conduct experiments on a public ray-tracing channel dataset, and the extensive simulation results demonstrate the effectiveness of our channel representation method, and also showcase the performance improvement in geolocation-based MIMO transmission. △ Less

Submitted 23 May, 2024; originally announced July 2024.

arXiv:2407.07672 [pdf, other]

StoryDiffusion: How to Support UX Storyboarding With Generative-AI

Authors: Zhaohui Liang, Xiaoyu Zhang, Kevin Ma, Zhao Liu, Xipei Ren, Kosa Goucher-Lambert, Can Liu

Abstract: Storyboarding is an established method for designing user experiences. Generative AI can support this process by helping designers quickly create visual narratives. However, existing tools only focus on accurate text-to-image generation. Currently, it is not clear how to effectively support the entire creative process of storyboarding and how to develop AI-powered tools to support designers' indiv… ▽ More Storyboarding is an established method for designing user experiences. Generative AI can support this process by helping designers quickly create visual narratives. However, existing tools only focus on accurate text-to-image generation. Currently, it is not clear how to effectively support the entire creative process of storyboarding and how to develop AI-powered tools to support designers' individual workflows. In this work, we iteratively developed and implemented StoryDiffusion, a system that integrates text-to-text and text-to-image models, to support the generation of narratives and images in a single pipeline. With a user study, we observed 12 UX designers using the system for both concept ideation and illustration tasks. Our findings identified AI-directed vs. user-directed creative strategies in both tasks and revealed the importance of supporting the interchange between narrative iteration and image generation. We also found effects of the design tasks on their strategies and preferences, providing insights for future development. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07667 [pdf, other]

VEnhancer: Generative Space-Time Enhancement for Video Generation

Authors: Jingwen He, Tianfan Xue, Dongyang Liu, Xinqi Lin, Peng Gao, Dahua Lin, Yu Qiao, Wanli Ouyang, Ziwei Liu

Abstract: We present VEnhancer, a generative space-time enhancement framework that improves the existing text-to-video results by adding more details in spatial domain and synthetic detailed motion in temporal domain. Given a generated low-quality video, our approach can increase its spatial and temporal resolution simultaneously with arbitrary up-sampling space and time scales through a unified video diffu… ▽ More We present VEnhancer, a generative space-time enhancement framework that improves the existing text-to-video results by adding more details in spatial domain and synthetic detailed motion in temporal domain. Given a generated low-quality video, our approach can increase its spatial and temporal resolution simultaneously with arbitrary up-sampling space and time scales through a unified video diffusion model. Furthermore, VEnhancer effectively removes generated spatial artifacts and temporal flickering of generated videos. To achieve this, basing on a pretrained video diffusion model, we train a video ControlNet and inject it to the diffusion model as a condition on low frame-rate and low-resolution videos. To effectively train this video ControlNet, we design space-time data augmentation as well as video-aware conditioning. Benefiting from the above designs, VEnhancer yields to be stable during training and shares an elegant end-to-end training manner. Extensive experiments show that VEnhancer surpasses existing state-of-the-art video super-resolution and space-time super-resolution methods in enhancing AI-generated videos. Moreover, with VEnhancer, exisiting open-source state-of-the-art text-to-video method, VideoCrafter-2, reaches the top one in video generation benchmark -- VBench. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: technical report

arXiv:2407.07651 [pdf, other]

Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$

Authors: M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (645 additional authors not shown)

Abstract: The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be… ▽ More The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07501 [pdf]

Electronic Correlation and Pseudogap-like Behavior of High-Temperature Superconductor La3Ni2O7

Authors: Yidian Li, Xian Du, Yantao Cao, Cuiying Pei, Mingxin Zhang, Wenxuan Zhao, Kaiyi Zhai, Runzhe Xu, Zhongkai Liu, Zhiwei Li, Jinkui Zhao, Gang Li, Yanpeng Qi, Hanjie Guo, Yulin Chen, Lexian Yang

Abstract: High-temperature superconductivity (HTSC) remains one of the most challenging and fascinating mysteries in condensed matter physics. Recently, superconductivity with transition temperature exceeding liquid-nitrogen temperature is discovered in La3Ni2O7 at high pressure, which provides a new platform to explore the unconventional HTSC. In this work, using high-resolution angle-resolved photoemissio… ▽ More High-temperature superconductivity (HTSC) remains one of the most challenging and fascinating mysteries in condensed matter physics. Recently, superconductivity with transition temperature exceeding liquid-nitrogen temperature is discovered in La3Ni2O7 at high pressure, which provides a new platform to explore the unconventional HTSC. In this work, using high-resolution angle-resolved photoemission spectroscopy and ab-initio calculation, we systematically investigate the electronic structures of La3Ni2O7 at ambient pressure. Our experiments are in nice agreement with ab-initio calculations after considering an orbital-dependent band renormalization effect. The strong electron correlation effect pushes a flat band of d_(z^2 ) orbital component below the Fermi level (EF), which is predicted to locate right at EF under high pressure. Moreover, the d_(x^2-y^2 ) band shows a pseudogap-like behavior with suppressed spectral weight and diminished quasiparticle peak near EF. Our findings provide important insights into the electronic structure of La3Ni2O7, which will shed light on the understanding of the unconventional superconductivity in nickelates. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07349 [pdf, other]

doi 10.1103/PhysRevB.110.024105

Ferromagnetic polar metals via epitaxial strain: a case study of SrCoO$_3$

Authors: Zhiwei Liu, Qiuyue Li, Hanghui Chen

Abstract: While polar metals are a metallic analogue of ferroelectrics, magnetic polar metals can be considered as a metallic analogue of multiferroics. There have been a number of attempts to integrate magnetism into a polar metal by synthesizing new materials or heterostructures. Here we use a simple yet widely used approach--epitaxial strain in the search for intrinsic magnetic polar metals. Via first-pr… ▽ More While polar metals are a metallic analogue of ferroelectrics, magnetic polar metals can be considered as a metallic analogue of multiferroics. There have been a number of attempts to integrate magnetism into a polar metal by synthesizing new materials or heterostructures. Here we use a simple yet widely used approach--epitaxial strain in the search for intrinsic magnetic polar metals. Via first-principles calculations, we study strain engineering of a ferromagnetic metallic oxide SrCoO$_3$, whose bulk form crystallizes in a cubic structure. We find that under an experimentally feasible biaxial strain on the $ab$ plane, collective Co polar displacements are stabilized in SrCoO$_3$. Specifically, a compressive strain stabilizes Co polar displacements along the $c$ axis, while a tensile strain stabilizes Co polar displacements along the diagonal line in the $ab$ plane. In both cases, we find an intrinsic ferromagnetic polar metallic state in SrCoO$_3$. In addition, we also find that a sufficiently large biaxial strain ($> 4\%$) can yield a ferromagnetic-to-antiferromagnetic transition in SrCoO$_3$. Our work demonstrates that in addition to yielding emergent multiferroics, epitaxial strain is also a viable approach to inducing magnetic polar metallic states in quantum materials. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 19 pages, 6 figures

Journal ref: Phys. Rev. B 110, 024105 (2024)

arXiv:2407.07325 [pdf, other]

HiLight: Technical Report on the Motern AI Video Language Model

Authors: Zhiting Wang, Qiangong Zhou, Kangjie Yang, Zongyang Liu, Xin Mao

Abstract: This technical report presents the implementation of a state-of-the-art video encoder for video-text modal alignment and a video conversation framework called HiLight, which features dual visual towers. The work is divided into two main parts: 1.alignment of video and text modalities; 2.convenient and efficient way to interact with users. Our goal is to address the task of video comprehension in t… ▽ More This technical report presents the implementation of a state-of-the-art video encoder for video-text modal alignment and a video conversation framework called HiLight, which features dual visual towers. The work is divided into two main parts: 1.alignment of video and text modalities; 2.convenient and efficient way to interact with users. Our goal is to address the task of video comprehension in the context of billiards. The report includes a discussion of the concepts and the final solution developed during the task's implementation. △ Less

Submitted 11 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.07061 [pdf, other]

Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

Authors: Weize Chen, Ziming You, Ran Li, Yitong Guan, Chen Qian, Chenyang Zhao, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun

Abstract: The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distributed environments, as most frameworks are limited to… ▽ More The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distributed environments, as most frameworks are limited to single-device setups. Furthermore, these frameworks often rely on hard-coded communication pipelines, limiting their adaptability to dynamic task requirements. Inspired by the concept of the Internet, we propose the Internet of Agents (IoA), a novel framework that addresses these limitations by providing a flexible and scalable platform for LLM-based multi-agent collaboration. IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control. Through extensive experiments on general assistant tasks, embodied AI tasks, and retrieval-augmented generation benchmarks, we demonstrate that IoA consistently outperforms state-of-the-art baselines, showcasing its ability to facilitate effective collaboration among heterogeneous agents. IoA represents a step towards linking diverse agents in an Internet-like environment, where agents can seamlessly collaborate to achieve greater intelligence and capabilities. Our codebase has been released at \url{https://github.com/OpenBMB/IoA}. △ Less

Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: work in progress

arXiv:2407.06767 [pdf, other]

Enhancing Robustness and Security in ISAC Network Design: Leveraging Transmissive Reconfigurable Intelligent Surface with RSMA

Authors: Ziwei Liu, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, Qiong Wu, Nan Cheng

Abstract: In this paper, we propose a novel transmissive reconfigurable intelligent surface transceiver-enhanced robust and secure integrated sensing and communication network. A time-division sensing communication mechanism is designed for the scenario, which enables communication and sensing to share wireless resources. To address the interference management problem and hinder eavesdropping, we implement… ▽ More In this paper, we propose a novel transmissive reconfigurable intelligent surface transceiver-enhanced robust and secure integrated sensing and communication network. A time-division sensing communication mechanism is designed for the scenario, which enables communication and sensing to share wireless resources. To address the interference management problem and hinder eavesdropping, we implement rate-splitting multiple access (RSMA), where the common stream is designed as a useful signal and an artificial noise, while taking into account the imperfect channel state information and modeling the channel for the illegal users in a fine-grained manner as well as giving an upper bound on the error. We introduce the secrecy outage probability and construct an optimization problem with secrecy sum-rate as the objective functions to optimize the common stream beamforming matrix, the private stream beamforming matrix and the timeslot duration variable. Due to the coupling of the optimization variables and the infinity of the error set, the proposed problem is a nonconvex optimization problem that cannot be solved directly. In order to address the above challenges, the block coordinate descent-based second-order cone programming algorithm is used to decouple the optimization variables and solving the problem. Specifically, the problem is decoupled into two subproblems concerning the common stream beamforming matrix, the private stream beamforming matrix, and the timeslot duration variable, which are solved by alternating optimization until convergence is reached. To solve the problem, S-procedure, Bernstein's inequality and successive convex approximation are employed to deal with the objective function and non-convex constraints. Numerical simulation results verify the superiority of the proposed scheme in improving the secrecy energy efficiency and the Cramér-Rao boundary. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06664 [pdf, other]

PDEformer-1: A Foundation Model for One-Dimensional Partial Differential Equations

Authors: Zhanhong Ye, Xiang Huang, Leheng Chen, Zining Liu, Bingyang Wu, Hongsheng Liu, Zidong Wang, Bin Dong

Abstract: This paper introduces PDEformer-1, a versatile neural solver capable of simultaneously addressing various partial differential equations (PDEs). With the PDE represented as a computational graph, we facilitate the seamless integration of symbolic and numeric information inherent in a PDE. A graph Transformer and an implicit neural representation (INR) are employed subsequently to generate mesh-fre… ▽ More This paper introduces PDEformer-1, a versatile neural solver capable of simultaneously addressing various partial differential equations (PDEs). With the PDE represented as a computational graph, we facilitate the seamless integration of symbolic and numeric information inherent in a PDE. A graph Transformer and an implicit neural representation (INR) are employed subsequently to generate mesh-free predicted solutions. We generated a dataset with up to three million samples involving diverse one-dimensional PDEs to pretrain our model. Compared with baseline models trained specifically on benchmark datasets, our pretrained model achieves comparable accuracy via zero-shot inference, and the advantage expands after finetuning. For PDEs new or unseen in the pretraining stage, our model can adapt quickly by finetuning on a relatively small set of examples from the target equation. Additionally, PDEformer-1 demonstrates promising results in the inverse problem of PDE scalar coefficient recovery and coefficient field recovery. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06487 [pdf, other]

Unconventional Spin-Orbit Torques from Sputtered MoTe2 Films

Authors: Shuchen Li, Jonathan Gibbons, Stasiu Chyczewski, Zetai Liu, Hsu-Chih Ni, Jiangchao Qian, Jian-Min Zuo, Jun-Fei Zheng, Wenjuan Zhu, Axel Hoffmann

Abstract: Materials with strong spin-orbit coupling and low crystalline symmetry are promising for generating large unconventional spin-orbit torques (SOTs), such as in-plane field-like (FL) torques and out-of-plane damping-like (DL) torques, which can effectively manipulate and deterministically switch an out-of-plane magnetization without the need for additional external in-plane magnetic fields. Here, we… ▽ More Materials with strong spin-orbit coupling and low crystalline symmetry are promising for generating large unconventional spin-orbit torques (SOTs), such as in-plane field-like (FL) torques and out-of-plane damping-like (DL) torques, which can effectively manipulate and deterministically switch an out-of-plane magnetization without the need for additional external in-plane magnetic fields. Here, we report SOTs generated by magnetron-sputtered 1T' MoTe2/Permalloy (Py; Ni80Fe20)/MgO heterostructures using both spin-torque ferromagnetic resonance (ST-FMR) and second harmonic Hall measurements. We observed unconventional FL and DL torques in our samples due to spins polarized normal to the interface of MoTe2 and Py layers, and studied the influence of crystallographic order and MoTe2 layer thickness on the SOTs. By comparing the Raman spectra of 1T' MoTe2 samples prepared in different ways, we found a tensile strain in sputtered MoTe2 films, which might further enhance the generation of unconventional torques by reducing the symmetry of 1T' MoTe2. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.06249 [pdf, other]

CodeUpdateArena: Benchmarking Knowledge Editing on API Updates

Authors: Zeyu Leo Liu, Shrey Pandit, Xi Ye, Eunsol Choi, Greg Durrett

Abstract: Large language models (LLMs) are increasingly being used to synthesize and reason about source code. However, the static nature of these models' knowledge does not reflect the fact that libraries and API functions they invoke are continuously evolving, with functionality being added or changing. While numerous benchmarks evaluate how LLMs can generate code, no prior work has studied how an LLMs' k… ▽ More Large language models (LLMs) are increasingly being used to synthesize and reason about source code. However, the static nature of these models' knowledge does not reflect the fact that libraries and API functions they invoke are continuously evolving, with functionality being added or changing. While numerous benchmarks evaluate how LLMs can generate code, no prior work has studied how an LLMs' knowledge about code API functions can be updated. To fill this gap, we present CodeUpdateArena, a benchmark for knowledge editing in the code domain. An instance in our benchmark consists of a synthetic API function update paired with a program synthesis example that uses the updated functionality; our goal is to update an LLM to be able to solve this program synthesis example without providing documentation of the update at inference time. Compared to knowledge editing for facts encoded in text, success here is more challenging: a code LLM must correctly reason about the semantics of the modified function rather than just reproduce its syntax. Our dataset is constructed by first prompting GPT-4 to generate atomic and executable function updates. Then, for each update, we generate program synthesis examples whose code solutions are prone to use the update. Our benchmark covers updates of various types to 54 functions from seven diverse Python packages, with a total of 670 program synthesis examples. Our experiments show that prepending documentation of the update to open-source code LLMs (i.e., DeepSeek, CodeLlama) does not allow them to incorporate changes for problem solving, and existing knowledge editing techniques also have substantial room for improvement. We hope our benchmark will inspire new methods for knowledge updating in code LLMs. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Under Review

arXiv:2407.06190 [pdf, other]

4D Contrastive Superflows are Dense 3D Representation Learners

Authors: Xiang Xu, Lingdong Kong, Hui Shuai, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Qingshan Liu

Abstract: In the realm of autonomous driving, accurate 3D perception is the foundation. However, developing such models relies on extensive human annotations -- a process that is both costly and labor-intensive. To address this challenge from a data representation learning perspective, we introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing spatiotempora… ▽ More In the realm of autonomous driving, accurate 3D perception is the foundation. However, developing such models relies on extensive human annotations -- a process that is both costly and labor-intensive. To address this challenge from a data representation learning perspective, we introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing spatiotemporal pretraining objectives. SuperFlow stands out by integrating two key designs: 1) a dense-to-sparse consistency regularization, which promotes insensitivity to point cloud density variations during feature learning, and 2) a flow-based contrastive learning module, carefully crafted to extract meaningful temporal cues from readily available sensor calibrations. To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances the alignment of the knowledge distilled from camera views. Extensive comparative and ablation studies across 11 heterogeneous LiDAR datasets validate our effectiveness and superiority. Additionally, we observe several interesting emerging properties by scaling up the 2D and 3D backbones during pretraining, shedding light on the future research of 3D foundation models for LiDAR-based perception. △ Less

Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: ECCV 2024; 36 pages, 11 figures, 11 tables; Code at https://github.com/Xiangxu-0103/SuperFlow

arXiv:2407.06188 [pdf, other]

CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

Authors: Xinying Guo, Mingyuan Zhang, Haozhe Xie, Chenyang Gu, Ziwei Liu

Abstract: Crowd Motion Generation is essential in entertainment industries such as animation and games as well as in strategic fields like urban simulation and planning. This new task requires an intricate integration of control and generation to realistically synthesize crowd dynamics under specific spatial and semantic constraints, whose challenges are yet to be fully explored. On the one hand, existing h… ▽ More Crowd Motion Generation is essential in entertainment industries such as animation and games as well as in strategic fields like urban simulation and planning. This new task requires an intricate integration of control and generation to realistically synthesize crowd dynamics under specific spatial and semantic constraints, whose challenges are yet to be fully explored. On the one hand, existing human motion generation models typically focus on individual behaviors, neglecting the complexities of collective behaviors. On the other hand, recent methods for multi-person motion generation depend heavily on pre-defined scenarios and are limited to a fixed, small number of inter-person interactions, thus hampering their practicality. To overcome these challenges, we introduce CrowdMoGen, a zero-shot text-driven framework that harnesses the power of Large Language Model (LLM) to incorporate the collective intelligence into the motion generation framework as guidance, thereby enabling generalizable planning and generation of crowd motions without paired training data. Our framework consists of two key components: 1) Crowd Scene Planner that learns to coordinate motions and dynamics according to specific scene contexts or introduced perturbations, and 2) Collective Motion Generator that efficiently synthesizes the required collective motions based on the holistic plans. Extensive quantitative and qualitative experiments have validated the effectiveness of our framework, which not only fills a critical gap by providing scalable and generalizable solutions for Crowd Motion Generation task but also achieves high levels of realism and flexibility. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Project page: https://gxyes.github.io/projects/CrowdMoGen.html

arXiv:2407.06129 [pdf, other]

Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization

Authors: Hannah K. Bako, Arshnoor Bhutani, Xinyi Liu, Kwesi A. Cobbina, Zhicheng Liu

Abstract: Automatically generating data visualizations in response to human utterances on datasets necessitates a deep semantic understanding of the data utterance, including implicit and explicit references to data attributes, visualization tasks, and necessary data preparation steps. Natural Language Interfaces (NLIs) for data visualization have explored ways to infer such information, yet challenges pers… ▽ More Automatically generating data visualizations in response to human utterances on datasets necessitates a deep semantic understanding of the data utterance, including implicit and explicit references to data attributes, visualization tasks, and necessary data preparation steps. Natural Language Interfaces (NLIs) for data visualization have explored ways to infer such information, yet challenges persist due to inherent uncertainty in human speech. Recent advances in Large Language Models (LLMs) provide an avenue to address these challenges, but their ability to extract the relevant semantic information remains unexplored. In this study, we evaluate four publicly available LLMs (GPT-4, Gemini-Pro, Llama3, and Mixtral), investigating their ability to comprehend utterances even in the presence of uncertainty and identify the relevant data context and visual tasks. Our findings reveal that LLMs are sensitive to uncertainties in utterances. Despite this sensitivity, they are able to extract the relevant data context. However, LLMs struggle with inferring visualization tasks. Based on these results, we highlight future research directions on using LLMs for visualization generation. △ Less

Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: 5 pages, 4 figures, IEEE VIS short papers

arXiv:2407.06067 [pdf, other]

Faraday laser pumped cesium beam clock

Authors: Hangbo Shi, Xiaomin Qin, Haijun Chen, Yufei Yan, Ziqi Lu, Zhiyang Wang, Zijie Liu, Xiaolei Guan, Qiang Wei, Tiantian Shi, Jingbiao Chen

Abstract: We realize a high-performance compact optically pumped cesium beam clock using Faraday laser simultaneously as pumping and detection lasers. The Faraday laser, which is frequency stabilized by modulation transfer spectroscopy (MTS) technique, has narrow linewidth and superior frequency stability. Measured by optical heterodyne method between two identical systems, the linewidth of the Faraday lase… ▽ More We realize a high-performance compact optically pumped cesium beam clock using Faraday laser simultaneously as pumping and detection lasers. The Faraday laser, which is frequency stabilized by modulation transfer spectroscopy (MTS) technique, has narrow linewidth and superior frequency stability. Measured by optical heterodyne method between two identical systems, the linewidth of the Faraday laser is 2.5 kHz after MTS locking, and the fractional frequency stability of the Faraday laser is optimized to $1.8\times{10}^{-12}/\sqrtτ$. Based on this high-performance Faraday laser, the cesium beam clock realizes a signal-to-noise ratio (SNR) in 1 Hz bandwidth of $39600$ when the cesium oven temperature is 130°C. Frequency-compared with Hydrogen maser, the fractional frequency stability of the Faraday laser pumped cesium beam clock can reach $1.3\times{10}^{-12}/\sqrtτ$ and drops to $1.4\times{10}^{-14}$ at 10000 s when the cesium oven temperature is 110°C. %, which is the best reported result compared with other cesium beam clocks. This Faraday laser pumped cesium beam clock demonstrates its excellent performance, and its great potential in the fields of timekeeping, navigation, and communication. Meanwhile, the Faraday laser, as a high-performance optical frequency standard, can also contribute to the development of other applications in quantum metrology, precision measurement and atomic physics. △ Less

Submitted 11 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.06020 [pdf, other]

Electron-only reconnection and inverse magnetic-energy transfer at sub-ion scales

Authors: Zhuo Liu, Caio Silva, Lucio M. Milanese, Muni Zhou, Noah R. Mandell, Nuno F. Loureiro

Abstract: We derive, and validate numerically, an analytical model for electron-only magnetic reconnection applicable to strongly magnetized (low-beta) plasmas. Our model predicts sub-ion-scale reconnection rates significantly higher than those pertaining to large-scale reconnection, aligning with recent observations and simulations. We apply this reconnection model to the problem of inverse magnetic-energy… ▽ More We derive, and validate numerically, an analytical model for electron-only magnetic reconnection applicable to strongly magnetized (low-beta) plasmas. Our model predicts sub-ion-scale reconnection rates significantly higher than those pertaining to large-scale reconnection, aligning with recent observations and simulations. We apply this reconnection model to the problem of inverse magnetic-energy transfer at sub-ion scales. We derive time-dependent scaling laws for the magnetic energy decay and the typical magnetic structure dimensions that differ from those previously found in the MHD regime. These scaling laws are validated via two- and three-dimensional simulations, demonstrating that sub-ion scale magnetic fields can reach large, system-size scales via successive coalescence. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05758 [pdf, other]

Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports

Authors: Yutong Zhang, Yi Pan, Tianyang Zhong, Peixin Dong, Kangni Xie, Yuxiao Liu, Hanqi Jiang, Zhengliang Liu, Shijie Zhao, Tuo Zhang, Xi Jiang, Dinggang Shen, Tianming Liu, Xin Zhang

Abstract: Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecti… ▽ More Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence (AGI) for computer vision, showcasing their potential in the biomedical domain. In this study, we evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets, including 5 medical imaging categories (dermatology, radiology, dentistry, ophthalmology, and endoscopy), and 3 radiology report datasets. The investigated tasks encompass disease classification, lesion segmentation, anatomical localization, disease diagnosis, report generation, and lesion detection. Our experimental results demonstrated that Gemini-series models excelled in report generation and lesion detection but faces challenges in disease classification and anatomical localization. Conversely, GPT-series models exhibited proficiency in lesion segmentation and anatomical localization but encountered difficulties in disease diagnosis and lesion detection. Additionally, both the Gemini series and GPT series contain models that have demonstrated commendable generation efficiency. While both models hold promise in reducing physician workload, alleviating pressure on limited healthcare resources, and fostering collaboration between clinical practitioners and artificial intelligence technologies, substantial enhancements and comprehensive validations remain imperative before clinical deployment. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05681 [pdf]

Bulk high-temperature superconductivity in the high-pressure tetragonal phase of bilayer La2PrNi2O7

Authors: Ningning Wang, Gang Wang, Xiaoling Shen, Jun Hou, Jun Luo, Xiaoping Ma, Huaixin Yang, Lifen Shi, Jie Dou, Jie Feng, Jie Yang, Yunqing Shi, Zhian Ren, Hanming Ma, Pengtao Yang, Ziyi Liu, Yue Liu, Hua Zhang, Xiaoli Dong, Yuxin Wang, Kun Jiang, Jiangping Hu, Stuart Calder, Jiaqiang Yan, Jianping Sun , et al. (4 additional authors not shown)

Abstract: The Ruddlesden-Popper (R-P) bilayer nickelate, La3Ni2O7, was recently found to show signatures of high-temperature superconductivity (HTSC) at pressures above 14 GPa. Subsequent investigations achieved zero resistance in single- and poly-crystalline samples under hydrostatic pressure conditions. Yet, obvious diamagnetic signals, the other hallmark of superconductors, are still lacking owing to the… ▽ More The Ruddlesden-Popper (R-P) bilayer nickelate, La3Ni2O7, was recently found to show signatures of high-temperature superconductivity (HTSC) at pressures above 14 GPa. Subsequent investigations achieved zero resistance in single- and poly-crystalline samples under hydrostatic pressure conditions. Yet, obvious diamagnetic signals, the other hallmark of superconductors, are still lacking owing to the filamentary nature with low superconducting volume fraction. The presence of a novel "1313" polymorph and competing R-P phases obscured proper identification of the phase for HTSC. Thus, achieving bulk HTSC and identifying the phase at play are the most prominent tasks at present. Here, we address these issues in the praseodymium (Pr)-doped La2PrNi2O7 polycrystalline samples. We find that the substitutions of Pr for La effectively inhibits the intergrowth of different R-P phases, resulting in nearly pure bilayer structure. For La2PrNi2O7, pressure-induced orthorhombic-to-tetragonal structural transition takes place at Pc ~ 11 GPa, above which HTSC emerges gradually upon further compression. The superconducting transition temperatures at 18-20 GPa reach Tconset = 82.5 K and Tczero = 60 K, which are the highest values among known nickelate superconductors. More importantly, bulk HTSC was testified by detecting clear diamagnetic signals below ~75 K corresponding to an estimated superconducting volume fraction ~ 57(5)% at 20 GPa. Our results not only resolve the existing controversies but also illuminate directions for exploring bulk HTSC in the bilayer nickelates. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05671 [pdf, other]

MSTF: Multiscale Transformer for Incomplete Trajectory Prediction

Authors: Zhanwen Liu, Chao Li, Nan Yang, Yang Wang, Jiaqi Ma, Guangliang Cheng, Xiangmo Zhao

Abstract: Motion forecasting plays a pivotal role in autonomous driving systems, enabling vehicles to execute collision warnings and rational local-path planning based on predictions of the surrounding vehicles. However, prevalent methods often assume complete observed trajectories, neglecting the potential impact of missing values induced by object occlusion, scope limitation, and sensor failures. Such ove… ▽ More Motion forecasting plays a pivotal role in autonomous driving systems, enabling vehicles to execute collision warnings and rational local-path planning based on predictions of the surrounding vehicles. However, prevalent methods often assume complete observed trajectories, neglecting the potential impact of missing values induced by object occlusion, scope limitation, and sensor failures. Such oversights inevitably compromise the accuracy of trajectory predictions. To tackle this challenge, we propose an end-to-end framework, termed Multiscale Transformer (MSTF), meticulously crafted for incomplete trajectory prediction. MSTF integrates a Multiscale Attention Head (MAH) and an Information Increment-based Pattern Adaptive (IIPA) module. Specifically, the MAH component concurrently captures multiscale motion representation of trajectory sequence from various temporal granularities, utilizing a multi-head attention mechanism. This approach facilitates the modeling of global dependencies in motion across different scales, thereby mitigating the adverse effects of missing values. Additionally, the IIPA module adaptively extracts continuity representation of motion across time steps by analyzing missing patterns in the data. The continuity representation delineates motion trend at a higher level, guiding MSTF to generate predictions consistent with motion continuity. We evaluate our proposed MSTF model using two large-scale real-world datasets. Experimental results demonstrate that MSTF surpasses state-of-the-art (SOTA) models in the task of incomplete trajectory prediction, showcasing its efficacy in addressing the challenges posed by missing values in motion forecasting for autonomous driving systems. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05467 [pdf, other]

The infrastructure powering IBM's Gen AI model development

Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering efficient and high-performing AI training requires an end-to-end solution that combines hardware, software and holistic telemetry to cater for multiple types of AI workloads. In this report, we describe IBM's hybrid cloud infrastructure that powers our generative AI model development. This infrastructure includes (1) Vela: an AI-optimized supercomputing capability directly integrated into the IBM Cloud, delivering scalable, dynamic, multi-tenant and geographically distributed infrastructure for large-scale model training and other AI workflow steps and (2) Blue Vela: a large-scale, purpose-built, on-premises hosting environment that is optimized to support our largest and most ambitious AI model training tasks. Vela provides IBM with the dual benefit of high performance for internal use along with the flexibility to adapt to an evolving commercial landscape. Blue Vela provides us with the benefits of rapid development of our largest and most ambitious models, as well as future-proofing against the evolving model landscape in the industry. Taken together, they provide IBM with the ability to rapidly innovate in the development of both AI models and commercial offerings. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

arXiv:2407.05382 [pdf, other]

Rethinking Unsupervised Outlier Detection via Multiple Thresholding

Authors: Zhonghang Liu, Panzhong Lu, Guoyang Xie, Zhichao Lu, Wen-Yan Lin

Abstract: In the realm of unsupervised image outlier detection, assigning outlier scores holds greater significance than its subsequent task: thresholding for predicting labels. This is because determining the optimal threshold on non-separable outlier score functions is an ill-posed problem. However, the lack of predicted labels not only hiders some real applications of current outlier detectors but also c… ▽ More In the realm of unsupervised image outlier detection, assigning outlier scores holds greater significance than its subsequent task: thresholding for predicting labels. This is because determining the optimal threshold on non-separable outlier score functions is an ill-posed problem. However, the lack of predicted labels not only hiders some real applications of current outlier detectors but also causes these methods not to be enhanced by leveraging the dataset's self-supervision. To advance existing scoring methods, we propose a multiple thresholding (Multi-T) module. It generates two thresholds that isolate inliers and outliers from the unlabelled target dataset, whereas outliers are employed to obtain better feature representation while inliers provide an uncontaminated manifold. Extensive experiments verify that Multi-T can significantly improve proposed outlier scoring methods. Moreover, Multi-T contributes to a naive distance-based method being state-of-the-art. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.05334 [pdf, other]

Pion photo-production of nucleon excited states with Hamiltonian effective field theory

Authors: Yu Zhuge, Zhan-Wei Liu, Derek B. Leinweber, Anthony W. Thomas

Abstract: We refine our previous calculation of multipole amplitude $E_{0+}$ for pion photo-production process, $γN\rightarrowπN$. The treatment of final state interactions is based upon an earlier analysis of pion-nucleon scattering within Hamiltonian effective field theory, supplemented by incorporating contributions from the $N^*(1650)$ and the $KΛ$ coupled channel. The contribution from the bare state c… ▽ More We refine our previous calculation of multipole amplitude $E_{0+}$ for pion photo-production process, $γN\rightarrowπN$. The treatment of final state interactions is based upon an earlier analysis of pion-nucleon scattering within Hamiltonian effective field theory, supplemented by incorporating contributions from the $N^*(1650)$ and the $KΛ$ coupled channel. The contribution from the bare state corresponding to the $N^*(1650)$ significantly enhances our results. Additionally, we also compute the multipole amplitude $M_{1-}$, which is of direct relevance to the Roper resonance. The results are comparable with other dynamical coupled channel models, even though the contribution from the bare state (interpreted as a 2s excitation) in this channel is small because of its large mass. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: 12 pages, 7 figures, 3 tables

Report number: ADP-24-11/T1250

arXiv:2407.05117 [pdf, ps, other]

Search for the baryon number and lepton number violating decays $τ^-\to Λπ^-$ and $τ^-\to \barΛπ^-$ at Belle II

Authors: Belle II Collaboration, I. Adachi, L. Aggarwal, H. Ahmed, H. Aihara, N. Akopov, A. Aloisio, N. Althubiti, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, J. Baudot, A. Baur, A. Beaubien , et al. (349 additional authors not shown)

Abstract: We present a search for the baryon number $B$ and lepton number $L$ violating decays $τ^- \rightarrow Λπ^-$ and $τ^- \rightarrow \barΛ π^-$ produced from the $e^+e^-\to τ^+τ^-$ process, using a 364 fb$^{-1}$ data sample collected by the Belle~II experiment at the SuperKEKB collider. No evidence of signal is found in either decay mode, which have $|Δ(B-L)|$ equal to $2$ and $0$, respectively. Upper… ▽ More We present a search for the baryon number $B$ and lepton number $L$ violating decays $τ^- \rightarrow Λπ^-$ and $τ^- \rightarrow \barΛ π^-$ produced from the $e^+e^-\to τ^+τ^-$ process, using a 364 fb$^{-1}$ data sample collected by the Belle~II experiment at the SuperKEKB collider. No evidence of signal is found in either decay mode, which have $|Δ(B-L)|$ equal to $2$ and $0$, respectively. Upper limits at 90\% credibility level on the branching fractions of $τ^- \rightarrow Λπ^-$ and $τ^- \rightarrow \barΛπ^-$ are determined to be $4.7 \times 10^{-8}$ and $4.3 \times 10^{-8}$, respectively. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: 8 pages, 4 figures

Report number: Belle II Preprint 2024-020; KEK Preprint 2024-17

arXiv:2407.04894 [pdf, other]

An embedding-aware continuum thin shell formulation

Authors: Abhishek Ghosh, Andrew McBride, Zhaowei Liu, Luca Heltai, Paul Steinmann, Prashant Saxena

Abstract: Cutting-edge smart materials are transforming the domains of soft robotics, actuators, and sensors by harnessing diverse non-mechanical stimuli, such as electric and magnetic fields. Accurately modelling their physical behaviour necessitates an understanding of the complex interactions between the structural deformation and the fields in the surrounding medium. For thin shell structures, this chal… ▽ More Cutting-edge smart materials are transforming the domains of soft robotics, actuators, and sensors by harnessing diverse non-mechanical stimuli, such as electric and magnetic fields. Accurately modelling their physical behaviour necessitates an understanding of the complex interactions between the structural deformation and the fields in the surrounding medium. For thin shell structures, this challenge is addressed by developing a shell model that effectively incorporates the three-dimensional field it is embedded in by appropriately accounting for the relevant boundary conditions. This study presents a model for the nonlinear deformation of thin hyperelastic shells, incorporating Kirchhoff-Love assumptions and a rigorous variational approach. The shell theory is derived from 3D nonlinear elasticity by dimension reduction while preserving the boundary conditions at the top and bottom surfaces of the shell. Consequently, unlike classical shell theories, this approach can distinguish between pressure loads applied at the top and bottom surfaces, and delivers a platform to include multi-physics coupling. Numerical examples are presented to illustrate the theory and provide a physical interpretation of the novel mechanical variables of the model. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2308.12300

arXiv:2407.04794 [pdf, other]

On Evaluating The Performance of Watermarked Machine-Generated Texts Under Adversarial Attacks

Authors: Zesen Liu, Tianshuo Cong, Xinlei He, Qi Li

Abstract: Large Language Models (LLMs) excel in various applications, including text generation and complex tasks. However, the misuse of LLMs raises concerns about the authenticity and ethical implications of the content they produce, such as deepfake news, academic fraud, and copyright infringement. Watermarking techniques, which embed identifiable markers in machine-generated text, offer a promising solu… ▽ More Large Language Models (LLMs) excel in various applications, including text generation and complex tasks. However, the misuse of LLMs raises concerns about the authenticity and ethical implications of the content they produce, such as deepfake news, academic fraud, and copyright infringement. Watermarking techniques, which embed identifiable markers in machine-generated text, offer a promising solution to these issues by allowing for content verification and origin tracing. Unfortunately, the robustness of current LLM watermarking schemes under potential watermark removal attacks has not been comprehensively explored. In this paper, to fill this gap, we first systematically comb the mainstream watermarking schemes and removal attacks on machine-generated texts, and then we categorize them into pre-text (before text generation) and post-text (after text generation) classes so that we can conduct diversified analyses. In our experiments, we evaluate eight watermarks (five pre-text, three post-text) and twelve attacks (two pre-text, ten post-text) across 87 scenarios. Evaluation results indicate that (1) KGW and Exponential watermarks offer high text quality and watermark retention but remain vulnerable to most attacks; (2) Post-text attacks are found to be more efficient and practical than pre-text attacks; (3) Pre-text watermarks are generally more imperceptible, as they do not alter text fluency, unlike post-text watermarks; (4) Additionally, combined attack methods can significantly increase effectiveness, highlighting the need for more robust watermarking solutions. Our study underscores the vulnerabilities of current techniques and the necessity for developing more resilient schemes. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04416 [pdf, other]

Improving Audio Generation with Visual Enhanced Caption

Authors: Yi Yuan, Dongya Jia, Xiaobin Zhuang, Yuanzhe Chen, Zhengxi Liu, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xubo Liu, Mark D. Plumbley, Wenwu Wang

Abstract: Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the low quality and relatively small quantity of training data. In this work, we aim to create a large-scale audio dataset with rich captions for improving audi… ▽ More Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the low quality and relatively small quantity of training data. In this work, we aim to create a large-scale audio dataset with rich captions for improving audio generation models. We develop an automated pipeline to generate detailed captions for audio-visual datasets by transforming predicted visual captions, audio captions, and tagging labels into comprehensive descriptions using a Large Language Model (LLM). We introduce Sound-VECaps, a dataset comprising 1.66M high-quality audio-caption pairs with enriched details including audio event orders, occurred places and environment information. We demonstrate that training with Sound-VECaps significantly enhances the capability of text-to-audio generation models to comprehend and generate audio from complex input prompts, improving overall system performance. Furthermore, we conduct ablation studies of Sound-VECaps across several audio-language tasks, suggesting its potential in advancing audio-text representation learning. Our dataset and models are available online. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 5 pages with 1 appendix

arXiv:2407.03961 [pdf, other]

Leveraging Latent Diffusion Models for Training-Free In-Distribution Data Augmentation for Surface Defect Detection

Authors: Federico Girella, Ziyue Liu, Franco Fummi, Francesco Setti, Marco Cristani, Luigi Capogrosso

Abstract: Defect detection is the task of identifying defects in production samples. Usually, defect detection classifiers are trained on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter are consistently fewer than normal samples. State-of-the-art data augmentation procedures add synthetic defect data by superimposing artifacts to normal s… ▽ More Defect detection is the task of identifying defects in production samples. Usually, defect detection classifiers are trained on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter are consistently fewer than normal samples. State-of-the-art data augmentation procedures add synthetic defect data by superimposing artifacts to normal samples to mitigate problems related to unbalanced training data. These techniques often produce out-of-distribution images, resulting in systems that learn what is not a normal sample but cannot accurately identify what a defect looks like. In this work, we introduce DIAG, a training-free Diffusion-based In-distribution Anomaly Generation pipeline for data augmentation. Unlike conventional image generation techniques, we implement a human-in-the-loop pipeline, where domain experts provide multimodal guidance to the model through text descriptions and region localization of the possible anomalies. This strategic shift enhances the interpretability of results and fosters a more robust human feedback loop, facilitating iterative improvements of the generated outputs. Remarkably, our approach operates in a zero-shot manner, avoiding time-consuming fine-tuning procedures while achieving superior performance. We demonstrate the efficacy and versatility of DIAG with respect to state-of-the-art data augmentation approaches on the challenging KSDD2 dataset, with an improvement in AP of approximately 18% when positive samples are available and 28% when they are missing. The source code is available at https://github.com/intelligolabs/DIAG. △ Less

Submitted 11 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

Comments: Accepted at the 21st International Conference on Content-Based Multimedia Indexing (CBMI 2024)

arXiv:2407.03868 [pdf]

Observation of exceptional line semimetal in three-dimensional non-Hermitian phononic crystals

Authors: Yejian Hu, Jien Wu, Peidong Ye, Weiyin Deng, Jiuyang Lu, Xueqin Huang, Ziyu Wang, Manzhu Ke, Zhengyou Liu

Abstract: Non-Hermitian topological phases, which exhibit unique features such as skin effect and exceptional points originated from nontrivial band topologies in complex plane, have attracted enormous attention in condensed-matter physics and metamaterials. Here we report the realization of an exceptional line semimetal in a three-dimensional non-Hermitian phononic crystal. A pair of exceptional rings with… ▽ More Non-Hermitian topological phases, which exhibit unique features such as skin effect and exceptional points originated from nontrivial band topologies in complex plane, have attracted enormous attention in condensed-matter physics and metamaterials. Here we report the realization of an exceptional line semimetal in a three-dimensional non-Hermitian phononic crystal. A pair of exceptional rings with opposite topologies are connected by the drumhead bulk states in the first Brillouin zone. The exceptional rings not only possess wave-function topology and thus result in the drumhead surface states, but also host spectral topology and thereby give rise to the hybrid-order geometry-dependent skin effect in three dimensions. Our experimental results evidence the complete non-Hermitian bulk-boundary correspondence of the three-dimensional exceptional line semimetal, and may pave the way for designing non-Hermitian acoustic devices. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 5 figures

arXiv:2407.03816 [pdf]

Compact ultra-broadband light coupling on chip via nonadiabatic pumping

Authors: Weiwei Liu, Chijun Li, Bing Wang, Tianyan Chai, Lingzhi Zheng, Zhuoxiong Liu, Haoru Zhang, Shuaifei Ren, Xiaohong Li, Cheng Zeng, Jinsong Xia, Peixiang Lu

Abstract: Enlarging bandwidth capacity of the integrated photonic systems demands efficient and broadband light coupling among optical elements, which has been a vital issue in integrated photonics. Here, we have developed a compact ultra-broadband light coupling strategy based on nonadiabatic pumping in coupled optical waveguides, and experimentally demonstrated the designs in thin-film lithium niobate on… ▽ More Enlarging bandwidth capacity of the integrated photonic systems demands efficient and broadband light coupling among optical elements, which has been a vital issue in integrated photonics. Here, we have developed a compact ultra-broadband light coupling strategy based on nonadiabatic pumping in coupled optical waveguides, and experimentally demonstrated the designs in thin-film lithium niobate on insulator (LNOI) platform. We found that nonadiabatic transition would produce a decreased dispersion of the phases related to eigenstates in the waveguides. As a consequence, we realized high-efficiency directional transfer between edgestates for various wavelengths covering a 1-dB bandwidth of ~320 nm in experiment (>400 nm in simulation), with a coupling length (~50 μm) approximately 1/10 of that required in the adiabatic regime. Furthermore, we have constructed complex functional devices including beamsplitter and multiple-level cascaded networks for broadband light routing and splitting. Our work preserves significant advantages simultaneously in extending the operation bandwidth and minimizing the footprint, which demonstrates great potential for large-scale and compact photonic integration on chip. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03699 [pdf, other]

Generalized Robust Fundus Photography-based Vision Loss Estimation for High Myopia

Authors: Zipei Yan, Zhile Liang, Zhengji Liu, Shuai Wang, Rachel Ka-Man Chun, Jizhou Li, Chea-su Kee, Dong Liang

Abstract: High myopia significantly increases the risk of irreversible vision loss. Traditional perimetry-based visual field (VF) assessment provides systematic quantification of visual loss but it is subjective and time-consuming. Consequently, machine learning models utilizing fundus photographs to estimate VF have emerged as promising alternatives. However, due to the high variability and the limited ava… ▽ More High myopia significantly increases the risk of irreversible vision loss. Traditional perimetry-based visual field (VF) assessment provides systematic quantification of visual loss but it is subjective and time-consuming. Consequently, machine learning models utilizing fundus photographs to estimate VF have emerged as promising alternatives. However, due to the high variability and the limited availability of VF data, existing VF estimation models fail to generalize well, particularly when facing out-of-distribution data across diverse centers and populations. To tackle this challenge, we propose a novel, parameter-efficient framework to enhance the generalized robustness of VF estimation on both in- and out-of-distribution data. Specifically, we design a Refinement-by-Denoising (RED) module for feature refinement and adaptation from pretrained vision models, aiming to learn high-entropy feature representations and to mitigate the domain gap effectively and efficiently. Through independent validation on two distinct real-world datasets from separate centers, our method significantly outperforms existing approaches in RMSE, MAE and correlation coefficient for both internal and external validation. Our proposed framework benefits both in- and out-of-distribution VF estimation, offering significant clinical implications and potential utility in real-world ophthalmic practices. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Accepted by MICCAI 2024, code: https://github.com/yanzipei/VF_RED

arXiv:2407.03697 [pdf, other]

Charm physics with overlap fermions on 2+1-flavor domain wall fermion configurations

Authors: Donghao Li, Ying Chen, Ming Gong, Keh-Fei Liu, Zhaofeng Liu, Tingxiao Wang

Abstract: Decay constants of pseudoscalar mesons $D$, $D_s$, $η_c$ and vector mesons $D^*$, $D_s^*$, $J/ψ$ are determined from $N_f=2+1$ lattice QCD at a lattice spacing $a\sim0.08$ fm. For vector mesons, the decay constants defined by tensor currents are given in the $\overline{\rm MS}$ scheme at $2$ GeV. The calculation is performed on domain wall fermion configurations generated by the RBC-UKQCD Collabor… ▽ More Decay constants of pseudoscalar mesons $D$, $D_s$, $η_c$ and vector mesons $D^*$, $D_s^*$, $J/ψ$ are determined from $N_f=2+1$ lattice QCD at a lattice spacing $a\sim0.08$ fm. For vector mesons, the decay constants defined by tensor currents are given in the $\overline{\rm MS}$ scheme at $2$ GeV. The calculation is performed on domain wall fermion configurations generated by the RBC-UKQCD Collaborations and the overlap fermion action is used for the valence quarks. Comparing the current results with our previous ones at a coarser lattice spacing $a\sim0.11$ fm gives us a better understanding of the discretization error. We obtain $f_{D_s^*}^T(\overline{\rm MS},\text{2 GeV})/f_{D_s^*}=0.907(20)$ with a better precision than our previous result. Combining our $f_{D_s^*}=277(11)$ MeV with the total width of $D_s^*$ determined in a recent work gives a branching fraction $4.26(52)\times10^{-5}$ for $D_s^*$ leptonic decay. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 18 pages, 7 figures, 12 tables

arXiv:2407.03541 [pdf]

Parallel fast random bit generation based on spectrotemporally uncorrelated Brillouin random fiber lasing oscillation

Authors: Yuxi Pang, Shaonian Ma, Qiang Ji, Xian Zhao, Zengguang Qin, Zhaojun Liu, Ping Lu, Xiaoyi Bao, Yanping Xu

Abstract: Correlations existing between spectral components in multi-wavelength lasers have been the key challenge that hinders these laser sources from being developed to chaotic comb entropy sources for parallel random bit generation. Herein, spectrotemporally uncorrelated multi-order Stokes/anti-Stokes emissions are achieved by cooperatively exploiting nonlinear optical processes including cascaded stimu… ▽ More Correlations existing between spectral components in multi-wavelength lasers have been the key challenge that hinders these laser sources from being developed to chaotic comb entropy sources for parallel random bit generation. Herein, spectrotemporally uncorrelated multi-order Stokes/anti-Stokes emissions are achieved by cooperatively exploiting nonlinear optical processes including cascaded stimulated Brillouin scattering and quasi-phase-matched four-wave mixing in a Brillouin random fiber laser. Chaotic instabilities induced by random mode resonance are enhanced and disorderly redistributed among different lasing lines through complex nonlinear optical interactions, which comprehensively releases the inherent correlation among multiple Stokes/anti-Stokes emission lines, realizing a chaotic frequency comb with multiple spectrotemporally uncorrelated channels. Parallel fast random bit generation is fulfilled with 31 channels, single-channel bit rate of 35-Gbps and total bit rate of 1.085-Tbps. National Institute of Standards and Technology statistic tests verify the randomness of generated bit streams. This work, in a simple and efficient way, breaks the correlation barrier for utilizing multi-wavelength laser to achieve high-quality spectrotemporally uncorrelated chaotic laser source, opening new avenues for achieving greatly accelerated random bit generation through parallelization and potentially revolutionizing the current architecture of secure communication and high-performance computation. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03388 [pdf]

Passenger Route and Departure Time Guidance under Disruptions in Oversaturated Urban Rail Transit Networks

Authors: Siyu Zhuo, Xiaoning Zhu, Pan Shang, Zhengke Liu

Abstract: The urban rail transit (URT) system attracts many commuters with its punctuality and convenience. However, it is vulnerable to disruptions caused by factors like extreme weather and temporary equipment failures, which greatly impact passengers' journeys and diminish the system's service quality. In this study, we propose targeted travel guidance for passengers at different space-time locations by… ▽ More The urban rail transit (URT) system attracts many commuters with its punctuality and convenience. However, it is vulnerable to disruptions caused by factors like extreme weather and temporary equipment failures, which greatly impact passengers' journeys and diminish the system's service quality. In this study, we propose targeted travel guidance for passengers at different space-time locations by devising passenger rescheduling strategies during disruptions. This guidance not only offers insights into route changes but also provides practical recommendations for delaying departure times when required. We present a novel three-feature four-group passenger classification principle, integrating temporal, spatial, and spatio-temporal features to classify passengers in disrupted URT networks. This approach results in the creation of four distinct solution spaces based on passenger groups. A mixed integer programming model is built based on individual level considering the First-in-First-out (FIFO) rule in oversaturated networks. Additionally, we present a two-stage solution approach for handling the complex issues in large-scale networks. Experimental results from both small-scale artificial networks and the real-world Beijing URT network validate the efficacy of our proposed passenger rescheduling strategies in mitigating disruptions. Specifically, when compared to scenarios with no travel guidance during disruptions, our strategies achieve a substantial reduction in total passenger travel time by 29.7% and 50.9% respectively, underscoring the effectiveness in managing unexpected disruptions. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03195 [pdf, other]

Incremental Gauss--Newton Methods with Superlinear Convergence Rates

Authors: Zhiling Zhou, Zhuanghua Liu, Chengchang Liu, Luo Luo

Abstract: This paper addresses the challenge of solving large-scale nonlinear equations with Hölder continuous Jacobians. We introduce a novel Incremental Gauss--Newton (IGN) method within explicit superlinear convergence rate, which outperforms existing methods that only achieve linear convergence rate. In particular, we formulate our problem by the nonlinear least squares with finite-sum structure, and ou… ▽ More This paper addresses the challenge of solving large-scale nonlinear equations with Hölder continuous Jacobians. We introduce a novel Incremental Gauss--Newton (IGN) method within explicit superlinear convergence rate, which outperforms existing methods that only achieve linear convergence rate. In particular, we formulate our problem by the nonlinear least squares with finite-sum structure, and our method incrementally iterates with the information of one component in each round. We also provide a mini-batch extension to our IGN method that obtains an even faster superlinear convergence rate. Furthermore, we conduct numerical experiments to show the advantages of the proposed methods. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 37 pages, 9 figures

arXiv:2407.03037 [pdf, other]

Vision-driven Automated Mobile GUI Testing via Multimodal Large Language Model

Authors: Zhe Liu, Cheng Li, Chunyang Chen, Junjie Wang, Boyu Wu, Yawen Wang, Jun Hu, Qing Wang

Abstract: With the advancement of software rendering techniques, GUI pages in mobile apps now encompass a wealth of visual information, where the visual semantics of each page contribute to the overall app logic, presenting new challenges to software testing. Despite the progress in automated Graphical User Interface (GUI) testing, the absence of testing oracles has constrained its efficacy to identify only… ▽ More With the advancement of software rendering techniques, GUI pages in mobile apps now encompass a wealth of visual information, where the visual semantics of each page contribute to the overall app logic, presenting new challenges to software testing. Despite the progress in automated Graphical User Interface (GUI) testing, the absence of testing oracles has constrained its efficacy to identify only crash bugs with evident abnormal signals. Nonetheless, there are still a considerable number of non-crash bugs, ranging from unexpected behaviors to misalignments, often evading detection by existing techniques. While these bugs can exhibit visual cues that serve as potential testing oracles, they often entail a sequence of screenshots, and detecting them necessitates an understanding of the operational logic among GUI page transitions, which is challenging traditional techniques. Considering the remarkable performance of Multimodal Large Language Models (MLLM) in visual and language understanding, this paper proposes a vision-driven automated GUI testing approach VisionDroid to detect non-crash functional bugs with MLLM. It begins by extracting GUI text information and aligning it with screenshots to form a vision prompt, enabling MLLM to understand GUI context. The function-aware explorer then employs MLLM for deeper and function-oriented GUI page exploration, while the logic-aware bug detector segments the entire exploration history into logically cohesive parts and prompts the MLLM for bug detection. We evaluate VisionDroid on three datasets and compare it with 10 baselines, demonstrating its excellent performance. The ablation study further proves the contribution of each module. Moreover, VisionDroid identifies 29 new bugs on Google Play, of which 19 have been confirmed and fixed. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02899 [pdf, other]

Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be… ▽ More A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02824 [pdf, other]

Exploring the Capabilities of LLMs for Code Change Related Tasks

Authors: Lishui Fan, Jiakun Liu, Zhongxin Liu, David Lo, Xin Xia, Shanping Li

Abstract: Developers deal with code-change-related tasks daily, e.g., reviewing code. Pre-trained code and code-change-oriented models have been adapted to help developers with such tasks. Recently, large language models (LLMs) have shown their effectiveness in code-related tasks. However, existing LLMs for code focus on general code syntax and semantics rather than the differences between two code versions… ▽ More Developers deal with code-change-related tasks daily, e.g., reviewing code. Pre-trained code and code-change-oriented models have been adapted to help developers with such tasks. Recently, large language models (LLMs) have shown their effectiveness in code-related tasks. However, existing LLMs for code focus on general code syntax and semantics rather than the differences between two code versions. Thus, it is an open question how LLMs perform on code-change-related tasks. To answer this question, we conduct an empirical study using \textgreater 1B parameters LLMs on three code-change-related tasks, i.e., code review generation, commit message generation, and just-in-time comment update, with in-context learning (ICL) and parameter-efficient fine-tuning (PEFT, including LoRA and prefix-tuning). We observe that the performance of LLMs is poor without examples and generally improves with examples, but more examples do not always lead to better performance. LLMs tuned with LoRA have comparable performance to the state-of-the-art small pre-trained models. Larger models are not always better, but \textsc{Llama~2} and \textsc{Code~Llama} families are always the best. The best LLMs outperform small pre-trained models on the code changes that only modify comments and perform comparably on other code changes. We suggest future work should focus more on guiding LLMs to learn the knowledge specific to the changes related to code rather than comments for code-change-related tasks. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02779 [pdf, other]

Croppable Knowledge Graph Embedding

Authors: Yushan Zhu, Wen Zhang, Zhiqiang Liu, Mingyang Chen, Lei Liang, Huajun Chen

Abstract: Knowledge Graph Embedding (KGE) is a common method for Knowledge Graphs (KGs) to serve various artificial intelligence tasks. The suitable dimensions of the embeddings depend on the storage and computing conditions of the specific application scenarios. Once a new dimension is required, a new KGE model needs to be trained from scratch, which greatly increases the training cost and limits the effic… ▽ More Knowledge Graph Embedding (KGE) is a common method for Knowledge Graphs (KGs) to serve various artificial intelligence tasks. The suitable dimensions of the embeddings depend on the storage and computing conditions of the specific application scenarios. Once a new dimension is required, a new KGE model needs to be trained from scratch, which greatly increases the training cost and limits the efficiency and flexibility of KGE in serving various scenarios. In this work, we propose a novel KGE training framework MED, through which we could train once to get a croppable KGE model applicable to multiple scenarios with different dimensional requirements, sub-models of the required dimensions can be cropped out of it and used directly without any additional training. In MED, we propose a mutual learning mechanism to improve the low-dimensional sub-models performance and make the high-dimensional sub-models retain the capacity that low-dimensional sub-models have, an evolutionary improvement mechanism to promote the high-dimensional sub-models to master the knowledge that the low-dimensional sub-models can not learn, and a dynamic loss weight to balance the multiple losses adaptively. Experiments on 3 KGE models over 4 standard KG completion datasets, 3 real application scenarios over a real-world large-scale KG, and the experiments of extending MED to the language model BERT show the effectiveness, high efficiency, and flexible extensibility of MED. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Showing 1–50 of 8,001 results for author: Liu, Z