subscribe to arXiv mailings

arXiv:2406.17201 [pdf, ps, other]

Qualitative analysis on a spatial SIS epidemic model with linear source in advective environments II saturated incidence

Authors: Qi Wang

Abstract: In this paper, we consider a reaction-diffusion-advection SIS epidemic model with saturated incidence rate and linear source. We study the uniform bounds of parabolic system and some asymptotic behavior of the basic reproduction number $\mathcal{R}_0$, according to which the threshold-type dynamics are given. Furthermore, we show the globally asymptotic stability of the endemic equilibrium in a sp… ▽ More In this paper, we consider a reaction-diffusion-advection SIS epidemic model with saturated incidence rate and linear source. We study the uniform bounds of parabolic system and some asymptotic behavior of the basic reproduction number $\mathcal{R}_0$, according to which the threshold-type dynamics are given. Furthermore, we show the globally asymptotic stability of the endemic equilibrium in a special case with a large saturated incidence rate. Meanwhile, we investigate the effects of diffusion, advection, saturation and linear source on the asymptotic profiles of the endemic equilibrium. Our study shows that advection may induce the concentration phenomenon and the disease will be eradicated with small dispersal rate of infected individuals. Moreover, our results also reveal that the linear source can enhance persistence of infectious disease and large saturation can help speed up the elimination of disease. △ Less

Submitted 24 June, 2024; originally announced June 2024.

MSC Class: 35K57; 35J57; 35B40; 92D25; 92D30

arXiv:2406.16360 [pdf, other]

MIRReS: Multi-bounce Inverse Rendering using Reservoir Sampling

Authors: Yuxin Dai, Qi Wang, Jingsen Zhu, Dianbing Xi, Yuchi Huo, Chen Qian, Ying He

Abstract: We present MIRReS, a novel two-stage inverse rendering framework that jointly reconstructs and optimizes the explicit geometry, material, and lighting from multi-view images. Unlike previous methods that rely on implicit irradiance fields or simplified path tracing algorithms, our method extracts an explicit geometry (triangular mesh) in stage one, and introduces a more realistic physically-based… ▽ More We present MIRReS, a novel two-stage inverse rendering framework that jointly reconstructs and optimizes the explicit geometry, material, and lighting from multi-view images. Unlike previous methods that rely on implicit irradiance fields or simplified path tracing algorithms, our method extracts an explicit geometry (triangular mesh) in stage one, and introduces a more realistic physically-based inverse rendering model that utilizes multi-bounce path tracing and Monte Carlo integration. By leveraging multi-bounce path tracing, our method effectively estimates indirect illumination, including self-shadowing and internal reflections, which improves the intrinsic decomposition of shape, material, and lighting. Moreover, we incorporate reservoir sampling into our framework to address the noise in Monte Carlo integration, enhancing convergence and facilitating gradient-based optimization with low sample counts. Through qualitative and quantitative evaluation of several scenarios, especially in challenging scenarios with complex shadows, we demonstrate that our method achieves state-of-the-art performance on decomposition results. Additionally, our optimized explicit geometry enables applications such as scene editing, relighting, and material editing with modern graphics engines or CAD software. The source code is available at https://brabbitdousha.github.io/MIRReS/ △ Less

Submitted 24 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: 16 pages, 14 figures

arXiv:2406.16298 [pdf, other]

Bell nonlocality and entanglement in $e^{+}e^{-} \rightarrow Y\bar{Y}$ at BESIII

Authors: Sihao Wu, Chen Qian, Qun Wang, Xiao-Rong Zhou

Abstract: The Bell nonlocality and entanglement are two kinds of quantum correlations in quantum systems. Due to the recent upgrade in Beijing Spectrometer III (BESIII) experiment, it is possible to explore the nonlocality and entanglement in hyperon-antihyperon systems produced in electron-positron annihilation with high precision data. We provide a systematic method for studying quantum correlations in sp… ▽ More The Bell nonlocality and entanglement are two kinds of quantum correlations in quantum systems. Due to the recent upgrade in Beijing Spectrometer III (BESIII) experiment, it is possible to explore the nonlocality and entanglement in hyperon-antihyperon systems produced in electron-positron annihilation with high precision data. We provide a systematic method for studying quantum correlations in spin-1/2 hyperon-antihyperon systems through the measures for the nonlocality and entanglement. We find that with nonvanishing polarizations of the hyperon and its antihyperon, the kinematic region of nonlocality in the hyperon-antihyperon system is more restricted than the $τ^{+}τ^{-}$ system in which polarizations of $τ$ leptons are vanishing. We also present an experimental proposal to probe the nonlocality and entanglement in hyperon-antihyperon systems at BSEIII. △ Less

Submitted 28 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

Comments: 9 pages, 4 figures, 4 tables. We corrected a few errors in plotting figures from analytical formula. Some results in tables read from figures have also been corrected. A new table (Table III) was added for the maximum concurrence and their corresponding angles. A few references were added

arXiv:2406.16272 [pdf, other]

Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancement

Authors: Zhiyuan Chang, Mingyang Li, Junjie Wang, Yi Liu, Qing Wang, Yang Liu

Abstract: Text-to-Image Diffusion Models (T2I DMs) have garnered significant attention for their ability to generate high-quality images from textual descriptions. However, these models often produce images that do not fully align with the input prompts, resulting in semantic inconsistencies. The most prominent issue among these semantic inconsistencies is catastrophic-neglect, where the images generated by… ▽ More Text-to-Image Diffusion Models (T2I DMs) have garnered significant attention for their ability to generate high-quality images from textual descriptions. However, these models often produce images that do not fully align with the input prompts, resulting in semantic inconsistencies. The most prominent issue among these semantic inconsistencies is catastrophic-neglect, where the images generated by T2I DMs miss key objects mentioned in the prompt. We first conduct an empirical study on this issue, exploring the prevalence of catastrophic-neglect, potential mitigation strategies with feature enhancement, and the insights gained. Guided by the empirical findings, we propose an automated repair approach named Patcher to address catastrophic-neglect in T2I DMs. Specifically, Patcher first determines whether there are any neglected objects in the prompt, and then applies attention-guided feature enhancement to these neglected objects, resulting in a repaired prompt. Experimental results on three versions of Stable Diffusion demonstrate that Patcher effectively repairs the issue of catastrophic-neglect, achieving 10.1%-16.3% higher Correct Rate in image generation compared to baselines. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 11 pages, 3 figures

arXiv:2406.15420 [pdf]

A comprehensive overview of diffuse correlation spectroscopy: theoretical framework, recent advances in hardware, analysis, and applications

Authors: Quan Wang, Mingliang Pan, Lucas Kreiss, Saeed Samaei, Stefan A. Carp, Johannes D. Johansson, Yuanzhe Zhang, Melissa Wu, Roarke Horstmeyer, Mamadou Diop, David Day-Uei Li

Abstract: Diffuse correlation spectroscopy (DCS) is a powerful tool for assessing microvascular hemodynamic in deep tissues. Recent advances in sensors, lasers, and deep learning have further boosted the development of new DCS methods. However, newcomers might feel overwhelmed, not only by the already complex DCS theoretical framework but also by the broad range of component options and system architectures… ▽ More Diffuse correlation spectroscopy (DCS) is a powerful tool for assessing microvascular hemodynamic in deep tissues. Recent advances in sensors, lasers, and deep learning have further boosted the development of new DCS methods. However, newcomers might feel overwhelmed, not only by the already complex DCS theoretical framework but also by the broad range of component options and system architectures. To facilitate new entry into this exciting field, we present a comprehensive review of DCS hardware architectures (continuous-wave, frequency-domain, and time-domain) and summarize corresponding theoretical models. Further, we discuss new applications of highly integrated silicon single-photon avalanche diode (SPAD) sensors in DCS, compare SPADs with existing sensors, and review other components (lasers, fibers, and correlators), as well as new data analysis tools, including deep learning. Potential applications in medical diagnosis are discussed, and an outlook for the future directions is provided, to offer effective guidance to embark on DCS research. △ Less

Submitted 18 May, 2024; originally announced June 2024.

arXiv:2406.15160 [pdf, other]

Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios

Authors: Ya Jiang, Qing Wang, Jun Du, Maocheng Hu, Pengfei Hu, Zeyan Liu, Shi Cheng, Zhaoxu Nian, Yuxuan Dong, Mingqi Cai, Xin Fang, Chin-Hui Lee

Abstract: This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich c… ▽ More This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich collection of audio data with multiple data augmentation techniques, to an audio-visual student model trained with only a limited set of multi-modal data. Next, we propose a two-stage audio-visual fusion strategy, consisting of an early feature fusion and a late video-guided decision fusion to exploit synergies between audio and video modalities. Finally, we introduce an innovative video pixel swapping (VPS) technique to extend an audio channel swapping (ACS) method to an audio-visual joint augmentation. Evaluation results on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge data set demonstrate significant improvements in SELD performances. Furthermore, our submission to the SELD task of the DCASE 2023 Challenge ranks first place by effectively integrating the proposed techniques into a model ensemble. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: accepted by icme2024

arXiv:2406.15062 [pdf, other]

Decoupled static and dynamical charge correlations in La$_{2-x}$Sr$_x$CuO$_4$

Authors: L. Martinelli, I. Biało, X. Hong, J. Oppliger, C. Lin, T. Schaller, J. Küspert, M. H. Fischer, T. Kurosawa, N. Momono, M. Oda, J. Choi, S. Agrestini, M. Garcia-Fernandez, Ke-Jin Zhou, Q. Wang, J. Chang

Abstract: The relation between charge order, its quantum fluctuations and optical phonon modes in cuprate superconductors remains an unsolved problem. The exploration of these excitations is however complicated by the presence of twinned domains. Here, we use uniaxial strain in combination with ultra-high-resolution Resonant Inelastic X-ray Scattering (RIXS) at the oxygen K- and copper L3-edges to study the… ▽ More The relation between charge order, its quantum fluctuations and optical phonon modes in cuprate superconductors remains an unsolved problem. The exploration of these excitations is however complicated by the presence of twinned domains. Here, we use uniaxial strain in combination with ultra-high-resolution Resonant Inelastic X-ray Scattering (RIXS) at the oxygen K- and copper L3-edges to study the excitations stemming from the charge ordering wave vector in La1.875Sr0.125CuO4. By detwinning stripe ordering, we demonstrate that the optical phonon anomalies do not show any stripe anisotropy. The low-energy charge excitations also retain an in-plane four-fold symmetry. As such, we find that both phonon and charge excitations are decoupled entirely from the strength of static charge ordering. The almost isotropic character of charge excitations remains a possible source for the strange metal properties found in the normal state of cuprate superconductors. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 9 pages, 4 figures

arXiv:2406.15030 [pdf, ps, other]

Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction… ▽ More Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 11 pages, 3 figures

arXiv:2406.14868 [pdf, other]

Direct Multi-Turn Preference Optimization for Language Agents

Authors: Wentao Shi, Mengqi Yuan, Junkang Wu, Qifan Wang, Fuli Feng

Abstract: Adapting Large Language Models (LLMs) for agent tasks is critical in developing language agents. Direct Preference Optimization (DPO) is a promising technique for this adaptation with the alleviation of compounding errors, offering a means to directly optimize Reinforcement Learning (RL) objectives. However, applying DPO to multi-turn tasks presents challenges due to the inability to cancel the pa… ▽ More Adapting Large Language Models (LLMs) for agent tasks is critical in developing language agents. Direct Preference Optimization (DPO) is a promising technique for this adaptation with the alleviation of compounding errors, offering a means to directly optimize Reinforcement Learning (RL) objectives. However, applying DPO to multi-turn tasks presents challenges due to the inability to cancel the partition function. Overcoming this obstacle involves making the partition function independent of the current state and addressing length disparities between preferred and dis-preferred trajectories. In this light, we replace the policy constraint with the state-action occupancy measure constraint in the RL objective and add length normalization to the Bradley-Terry model, yielding a novel loss function named DMPO for multi-turn agent tasks with theoretical explanations. Extensive experiments on three multi-turn agent task datasets confirm the effectiveness and superiority of the DMPO loss. △ Less

Submitted 25 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14631 [pdf, other]

Multistructured accretion flow of Sgr A* II: Signatures of a Cool Accretion Disk in Hydrodynamic Simulations of Stellar Winds

Authors: Mayura Balakrishnan, Christopher M. P. Russell, Lia Corrales, Diego Calderón, Jorge Cuadra, Daryl Haggard, Sera Markoff, Joey Neilsen, Michael Nowak, Q. Daniel Wang, Fred Baganoff

Abstract: Hydrodynamic simulations of the stellar winds from Wolf-Rayet stars within the Galactic Center can provide predictions for the X-ray spectrum of supermassive black hole Sgr A*. Herein, we present results from updated smooth particle hydrodynamics simulations, building on the architecture of Cuadra et al. (2015); Russell et al. (2017), finding that a cold gas disk forms around Sgr A* with a simulat… ▽ More Hydrodynamic simulations of the stellar winds from Wolf-Rayet stars within the Galactic Center can provide predictions for the X-ray spectrum of supermassive black hole Sgr A*. Herein, we present results from updated smooth particle hydrodynamics simulations, building on the architecture of Cuadra et al. (2015); Russell et al. (2017), finding that a cold gas disk forms around Sgr A* with a simulation runtime of 3500 years. This result is consistent with previous grid-based simulations, demonstrating that a cold disk can form regardless of numerical method. We examine the plasma scenarios arising from an environment with and without this cold disk, by generating synthetic spectra for comparison to the quiescent Fe K alpha Sgr A* spectrum from Chandra HETG-S, taken through the Chandra X-ray Visionary Program. We find that current and future X-ray missions are unlikely to distinguish between the kinematic signatures in the plasma in these two scenarios. Nonetheless, the stellar wind plasma model presents a good fit to the dispersed Chandra spectra within 1.5" of Sgr A*. We compare our results to the Radiatively Inefficient Accretion Flow (RIAF) model fit to the HETG-S spectrum presented in Paper I and find that the Bayesian model evidence does not strongly favor either model. With 9" angular resolution and high spectral resolution of the X-IFU, NewAthena will offer a clearer differentiation between the RIAF plasma model and hydrodynamic simulations, but only a future X-ray mission with arcsecond resolution will significantly advance our understanding of Sgr A*'s accretion flow in X-rays. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 14 pages, 6 figures. Accepted for publication in ApJ

arXiv:2406.14630 [pdf, other]

Multistructured accretion flow of Sgr A* I: Examination of a RIAF model

Authors: Mayura Balakrishnan, Lia Corrales, Sera Markoff, Michael Nowak, Daryl Haggard, Q. Daniel Wang, Joey Neilsen, Christopher M. P. Russell, Diego Calderón, Jorge Cuadra, Fred Baganoff

Abstract: The extreme low-luminosity supermassive black hole Sgr A* provides a unique laboratory in which to test radiatively inefficient accretion flow (RIAF) models. Previous fits to the quiescent Chandra ACIS-S spectrum found a RIAF model with an equal inflow-outflow balance works well. In this work, we apply the RIAF model to the Chandra HETG-S spectrum obtained through the Chandra X-ray Visionary Progr… ▽ More The extreme low-luminosity supermassive black hole Sgr A* provides a unique laboratory in which to test radiatively inefficient accretion flow (RIAF) models. Previous fits to the quiescent Chandra ACIS-S spectrum found a RIAF model with an equal inflow-outflow balance works well. In this work, we apply the RIAF model to the Chandra HETG-S spectrum obtained through the Chandra X-ray Visionary Program, which displays features suggestive of temperature and velocity structures within the plasma. A comprehensive forward model analysis accounting for the accretion flow geometry and HETG-S instrumental effects is required for a full interpretation of the quiescent Chandra HETG-S spectrum. We present a RIAF model that takes these effects into account. Our fits to the high-resolution gratings spectrum indicate an inflow balanced by an outflow ($s \sim 1$) alongside a temperature profile that appears shallower than what would be expected from a gravitational potential following $1/r$. The data require that the abundance of Iron relative to solar is $Z_{Fe} < 0.32 Z_\odot$ (90\% credible interval), much lower than the $2~Z_\odot$ metallicity measured in nearby late-type giants. While future missions like NewAthena will provide higher spectral resolution, source separation will continue to be a problem. Leveraging Chandra's unparalleled spatial resolution, which is not expected to be surpassed for decades, remains essential for detailed investigations of the densely populated Galactic Center in X-rays. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 11 pages, 5 figures, 1 table. Accepted for publication in ApJ

arXiv:2406.14261 [pdf, other]

Unleashing the Potential of Tracklets for Unsupervised Video Person Re-Identification

Authors: Nanxing Meng, Qizao Wang, Bin Li, Xiangyang Xue

Abstract: With rich temporal-spatial information, video-based person re-identification methods have shown broad prospects. Although tracklets can be easily obtained with ready-made tracking models, annotating identities is still expensive and impractical. Therefore, some video-based methods propose using only a few identity annotations or camera labels to facilitate feature learning. They also simply averag… ▽ More With rich temporal-spatial information, video-based person re-identification methods have shown broad prospects. Although tracklets can be easily obtained with ready-made tracking models, annotating identities is still expensive and impractical. Therefore, some video-based methods propose using only a few identity annotations or camera labels to facilitate feature learning. They also simply average the frame features of each tracklet, overlooking unexpected variations and inherent identity consistency within tracklets. In this paper, we propose the Self-Supervised Refined Clustering (SSR-C) framework without relying on any annotation or auxiliary information to promote unsupervised video person re-identification. Specifically, we first propose the Noise-Filtered Tracklet Partition (NFTP) module to reduce the feature bias of tracklets caused by noisy tracking results, and sequentially partition the noise-filtered tracklets into "sub-tracklets". Then, we cluster and further merge sub-tracklets using the self-supervised signal from tracklet partition, which is enhanced through a progressive strategy to generate reliable pseudo labels, facilitating intra-class cross-tracklet aggregation. Moreover, we propose the Class Smoothing Classification (CSC) loss to efficiently promote model learning. Extensive experiments on the MARS and DukeMTMC-VideoReID datasets demonstrate that our proposed SSR-C for unsupervised video person re-identification achieves state-of-the-art results and is comparable to advanced supervised methods. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: The first two authors contributed equally

arXiv:2406.13645 [pdf, other]

Advancing UWF-SLO Vessel Segmentation with Source-Free Active Domain Adaptation and a Novel Multi-Center Dataset

Authors: Hongqiu Wang, Xiangde Luo, Wu Chen, Qingqing Tang, Mei Xin, Qiong Wang, Lei Zhu

Abstract: Accurate vessel segmentation in Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images is crucial for diagnosing retinal diseases. Although recent techniques have shown encouraging outcomes in vessel segmentation, models trained on one medical dataset often underperform on others due to domain shifts. Meanwhile, manually labeling high-resolution UWF-SLO images is an extremely challenging,… ▽ More Accurate vessel segmentation in Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images is crucial for diagnosing retinal diseases. Although recent techniques have shown encouraging outcomes in vessel segmentation, models trained on one medical dataset often underperform on others due to domain shifts. Meanwhile, manually labeling high-resolution UWF-SLO images is an extremely challenging, time-consuming and expensive task. In response, this study introduces a pioneering framework that leverages a patch-based active domain adaptation approach. By actively recommending a few valuable image patches by the devised Cascade Uncertainty-Predominance (CUP) selection strategy for labeling and model-finetuning, our method significantly improves the accuracy of UWF-SLO vessel segmentation across diverse medical centers. In addition, we annotate and construct the first Multi-center UWF-SLO Vessel Segmentation (MU-VS) dataset to promote this topic research, comprising data from multiple institutions. This dataset serves as a valuable resource for cross-center evaluation, verifying the effectiveness and robustness of our approach. Experimental results demonstrate that our approach surpasses existing domain adaptation and active learning methods, considerably reducing the gap between the Upper and Lower bounds with minimal annotations, highlighting our method's practical clinical value. We will release our dataset and code to facilitate relevant research: https://github.com/whq-xxh/SFADA-UWF-SLO. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: MICCAI 2024 Early Accept

arXiv:2406.12718 [pdf, other]

AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Authors: Wenbin An, Feng Tian, Sicong Leng, Jiahao Nie, Haonan Lin, QianYing Wang, Guang Dai, Ping Chen, Shijian Lu

Abstract: Despite their great success across various multimodal tasks, Large Vision-Language Models (LVLMs) are facing a prevalent problem with object hallucinations, where the generated textual responses are inconsistent with ground-truth objects in the given image. This paper investigates various LVLMs and pinpoints attention deficiency toward discriminative local image features as one root cause of objec… ▽ More Despite their great success across various multimodal tasks, Large Vision-Language Models (LVLMs) are facing a prevalent problem with object hallucinations, where the generated textual responses are inconsistent with ground-truth objects in the given image. This paper investigates various LVLMs and pinpoints attention deficiency toward discriminative local image features as one root cause of object hallucinations. Specifically, LVLMs predominantly attend to prompt-independent global image features, while failing to capture prompt-relevant local features, consequently undermining the visual grounding capacity of LVLMs and leading to hallucinations. To this end, we propose Assembly of Global and Local Attention (AGLA), a training-free and plug-and-play approach that mitigates object hallucinations by exploring an ensemble of global features for response generation and local features for visual discrimination simultaneously. Our approach exhibits an image-prompt matching scheme that captures prompt-relevant local features from images, leading to an augmented view of the input image where prompt-relevant content is reserved while irrelevant distractions are masked. With the augmented view, a calibrated decoding distribution can be derived by integrating generative global features from the original image and discriminative local features from the augmented image. Extensive experiments show that AGLA consistently mitigates object hallucinations and enhances general perception capability for LVLMs across various discriminative and generative benchmarks. Our code will be released at https://github.com/Lackel/AGLA. △ Less

Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12386 [pdf, other]

IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models

Authors: Qiyao Wang, Jianguo Huang, Shule Lu, Yuan Lin, Kan Xu, Liang Yang, Hongfei Lin

Abstract: The rapid development of Large Language Models (LLMs) in vertical domains, including intellectual property (IP), lacks a specific evaluation benchmark for assessing their understanding, application, and reasoning abilities. To fill this gap, we introduce IPEval, the first evaluation benchmark tailored for IP agency and consulting tasks. IPEval comprises 2657 multiple-choice questions across four m… ▽ More The rapid development of Large Language Models (LLMs) in vertical domains, including intellectual property (IP), lacks a specific evaluation benchmark for assessing their understanding, application, and reasoning abilities. To fill this gap, we introduce IPEval, the first evaluation benchmark tailored for IP agency and consulting tasks. IPEval comprises 2657 multiple-choice questions across four major dimensions: creation, application, protection, and management of IP. These questions span patent rights (inventions, utility models, designs), trademarks, copyrights, trade secrets, and other related laws. Evaluation methods include zero-shot, 5-few-shot, and Chain of Thought (CoT) for seven LLM types, predominantly in English or Chinese. Results show superior English performance by models like GPT series and Qwen series, while Chinese-centric LLMs excel in Chinese tests, albeit specialized IP LLMs lag behind general-purpose ones. Regional and temporal aspects of IP underscore the need for LLMs to grasp legal nuances and evolving laws. IPEval aims to accurately gauge LLM capabilities in IP and spur development of specialized models. Website: \url{https://ipeval.github.io/} △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12324 [pdf, other]

AutoDSL: Automated domain-specific language design for structural representation of procedures with constraints

Authors: Yu-Zhe Shi, Haofei Hou, Zhangqian Bi, Fanxu Meng, Xiang Wei, Lecheng Ruan, Qining Wang

Abstract: Accurate representation of procedures in restricted scenarios, such as non-standardized scientific experiments, requires precise depiction of constraints. Unfortunately, Domain-specific Language (DSL), as an effective tool to express constraints structurally, often requires case-by-case hand-crafting, necessitating customized, labor-intensive efforts. To overcome this challenge, we introduce the A… ▽ More Accurate representation of procedures in restricted scenarios, such as non-standardized scientific experiments, requires precise depiction of constraints. Unfortunately, Domain-specific Language (DSL), as an effective tool to express constraints structurally, often requires case-by-case hand-crafting, necessitating customized, labor-intensive efforts. To overcome this challenge, we introduce the AutoDSL framework to automate DSL-based constraint design across various domains. Utilizing domain specified experimental protocol corpora, AutoDSL optimizes syntactic constraints and abstracts semantic constraints. Quantitative and qualitative analyses of the DSLs designed by AutoDSL across five distinct domains highlight its potential as an auxiliary module for language models, aiming to improve procedural planning and execution. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL'24)

arXiv:2406.12214 [pdf, other]

Is Your HD Map Constructor Reliable under Sensor Corruptions?

Authors: Xiaoshuai Hao, Mengchuan Wei, Yifan Yang, Haimei Zhao, Hui Zhang, Yi Zhou, Qiang Wang, Weiming Li, Lingdong Kong, Jing Zhang

Abstract: Driving systems often rely on high-definition (HD) maps for precise environmental information, which is crucial for planning and navigation. While current HD map constructors perform well under ideal conditions, their resilience to real-world challenges, \eg, adverse weather and sensor failures, is not well understood, raising safety concerns. This work introduces MapBench, the first comprehensive… ▽ More Driving systems often rely on high-definition (HD) maps for precise environmental information, which is crucial for planning and navigation. While current HD map constructors perform well under ideal conditions, their resilience to real-world challenges, \eg, adverse weather and sensor failures, is not well understood, raising safety concerns. This work introduces MapBench, the first comprehensive benchmark designed to evaluate the robustness of HD map construction methods against various sensor corruptions. Our benchmark encompasses a total of 29 types of corruptions that occur from cameras and LiDAR sensors. Extensive evaluations across 31 HD map constructors reveal significant performance degradation of existing methods under adverse weather conditions and sensor failures, underscoring critical safety concerns. We identify effective strategies for enhancing robustness, including innovative approaches that leverage multi-modal fusion, advanced data augmentation, and architectural techniques. These insights provide a pathway for developing more reliable HD map construction methods, which are essential for the advancement of autonomous driving technology. The benchmark toolkit and affiliated code and model checkpoints have been made publicly accessible. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: project url: https://mapbench.github.io/

arXiv:2406.12053 [pdf, other]

InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States

Authors: Mohammad Beigi, Ying Shen, Runing Yang, Zihao Lin, Qifan Wang, Ankith Mohan, Jianfeng He, Ming Jin, Chang-Tien Lu, Lifu Huang

Abstract: Despite their vast capabilities, Large Language Models (LLMs) often struggle with generating reliable outputs, frequently producing high-confidence inaccuracies known as hallucinations. Addressing this challenge, our research introduces InternalInspector, a novel framework designed to enhance confidence estimation in LLMs by leveraging contrastive learning on internal states including attention st… ▽ More Despite their vast capabilities, Large Language Models (LLMs) often struggle with generating reliable outputs, frequently producing high-confidence inaccuracies known as hallucinations. Addressing this challenge, our research introduces InternalInspector, a novel framework designed to enhance confidence estimation in LLMs by leveraging contrastive learning on internal states including attention states, feed-forward states, and activation states of all layers. Unlike existing methods that primarily focus on the final activation state, InternalInspector conducts a comprehensive analysis across all internal states of every layer to accurately identify both correct and incorrect prediction processes. By benchmarking InternalInspector against existing confidence estimation methods across various natural language understanding and generation tasks, including factual question answering, commonsense reasoning, and reading comprehension, InternalInspector achieves significantly higher accuracy in aligning the estimated confidence scores with the correctness of the LLM's predictions and lower calibration error. Furthermore, InternalInspector excels at HaluEval, a hallucination detection benchmark, outperforming other internal-based confidence estimation methods in this task. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 8 pages

arXiv:2406.11497 [pdf, other]

CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG

Authors: Boyi Deng, Wenjie Wang, Fengbin Zhu, Qifan Wang, Fuli Feng

Abstract: Retrieval-Augmented Generation (RAG) can alleviate hallucinations of Large Language Models (LLMs) by referencing external documents. However, the misinformation in external documents may mislead LLMs' generation. To address this issue, we explore the task of "credibility-aware RAG", in which LLMs automatically adjust the influence of retrieved documents based on their credibility scores to counter… ▽ More Retrieval-Augmented Generation (RAG) can alleviate hallucinations of Large Language Models (LLMs) by referencing external documents. However, the misinformation in external documents may mislead LLMs' generation. To address this issue, we explore the task of "credibility-aware RAG", in which LLMs automatically adjust the influence of retrieved documents based on their credibility scores to counteract misinformation. To this end, we introduce a plug-and-play method named $\textbf{Cr}$edibility-aware $\textbf{A}$ttention $\textbf{M}$odification (CrAM). CrAM identifies influential attention heads in LLMs and adjusts their attention weights based on the credibility of the documents, thereby reducing the impact of low-credibility documents. Experiments on Natual Questions and TriviaQA using Llama2-13B, Llama3-8B, and Qwen-7B show that CrAM improves the RAG performance of LLMs against misinformation pollution by over 20%, even surpassing supervised fine-tuning methods. △ Less

Submitted 27 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.11446 [pdf, other]

Approximate Angular Domain Expression for Near-Field XL-MIMO Channel

Authors: Hongbo Xing, Yuxiang Zhang, Jianhua Zhang, Huixin Xu, Guangyi Liu, Qixing Wang

Abstract: As Extremely Large-Scale Multiple-Input-Multiple-Output (XL-MIMO) technology advances and frequency band rises, the near-field effects in communication are intensifying. A concise and accurate near-field XL-MIMO channel model serves as the cornerstone for investigating the near-field effects. However, existing angular domain XL-MIMO channel models under near-field conditions require non-closed-for… ▽ More As Extremely Large-Scale Multiple-Input-Multiple-Output (XL-MIMO) technology advances and frequency band rises, the near-field effects in communication are intensifying. A concise and accurate near-field XL-MIMO channel model serves as the cornerstone for investigating the near-field effects. However, existing angular domain XL-MIMO channel models under near-field conditions require non-closed-form wave-number domain integrals for computation, which is complicated. To obtain a more succinct channel model, this paper introduces a closed-form approximate expression based on the principle of stationary phase. It was subsequently shown that when the scatterer distance is larger than the array aperture, the closed-form model can be further simplified as a trapezoidal spectrum. We validate the accuracy of the proposed approximation through simulations of power angular spectrum similarity. The results indicate that the proposed approximation can accurately approximate the near-field angular domain channel within the effective Rayleigh distance. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10855 [pdf, other]

ALPS: An Auto-Labeling and Pre-training Scheme for Remote Sensing Segmentation With Segment Anything Model

Authors: Song Zhang, Qingzhong Wang, Junyi Liu, Haoyi Xiong

Abstract: In the fast-growing field of Remote Sensing (RS) image analysis, the gap between massive unlabeled datasets and the ability to fully utilize these datasets for advanced RS analytics presents a significant challenge. To fill the gap, our work introduces an innovative auto-labeling framework named ALPS (Automatic Labeling for Pre-training in Segmentation), leveraging the Segment Anything Model (SAM)… ▽ More In the fast-growing field of Remote Sensing (RS) image analysis, the gap between massive unlabeled datasets and the ability to fully utilize these datasets for advanced RS analytics presents a significant challenge. To fill the gap, our work introduces an innovative auto-labeling framework named ALPS (Automatic Labeling for Pre-training in Segmentation), leveraging the Segment Anything Model (SAM) to predict precise pseudo-labels for RS images without necessitating prior annotations or additional prompts. The proposed pipeline significantly reduces the labor and resource demands traditionally associated with annotating RS datasets. By constructing two comprehensive pseudo-labeled RS datasets via ALPS for pre-training purposes, our approach enhances the performance of downstream tasks across various benchmarks, including iSAID and ISPRS Potsdam. Experiments demonstrate the effectiveness of our framework, showcasing its ability to generalize well across multiple tasks even under the scarcity of extensively annotated datasets, offering a scalable solution to automatic segmentation and annotation challenges in the field. In addition, the proposed a pipeline is flexible and can be applied to medical image segmentation, remarkably boosting the performance. Note that ALPS utilizes pre-trained SAM to semi-automatically annotate RS images without additional manual annotations. Though every component in the pipeline has bee well explored, integrating clustering algorithms with SAM and novel pseudo-label alignment significantly enhances RS segmentation, as an off-the-shelf tool for pre-training data preparation. Our source code is available at: https://github.com/StriveZs/ALPS. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10744 [pdf, other]

Technique Report of CVPR 2024 PBDL Challenges

Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches. △ Less

Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

arXiv:2406.10626 [pdf, ps, other]

The coupling mechanism between crossed-beams energy transfer and stimulated Brillouin scattering in homogeneous plasmas

Authors: Y. Chen, Q. Wang, C. Y. Zheng, Z. J. Liu, L. H. Cao, C. Z. Xiao

Abstract: The coupling mechanism between crossed beams energy transfer and stimulated Brillouin scattering in homogeneous plasmas are studied by theoretical analysis, fluid simulations and particle in cell(PIC) simulations. The numerical models of laser plasma instabilities are constructed by solving coupling equations with Schodinger equations form, and the fluid simulation results are confirmed by fluid t… ▽ More The coupling mechanism between crossed beams energy transfer and stimulated Brillouin scattering in homogeneous plasmas are studied by theoretical analysis, fluid simulations and particle in cell(PIC) simulations. The numerical models of laser plasma instabilities are constructed by solving coupling equations with Schodinger equations form, and the fluid simulation results are confirmed by fluid theory and PIC simulations.In the parameter regime when the pump depletion does not occur in CBET and the reflectivity of SBS is lower than 1%, SBS will be affected by CBET, the CBET energy gain will still agree with theoretical predications. However, In the parameter regime when the pump depletion does occur in CBET and the reflectivity of SBS is higher than 1%, the CBET spatial gain will be reduced by the interaction of CBET and SBS, and the huge difference of SBS reflectivity for two crossed laser beams is observed.In the PIC simulations, we found that lower ZTe=Ti will significantly reduce the interaction between CBET and SBS (Z is the ion charge, Teis the electron temperature, Ti is the ion temperature). △ Less

Submitted 1 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: 10pages,11 figures

arXiv:2406.10283 [pdf, other]

Attentive Merging of Hidden Embeddings from Pre-trained Speech Model for Anti-spoofing Detection

Authors: Zihan Pan, Tianchi Liu, Hardik B. Sailor, Qiongqiong Wang

Abstract: Self-supervised learning (SSL) speech representation models, trained on large speech corpora, have demonstrated effectiveness in extracting hierarchical speech embeddings through multiple transformer layers. However, the behavior of these embeddings in specific tasks remains uncertain. This paper investigates the multi-layer behavior of the WavLM model in anti-spoofing and proposes an attentive me… ▽ More Self-supervised learning (SSL) speech representation models, trained on large speech corpora, have demonstrated effectiveness in extracting hierarchical speech embeddings through multiple transformer layers. However, the behavior of these embeddings in specific tasks remains uncertain. This paper investigates the multi-layer behavior of the WavLM model in anti-spoofing and proposes an attentive merging method to leverage the hierarchical hidden embeddings. Results demonstrate the feasibility of fine-tuning WavLM to achieve the best equal error rate (EER) of 0.65%, 3.50%, and 3.19% on the ASVspoof 2019LA, 2021LA, and 2021DF evaluation sets, respectively. Notably, We find that the early hidden transformer layers of the WavLM large model contribute significantly to anti-spoofing task, enabling computational efficiency by utilizing a partial pre-trained model. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.10095 [pdf, other]

Calculation of the chiral Lagrangian coefficients with light vector mesons

Authors: Zi-Kan Geng, Qing Wang

Abstract: We calculate the coefficients in the effective chiral Lagrangian from QCD, which includes pseudo-scalar mesons and vector mesons (with hidden symmetry), up to O(p4). This encompasses both the normal and anomalous parts. Our work builds on a previous study that derived the chiral Lagrangian from first principles of QCD, where the low-energy coefficients are defined in terms of specific Green's func… ▽ More We calculate the coefficients in the effective chiral Lagrangian from QCD, which includes pseudo-scalar mesons and vector mesons (with hidden symmetry), up to O(p4). This encompasses both the normal and anomalous parts. Our work builds on a previous study that derived the chiral Lagrangian from first principles of QCD, where the low-energy coefficients are defined in terms of specific Green's functions in QCD. This research extends our earlier efforts that focused on calculating the low-energy coefficients of the chiral Lagrangian for pure pseudo-scalar mesons. This marks the first calculation of chiral Lagrangian coefficients for vector mesons from QCD, particularly for the important parameters a and g, which are typically considered inputs in existing literature. Notably, the regularization method used previously is inadequate for this broader scope. We find that cut-off regularization yields reasonable results for both pseudo-scalar mesons and vector mesons, though it has certain limitations. Finally, we demonstrate that our method aligns with the Weinberg sum rules. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 34 pages, 2figures

arXiv:2406.09813 [pdf, other]

Diffuse X-ray Explorer: a high-resolution X-ray spectroscopic sky surveyor on the China Space Station

Authors: Hai Jin, Junjie Mao, Liubiao Chen, Naihui Chen, Wei Cui, Bo Gao, Jinjin Li, Xinfeng Li, Jiejia Liu, Jia Quan, Chunyang Jiang, Guole Wang, Le Wang, Qian Wang, Sifan Wang, Aimin Xiao, Shuo Zhang

Abstract: DIffuse X-ray Explorer (DIXE) is a proposed high-resolution X-ray spectroscopic sky surveyor on the China Space Station (CSS). DIXE will focus on studying hot baryons in the Milky Way. Galactic hot baryons like the X-ray emitting Milky Way halo and eROSITA bubbles are best observed in the sky survey mode with a large field of view. DIXE will take advantage of the orbital motion of the CSS to scan… ▽ More DIffuse X-ray Explorer (DIXE) is a proposed high-resolution X-ray spectroscopic sky surveyor on the China Space Station (CSS). DIXE will focus on studying hot baryons in the Milky Way. Galactic hot baryons like the X-ray emitting Milky Way halo and eROSITA bubbles are best observed in the sky survey mode with a large field of view. DIXE will take advantage of the orbital motion of the CSS to scan a large fraction of the sky. High-resolution X-ray spectroscopy, enabled by superconducting microcalorimeters based on the transition-edge sensor (TES) technology, will probe the physical properties (e.g., temperature, density, elemental abundances, kinematics) of the Galactic hot baryons. This will complement the high-resolution imaging data obtained with the eROSITA mission. Here we present the preliminary design of DIXE. The payload consists mainly of a detector assembly and a cryogenic cooling system. The key components of the detector assembly are a microcalorimeter array and frequency-domain multiplexing readout electronics. To provide a working temperature for the detector assembly, the cooling system consists of an adiabatic demagnetization refrigerator and a mechanical cryocooler system. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 12 pages, 6 figures, the full version is published by Journal of Low Temperature Physics

arXiv:2406.09475 [pdf, other]

Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

Abstract: Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the… ▽ More Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09410 [pdf, other]

STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery

Authors: Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, Bo Dang, Yongjun Zhang, Yi Yu, Junchi Yan

Abstract: Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it attractive to holistically conduct SGG in large-size very-high-resolution (VHR… ▽ More Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it attractive to holistically conduct SGG in large-size very-high-resolution (VHR) SAI. However, there lack such SGG datasets. Due to the complexity of large-size SAI, mining triplets <subject, relationship, object> heavily relies on long-range contextual reasoning. Consequently, SGG models designed for small-size natural imagery are not directly applicable to large-size SAI. This paper constructs a large-scale dataset for SGG in large-size VHR SAI with image sizes ranging from 512 x 768 to 27,860 x 31,096 pixels, named STAR (Scene graph generaTion in lArge-size satellite imageRy), encompassing over 210K objects and over 400K triplets. To realize SGG in large-size SAI, we propose a context-aware cascade cognition (CAC) framework to understand SAI regarding object detection (OBD), pair pruning and relationship prediction for SGG. We also release a SAI-oriented SGG toolkit with about 30 OBD and 10 SGG methods which need further adaptation by our devised modules on our challenging STAR dataset. The dataset and toolkit are available at: https://linlin-dev.github.io/project/STAR. △ Less

Submitted 3 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: 18 pages, 11 figures

arXiv:2406.09179 [pdf, other]

Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning

Authors: Qizhou Wang, Bo Han, Puning Yang, Jianing Zhu, Tongliang Liu, Masashi Sugiyama

Abstract: The compelling goal of eradicating undesirable data behaviors, while preserving usual model functioning, underscores the significance of machine unlearning within the domain of large language models (LLMs). Recent research has begun to approach LLM unlearning via gradient ascent (GA) -- increasing the prediction risk for those training strings targeted to be unlearned, thereby erasing their parame… ▽ More The compelling goal of eradicating undesirable data behaviors, while preserving usual model functioning, underscores the significance of machine unlearning within the domain of large language models (LLMs). Recent research has begun to approach LLM unlearning via gradient ascent (GA) -- increasing the prediction risk for those training strings targeted to be unlearned, thereby erasing their parameterized responses. Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning, resulting in various undesirable model behaviors, such as catastrophic forgetting, that diminish their practical utility. In this paper, we suggest a set of metrics that can capture multiple facets of real-world utility and propose several controlling methods that can regulate the extent of excessive unlearning. Accordingly, we suggest a general framework to better reflect the practical efficacy of various unlearning methods -- we begin by controlling the unlearning procedures/unlearned models such that no excessive unlearning occurs and follow by the evaluation for unlearning efficacy. Our experimental analysis on established benchmarks revealed that GA-based methods are far from perfect in practice, as strong unlearning is at the high cost of hindering the model utility. We conclude that there is still a long way towards practical and effective LLM unlearning, and more efforts are required in this field. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09166 [pdf, other]

Fine-Grained Domain Generalization with Feature Structuralization

Authors: Wenlong Yu, Dongyue Chen, Qilong Wang, Qinghua Hu

Abstract: Fine-grained domain generalization (FGDG) is a more challenging task than traditional DG tasks due to its small inter-class variations and relatively large intra-class disparities. When domain distribution changes, the vulnerability of subtle features leads to a severe deterioration in model performance. Nevertheless, humans inherently demonstrate the capacity for generalizing to out-of-distributi… ▽ More Fine-grained domain generalization (FGDG) is a more challenging task than traditional DG tasks due to its small inter-class variations and relatively large intra-class disparities. When domain distribution changes, the vulnerability of subtle features leads to a severe deterioration in model performance. Nevertheless, humans inherently demonstrate the capacity for generalizing to out-of-distribution data, leveraging structured multi-granularity knowledge that emerges from discerning the commonality and specificity within categories. Likewise, we propose a Feature Structuralized Domain Generalization (FSDG) model, wherein features experience structuralization into common, specific, and confounding segments, harmoniously aligned with their relevant semantic concepts, to elevate performance in FGDG. Specifically, feature structuralization (FS) is accomplished through joint optimization of five constraints: a decorrelation function applied to disentangled segments, three constraints ensuring common feature consistency and specific feature distinctiveness, and a prediction calibration term. By imposing these stipulations, FSDG is prompted to disentangle and align features based on multi-granularity knowledge, facilitating robust subtle distinctions among categories. Extensive experimentation on three benchmarks consistently validates the superiority of FSDG over state-of-the-art counterparts, with an average improvement of 6.2% in FGDG performance. Beyond that, the explainability analysis on explicit concept matching intensity between the shared concepts among categories and the model channels, along with experiments on various mainstream model architectures, substantiates the validity of FS. △ Less

Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08757 [pdf, other]

SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding

Authors: Jiefeng Ma, Yan Wang, Chenyu Liu, Jun Du, Yu Hu, Zhenrong Zhang, Pengfei Hu, Qing Wang, Jianshu Zhang

Abstract: Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks but are typically limited to local and entity-level annotations. This limitation overlooks the hierarchically structured representation of documents,… ▽ More Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks but are typically limited to local and entity-level annotations. This limitation overlooks the hierarchically structured representation of documents, constraining comprehensive understanding of complex forms. To address this issue, we present the SRFUND, a hierarchically structured multi-task form understanding benchmark. SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets, encompassing five tasks: (1) word to text-line merging, (2) text-line to entity merging, (3) entity category classification, (4) item table localization, and (5) entity-based full-document hierarchical structure recovery. We meticulously supplemented the original dataset with missing annotations at various levels of granularity and added detailed annotations for multi-item table regions within the forms. Additionally, we introduce global hierarchical structure dependencies for entity relation prediction tasks, surpassing traditional local key-value associations. The SRFUND dataset includes eight languages including English, Chinese, Japanese, German, French, Spanish, Italian, and Portuguese, making it a powerful tool for cross-lingual form understanding. Extensive experimental results demonstrate that the SRFUND dataset presents new challenges and significant opportunities in handling diverse layouts and global hierarchical structures of forms, thus providing deep insights into the field of form understanding. The original dataset and implementations of baseline methods are available at https://sprateam-ustc.github.io/SRFUND △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: NeurIPS 2024 Track on Datasets and Benchmarks under review

arXiv:2406.08664 [pdf, ps, other]

Contractibility of Vietoris-Rips Complexes of dense subsets in $(\mathbb{R}^n, \ell_1)$ via hyperconvex embeddings

Authors: Qingsong Wang

Abstract: We consider the contractibility of Vietoris-Rips complexes of dense subsets of $(\mathbb{R}^n,\ell_1)$ with sufficiently large scales. This is motivated by a question by Matthew Zaremsky regarding whether for each $n$ natural there is a $r_n>0$ so that the Vietoris-Rips complex of $(\mathbb{Z}^n,\ell_1)$ at scale $r$ is contractible for all $r\geq r_n$. We approach this question using results that… ▽ More We consider the contractibility of Vietoris-Rips complexes of dense subsets of $(\mathbb{R}^n,\ell_1)$ with sufficiently large scales. This is motivated by a question by Matthew Zaremsky regarding whether for each $n$ natural there is a $r_n>0$ so that the Vietoris-Rips complex of $(\mathbb{Z}^n,\ell_1)$ at scale $r$ is contractible for all $r\geq r_n$. We approach this question using results that relates to the neighborhood of embeddings into hyperconvex metric space of a metric space $X$ and its connection to the Vietoris-Rips complex of $X$. In this manner, we provide positive answers to the question above for the case $n=2$ and $3$. △ Less

Submitted 12 June, 2024; originally announced June 2024.

MSC Class: 51F99; 55N31

arXiv:2406.08589 [pdf]

FireBench: A High-fidelity Ensemble Simulation Framework for Exploring Wildfire Behavior and Data-driven Modeling

Authors: Qing Wang, Matthias Ihme, Cenk Gazen, Yi-Fan Chen, John Anderson

Abstract: Background. Wildfire research uses ensemble methods to analyze fire behaviors and assess uncertainties. Nonetheless, current research methods are either confined to simple models or complex simulations with limits. Modern computing tools could allow for efficient, high-fidelity ensemble simulations. Aims. This study proposes a high-fidelity ensemble wildfire simulation framework for studying wildf… ▽ More Background. Wildfire research uses ensemble methods to analyze fire behaviors and assess uncertainties. Nonetheless, current research methods are either confined to simple models or complex simulations with limits. Modern computing tools could allow for efficient, high-fidelity ensemble simulations. Aims. This study proposes a high-fidelity ensemble wildfire simulation framework for studying wildfire behavior, ML tasks, fire-risk assessment, and uncertainty analysis. Methods. In this research, we present a simulation framework that integrates the Swirl-Fire large-eddy simulation tool for wildfire predictions with the Vizier optimization platform for automated run-time management of ensemble simulations and large-scale batch processing. All simulations are executed on tensor-processing units to enhance computational efficiency. Key results. A dataset of 117 simulations is created, each with 1.35 billion mesh points. The simulations are compared to existing experimental data and show good agreement in terms of fire rate of spread. Computations are done for fire acceleration, mean rate of spread, and fireline intensity. Conclusions. Strong coupling between these 2 parameters are observed for the fire spread and intermittency. A critical Froude number that delineates fires from plume-driven to convection-driven is identified and confirmed with literature observations. Implications. The ensemble simulation framework is efficient in facilitating parametric wildfire studies. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08234 [pdf, other]

MaIL: Improving Imitation Learning with Mamba

Authors: Xiaogang Jia, Qian Wang, Atalay Donat, Bowen Xing, Ge Li, Hongyi Zhou, Onur Celik, Denis Blessing, Rudolf Lioutikov, Gerhard Neumann

Abstract: This work introduces Mamba Imitation Learning (MaIL), a novel imitation learning (IL) architecture that offers a computationally efficient alternative to state-of-the-art (SoTA) Transformer policies. Transformer-based policies have achieved remarkable results due to their ability in handling human-recorded data with inherently non-Markovian behavior. However, their high performance comes with the… ▽ More This work introduces Mamba Imitation Learning (MaIL), a novel imitation learning (IL) architecture that offers a computationally efficient alternative to state-of-the-art (SoTA) Transformer policies. Transformer-based policies have achieved remarkable results due to their ability in handling human-recorded data with inherently non-Markovian behavior. However, their high performance comes with the drawback of large models that complicate effective training. While state space models (SSMs) have been known for their efficiency, they were not able to match the performance of Transformers. Mamba significantly improves the performance of SSMs and rivals against Transformers, positioning it as an appealing alternative for IL policies. MaIL leverages Mamba as a backbone and introduces a formalism that allows using Mamba in the encoder-decoder structure. This formalism makes it a versatile architecture that can be used as a standalone policy or as part of a more advanced architecture, such as a diffuser in the diffusion process. Extensive evaluations on the LIBERO IL benchmark and three real robot experiments show that MaIL: i) outperforms Transformers in all LIBERO tasks, ii) achieves good performance even with small datasets, iii) is able to effectively process multi-modal sensory inputs, iv) is more robust to input noise compared to Transformers. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08225 [pdf, ps, other]

Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (636 additional authors not shown)

Abstract: Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur… ▽ More Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measured in both destructive and constructive interference scenarios for the first time. The mass and width of the $η_{c}(1S)$ are measured to be $M=(2984.14 \pm 0.13 \pm 0.38)$ MeV/$c^{2}$ and $Γ=(28.82 \pm 0.11 \pm 0.82)$ MeV, respectively. Clear signals for the decays of the $χ_{cJ}(J=0,1,2)$ and the $η_{c}(2S)$ to $2(π^{+}π^{-})η$ are also observed for the first time, and the corresponding branching fractions are measured. The ratio of the branching fractions between the $η_{c}(2S)$ and $η_{c}(1S)$ decays is significantly lower than the theoretical prediction, which might suggest different dynamics in their decays. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08184 [pdf, other]

MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents

Authors: Luyuan Wang, Yongyu Deng, Yiwei Zha, Guodong Mao, Qinmin Wang, Tianchen Min, Wei Chen, Shoufa Chen

Abstract: Large language model (LLM)-based mobile agents are increasingly popular due to their capability to interact directly with mobile phone Graphic User Interfaces (GUIs) and their potential to autonomously manage daily tasks. Despite their promising prospects in both academic and industrial sectors, little research has focused on benchmarking the performance of existing mobile agents, due to the inexh… ▽ More Large language model (LLM)-based mobile agents are increasingly popular due to their capability to interact directly with mobile phone Graphic User Interfaces (GUIs) and their potential to autonomously manage daily tasks. Despite their promising prospects in both academic and industrial sectors, little research has focused on benchmarking the performance of existing mobile agents, due to the inexhaustible states of apps and the vague definition of feasible action sequences. To address this challenge, we propose an efficient and user-friendly benchmark, MobileAgentBench, designed to alleviate the burden of extensive manual testing. We initially define 100 tasks across 10 open-source apps, categorized by multiple levels of difficulty. Subsequently, we evaluate several existing mobile agents, including AppAgent and MobileAgent, to thoroughly and systematically compare their performance. All materials are accessible on our project webpage: https://MobileAgentBench.github.io, contributing to the advancement of both academic and industrial fields. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08101 [pdf, other]

CoXQL: A Dataset for Parsing Explanation Requests in Conversational XAI Systems

Authors: Qianli Wang, Tatiana Anikina, Nils Feldhus, Simon Ostermann, Sebastian Möller

Abstract: Conversational explainable artificial intelligence (ConvXAI) systems based on large language models (LLMs) have garnered significant interest from the research community in natural language processing (NLP) and human-computer interaction (HCI). Such systems can provide answers to user questions about explanations in dialogues, have the potential to enhance users' comprehension and offer more infor… ▽ More Conversational explainable artificial intelligence (ConvXAI) systems based on large language models (LLMs) have garnered significant interest from the research community in natural language processing (NLP) and human-computer interaction (HCI). Such systems can provide answers to user questions about explanations in dialogues, have the potential to enhance users' comprehension and offer more information about the decision-making and generation processes of LLMs. Currently available ConvXAI systems are based on intent recognition rather than free chat, as this has been found to be more precise and reliable in identifying users' intentions. However, the recognition of intents still presents a challenge in the case of ConvXAI, since little training data exist and the domain is highly specific, as there is a broad range of XAI methods to map requests onto. In order to bridge this gap, we present CoXQL, the first dataset for user intent recognition in ConvXAI, covering 31 intents, seven of which require filling multiple slots. Subsequently, we enhance an existing parsing approach by incorporating template validations, and conduct an evaluation of several LLMs on CoXQL using different parsing strategies. We conclude that the improved parsing approach (MP+) surpasses the performance of previous approaches. We also discover that intents with multiple slots remain highly challenging for LLMs. △ Less

Submitted 12 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 4 pages, short paper

arXiv:2406.07003 [pdf, other]

GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model

Authors: Wei Liu, Ailun Yu, Daoguang Zan, Bo Shen, Wei Zhang, Haiyan Zhao, Zhi Jin, Qianxiang Wang

Abstract: The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose… ▽ More The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in existing retrieval-augmented approaches; based on CCG, GraphCoder further employs a coarse-to-fine retrieval process to locate context-similar code snippets with the completion target from the current repository. Experimental results demonstrate both the effectiveness and efficiency of GraphCoder: Compared to baseline retrieval-augmented methods, GraphCoder achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.06889 [pdf, other]

Universal spatial inflation of human mobility

Authors: Lu Zhong, Lei Dong, Qi Wang, Chaoming Song, Jianxi Gao

Abstract: Understanding the interplay between egocentric preference and urban structure in shaping human mobility has profound implications for improving epidemic intervention, social equity, and urban resilience. However, numerous existing studies either solely identify the egocentric preferences -- the anchoring effects from home -- or the impact of hierarchical urban structures. Here, we propose a networ… ▽ More Understanding the interplay between egocentric preference and urban structure in shaping human mobility has profound implications for improving epidemic intervention, social equity, and urban resilience. However, numerous existing studies either solely identify the egocentric preferences -- the anchoring effects from home -- or the impact of hierarchical urban structures. Here, we propose a network-based approach to present human mobility in both spatial and topological aspects within the urban system, using cell phone trajectory data from millions of users across three countries. By segmenting mobility trajectories into modules and examining their overlap with urban scales, we have observed the inflation law that the geospatial extent of these modules increases sub-linearly with their distance from home. Moreover, the egocentric preference for higher urban levels leads to this increase. This universal finding indicates that home-based preferences distort the hierarchical scales of human mobility in the urban environment, regardless of demographics or geography. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 15 pages, 6 figures

arXiv:2406.06839 [pdf, other]

EAVE: Efficient Product Attribute Value Extraction via Lightweight Sparse-layer Interaction

Authors: Li Yang, Qifan Wang, Jianfeng Chi, Jiahao Liu, Jingang Wang, Fuli Feng, Zenglin Xu, Yi Fang, Lifu Huang, Dongfang Liu

Abstract: Product attribute value extraction involves identifying the specific values associated with various attributes from a product profile. While existing methods often prioritize the development of effective models to improve extraction performance, there has been limited emphasis on extraction efficiency. However, in real-world scenarios, products are typically associated with multiple attributes, ne… ▽ More Product attribute value extraction involves identifying the specific values associated with various attributes from a product profile. While existing methods often prioritize the development of effective models to improve extraction performance, there has been limited emphasis on extraction efficiency. However, in real-world scenarios, products are typically associated with multiple attributes, necessitating multiple extractions to obtain all corresponding values. In this work, we propose an Efficient product Attribute Value Extraction (EAVE) approach via lightweight sparse-layer interaction. Specifically, we employ a heavy encoder to separately encode the product context and attribute. The resulting non-interacting heavy representations of the context can be cached and reused for all attributes. Additionally, we introduce a light encoder to jointly encode the context and the attribute, facilitating lightweight interactions between them. To enrich the interaction within the lightweight encoder, we design a sparse-layer interaction module to fuse the non-interacting heavy representation into the lightweight encoder. Comprehensive evaluation on two benchmarks demonstrate that our method achieves significant efficiency gains with neutral or marginal loss in performance when the context is long and number of attributes is large. Our code is available \href{https://anonymous.4open.science/r/EAVE-EA18}{here}. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06571 [pdf, other]

SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM

Authors: Quandong Wang, Yuxuan Yuan, Xiaoyu Yang, Ruike Zhang, Kang Zhao, Wei Liu, Jian Luan, Daniel Povey, Bin Wang

Abstract: While Large Language Models (LLMs) have achieved remarkable success in various fields, the efficiency of training and inference remains a major challenge. To address this issue, we propose SUBLLM, short for Subsampling-Upsampling-Bypass Large Language Model, an innovative architecture that extends the core decoder-only framework by incorporating subsampling, upsampling, and bypass modules. The sub… ▽ More While Large Language Models (LLMs) have achieved remarkable success in various fields, the efficiency of training and inference remains a major challenge. To address this issue, we propose SUBLLM, short for Subsampling-Upsampling-Bypass Large Language Model, an innovative architecture that extends the core decoder-only framework by incorporating subsampling, upsampling, and bypass modules. The subsampling modules are responsible for shortening the sequence, while the upsampling modules restore the sequence length, and the bypass modules enhance convergence. In comparison to LLaMA, the proposed SUBLLM exhibits significant enhancements in both training and inference speeds as well as memory usage, while maintaining competitive few-shot performance. During training, SUBLLM increases speeds by 26% and cuts memory by 10GB per GPU. In inference, it boosts speeds by up to 37% and reduces memory by 1GB per GPU. The training and inference speeds can be enhanced by 34% and 52% respectively when the context window is expanded to 8192. We shall release the source code of the proposed architecture in the published version. △ Less

Submitted 17 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: 9 pages, 3 figures, submitted to ECAI 2024

ACM Class: I.2.7

arXiv:2406.06410 [pdf, other]

Spatial dependence of local density of states in semiconductor-superconductor hybrids

Authors: Qingzhen Wang, Yining Zhang, Saurabh Karwal, Srijit Goswami

Abstract: Majorana bound states are expected to appear in one-dimensional semiconductor-superconductor hybrid systems, provided they are homogenous enough to host a global topological phase. In order to experimentally investigate the uniformity of the system, we study the spatial dependence of the local density of states in multiprobe devices where several local tunnelling probes are positioned along a gate… ▽ More Majorana bound states are expected to appear in one-dimensional semiconductor-superconductor hybrid systems, provided they are homogenous enough to host a global topological phase. In order to experimentally investigate the uniformity of the system, we study the spatial dependence of the local density of states in multiprobe devices where several local tunnelling probes are positioned along a gate-defined wire in a two-dimensional electron gas. Spectroscopy at each probe reveals a hard induced gap, and an absence of subgap states at zero magnetic field. However, subgap states emerging at finite magnetic field are not always correlated between different probes. Moreover, we find that the extracted critical field and effective $g$-factor of the lowest energy subgap state varies significantly across the length of the wire. Upon studying several such devices we do however find examples of striking correlations in the local density of states measured at different tunnel probes. We discuss possible sources of variations across devices. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06118 [pdf, other]

Strong and weak $CP$ tests in sequential decays of polarized $Σ^0$ hyperons

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

Abstract: The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The wea… ▽ More The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The weak-$CP$ test is performed in the subsequent decays of their daughter particles $Λ$ and $\barΛ$. Also for the first time, the transverse polarizations of the $Σ^0$ hyperons in $J/ψ$ and $ψ(3686)$ decays are observed with opposite directions, and the ratios between the S-wave and D-wave contributions of the $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ decays are obtained. These results are crucial to understand the decay dynamics of the charmonium states and the production mechanism of the $Σ^0-\barΣ^0$ pairs. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.05974 [pdf, other]

Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning

Authors: Xin Wang, Zhiyun Song, Yitao Zhu, Sheng Wang, Lichi Zhang, Dinggang Shen, Qian Wang

Abstract: In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for both image visualization and subsequent analysis tasks, which often require isotropic voxel spacing. To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated.… ▽ More In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for both image visualization and subsequent analysis tasks, which often require isotropic voxel spacing. To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated. However, most current solutions require a substantial number of paired high-resolution and low-resolution images for supervised training, which are typically unavailable in real-world scenarios. In this work, we propose a self-supervised super-resolution framework for inter-slice super-resolution of MR images. Our framework is first featured by pre-training on video dataset, as temporal correlation of videos is found beneficial for modeling the spatial relation among MR slices. Then, we use public high-quality MR dataset to fine-tune our pre-trained model, for enhancing awareness of our model to medical data. Finally, given a target dataset at hand, we utilize self-supervised fine-tuning to further ensure our model works well with user-specific super-resolution tasks. The proposed method demonstrates superior performance compared to other self-supervised methods and also holds the potential to benefit various downstream applications. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: ISBI 2024

arXiv:2406.05857 [pdf, other]

doi 10.1109/TPAMI.2024.3412632

Self-supervised Adversarial Training of Monocular Depth Estimation against Physical-World Attacks

Authors: Zhiyuan Cheng, Cheng Han, James Liang, Qifan Wang, Xiangyu Zhang, Dongfang Liu

Abstract: Monocular Depth Estimation (MDE) plays a vital role in applications such as autonomous driving. However, various attacks target MDE models, with physical attacks posing significant threats to system security. Traditional adversarial training methods, which require ground-truth labels, are not directly applicable to MDE models that lack ground-truth depth. Some self-supervised model hardening techn… ▽ More Monocular Depth Estimation (MDE) plays a vital role in applications such as autonomous driving. However, various attacks target MDE models, with physical attacks posing significant threats to system security. Traditional adversarial training methods, which require ground-truth labels, are not directly applicable to MDE models that lack ground-truth depth. Some self-supervised model hardening techniques (e.g., contrastive learning) overlook the domain knowledge of MDE, resulting in suboptimal performance. In this work, we introduce a novel self-supervised adversarial training approach for MDE models, leveraging view synthesis without the need for ground-truth depth. We enhance adversarial robustness against real-world attacks by incorporating L_0-norm-bounded perturbation during training. We evaluate our method against supervised learning-based and contrastive learning-based approaches specifically designed for MDE. Our experiments with two representative MDE networks demonstrate improved robustness against various adversarial attacks, with minimal impact on benign performance. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Accepted in TPAMI'24. Extended from our ICLR'23 publication (arXiv:2301.13487). arXiv admin note: substantial text overlap with arXiv:2301.13487

arXiv:2406.05827 [pdf, ps, other]

Measurement of the integrated luminosity of the data collected at 3.773 GeV by BESIII from 2021 to 2024

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$,… ▽ More We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$, $8.157 \pm 0.031$~fb$^{-1}$, and $4.191 \pm 0.016$~fb$^{-1}$, respectively, by analyzing large angle Bhabha scattering events. The uncertainties are dominated by systematic effects and the statistical uncertainties are negligible. Our results provide essential input for future analyses and precision measurements. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.05810 [pdf, other]

ControlLoc: Physical-World Hijacking Attack on Visual Perception in Autonomous Driving

Authors: Chen Ma, Ningfei Wang, Zhengyu Zhao, Qian Wang, Qi Alfred Chen, Chao Shen

Abstract: Recent research in adversarial machine learning has focused on visual perception in Autonomous Driving (AD) and has shown that printed adversarial patches can attack object detectors. However, it is important to note that AD visual perception encompasses more than just object detection; it also includes Multiple Object Tracking (MOT). MOT enhances the robustness by compensating for object detectio… ▽ More Recent research in adversarial machine learning has focused on visual perception in Autonomous Driving (AD) and has shown that printed adversarial patches can attack object detectors. However, it is important to note that AD visual perception encompasses more than just object detection; it also includes Multiple Object Tracking (MOT). MOT enhances the robustness by compensating for object detection errors and requiring consistent object detection results across multiple frames before influencing tracking results and driving decisions. Thus, MOT makes attacks on object detection alone less effective. To attack such robust AD visual perception, a digital hijacking attack has been proposed to cause dangerous driving scenarios. However, this attack has limited effectiveness. In this paper, we introduce a novel physical-world adversarial patch attack, ControlLoc, designed to exploit hijacking vulnerabilities in entire AD visual perception. ControlLoc utilizes a two-stage process: initially identifying the optimal location for the adversarial patch, and subsequently generating the patch that can modify the perceived location and shape of objects with the optimal location. Extensive evaluations demonstrate the superior performance of ControlLoc, achieving an impressive average attack success rate of around 98.1% across various AD visual perceptions and datasets, which is four times greater effectiveness than the existing hijacking attack. The effectiveness of ControlLoc is further validated in physical-world conditions, including real vehicle tests under different conditions such as outdoor light conditions with an average attack success rate of 77.5%. AD system-level impact assessments are also included, such as vehicle collision, using industry-grade AD systems and production-grade AD simulators with an average vehicle collision rate and unnecessary emergency stop rate of 81.3%. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.05438 [pdf, other]

A Roadmap for Software Testing in Open Collaborative Development Environments

Authors: Qing Wang, Junjie Wang, Mingyang Li, Yawen Wang, Zhe Liu

Abstract: Amidst the ever-expanding digital sphere, the evolution of the Internet has not only fostered an atmosphere of information transparency and sharing but has also sparked a revolution in software development practices. The distributed nature of open collaborative development, along with its diverse contributors and rapid iterations, presents new challenges for ensuring software quality. This paper o… ▽ More Amidst the ever-expanding digital sphere, the evolution of the Internet has not only fostered an atmosphere of information transparency and sharing but has also sparked a revolution in software development practices. The distributed nature of open collaborative development, along with its diverse contributors and rapid iterations, presents new challenges for ensuring software quality. This paper offers a comprehensive review and analysis of recent advancements in software quality assurance within open collaborative development environments. Our examination covers various aspects, including process management, personnel dynamics, and technological advancements, providing valuable insights into effective approaches for maintaining software quality in such collaborative settings. Furthermore, we delve into the challenges and opportunities arising from emerging technologies such as LLMs and the AI model-centric development paradigm. By addressing these topics, our study contributes to a deeper understanding of software quality assurance in open collaborative environments and lays the groundwork for future exploration and innovation. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.05379 [pdf]

Tunable huge asymmetric transmission in mid-infrared range through a monolayer graphene chiral metasurface

Authors: Qi Wang, Haotian Hao, Wei Xin

Abstract: Metasurface has witnessed an urgent need for reconfigurable chiral manipulation recently, while the graphene-based devices attracted significant attention from researchers due to their inherent tunable properties. Here, an L-shaped metasurface composed of three hollow monolayer graphene resonant rings is proposed. A huge dual-frequency asymmetric transmission (AT) in the mid-infrared band (7800 nm… ▽ More Metasurface has witnessed an urgent need for reconfigurable chiral manipulation recently, while the graphene-based devices attracted significant attention from researchers due to their inherent tunable properties. Here, an L-shaped metasurface composed of three hollow monolayer graphene resonant rings is proposed. A huge dual-frequency asymmetric transmission (AT) in the mid-infrared band (7800 nm-9000 nm) with transmission efficiency peaks of 0.14 and 0.26 is revealed, which is optimal of its similar monolayer counterparts in the current band. By changing the Fermi level of graphene and the relaxation time of its carriers, the AT spectrum including the positions and widths of the peaks can be effectively modulated. More importantly, the structure designed here gives priority to the machinability of the metasurface configuration, ensuring its practicality and promising applications in chiral applications in the future. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.05089 [pdf, other]

Discovery of An Apparent Red, High-Velocity Type Ia Supernova at z = 2.9 with JWST

Authors: J. D. R. Pierel, M. Engesser, D. A. Coulter, C. Decoursey, M. R. Siebert, A. Rest, E. Egami, W. Chen, O. D. Fox, D. O. Jones, B. A. Joshi, T. J. Moriya, Y. Zenati, A. J. Bunker, P. A. Cargile, M. Curti, D. J. Eisenstein, S. Gezari, S. Gomez, M. Guolo, B. D. Johnson, M. Karmen, R. Maiolino, Robert M. Quimby, B. Robertson , et al. (5 additional authors not shown)

Abstract: We present the JWST discovery of SN 2023adsy, a transient object located in a host galaxy JADES-GS$+53.13485$$-$$27.82088$ with a host spectroscopic redshift of $2.903\pm0.007$. The transient was identified in deep James Webb Space Telescope (JWST)/NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) program. Photometric and spectroscopic followup with NIRCam and NIRSpec, respec… ▽ More We present the JWST discovery of SN 2023adsy, a transient object located in a host galaxy JADES-GS$+53.13485$$-$$27.82088$ with a host spectroscopic redshift of $2.903\pm0.007$. The transient was identified in deep James Webb Space Telescope (JWST)/NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) program. Photometric and spectroscopic followup with NIRCam and NIRSpec, respectively, confirm the redshift and yield UV-NIR light-curve, NIR color, and spectroscopic information all consistent with a Type Ia classification. Despite its classification as a likely SN Ia, SN 2023adsy is both fairly red (E(B-V)$\sim0.9$) despite a host galaxy with low-extinction and has a high Ca II velocity ($19,000\pm2,000$km/s) compared to the general population of SNe Ia. While these characteristics are consistent with some Ca-rich SNe Ia, particularly SN 2016hnk, SN 2023adsy is intrinsically brighter than the low-z Ca-rich population. Although such an object is too red for any low-z cosmological sample, we apply a fiducial standardization approach to SN 2023adsy and find that the SN 2023adsy luminosity distance measurement is in excellent agreement ($\lesssim1σ$) with $Λ$CDM. Therefore unlike low-z Ca-rich SNe Ia, SN 2023adsy is standardizable and gives no indication that SN Ia standardized luminosities change significantly with redshift. A larger sample of distant SNe Ia is required to determine if SN Ia population characteristics at high-z truly diverge from their low-z counterparts, and to confirm that standardized luminosities nevertheless remain constant with redshift. △ Less

Submitted 10 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

Comments: Submitted to ApJL

Showing 51–100 of 5,068 results for author: Wang, Q