-
Lifelong Histopathology Whole Slide Image Retrieval via Distance Consistency Rehearsal
Authors:
Xinyu Zhu,
Zhiguo Jiang,
Kun Wu,
Jun Shi,
Yushan Zheng
Abstract:
Content-based histopathological image retrieval (CBHIR) has gained attention in recent years, offering the capability to return histopathology images that are content-wise similar to the query one from an established database. However, in clinical practice, the continuously expanding size of WSI databases limits the practical application of the current CBHIR methods. In this paper, we propose a Li…
▽ More
Content-based histopathological image retrieval (CBHIR) has gained attention in recent years, offering the capability to return histopathology images that are content-wise similar to the query one from an established database. However, in clinical practice, the continuously expanding size of WSI databases limits the practical application of the current CBHIR methods. In this paper, we propose a Lifelong Whole Slide Retrieval (LWSR) framework to address the challenges of catastrophic forgetting by progressive model updating on continuously growing retrieval database. Our framework aims to achieve the balance between stability and plasticity during continuous learning. To preserve system plasticity, we utilize local memory bank with reservoir sampling method to save instances, which can comprehensively encompass the feature spaces of both old and new tasks. Furthermore, A distance consistency rehearsal (DCR) module is designed to ensure the retrieval queue's consistency for previous tasks, which is regarded as stability within a lifelong CBHIR system. We evaluated the proposed method on four public WSI datasets from TCGA projects. The experimental results have demonstrated the proposed method is effective and is superior to the state-of-the-art methods.
△ Less
Submitted 12 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Bayesian Detector Combination for Object Detection with Crowdsourced Annotations
Authors:
Zhi Qin Tan,
Olga Isupova,
Gustavo Carneiro,
Xiatian Zhu,
Yunpeng Li
Abstract:
Acquiring fine-grained object detection annotations in unconstrained images is time-consuming, expensive, and prone to noise, especially in crowdsourcing scenarios. Most prior object detection methods assume accurate annotations; A few recent works have studied object detection with noisy crowdsourced annotations, with evaluation on distinct synthetic crowdsourced datasets of varying setups under…
▽ More
Acquiring fine-grained object detection annotations in unconstrained images is time-consuming, expensive, and prone to noise, especially in crowdsourcing scenarios. Most prior object detection methods assume accurate annotations; A few recent works have studied object detection with noisy crowdsourced annotations, with evaluation on distinct synthetic crowdsourced datasets of varying setups under artificial assumptions. To address these algorithmic limitations and evaluation inconsistency, we first propose a novel Bayesian Detector Combination (BDC) framework to more effectively train object detectors with noisy crowdsourced annotations, with the unique ability of automatically inferring the annotators' label qualities. Unlike previous approaches, BDC is model-agnostic, requires no prior knowledge of the annotators' skill level, and seamlessly integrates with existing object detection models. Due to the scarcity of real-world crowdsourced datasets, we introduce large synthetic datasets by simulating varying crowdsourcing scenarios. This allows consistent evaluation of different models at scale. Extensive experiments on both real and synthetic crowdsourced datasets show that BDC outperforms existing state-of-the-art methods, demonstrating its superiority in leveraging crowdsourced data for object detection. Our code and data are available at https://github.com/zhiqin1998/bdc.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
GLBench: A Comprehensive Benchmark for Graph with Large Language Models
Authors:
Yuhan Li,
Peisong Wang,
Xiao Zhu,
Aochuan Chen,
Haiyun Jiang,
Deng Cai,
Victor Wai Kin Chan,
Jia Li
Abstract:
The emergence of large language models (LLMs) has revolutionized the way we interact with graphs, leading to a new paradigm called GraphLLM. Despite the rapid development of GraphLLM methods in recent years, the progress and understanding of this field remain unclear due to the lack of a benchmark with consistent experimental protocols. To bridge this gap, we introduce GLBench, the first comprehen…
▽ More
The emergence of large language models (LLMs) has revolutionized the way we interact with graphs, leading to a new paradigm called GraphLLM. Despite the rapid development of GraphLLM methods in recent years, the progress and understanding of this field remain unclear due to the lack of a benchmark with consistent experimental protocols. To bridge this gap, we introduce GLBench, the first comprehensive benchmark for evaluating GraphLLM methods in both supervised and zero-shot scenarios. GLBench provides a fair and thorough evaluation of different categories of GraphLLM methods, along with traditional baselines such as graph neural networks. Through extensive experiments on a collection of real-world datasets with consistent data processing and splitting strategies, we have uncovered several key findings. Firstly, GraphLLM methods outperform traditional baselines in supervised settings, with LLM-as-enhancers showing the most robust performance. However, using LLMs as predictors is less effective and often leads to uncontrollable output issues. We also notice that no clear scaling laws exist for current GraphLLM methods. In addition, both structures and semantics are crucial for effective zero-shot transfer, and our proposed simple baseline can even outperform several models tailored for zero-shot scenarios. The data and code of the benchmark can be found at https://github.com/NineAbyss/GLBench.
△ Less
Submitted 11 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Alternating Subspace Approximate Message Passing
Authors:
Xu Zhu,
Yufei Ma,
Xiaoguang Li,
Tiejun Li
Abstract:
Numerous renowned algorithms for tackling the compressed sensing problem employ an alternating strategy, which typically involves data matching in one module and denoising in another. Based on an in-depth analysis of the connection between the message passing and operator splitting, we present a novel approach, the Alternating Subspace Method (ASM), which intuitively combines the principles of the…
▽ More
Numerous renowned algorithms for tackling the compressed sensing problem employ an alternating strategy, which typically involves data matching in one module and denoising in another. Based on an in-depth analysis of the connection between the message passing and operator splitting, we present a novel approach, the Alternating Subspace Method (ASM), which intuitively combines the principles of the greedy methods (e.g., the orthogonal matching pursuit type methods) and the splitting methods (e.g., the approximate message passing type methods). Essentially, ASM modifies the splitting method by achieving fidelity in a subspace-restricted fashion. We reveal that such confining strategy still yields a consistent fixed point iteration and establish its local geometric convergence on the lasso problem. Numerical experiments on both the lasso and channel estimation problems demonstrate its high convergence rate and its capacity to incorporate different prior distributions. Further theoretical analysis also demonstrates the advantage of the motivated message-passing splitting by incorporating quasi-variance degree of freedom even for the classical lasso optimization problem. Overall, the proposed method is promising in efficiency, accuracy and flexibility, which has the potential to be competitive in different sparse recovery applications.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Uniaxial plasmon polaritons $\textit{via}$ charge transfer at the graphene/CrSBr interface
Authors:
Daniel J. Rizzo,
Eric Seewald,
Fangzhou Zhao,
Jordan Cox,
Kaichen Xie,
Rocco A. Vitalone,
Francesco L. Ruta,
Daniel G. Chica,
Yinming Shao,
Sara Shabani,
Evan J. Telford,
Matthew C. Strasbourg,
Thomas P. Darlington,
Suheng Xu,
Siyuan Qiu,
Aravind Devarakonda,
Takashi Taniguchi,
Kenji Watanabe,
Xiaoyang Zhu,
P. James Schuck,
Cory R. Dean,
Xavier Roy,
Andrew J. Millis,
Ting Cao,
Angel Rubio
, et al. (2 additional authors not shown)
Abstract:
Graphene is a privileged 2D platform for hosting confined light-matter excitations known as surface plasmon-polaritons (SPPs), as it possesses low intrinsic losses with a high degree of optical confinement. However, the inherently isotropic optical properties of graphene limit its ability to guide and focus SPPs, making it less suitable than anisotropic elliptical and hyperbolic materials as a pla…
▽ More
Graphene is a privileged 2D platform for hosting confined light-matter excitations known as surface plasmon-polaritons (SPPs), as it possesses low intrinsic losses with a high degree of optical confinement. However, the inherently isotropic optical properties of graphene limit its ability to guide and focus SPPs, making it less suitable than anisotropic elliptical and hyperbolic materials as a platform for polaritonic lensing and canalization. Here, we present the graphene/CrSBr heterostructure as an engineered 2D interface that hosts highly anisotropic SPP propagation over a wide range of frequencies in the mid-infrared and terahertz. Using a combination of scanning tunneling microscopy (STM), scattering-type scanning near-field optical microscopy (s-SNOM), and first-principles calculations, we demonstrate mutual doping in excess of 10$^{13}$ cm$^{-2}$ holes/electrons between the interfacial layers of graphene/CrSBr heterostructures. SPPs in graphene activated by charge transfer interact with charge-induced anisotropic intra- and interband transitions in the interfacial doped CrSBr, leading to preferential SPP propagation along the quasi-1D chains that compose each CrSBr layer. This multifaceted proximity effect both creates SPPs and endows them with anisotropic transport and propagation lengths that differ by an order-of-magnitude between the two in-plane crystallographic axes of CrSBr.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
HTD-Mamba: Efficient Hyperspectral Target Detection with Pyramid State Space Model
Authors:
Dunbin Shen,
Xuanbing Zhu,
Jiacheng Tian,
Jianjun Liu,
Zhenrong Du,
Hongyu Wang,
Xiaorui Ma
Abstract:
Hyperspectral target detection (HTD) identifies objects of interest from complex backgrounds at the pixel level, playing a vital role in Earth observation. However, HTD faces challenges due to limited prior knowledge and spectral variations, leading to underfitting models and unreliable performance. To address these challenges, this paper proposes an efficient self-supervised HTD method with a pyr…
▽ More
Hyperspectral target detection (HTD) identifies objects of interest from complex backgrounds at the pixel level, playing a vital role in Earth observation. However, HTD faces challenges due to limited prior knowledge and spectral variations, leading to underfitting models and unreliable performance. To address these challenges, this paper proposes an efficient self-supervised HTD method with a pyramid state space model (SSM), named HTD-Mamba, which employs spectrally contrastive learning to distinguish between target and background based on the similarity measurement of intrinsic features. Specifically, to obtain sufficient training samples and leverage spatial contextual information, we propose a spatial-encoded spectral augmentation technique that encodes all surrounding pixels within a patch into a transformed view of the central pixel. Additionally, to explore global band correlations, we divide pixels into continuous group-wise spectral embeddings and introduce Mamba to HTD for the first time to model long-range dependencies of the spectral sequence with linear complexity. Furthermore, to alleviate spectral variation and enhance robust representation, we propose a pyramid SSM as a backbone to capture and fuse multiresolution spectral-wise intrinsic features. Extensive experiments conducted on four public datasets demonstrate that the proposed method outperforms state-of-the-art methods in both quantitative and qualitative evaluations. Code is available at \url{https://github.com/shendb2022/HTD-Mamba}.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Enhancing Robustness and Security in ISAC Network Design: Leveraging Transmissive Reconfigurable Intelligent Surface with RSMA
Authors:
Ziwei Liu,
Wen Chen,
Qingqing Wu,
Zhendong Li,
Xusheng Zhu,
Qiong Wu,
Nan Cheng
Abstract:
In this paper, we propose a novel transmissive reconfigurable intelligent surface transceiver-enhanced robust and secure integrated sensing and communication network. A time-division sensing communication mechanism is designed for the scenario, which enables communication and sensing to share wireless resources. To address the interference management problem and hinder eavesdropping, we implement…
▽ More
In this paper, we propose a novel transmissive reconfigurable intelligent surface transceiver-enhanced robust and secure integrated sensing and communication network. A time-division sensing communication mechanism is designed for the scenario, which enables communication and sensing to share wireless resources. To address the interference management problem and hinder eavesdropping, we implement rate-splitting multiple access (RSMA), where the common stream is designed as a useful signal and an artificial noise, while taking into account the imperfect channel state information and modeling the channel for the illegal users in a fine-grained manner as well as giving an upper bound on the error. We introduce the secrecy outage probability and construct an optimization problem with secrecy sum-rate as the objective functions to optimize the common stream beamforming matrix, the private stream beamforming matrix and the timeslot duration variable. Due to the coupling of the optimization variables and the infinity of the error set, the proposed problem is a nonconvex optimization problem that cannot be solved directly. In order to address the above challenges, the block coordinate descent-based second-order cone programming algorithm is used to decouple the optimization variables and solving the problem. Specifically, the problem is decoupled into two subproblems concerning the common stream beamforming matrix, the private stream beamforming matrix, and the timeslot duration variable, which are solved by alternating optimization until convergence is reached. To solve the problem, S-procedure, Bernstein's inequality and successive convex approximation are employed to deal with the objective function and non-convex constraints. Numerical simulation results verify the superiority of the proposed scheme in improving the secrecy energy efficiency and the Cramér-Rao boundary.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries
Authors:
Renjie Liang,
Li Li,
Chongzhi Zhang,
Jing Wang,
Xizhou Zhu,
Aixin Sun
Abstract:
In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we dev…
▽ More
In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we develop the TVR-Ranking dataset, based on the raw videos and existing moment annotations provided in the TVR dataset. Our key contribution is the manual annotation of relevance levels for 94,442 query-moment pairs. We then develop the $NDCG@K, IoU\geq μ$ evaluation metric for this new task and conduct experiments to evaluate three baseline models. Our experiments show that the new RVMR task brings new challenges to existing models and we believe this new dataset contributes to the research on multi-modality search. The dataset is available at \url{https://github.com/Ranking-VMR/TVR-Ranking}
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Quantum Noise Spectroscopy of Critical Slowing Down in an Atomically Thin Magnet
Authors:
Mark E. Ziffer,
Francisco Machado,
Benedikt Ursprung,
Artur Lozovoi,
Aya Batoul Tazi,
Zhiyang Yuan,
Michael E. Ziebel,
Tom Delord,
Nanyu Zeng,
Evan Telford,
Daniel G. Chica,
Dane W. deQuilettes,
Xiaoyang Zhu,
James C. Hone,
Kenneth L. Shepard,
Xavier Roy,
Nathalie P. de Leon,
Emily J. Davis,
Shubhayu Chatterjee,
Carlos A. Meriles,
Jonathan S. Owen,
P. James Schuck,
Abhay N. Pasupathy
Abstract:
Low frequency critical fluctuations in magnetic materials encode important information about the physics of magnetic ordering, especially in the associated critical exponents. While a number of techniques have been established to study magnetic critical fluctuations in bulk materials, few approaches maintain the required microscopic resolution, temporal range, and signal sensitivity to quantitativ…
▽ More
Low frequency critical fluctuations in magnetic materials encode important information about the physics of magnetic ordering, especially in the associated critical exponents. While a number of techniques have been established to study magnetic critical fluctuations in bulk materials, few approaches maintain the required microscopic resolution, temporal range, and signal sensitivity to quantitatively analyze critical fluctuations in magnetic phases of 2D materials. Using nitrogen-vacancy (NV) centers in diamond as quantum probes, we implement $T_2$ (spin decoherence) noise magnetometry to quantitatively study critical dynamics in a tri-layer sample of the Van der Waals magnetic material CrSBr. We characterize critical fluctuations across the magnetic phase transition in CrSBr by analyzing the NV spin echo coherence decay on time scales that approach the characteristic fluctuation correlation time $τ_c$ at criticality, allowing us to study the temperature dependence of critical slowing down. By modelling the spin echo decoherence using theoretical models for critical dynamics, we are able to extract the critical exponent $ν$ for the correlation length. We find a value for $ν$ which deviates from the Ising prediction and suggests the role of long-range dipolar interactions in modifying the critical behavior of magnetic fluctuation modes in CrSBr at the 2D limit. We further compare the divergence of correlation length in CrSBr to the predicted exponential divergence for 2D XY criticality, and find evidence suggesting the possibility of such behavior in a temperature window near $T_C$ where static magnetic domains are absent. Our work provides a first demonstration of the capability of decoherence based NV noise magnetometry to quantitatively analyze critical scaling laws in 2D materials.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Channel Characterization of IRS-assisted Resonant Beam Communication Systems
Authors:
Wen Fang,
Wen Chen,
Qingqing Wu,
Xusheng Zhu,
Qiong Wu,
Nan Cheng
Abstract:
To meet the growing demand for data traffic, spectrum-rich optical wireless communication (OWC) has emerged as a key technological driver for the development of 6G. The resonant beam communication (RBC) system, which employs spatially separated laser cavities as the transmitter and receiver, is a high-speed OWC technology capable of self-alignment without tracking. However, its transmission throug…
▽ More
To meet the growing demand for data traffic, spectrum-rich optical wireless communication (OWC) has emerged as a key technological driver for the development of 6G. The resonant beam communication (RBC) system, which employs spatially separated laser cavities as the transmitter and receiver, is a high-speed OWC technology capable of self-alignment without tracking. However, its transmission through the air is susceptible to losses caused by obstructions. In this paper, we propose an intelligent reflecting surface (IRS) assisted RBC system with the optical frequency doubling method, where the resonant beam in frequency-fundamental and frequency-doubled is transmitted through both direct line-of-sight (LoS) and IRS-assisted channels to maintain steady-state oscillation and enable communication without echo-interference, respectively. Then, we establish the channel model based on Fresnel diffraction theory under the near-field optical propagation to analyze the transmission loss and frequency-doubled power analytically. Furthermore, communication power can be maximized by dynamically controlling the beam-splitting ratio between the two channels according to the loss levels encountered over air. Numerical results validate that the IRS-assisted channel can compensate for the losses in the obstructed LoS channel and misaligned receivers, ensuring that communication performance reaches an optimal value with dynamic ratio adjustments.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention
Authors:
Yunzhong Si,
Huiying Xu,
Xinzhong Zhu,
Wenhao Zhang,
Yao Dong,
Yuxing Chen,
Hongbo Li
Abstract:
Channel and spatial attentions have respectively brought significant improvements in extracting feature dependencies and spatial structure relations for various downstream vision tasks. While their combination is more beneficial for leveraging their individual strengths, the synergy between channel and spatial attentions has not been fully explored, lacking in fully harness the synergistic potenti…
▽ More
Channel and spatial attentions have respectively brought significant improvements in extracting feature dependencies and spatial structure relations for various downstream vision tasks. While their combination is more beneficial for leveraging their individual strengths, the synergy between channel and spatial attentions has not been fully explored, lacking in fully harness the synergistic potential of multi-semantic information for feature guidance and mitigation of semantic disparities. Our study attempts to reveal the synergistic relationship between spatial and channel attention at multiple semantic levels, proposing a novel Spatial and Channel Synergistic Attention module (SCSA). Our SCSA consists of two parts: the Shareable Multi-Semantic Spatial Attention (SMSA) and the Progressive Channel-wise Self-Attention (PCSA). SMSA integrates multi-semantic information and utilizes a progressive compression strategy to inject discriminative spatial priors into PCSA's channel self-attention, effectively guiding channel recalibration. Additionally, the robust feature interactions based on the self-attention mechanism in PCSA further mitigate the disparities in multi-semantic information among different sub-features within SMSA. We conduct extensive experiments on seven benchmark datasets, including classification on ImageNet-1K, object detection on MSCOCO 2017, segmentation on ADE20K, and four other complex scene detection datasets. Our results demonstrate that our proposed SCSA not only surpasses the current state-of-the-art attention but also exhibits enhanced generalization capabilities across various task scenarios. The code and models are available at: https://github.com/HZAI-ZJNU/SCSA.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Hierarchical Decoupling Capacitor Optimization for Power Distribution Network of 2.5D ICs with Co-Analysis of Frequency and Time Domains Based on Deep Reinforcement Learning
Authors:
Yuanyuan Duan,
Haiyang Feng,
Zhiping Yu,
Hanming Wu,
Leilai Shao,
Xiaolei Zhu
Abstract:
With the growing need for higher memory bandwidth and computation density, 2.5D design, which involves integrating multiple chiplets onto an interposer, emerges as a promising solution. However, this integration introduces significant challenges due to increasing data rates and a large number of I/Os, necessitating advanced optimization of the power distribution networks (PDNs) both on-chip and on…
▽ More
With the growing need for higher memory bandwidth and computation density, 2.5D design, which involves integrating multiple chiplets onto an interposer, emerges as a promising solution. However, this integration introduces significant challenges due to increasing data rates and a large number of I/Os, necessitating advanced optimization of the power distribution networks (PDNs) both on-chip and on-interposer to mitigate the small signal noise and simultaneous switching noise (SSN). Traditional PDN optimization strategies in 2.5D systems primarily focus on reducing impedance by integrating decoupling capacitors (decaps) to lessen small signal noises. Unfortunately, relying solely on frequency-domain analysis has been proven inadequate for addressing coupled SSN, as indicated by our experimental results. In this work, we introduce a novel two-phase optimization flow using deep reinforcement learning to tackle both the on-chip small signal noise and SSN. Initially, we optimize the impedance in the frequency domain to maintain the small signal noise within acceptable limits while avoiding over-design. Subsequently, in the time domain, we refine the PDN to minimize the voltage violation integral (VVI), a more accurate measure of SSN severity. To the best of our knowledge, this is the first dual-domain optimization strategy that simultaneously addresses both the small signal noise and SSN propagation through strategic decap placement in on-chip and on-interposer PDNs, offering a significant step forward in the design of robust PDNs for 2.5D integrated systems.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
PartCraft: Crafting Creative Objects by Parts
Authors:
Kam Woh Ng,
Xiatian Zhu,
Yi-Zhe Song,
Tao Xiang
Abstract:
This paper propels creative control in generative visual AI by allowing users to "select". Departing from traditional text or sketch-based methods, we for the first time allow users to choose visual concepts by parts for their creative endeavors. The outcome is fine-grained generation that precisely captures selected visual concepts, ensuring a holistically faithful and plausible result. To achiev…
▽ More
This paper propels creative control in generative visual AI by allowing users to "select". Departing from traditional text or sketch-based methods, we for the first time allow users to choose visual concepts by parts for their creative endeavors. The outcome is fine-grained generation that precisely captures selected visual concepts, ensuring a holistically faithful and plausible result. To achieve this, we first parse objects into parts through unsupervised feature clustering. Then, we encode parts into text tokens and introduce an entropy-based normalized attention loss that operates on them. This loss design enables our model to learn generic prior topology knowledge about object's part composition, and further generalize to novel part compositions to ensure the generation looks holistically faithful. Lastly, we employ a bottleneck encoder to project the part tokens. This not only enhances fidelity but also accelerates learning, by leveraging shared knowledge and facilitating information exchange among instances. Visual results in the paper and supplementary material showcase the compelling power of PartCraft in crafting highly customized, innovative creations, exemplified by the "charming" and creative birds. Code is released at https://github.com/kamwoh/partcraft.
△ Less
Submitted 8 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
MineNetCD: A Benchmark for Global Mining Change Detection on Remote Sensing Imagery
Authors:
Weikang Yu,
Xiaokang Zhang,
Xiao Xiang Zhu,
Richard Gloaguen,
Pedram Ghamisi
Abstract:
Monitoring changes triggered by mining activities is crucial for industrial controlling, environmental management and regulatory compliance, yet it poses significant challenges due to the vast and often remote locations of mining sites. Remote sensing technologies have increasingly become indispensable to detect and analyze these changes over time. We thus introduce MineNetCD, a comprehensive benc…
▽ More
Monitoring changes triggered by mining activities is crucial for industrial controlling, environmental management and regulatory compliance, yet it poses significant challenges due to the vast and often remote locations of mining sites. Remote sensing technologies have increasingly become indispensable to detect and analyze these changes over time. We thus introduce MineNetCD, a comprehensive benchmark designed for global mining change detection using remote sensing imagery. The benchmark comprises three key contributions. First, we establish a global mining change detection dataset featuring more than 70k paired patches of bi-temporal high-resolution remote sensing images and pixel-level annotations from 100 mining sites worldwide. Second, we develop a novel baseline model based on a change-aware Fast Fourier Transform (ChangeFFT) module, which enhances various backbones by leveraging essential spectrum components within features in the frequency domain and capturing the channel-wise correlation of bi-temporal feature differences to learn change-aware representations. Third, we construct a unified change detection (UCD) framework that integrates over 13 advanced change detection models. This framework is designed for streamlined and efficient processing, utilizing the cloud platform hosted by HuggingFace. Extensive experiments have been conducted to demonstrate the superiority of the proposed baseline model compared with 12 state-of-the-art change detection approaches. Empirical studies on modularized backbones comprehensively confirm the efficacy of different representation learners on change detection. This contribution represents significant advancements in the field of remote sensing and change detection, providing a robust resource for future research and applications in global mining monitoring. Dataset and Codes are available via the link.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
M^3:Manipulation Mask Manufacturer for Arbitrary-Scale Super-Resolution Mask
Authors:
Xinyu Yang,
Xiaochen Ma,
Xuekang Zhu,
Bo Du,
Lei Su,
Bingkui Tong,
Zeyu Lei,
Jizhe Zhou
Abstract:
In the field of image manipulation localization (IML), the small quantity and poor quality of existing datasets have always been major issues. A dataset containing various types of manipulations will greatly help improve the accuracy of IML models. Images on the internet (such as those on Baidu Tieba's PS Bar) are manipulated using various techniques, and creating a dataset from these images will…
▽ More
In the field of image manipulation localization (IML), the small quantity and poor quality of existing datasets have always been major issues. A dataset containing various types of manipulations will greatly help improve the accuracy of IML models. Images on the internet (such as those on Baidu Tieba's PS Bar) are manipulated using various techniques, and creating a dataset from these images will significantly enrich the types of manipulations in our data. However, images on the internet suffer from resolution and clarity issues, and the masks obtained by simply subtracting the manipulated image from the original contain various noises. These noises are difficult to remove, rendering the masks unusable for IML models. Inspired by the field of change detection, we treat the original and manipulated images as changes over time for the same image and view the data generation task as a change detection task. However, due to clarity issues between images, conventional change detection models perform poorly. Therefore, we introduced a super-resolution module and proposed the Manipulation Mask Manufacturer (MMM) framework. It enhances the resolution of both the original and tampered images, thereby improving image details for better comparison. Simultaneously, the framework converts the original and tampered images into feature embeddings and concatenates them, effectively modeling the context. Additionally, we created the Manipulation Mask Manufacturer Dataset (MMMD), a dataset that covers a wide range of manipulation techniques. We aim to contribute to the fields of image forensics and manipulation detection by providing more realistic manipulation data through MMM and MMMD. Detailed information about MMMD and the download link can be found at: the code and datasets will be made available.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation Quality Across Languages, Domains, and Expertise Levels
Authors:
Jianhao Yan,
Pingchuan Yan,
Yulong Chen,
Judy Li,
Xianchao Zhu,
Yue Zhang
Abstract:
This study comprehensively evaluates the translation quality of Large Language Models (LLMs), specifically GPT-4, against human translators of varying expertise levels across multiple language pairs and domains. Through carefully designed annotation rounds, we find that GPT-4 performs comparably to junior translators in terms of total errors made but lags behind medium and senior translators. We a…
▽ More
This study comprehensively evaluates the translation quality of Large Language Models (LLMs), specifically GPT-4, against human translators of varying expertise levels across multiple language pairs and domains. Through carefully designed annotation rounds, we find that GPT-4 performs comparably to junior translators in terms of total errors made but lags behind medium and senior translators. We also observe the imbalanced performance across different languages and domains, with GPT-4's translation capability gradually weakening from resource-rich to resource-poor directions. In addition, we qualitatively study the translation given by GPT-4 and human translators, and find that GPT-4 translator suffers from literal translations, but human translators sometimes overthink the background information. To our knowledge, this study is the first to evaluate LLMs against human translators and analyze the systematic differences between their outputs, providing valuable insights into the current state of LLM-based translation and its potential limitations.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
OrbitGrasp: $SE(3)$-Equivariant Grasp Learning
Authors:
Boce Hu,
Xupeng Zhu,
Dian Wang,
Zihao Dong,
Haojie Huang,
Chenghao Wang,
Robin Walters,
Robert Platt
Abstract:
While grasp detection is an important part of any robotic manipulation pipeline, reliable and accurate grasp detection in $SE(3)$ remains a research challenge. Many robotics applications in unstructured environments such as the home or warehouse would benefit a lot from better grasp performance. This paper proposes a novel framework for detecting $SE(3)$ grasp poses based on point cloud input. Our…
▽ More
While grasp detection is an important part of any robotic manipulation pipeline, reliable and accurate grasp detection in $SE(3)$ remains a research challenge. Many robotics applications in unstructured environments such as the home or warehouse would benefit a lot from better grasp performance. This paper proposes a novel framework for detecting $SE(3)$ grasp poses based on point cloud input. Our main contribution is to propose an $SE(3)$-equivariant model that maps each point in the cloud to a continuous grasp quality function over the 2-sphere $S^2$ using a spherical harmonic basis. Compared with reasoning about a finite set of samples, this formulation improves the accuracy and efficiency of our model when a large number of samples would otherwise be needed. In order to accomplish this, we propose a novel variation on EquiFormerV2 that leverages a UNet-style backbone to enlarge the number of points the model can handle. Our resulting method, which we name $\textit{OrbitGrasp}$, significantly outperforms baselines in both simulation and physical experiments.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Passenger Route and Departure Time Guidance under Disruptions in Oversaturated Urban Rail Transit Networks
Authors:
Siyu Zhuo,
Xiaoning Zhu,
Pan Shang,
Zhengke Liu
Abstract:
The urban rail transit (URT) system attracts many commuters with its punctuality and convenience. However, it is vulnerable to disruptions caused by factors like extreme weather and temporary equipment failures, which greatly impact passengers' journeys and diminish the system's service quality. In this study, we propose targeted travel guidance for passengers at different space-time locations by…
▽ More
The urban rail transit (URT) system attracts many commuters with its punctuality and convenience. However, it is vulnerable to disruptions caused by factors like extreme weather and temporary equipment failures, which greatly impact passengers' journeys and diminish the system's service quality. In this study, we propose targeted travel guidance for passengers at different space-time locations by devising passenger rescheduling strategies during disruptions. This guidance not only offers insights into route changes but also provides practical recommendations for delaying departure times when required. We present a novel three-feature four-group passenger classification principle, integrating temporal, spatial, and spatio-temporal features to classify passengers in disrupted URT networks. This approach results in the creation of four distinct solution spaces based on passenger groups. A mixed integer programming model is built based on individual level considering the First-in-First-out (FIFO) rule in oversaturated networks. Additionally, we present a two-stage solution approach for handling the complex issues in large-scale networks. Experimental results from both small-scale artificial networks and the real-world Beijing URT network validate the efficacy of our proposed passenger rescheduling strategies in mitigating disruptions. Specifically, when compared to scenarios with no travel guidance during disruptions, our strategies achieve a substantial reduction in total passenger travel time by 29.7% and 50.9% respectively, underscoring the effectiveness in managing unexpected disruptions.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models
Authors:
Haritz Puerto,
Tilek Chubakov,
Xiaodan Zhu,
Harish Tayyar Madabushi,
Iryna Gurevych
Abstract:
Requiring a Large Language Model to generate intermediary reasoning steps has been shown to be an effective way of boosting performance. In fact, it has been found that instruction tuning on these intermediary reasoning steps improves model performance. In this work, we present a novel method of further improving performance by requiring models to compare multiple reasoning chains before generatin…
▽ More
Requiring a Large Language Model to generate intermediary reasoning steps has been shown to be an effective way of boosting performance. In fact, it has been found that instruction tuning on these intermediary reasoning steps improves model performance. In this work, we present a novel method of further improving performance by requiring models to compare multiple reasoning chains before generating a solution in a single inference step. We call this method Divergent CoT (DCoT). We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, LLMs. Through a rigorous set of experiments spanning a wide range of tasks that require various reasoning types, we show that fine-tuning on DCoT consistently improves performance over the CoT baseline across model families and scales (1.3B to 70B). Through a combination of empirical and manual evaluation, we additionally show that these performance gains stem from models generating multiple divergent reasoning chains in a single inference step, indicative of the enabling of self-correction in language models. Our code and data are publicly available at https://github.com/UKPLab/arxiv2024-divergent-cot.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Properties of the QCD Matter -- An Experimental Review of Selected Results from RHIC BES Program
Authors:
Jinhui Chen,
Xin Dong,
Xionghong He,
Huanzhong Huang,
Feng Liu,
Xiaofeng Luo,
Yu-Gang Ma,
Lijuan Ruan,
Ming Shao,
Shusu Shi,
Xu Sun,
Aihong Tang,
Zebo Tang,
Fuqiang Wang,
Hai Wang,
Yi Wang,
Zhigang Xiao,
Guannan Xie,
Nu Xu,
Qinghua Xu,
Zhangbu Xu,
Chi Yang,
Shuai Yang,
Wangmei Zha,
Yapeng Zhang
, et al. (3 additional authors not shown)
Abstract:
In the paper, we discuss the development of the multi-gap resistive plate chamber Time-of-Flight (TOF) technology and the production of the STAR TOF detector in China at the beginning of the 21st century. Then we review recent experimental results from the first beam energy scan program (BES-I) at the Relativistic Heavy Ion Collider (RHIC). Topics cover measurements of collectivity, chirality, cri…
▽ More
In the paper, we discuss the development of the multi-gap resistive plate chamber Time-of-Flight (TOF) technology and the production of the STAR TOF detector in China at the beginning of the 21st century. Then we review recent experimental results from the first beam energy scan program (BES-I) at the Relativistic Heavy Ion Collider (RHIC). Topics cover measurements of collectivity, chirality, criticality, global polarization, strangeness, heavy-flavor, di-lepton and light nuclei productions.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be…
▽ More
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA
Authors:
Xuqi Zhu,
Huaizhi Zhang,
JunKyu Lee,
Jiacheng Zhu,
Chandrajit Pal,
Sangeet Saha,
Klaus D. McDonald-Maier,
Xiaojun Zhai
Abstract:
Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations, constituting the predominant computational cost. Therefore, this paper proposes a high-throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs as a basic component of the NNs. We firstly streamline inter-layer and intra-layer redundancies of MAD…
▽ More
Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations, constituting the predominant computational cost. Therefore, this paper proposes a high-throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs as a basic component of the NNs. We firstly streamline inter-layer and intra-layer redundancies of MADDNESS algorithm, a LUT-based approximate matrix multiplication, to design a fast, efficient scalable approximate matrix multiplication module termed "Approximate Multiplication Unit (AMU)". The AMU optimizes LUT-based matrix multiplications further through dedicated memory management and access design, decoupling computational overhead from input resolution and boosting FPGA-based NN accelerator efficiency significantly. The experimental results show that using our AMU achieves up to 9x higher throughput and 112x higher energy efficiency over the state-of-the-art solutions for the FPGA-based Quantised Neural Network (QNN) accelerators.
△ Less
Submitted 7 July, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation
Authors:
Zhiyuan Ma,
Yuxiang Wei,
Yabin Zhang,
Xiangyu Zhu,
Zhen Lei,
Lei Zhang
Abstract:
By leveraging the text-to-image diffusion priors, score distillation can synthesize 3D contents without paired text-3D training data. Instead of spending hours of online optimization per text prompt, recent studies have been focused on learning a text-to-3D generative network for amortizing multiple text-3D relations, which can synthesize 3D contents in seconds. However, existing score distillatio…
▽ More
By leveraging the text-to-image diffusion priors, score distillation can synthesize 3D contents without paired text-3D training data. Instead of spending hours of online optimization per text prompt, recent studies have been focused on learning a text-to-3D generative network for amortizing multiple text-3D relations, which can synthesize 3D contents in seconds. However, existing score distillation methods are hard to scale up to a large amount of text prompts due to the difficulties in aligning pretrained diffusion prior with the distribution of rendered images from various text prompts. Current state-of-the-arts such as Variational Score Distillation finetune the pretrained diffusion model to minimize the noise prediction error so as to align the distributions, which are however unstable to train and will impair the model's comprehension capability to numerous text prompts. Based on the observation that the diffusion models tend to have lower noise prediction errors at earlier timesteps, we propose Asynchronous Score Distillation (ASD), which minimizes the noise prediction error by shifting the diffusion timestep to earlier ones. ASD is stable to train and can scale up to 100k prompts. It reduces the noise prediction error without changing the weights of pre-trained diffusion model, thus keeping its strong comprehension capability to prompts. We conduct extensive experiments across different 2D diffusion models, including Stable Diffusion and MVDream, and text-to-3D generators, including Hyper-iNGP, 3DConv-Net and Triplane-Transformer. The results demonstrate ASD's effectiveness in stable 3D generator training, high-quality 3D content synthesis, and its superior prompt-consistency, especially under large prompt corpus.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Aeroengine performance prediction using a physical-embedded data-driven method
Authors:
Tong Mo,
Shiran Dai,
An Fu,
Xiaomeng Zhu,
Shuxiao Li
Abstract:
Accurate and efficient prediction of aeroengine performance is of paramount importance for engine design, maintenance, and optimization endeavours. However, existing methodologies often struggle to strike an optimal balance among predictive accuracy, computational efficiency, modelling complexity, and data dependency. To address these challenges, we propose a strategy that synergistically combines…
▽ More
Accurate and efficient prediction of aeroengine performance is of paramount importance for engine design, maintenance, and optimization endeavours. However, existing methodologies often struggle to strike an optimal balance among predictive accuracy, computational efficiency, modelling complexity, and data dependency. To address these challenges, we propose a strategy that synergistically combines domain knowledge from both the aeroengine and neural network realms to enable real-time prediction of engine performance parameters. Leveraging aeroengine domain knowledge, we judiciously design the network structure and regulate the internal information flow. Concurrently, drawing upon neural network domain expertise, we devise four distinct feature fusion methods and introduce an innovative loss function formulation. To rigorously evaluate the effectiveness and robustness of our proposed strategy, we conduct comprehensive validation across two distinct datasets. The empirical results demonstrate :(1) the evident advantages of our tailored loss function; (2) our model's ability to maintain equal or superior performance with a reduced parameter count; (3) our model's reduced data dependency compared to generalized neural network architectures; (4)Our model is more interpretable than traditional black box machine learning methods.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees
Authors:
Zhiyuan Wang,
Jinhao Duan,
Lu Cheng,
Yue Zhang,
Qingni Wang,
Hengtao Shen,
Xiaofeng Zhu,
Xiaoshuang Shi,
Kaidi Xu
Abstract:
Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended…
▽ More
Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended NLG tasks. We propose a sampling-based uncertainty measure leveraging self-consistency and develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the design of the CP algorithm. Experimental results indicate that our uncertainty measure generally surpasses prior state-of-the-art methods. Furthermore, we calibrate the prediction sets within the model's unfixed answer distribution and achieve strict control over the correctness coverage rate across 6 LLMs on 4 free-form NLG datasets, spanning general-purpose and medical domains, while the small average set size further highlights the efficiency of our method in providing trustworthy guarantees for practical open-ended NLG applications.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models
Authors:
Yuxiang Zhang,
Jing Chen,
Junjie Wang,
Yaxin Liu,
Cheng Yang,
Chufan Shi,
Xinyu Zhu,
Zihao Lin,
Hanwen Wan,
Yujiu Yang,
Tetsuya Sakai,
Tian Feng,
Hayato Yamana
Abstract:
Tool-augmented large language models (LLMs) are rapidly being integrated into real-world applications. Due to the lack of benchmarks, the community still needs to fully understand the hallucination issues within these models. To address this challenge, we introduce a comprehensive diagnostic benchmark, ToolBH. Specifically, we assess the LLM's hallucinations through two perspectives: depth and bre…
▽ More
Tool-augmented large language models (LLMs) are rapidly being integrated into real-world applications. Due to the lack of benchmarks, the community still needs to fully understand the hallucination issues within these models. To address this challenge, we introduce a comprehensive diagnostic benchmark, ToolBH. Specifically, we assess the LLM's hallucinations through two perspectives: depth and breadth. In terms of depth, we propose a multi-level diagnostic process, including (1) solvability detection, (2) solution planning, and (3) missing-tool analysis. For breadth, we consider three scenarios based on the characteristics of the toolset: missing necessary tools, potential tools, and limited functionality tools. Furthermore, we developed seven tasks and collected 700 evaluation samples through multiple rounds of manual annotation. The results show the significant challenges presented by the ToolBH benchmark. The current advanced models Gemini-1.5-Pro and GPT-4o only achieve a total score of 45.3 and 37.0, respectively, on a scale of 100. In this benchmark, larger model parameters do not guarantee better performance; the training data and response strategies also play a crucial role in tool-enhanced LLM scenarios. Our diagnostic analysis indicates that the primary reason for model errors lies in assessing task solvability. Additionally, open-weight models suffer from performance drops with verbose replies, whereas proprietary models excel with longer reasoning.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Unified Framework for Calculating Convex Roof Resource Measures
Authors:
Xuanran Zhu,
Chao Zhang,
Zheng An,
Bei Zeng
Abstract:
Quantum resource theories (QRTs) provide a comprehensive and practical framework for the analysis of diverse quantum phenomena. A fundamental task within QRTs is the quantification of resources inherent in a given quantum state. In this letter, we introduce a unified computational framework for a class of widely utilized quantum resource measures, derived from convex roof extensions. We establish…
▽ More
Quantum resource theories (QRTs) provide a comprehensive and practical framework for the analysis of diverse quantum phenomena. A fundamental task within QRTs is the quantification of resources inherent in a given quantum state. In this letter, we introduce a unified computational framework for a class of widely utilized quantum resource measures, derived from convex roof extensions. We establish that the computation of these convex roof resource measures can be reformulated as an optimization problem over a Stiefel manifold, which can be further unconstrained through polar projection. Compared to existing methods employing semi-definite programming (SDP), gradient-based techniques or seesaw strategy, our approach not only demonstrates superior computational efficiency but also maintains applicability across various scenarios within a streamlined workflow. We substantiate the efficacy of our method by applying it to several key quantum resources, including entanglement, coherence, and magic states. Moreover, our methodology can be readily extended to other convex roof quantities beyond the domain of resource theories, suggesting broad applicability in the realm of quantum information theory.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models
Authors:
Xiliang Zhu,
Shayna Gardiner,
Tere Roldán,
David Rossouw
Abstract:
Sentiment analysis serves as a pivotal component in Natural Language Processing (NLP). Advancements in multilingual pre-trained models such as XLM-R and mT5 have contributed to the increasing interest in cross-lingual sentiment analysis. The recent emergence in Large Language Models (LLM) has significantly advanced general NLP tasks, however, the capability of such LLMs in cross-lingual sentiment…
▽ More
Sentiment analysis serves as a pivotal component in Natural Language Processing (NLP). Advancements in multilingual pre-trained models such as XLM-R and mT5 have contributed to the increasing interest in cross-lingual sentiment analysis. The recent emergence in Large Language Models (LLM) has significantly advanced general NLP tasks, however, the capability of such LLMs in cross-lingual sentiment analysis has not been fully studied. This work undertakes an empirical analysis to compare the cross-lingual transfer capability of public Small Multilingual Language Models (SMLM) like XLM-R, against English-centric LLMs such as Llama-3, in the context of sentiment analysis across English, Spanish, French and Chinese. Our findings reveal that among public models, SMLMs exhibit superior zero-shot cross-lingual performance relative to LLMs. However, in few-shot cross-lingual settings, public LLMs demonstrate an enhanced adaptive potential. In addition, we observe that proprietary GPT-3.5 and GPT-4 lead in zero-shot cross-lingual capability, but are outpaced by public models in few-shot scenarios.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec…
▽ More
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Geometric heat pumping under continuous modulation in thermal diffusion
Authors:
Hao-Ran Yan,
Pei-Chao Cao,
Yan-Xiang Wang,
Xue-Feng Zhu,
Ying Li
Abstract:
Berry (geometric) phase has attracted a lot of interest and permeated into all aspects of physics including photonics, crystal dynamics, electromagnetism and heat transfer since it was discovered, leading to various unprecedented effects both in classical and quantum systems, such as Hannay angle, quantum Hall effect, orbital magnetism and Thouless pumping. Heat pumping is one of the most prominen…
▽ More
Berry (geometric) phase has attracted a lot of interest and permeated into all aspects of physics including photonics, crystal dynamics, electromagnetism and heat transfer since it was discovered, leading to various unprecedented effects both in classical and quantum systems, such as Hannay angle, quantum Hall effect, orbital magnetism and Thouless pumping. Heat pumping is one of the most prominent and fantastic application of geometric phase in heat transport. Here we derive a general heat pumping theory based on classical diffusion equation and continuous modulation of system parameters in macroscopic thermal diffusion system and obtain a formula which is reminiscent of contact between Berry phase and the Berry curvature. Furthermore, we discuss two cases of non-trivial zero heat flux after one cycle which is fundamentally different from the trivial zero heat flux generated by static zero heat bias in physical nature. Then we analyze the dependence of the effect on the system thermal parameters, including some counterintuitive phenomenon. Finally, under the guidance of this theory, we conduct an experiment to demonstrate the accuracy and effectiveness of our theory and observe the heat pumping effect regardless of the presence and the absence of the thermal bias between two ports of system. In general, our work clearly derives the universal form of heat pumping theory under arbitrary form of the modulation in the macroscopic thermal diffusion system, this is of great significance for better heat energy transport, heat manipulation and so on. It also establishes the foundation of achieving other non-reciprocity devices or topological devices with the aid of spatiotemporal modulation.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Authors:
Le Zhuo,
Ruoyi Du,
Han Xiao,
Yangguang Li,
Dongyang Liu,
Rongjie Huang,
Wenze Liu,
Lirui Zhao,
Fu-Yun Wang,
Zhanyu Ma,
Xu Luo,
Zehan Wang,
Kaipeng Zhang,
Xiangyang Zhu,
Si Liu,
Xiangyu Yue,
Dingning Liu,
Wanli Ouyang,
Ziwei Liu,
Yu Qiao,
Hongsheng Li,
Peng Gao
Abstract:
Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lu…
▽ More
Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lumina-Next, an improved version of Lumina-T2X, showcasing stronger generation performance with increased training and inference efficiency. We begin with a comprehensive analysis of the Flag-DiT architecture and identify several suboptimal components, which we address by introducing the Next-DiT architecture with 3D RoPE and sandwich normalizations. To enable better resolution extrapolation, we thoroughly compare different context extrapolation methods applied to text-to-image generation with 3D RoPE, and propose Frequency- and Time-Aware Scaled RoPE tailored for diffusion transformers. Additionally, we introduced a sigmoid time discretization schedule to reduce sampling steps in solving the Flow ODE and the Context Drop method to merge redundant visual tokens for faster network evaluation, effectively boosting the overall sampling speed. Thanks to these improvements, Lumina-Next not only improves the quality and efficiency of basic text-to-image generation but also demonstrates superior resolution extrapolation capabilities and multilingual generation using decoder-based LLMs as the text encoder, all in a zero-shot manner. To further validate Lumina-Next as a versatile generative framework, we instantiate it on diverse tasks including visual recognition, multi-view, audio, music, and point cloud generation, showcasing strong performance across these domains. By releasing all codes and model weights, we aim to advance the development of next-generation generative AI capable of universal modeling.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an…
▽ More
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Few-Shot Medical Image Segmentation with High-Fidelity Prototypes
Authors:
Song Tang,
Shaxu Yan,
Xiaozhi Qi,
Jianxin Gao,
Mao Ye,
Jianwei Zhang,
Xiatian Zhu
Abstract:
Few-shot Semantic Segmentation (FSS) aims to adapt a pretrained model to new classes with as few as a single labelled training sample per class. Despite the prototype based approaches have achieved substantial success, existing models are limited to the imaging scenarios with considerably distinct objects and not highly complex background, e.g., natural images. This makes such models suboptimal fo…
▽ More
Few-shot Semantic Segmentation (FSS) aims to adapt a pretrained model to new classes with as few as a single labelled training sample per class. Despite the prototype based approaches have achieved substantial success, existing models are limited to the imaging scenarios with considerably distinct objects and not highly complex background, e.g., natural images. This makes such models suboptimal for medical imaging with both conditions invalid. To address this problem, we propose a novel Detail Self-refined Prototype Network (DSPNet) to constructing high-fidelity prototypes representing the object foreground and the background more comprehensively. Specifically, to construct global semantics while maintaining the captured detail semantics, we learn the foreground prototypes by modelling the multi-modal structures with clustering and then fusing each in a channel-wise manner. Considering that the background often has no apparent semantic relation in the spatial dimensions, we integrate channel-specific structural information under sparse channel-aware regulation. Extensive experiments on three challenging medical image benchmarks show the superiority of DSPNet over previous state-of-the-art methods.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Probing many-body Bell correlation depth with superconducting qubits
Authors:
Ke Wang,
Weikang Li,
Shibo Xu,
Mengyao Hu,
Jiachen Chen,
Yaozu Wu,
Chuanyu Zhang,
Feitong Jin,
Xuhao Zhu,
Yu Gao,
Ziqi Tan,
Aosai Zhang,
Ning Wang,
Yiren Zou,
Tingting Li,
Fanhao Shen,
Jiarun Zhong,
Zehang Bao,
Zitian Zhu,
Zixuan Song,
Jinfeng Deng,
Hang Dong,
Xu Zhang,
Pengfei Zhang,
Wenjie Jiang
, et al. (10 additional authors not shown)
Abstract:
Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing…
▽ More
Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing to machine learning. Nevertheless, the detection of nonlocality, especially in quantum many-body systems, is notoriously challenging. Here, we report an experimental certification of genuine multipartite Bell correlations, which signal nonlocality in quantum many-body systems, up to 24 qubits with a fully programmable superconducting quantum processor. In particular, we employ energy as a Bell correlation witness and variationally decrease the energy of a many-body system across a hierarchy of thresholds, below which an increasing Bell correlation depth can be certified from experimental data. As an illustrating example, we variationally prepare the low-energy state of a two-dimensional honeycomb model with 73 qubits and certify its Bell correlations by measuring an energy that surpasses the corresponding classical bound with up to 48 standard deviations. In addition, we variationally prepare a sequence of low-energy states and certify their genuine multipartite Bell correlations up to 24 qubits via energies measured efficiently by parity oscillation and multiple quantum coherence techniques. Our results establish a viable approach for preparing and certifying multipartite Bell correlations, which provide not only a finer benchmark beyond entanglement for quantum devices, but also a valuable guide towards exploiting multipartite Bell correlation in a wide spectrum of practical applications.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (649 additional authors not shown)
Abstract:
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and…
▽ More
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and determine the branching fractions $\mathcal{B}(D_s^+\toπ^+π^+π^-π^0|_{{\rm non}-η})=(2.04\pm0.08_{\rm stat.}\pm0.05_{\rm syst.})\%$ and $\mathcal{B}(D_s^+\toηπ^+)=(1.56\pm0.09_{\rm stat.}\pm0.04_{\rm syst.})\%$. Moreover, we measure the relative branching fraction between $φ\toπ^+π^-π^0$ and $φ\to K^+K^-$ to be $\frac{\mathcal{B}(φ(1020) \to π^+π^-π^0)}{\mathcal{B}(φ(1020) \to K^+K^-)}=0.230 \pm 0.014_{\rm stat.} \pm 0.010_{\rm syst.}$, which deviates from the world average value by more than $4σ$.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Inception: Efficiently Computable Misinformation Attacks on Markov Games
Authors:
Jeremy McMahan,
Young Wu,
Yudong Chen,
Xiaojin Zhu,
Qiaomin Xie
Abstract:
We study security threats to Markov games due to information asymmetry and misinformation. We consider an attacker player who can spread misinformation about its reward function to influence the robust victim player's behavior. Given a fixed fake reward function, we derive the victim's policy under worst-case rationality and present polynomial-time algorithms to compute the attacker's optimal wors…
▽ More
We study security threats to Markov games due to information asymmetry and misinformation. We consider an attacker player who can spread misinformation about its reward function to influence the robust victim player's behavior. Given a fixed fake reward function, we derive the victim's policy under worst-case rationality and present polynomial-time algorithms to compute the attacker's optimal worst-case policy based on linear programming and backward induction. Then, we provide an efficient inception ("planting an idea in someone's mind") attack algorithm to find the optimal fake reward function within a restricted set of reward functions with dominant strategies. Importantly, our methods exploit the universal assumption of rationality to compute attacks efficiently. Thus, our work exposes a security vulnerability arising from standard game assumptions under misinformation.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Probing the nature of the $χ_{c1}(3872)$ state using radiative decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1094 additional authors not shown)
Abstract:
The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and…
▽ More
The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and the ratio of its partial width to that of the $χ_{c1}(3872)\rightarrow J/ψγ$ decay is measured to be $$ \frac{Γ_{χ_{c1}(3872)\rightarrow ψ(2S)γ}}
{Γ_{χ_{c1}(3872)\rightarrow J/ψγ}} = 1.67 \pm 0.21 \pm 0.12 \pm0.04 , $$ where the first uncertainty is statistical, the second systematic and the third is due to the uncertainties on the branching fractions of the $ψ(2S)$ and $J/ψ$ mesons. The measured ratio makes the interpretation of the $χ_{c1}(3872)$ state as a~pure $D^0\bar{D}^{*0}+\bar{D}^0D^{*0}$ molecule questionable and strongly indicates a sizeable compact charmonium or tetraquark component within the $χ_{c1}(3872)$ state.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Time-Domain Signatures of Distinct Correlated Insulators in a Moiré Superlattice
Authors:
Eric A. Arsenault,
Yiliu Li,
Birui Yang,
Takashi Taniguchi,
Kenji Watanabe,
James C. Hone,
Cory R. Dean,
Xiaodong Xu,
X. -Y. Zhu
Abstract:
Among expanding discoveries of quantum phases in moiré superlattices, correlated insulators stand out as both the most stable and most commonly observed. Despite the central importance of these states in moiré physics, little is known about their underlying nature. Here, we use pump-probe spectroscopy to show distinct time-domain signatures of correlated insulators at fillings of one (v = -1) and…
▽ More
Among expanding discoveries of quantum phases in moiré superlattices, correlated insulators stand out as both the most stable and most commonly observed. Despite the central importance of these states in moiré physics, little is known about their underlying nature. Here, we use pump-probe spectroscopy to show distinct time-domain signatures of correlated insulators at fillings of one (v = -1) and two (v = -2) holes per moiré unit cell in the angle-aligned WSe2/WS2 system. Following photo-doping, we find that the disordering time of the v = -1 state is independent of excitation density (n_ex), as expected from the characteristic phonon response time associated with a polaronic state. In contrast, the disordering time of the v = -2 state scales with (n_ex)^-0.5, in agreement with plasmonic screening from free holons and doublons. These states display disparate reordering behavior dominated either by first order (v = -1) or second order (v = -2) recombination, suggesting the presence of Hubbard excitons and free carrier-like holons/doublons, respectively. Our work delineates the roles of electron-phonon (e-ph) versus electron-electron (e-e) interactions in correlated insulators on the moiré landscape and establishes non-equilibrium responses as mechanistic signatures for distinguishing and discovering quantum phases.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction…
▽ More
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Relighting Scenes with Object Insertions in Neural Radiance Fields
Authors:
Xuening Zhu,
Renjiao Yi,
Xin Wen,
Chenyang Zhu,
Kai Xu
Abstract:
The insertion of objects into a scene and relighting are commonly utilized applications in augmented reality (AR). Previous methods focused on inserting virtual objects using CAD models or real objects from single-view images, resulting in highly limited AR application scenarios. We propose a novel NeRF-based pipeline for inserting object NeRFs into scene NeRFs, enabling novel view synthesis and r…
▽ More
The insertion of objects into a scene and relighting are commonly utilized applications in augmented reality (AR). Previous methods focused on inserting virtual objects using CAD models or real objects from single-view images, resulting in highly limited AR application scenarios. We propose a novel NeRF-based pipeline for inserting object NeRFs into scene NeRFs, enabling novel view synthesis and realistic relighting, supporting physical interactions like casting shadows onto each other, from two sets of images depicting the object and scene. The lighting environment is in a hybrid representation of Spherical Harmonics and Spherical Gaussians, representing both high- and low-frequency lighting components very well, and supporting non-Lambertian surfaces. Specifically, we leverage the benefits of volume rendering and introduce an innovative approach for efficient shadow rendering by comparing the depth maps between the camera view and the light source view and generating vivid soft shadows. The proposed method achieves realistic relighting effects in extensive experimental evaluations.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Two-Loop Spacelike Splitting Amplitude for N=4 Super-Yang-Mills Theory
Authors:
Johannes Henn,
Rourou Ma,
Yongqun Xu,
Kai Yan,
Yang Zhang,
Hua Xing Zhu
Abstract:
The study of collinear behavior for gauge theories in the spacelike region is of great phenomenological and theoretical importance. We analytically calculate the two-loop spacelike splitting amplitude for the full color N=4 Super-Yang-Mills theory. The result is derived by two complementary methods starting from the known amplitude: one is based on a discontinuity analysis, while the other one is…
▽ More
The study of collinear behavior for gauge theories in the spacelike region is of great phenomenological and theoretical importance. We analytically calculate the two-loop spacelike splitting amplitude for the full color N=4 Super-Yang-Mills theory. The result is derived by two complementary methods starting from the known amplitude: one is based on a discontinuity analysis, while the other one is based on analytic continuation. Our result explicitly shows terms that violate naive factorization. However we show that factorization is restored at the level of color-summed unpolarized squared amplitudes at next-to-next-to-next-to leading order. We conjecture that the two-loop tripole terms in the generalized splitting amplitudes in QCD are identical to what we obtain in N=4 super Yang-Mills theory.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Mitigating Social Biases in Language Models through Unlearning
Authors:
Omkar Dige,
Diljot Singh,
Tsz Fung Yau,
Qixuan Zhang,
Borna Bolandraftar,
Xiaodan Zhu,
Faiza Khan Khattak
Abstract:
Mitigating bias in language models (LMs) has become a critical problem due to the widespread deployment of LMs. Numerous approaches revolve around data pre-processing and fine-tuning of language models, tasks that can be both time-consuming and computationally demanding. Consequently, there is a growing interest in machine unlearning techniques given their capacity to induce the forgetting of unde…
▽ More
Mitigating bias in language models (LMs) has become a critical problem due to the widespread deployment of LMs. Numerous approaches revolve around data pre-processing and fine-tuning of language models, tasks that can be both time-consuming and computationally demanding. Consequently, there is a growing interest in machine unlearning techniques given their capacity to induce the forgetting of undesired behaviors of the existing pre-trained or fine-tuned models with lower computational cost. In this work, we explore two unlearning methods, (1) Partitioned Contrastive Gradient Unlearning (PCGU) applied on decoder models and (2) Negation via Task Vector, to reduce social biases in state-of-the-art and open-source LMs such as LLaMA-2 and OPT. We also implement distributed PCGU for large models. It is empirically shown, through quantitative and qualitative analyses, that negation via Task Vector method outperforms PCGU in debiasing with minimum deterioration in performance and perplexity of the models. On LLaMA-27B, negation via Task Vector reduces the bias score by 11.8%
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video
Authors:
Xiangming Zhu,
Huayu Deng,
Haochen Yuan,
Yunbo Wang,
Xiaokang Yang
Abstract:
We introduce latent intuitive physics, a transfer learning framework for physics simulation that can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes. Our key insight is to use latent features drawn from a learnable prior distribution conditioned on the underlying particle states to capture the invisible and complex physical properties. To ac…
▽ More
We introduce latent intuitive physics, a transfer learning framework for physics simulation that can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes. Our key insight is to use latent features drawn from a learnable prior distribution conditioned on the underlying particle states to capture the invisible and complex physical properties. To achieve this, we train a parametrized prior learner given visual observations to approximate the visual posterior of inverse graphics, and both the particle states and the visual posterior are obtained from a learned neural renderer. The converged prior learner is embedded in our probabilistic physics engine, allowing us to perform novel simulations on unseen geometries, boundaries, and dynamics without knowledge of the true physical parameters. We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation. Our model demonstrates strong performance in all three tasks.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?
Authors:
Zhouhong Gu,
Lin Zhang,
Xiaoxuan Zhu,
Jiangjie Chen,
Wenhao Huang,
Yikai Zhang,
Shusen Wang,
Zheyu Ye,
Yan Gao,
Hongwei Feng,
Yanghua Xiao
Abstract:
Detecting evidence within the context is a key step in the process of reasoning task. Evaluating and enhancing the capabilities of LLMs in evidence detection will strengthen context-based reasoning performance. This paper proposes a benchmark called DetectBench for verifying the ability to detect and piece together implicit evidence within a long context. DetectBench contains 3,928 multiple-choice…
▽ More
Detecting evidence within the context is a key step in the process of reasoning task. Evaluating and enhancing the capabilities of LLMs in evidence detection will strengthen context-based reasoning performance. This paper proposes a benchmark called DetectBench for verifying the ability to detect and piece together implicit evidence within a long context. DetectBench contains 3,928 multiple-choice questions, with an average of 994 tokens per question. Each question contains an average of 4.55 pieces of implicit evidence, and solving the problem typically requires 7.62 logical jumps to find the correct answer. To enhance the performance of LLMs in evidence detection, this paper proposes Detective Reasoning Prompt and Finetune. Experiments demonstrate that the existing LLMs' abilities to detect evidence in long contexts are far inferior to humans. However, the Detective Reasoning Prompt effectively enhances the capability of powerful LLMs in evidence detection, while the Finetuning method shows significant effects in enhancing the performance of weaker LLMs. Moreover, when the abilities of LLMs in evidence detection are improved, their final reasoning performance is also enhanced accordingly.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Diverse Responses in Lattice Thermal Conductivity of $n$-type/$p$-type Semiconductors Driven by Asymmetric Electron-Phonon Interactions
Authors:
Jianshi Sun,
Shouhang Li,
Zhen Tong,
Cheng Shao,
Han Xie,
Meng An,
Chuang Zhang,
Xiongfei Zhu,
Chen Huang,
Yucheng Xiong,
Xiangjun Liu
Abstract:
Accurately assessing the impact of electron-phonon interaction (EPI) on the lattice thermal conductivity of semiconductors is crucial for the thermal management of electronic devices and a unified physical understanding of this issue is highly desired. In this work, we predict the lattice thermal conductivities of typical direct and indirect bandgap semiconductors accounting for EPI based on mode-…
▽ More
Accurately assessing the impact of electron-phonon interaction (EPI) on the lattice thermal conductivity of semiconductors is crucial for the thermal management of electronic devices and a unified physical understanding of this issue is highly desired. In this work, we predict the lattice thermal conductivities of typical direct and indirect bandgap semiconductors accounting for EPI based on mode-level first-principles calculations. It is found that EPI has a larger effect on the lattice thermal conductivity of $p$-type doping compared to $n$-type doping in the same semiconductor at high charge carrier concentrations. The stronger EPI in $p$-type doping is attributed to the relatively higher electron density of states caused by the relatively larger $p$-orbital component. Furthermore, EPI has a stronger influence on the lattice thermal conductivity of $n$-type indirect bandgap semiconductors than $n$-type direct bandgap semiconductors. This is attributed to the relatively lower electron density of states in direct bandgap semiconductors stemming from the $s$-orbital component. This work reveals that there exist diverse responses in lattice thermal conductivity of $n$-type/$p$-type semiconductors, which can be attributed to asymmetric EPIs.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
ChatEMG: Synthetic Data Generation to Control a Robotic Hand Orthosis for Stroke
Authors:
Jingxi Xu,
Runsheng Wang,
Siqi Shang,
Ava Chen,
Lauren Winterbottom,
To-Liang Hsu,
Wenxi Chen,
Khondoker Ahmed,
Pedro Leandro La Rotta,
Xinyue Zhu,
Dawn M. Nilsen,
Joel Stein,
Matei Ciocarlie
Abstract:
Intent inferral on a hand orthosis for stroke patients is challenging due to the difficulty of data collection from impaired subjects. Additionally, EMG signals exhibit significant variations across different conditions, sessions, and subjects, making it hard for classifiers to generalize. Traditional approaches require a large labeled dataset from the new condition, session, or subject to train i…
▽ More
Intent inferral on a hand orthosis for stroke patients is challenging due to the difficulty of data collection from impaired subjects. Additionally, EMG signals exhibit significant variations across different conditions, sessions, and subjects, making it hard for classifiers to generalize. Traditional approaches require a large labeled dataset from the new condition, session, or subject to train intent classifiers; however, this data collection process is burdensome and time-consuming. In this paper, we propose ChatEMG, an autoregressive generative model that can generate synthetic EMG signals conditioned on prompts (i.e., a given sequence of EMG signals). ChatEMG enables us to collect only a small dataset from the new condition, session, or subject and expand it with synthetic samples conditioned on prompts from this new context. ChatEMG leverages a vast repository of previous data via generative training while still remaining context-specific via prompting. Our experiments show that these synthetic samples are classifier-agnostic and can improve intent inferral accuracy for different types of classifiers. We demonstrate that our complete approach can be integrated into a single patient session, including the use of the classifier for functional orthosis-assisted tasks. To the best of our knowledge, this is the first time an intent classifier trained partially on synthetic data has been deployed for functional control of an orthosis by a stroke survivor. Videos and additional information can be found at https://jxu.ai/chatemg.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Precision measurement of the $Ξ^-_b$ baryon lifetime
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1064 additional authors not shown)
Abstract:
A sample of $pp$ collision data, corresponding to an integrated luminosity of 5.5 fb$^{-1}$ and collected by the LHCb experiment during Run 2, is used to measure the ratio of the lifetime of the $Ξ^-_b$ baryon to that of the $Λ^0_b$ baryon, $r_τ\equivτ_{Ξ^-_b}/τ_{Λ^0_b}$. The value ${r_τ^{\rm Run\,2}=1.076\pm0.013\pm0.006}$ is obtained, where the first uncertainty is statistical and the second sys…
▽ More
A sample of $pp$ collision data, corresponding to an integrated luminosity of 5.5 fb$^{-1}$ and collected by the LHCb experiment during Run 2, is used to measure the ratio of the lifetime of the $Ξ^-_b$ baryon to that of the $Λ^0_b$ baryon, $r_τ\equivτ_{Ξ^-_b}/τ_{Λ^0_b}$. The value ${r_τ^{\rm Run\,2}=1.076\pm0.013\pm0.006}$ is obtained, where the first uncertainty is statistical and the second systematic. This value is averaged with the corresponding value from Run 1 to obtain ${r_τ^{\rm Run\,1,2} = 1.078\pm0.012\pm0.007}$. Multiplying by the world-average value of the $Λ^0_b$ lifetime yields $τ_{Ξ^-_b}^{\rm Run~1,2} = 1.578\pm0.018\pm0.010\pm0.011$ ps, where the uncertainties are statistical, systematic, and due to the limited knowledge of the $Λ^0_b$ lifetime. This measurement improves the precision of the current world average of the $Ξ^-_b$ lifetime by about a factor of two, and is in good agreement with the most recent theoretical predictions.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
When Box Meets Graph Neural Network in Tag-aware Recommendation
Authors:
Fake Lin,
Ziwei Zhao,
Xi Zhu,
Da Zhang,
Shitian Shen,
Xueying Li,
Tong Xu,
Suojuan Zhang,
Enhong Chen
Abstract:
Last year has witnessed the re-flourishment of tag-aware recommender systems supported by the LLM-enriched tags. Unfortunately, though large efforts have been made, current solutions may fail to describe the diversity and uncertainty inherent in user preferences with only tag-driven profiles. Recently, with the development of geometry-based techniques, e.g., box embedding, diversity of user prefer…
▽ More
Last year has witnessed the re-flourishment of tag-aware recommender systems supported by the LLM-enriched tags. Unfortunately, though large efforts have been made, current solutions may fail to describe the diversity and uncertainty inherent in user preferences with only tag-driven profiles. Recently, with the development of geometry-based techniques, e.g., box embedding, diversity of user preferences now could be fully modeled as the range within a box in high dimension space. However, defect still exists as these approaches are incapable of capturing high-order neighbor signals, i.e., semantic-rich multi-hop relations within the user-tag-item tripartite graph, which severely limits the effectiveness of user modeling. To deal with this challenge, in this paper, we propose a novel algorithm, called BoxGNN, to perform the message aggregation via combination of logical operations, thereby incorporating high-order signals. Specifically, we first embed users, items, and tags as hyper-boxes rather than simple points in the representation space, and define two logical operations to facilitate the subsequent process. Next, we perform the message aggregation mechanism via the combination of logical operations, to obtain the corresponding high-order box representations. Finally, we adopt a volume-based learning objective with Gumbel smoothing techniques to refine the representation of boxes. Extensive experiments on two publicly available datasets and one LLM-enhanced e-commerce dataset have validated the superiority of BoxGNN compared with various state-of-the-art baselines. The code is released online
△ Less
Submitted 17 June, 2024;
originally announced June 2024.