-
SoupLM: Model Integration in Large Language and Multi-Modal Models
Authors:
Yue Bai,
Zichen Zhang,
Jiasen Lu,
Yun Fu
Abstract:
Training large language models (LLMs) and multimodal LLMs necessitates significant computing resources, and existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks. For instance, LLaMA, Vicuna, and LLaVA are three LLM variants trained with LLaMA base models using very different training recipes, tasks, and data modalities. The traini…
▽ More
Training large language models (LLMs) and multimodal LLMs necessitates significant computing resources, and existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks. For instance, LLaMA, Vicuna, and LLaVA are three LLM variants trained with LLaMA base models using very different training recipes, tasks, and data modalities. The training cost and complexity for such LLM variants grow rapidly. In this study, we propose to use a soup strategy to assemble these LLM variants into a single well-generalized multimodal LLM (SoupLM) in a cost-efficient manner. Assembling these LLM variants efficiently brings knowledge and specialities trained from different domains and data modalities into an integrated one (e.g., chatbot speciality from user-shared conversations for Vicuna, and visual capacity from vision-language data for LLaVA), therefore, to avoid computing costs of repetitive training on several different domains. We propose series of soup strategies to systematically benchmark performance gains across various configurations, and probe the soup behavior across base models in the interpolation space.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Spectroastrometry and Reverberation Mapping (SARM) of Active Galactic Nuclei. I. The H$β$ Broad-line Region Structure and Black Hole Mass of Five Quasars
Authors:
Yan-Rong Li,
Chen Hu,
Zhu-Heng Yao,
Yong-Jie Chen,
Hua-Rui Bai,
Sen Yang,
Pu Du,
Feng-Na Fang,
Yi-Xin Fu,
Jun-Rong Liu,
Yue-Chang Peng,
Yu-Yang Songsheng,
Yi-Lin Wang,
Ming Xiao,
Shuo Zhai,
Hartmut Winkler,
Jin-Ming Bai,
Luis C. Ho,
Romain G. Petrov,
Jesus Aceituno,
Jian-Min Wang
Abstract:
We conduct a reverberation mapping (RM) campaign to spectroscopically monitor a sample of selected bright active galactic nuclei with large anticipated broad-line region (BLR) sizes adequate for spectroastrometric observations by the GRAVITY instrument on the Very Large Telescope Interferometer. We report the first results for five objects, IC 4329A, Mrk 335, Mrk 509, Mrk 1239, and PDS 456, among…
▽ More
We conduct a reverberation mapping (RM) campaign to spectroscopically monitor a sample of selected bright active galactic nuclei with large anticipated broad-line region (BLR) sizes adequate for spectroastrometric observations by the GRAVITY instrument on the Very Large Telescope Interferometer. We report the first results for five objects, IC 4329A, Mrk 335, Mrk 509, Mrk 1239, and PDS 456, among which Mrk 1239 and PDS 456 are for the first time spectroscopically monitored. We obtain multi-year monitoring data and perform multi-component spectral decomposition to extract the broad H$β$ profiles. We detect significant time lags between the H$β$ and continuum variations, generally obeying the previously established BLR size-luminosity relation. Velocity-resolved H$β$ time lags illustrate diverse, possibly evolving BLR kinematics. We further measure the H$β$ line widths from mean and rms spectra and the resulting virial products show good consistency among different seasons. Adopting a unity virial factor and the full width at half maximum of the broad H$β$ line from the mean spectrum as the measure of velocity, the obtained black hole mass averaged over seasons is $\log M_\bullet/M_\odot=8.02_{-0.14}^{+0.09}$, $6.92_{-0.12}^{+0.12}$, $8.01_{-0.25}^{+0.16}$, $7.44_{-0.14}^{+0.13}$, and $8.59_{-0.11}^{+0.07}$ for the five objects, respectively. The black hole mass estimations using other line width measures are also reported (up to the virial factors). For objects with previous RM campaigns, our mass estimates are in agreement with earlier results. In a companion paper, we will employ BLR dynamical modeling to directly infer the black hole mass and thereby determine the virial factors.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models
Authors:
Jinhua Zhang,
Hualian Sheng,
Sijia Cai,
Bing Deng,
Qiao Liang,
Wen Li,
Ying Fu,
Jieping Ye,
Shuhang Gu
Abstract:
Controllable generation is considered a potentially vital approach to address the challenge of annotating 3D data, and the precision of such controllable generation becomes particularly imperative in the context of data production for autonomous driving. Existing methods focus on the integration of diverse generative information into controlling inputs, utilizing frameworks such as GLIGEN or Contr…
▽ More
Controllable generation is considered a potentially vital approach to address the challenge of annotating 3D data, and the precision of such controllable generation becomes particularly imperative in the context of data production for autonomous driving. Existing methods focus on the integration of diverse generative information into controlling inputs, utilizing frameworks such as GLIGEN or ControlNet, to produce commendable outcomes in controllable generation. However, such approaches intrinsically restrict generation performance to the learning capacities of predefined network architectures. In this paper, we explore the integration of controlling information and introduce PerlDiff (Perspective-Layout Diffusion Models), a method for effective street view image generation that fully leverages perspective 3D geometric information. Our PerlDiff employs 3D geometric priors to guide the generation of street view images with precise object-level control within the network learning process, resulting in a more robust and controllable output. Moreover, it demonstrates superior controllability compared to alternative layout control methods. Empirical results justify that our PerlDiff markedly enhances the precision of generation on the NuScenes and KITTI datasets. Our codes and models are publicly available at https://github.com/LabShuHangGU/PerlDiff.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition
Authors:
Yanjie Cui,
Xiaohong Liu,
Jing Liang,
Yamin Fu
Abstract:
Electroencephalography (EEG), a medical imaging technique that captures scalp electrical activity of brain structures via electrodes, has been widely used in affective computing. The spatial domain of EEG is rich in affective information. However, few of the existing studies have simultaneously analyzed EEG signals from multiple perspectives of geometric and anatomical structures in spatial domain…
▽ More
Electroencephalography (EEG), a medical imaging technique that captures scalp electrical activity of brain structures via electrodes, has been widely used in affective computing. The spatial domain of EEG is rich in affective information. However, few of the existing studies have simultaneously analyzed EEG signals from multiple perspectives of geometric and anatomical structures in spatial domain. In this paper, we propose a multi-view Graph Transformer (MVGT) based on spatial relations, which integrates information from the temporal, frequency and spatial domains, including geometric and anatomical structures, so as to enhance the expressive power of the model comprehensively. We incorporate the spatial information of EEG channels into the model as encoding, thereby improving its ability to perceive the spatial structure of the channels. Meanwhile, experimental results based on publicly available datasets demonstrate that our proposed model outperforms state-of-the-art methods in recent years. In addition, the results also show that the MVGT could extract information from multiple domains and capture inter-channel relationships in EEG emotion recognition tasks effectively.
△ Less
Submitted 8 July, 2024; v1 submitted 3 July, 2024;
originally announced July 2024.
-
Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be…
▽ More
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Inducing superconductivity in quantum anomalous Hall regime
Authors:
Yu Huang,
Yu Fu,
Peng Zhang,
Kang L. Wang,
Qing Lin He
Abstract:
Interfacing the quantum anomalous Hall insulator with a conventional superconductor is known to be a promising manner for realizing a topological superconductor, which has been continuously pursued for years. Such a proximity route depends to a great extent on the control of the delicate interfacial coupling of the two constituents. However, a recent experiment reported the failure to reproduce su…
▽ More
Interfacing the quantum anomalous Hall insulator with a conventional superconductor is known to be a promising manner for realizing a topological superconductor, which has been continuously pursued for years. Such a proximity route depends to a great extent on the control of the delicate interfacial coupling of the two constituents. However, a recent experiment reported the failure to reproduce such a topological superconductor, which is ascribed to the negligence of the electrical short by the superconductor in the theoretical proposal. Here, we reproduce this topological superconductor with attention to the interface control. The resulted conductance matrix under a wide magnetic field range agrees with the fingerprint of this topological superconductor. This allows us to develop a phase diagram that unveils three regions parameterized by various coupling limits, which not only supports the feasibility to fabricate the topological superconductor by proximity but also fully explains the origin of the previous debate. The present work provides a comprehensible guide on fabricating the topological superconductor.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
HC-GLAD: Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly Detection
Authors:
Yali Fu,
Jindong Li,
Jiahong Liu,
Qianli Xing,
Qi Wang,
Irwin King
Abstract:
Unsupervised graph-level anomaly detection (UGAD) has garnered increasing attention in recent years due to its significance. However, most existing methods only rely on traditional graph neural networks to explore pairwise relationships but such kind of pairwise edges are not enough to describe multifaceted relationships involving anomaly. There is an emergency need to exploit node group informati…
▽ More
Unsupervised graph-level anomaly detection (UGAD) has garnered increasing attention in recent years due to its significance. However, most existing methods only rely on traditional graph neural networks to explore pairwise relationships but such kind of pairwise edges are not enough to describe multifaceted relationships involving anomaly. There is an emergency need to exploit node group information which plays a crucial role in UGAD. In addition, most previous works ignore the global underlying properties (e.g., hierarchy and power-law structure) which are common in real-world graph datasets and therefore are indispensable factors on UGAD task. In this paper, we propose a novel Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly Detection (HC-GLAD in short). To exploit node group connections, we construct hypergraphs based on gold motifs and subsequently perform hypergraph convolution. Furthermore, to preserve the hierarchy of real-world graphs, we introduce hyperbolic geometry into this field and conduct both graph and hypergraph embedding learning in hyperbolic space with hyperboloid model. To the best of our knowledge, this is the first work to simultaneously apply hypergraph with node group connections and hyperbolic geometry into this field. Extensive experiments on several real world datasets of different fields demonstrate the superiority of HC-GLAD on UGAD task. The code is available at https://github.com/Yali-F/HC-GLAD.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Investigating the Effects of Large-Scale Pseudo-Stereo Data and Different Speech Foundation Model on Dialogue Generative Spoken Language Model
Authors:
Yu-Kuan Fu,
Cheng-Kuang Lee,
Hsiu-Hsuan Wang,
Hung-yi Lee
Abstract:
Recent efforts in Spoken Dialogue Modeling aim to synthesize spoken dialogue without the need for direct transcription, thereby preserving the wealth of non-textual information inherent in speech. However, this approach faces a challenge when speakers talk simultaneously, requiring stereo dialogue data with speakers recorded on separate channels, a notably scarce resource. To address this, we have…
▽ More
Recent efforts in Spoken Dialogue Modeling aim to synthesize spoken dialogue without the need for direct transcription, thereby preserving the wealth of non-textual information inherent in speech. However, this approach faces a challenge when speakers talk simultaneously, requiring stereo dialogue data with speakers recorded on separate channels, a notably scarce resource. To address this, we have developed an innovative pipeline capable of transforming single-channel dialogue data into pseudo-stereo data. This expanded our training dataset from a mere 2,000 to an impressive 17,600 hours, significantly enriching the diversity and quality of the training examples available. The inclusion of this pseudo-stereo data has proven to be effective in improving the performance of spoken dialogue language models. Additionally, we explored the use of discrete units of different speech foundation models for spoken dialogue generation.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation
Authors:
Yongan Zhang,
Zhongzhi Yu,
Yonggan Fu,
Cheng Wan,
Yingyan Celine Lin
Abstract:
Large Language Models (LLMs) have recently shown promise in streamlining hardware design processes by encapsulating vast amounts of domain-specific data. In addition, they allow users to interact with the design processes through natural language instructions, thus making hardware design more accessible to developers. However, effectively leveraging LLMs in hardware design necessitates providing d…
▽ More
Large Language Models (LLMs) have recently shown promise in streamlining hardware design processes by encapsulating vast amounts of domain-specific data. In addition, they allow users to interact with the design processes through natural language instructions, thus making hardware design more accessible to developers. However, effectively leveraging LLMs in hardware design necessitates providing domain-specific data during inference (e.g., through in-context learning), fine-tuning, or pre-training. Unfortunately, existing publicly available hardware datasets are often limited in size, complexity, or detail, which hinders the effectiveness of LLMs in hardware design tasks. To address this issue, we first propose a set of criteria for creating high-quality hardware datasets that can effectively enhance LLM-assisted hardware design. Based on these criteria, we propose a Multi-Grained-Verilog (MG-Verilog) dataset, which encompasses descriptions at various levels of detail and corresponding code samples. To benefit the broader hardware design community, we have developed an open-source infrastructure that facilitates easy access, integration, and extension of the dataset to meet specific project needs. Furthermore, to fully exploit the potential of the MG-Verilog dataset, which varies in complexity and detail, we introduce a balanced fine-tuning scheme. This scheme serves as a unique use case to leverage the diverse levels of detail provided by the dataset. Extensive experiments demonstrate that the proposed dataset and fine-tuning scheme consistently improve the performance of LLMs in hardware design tasks.
△ Less
Submitted 3 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Data on the Move: Traffic-Oriented Data Trading Platform Powered by AI Agent with Common Sense
Authors:
Yi Yu,
Shengyue Yao,
Tianchen Zhou,
Yexuan Fu,
Jingru Yu,
Ding Wang,
Xuhong Wang,
Cen Chen,
Yilun Lin
Abstract:
In the digital era, data has become a pivotal asset, advancing technologies such as autonomous driving. Despite this, data trading faces challenges like the absence of robust pricing methods and the lack of trustworthy trading mechanisms. To address these challenges, we introduce a traffic-oriented data trading platform named Data on The Move (DTM), integrating traffic simulation, data trading, an…
▽ More
In the digital era, data has become a pivotal asset, advancing technologies such as autonomous driving. Despite this, data trading faces challenges like the absence of robust pricing methods and the lack of trustworthy trading mechanisms. To address these challenges, we introduce a traffic-oriented data trading platform named Data on The Move (DTM), integrating traffic simulation, data trading, and Artificial Intelligent (AI) agents. The DTM platform supports evident-based data value evaluation and AI-based trading mechanisms. Leveraging the common sense capabilities of Large Language Models (LLMs) to assess traffic state and data value, DTM can determine reasonable traffic data pricing through multi-round interaction and simulations. Moreover, DTM provides a pricing method validation by simulating traffic systems, multi-agent interactions, and the heterogeneity and irrational behaviors of individuals in the trading market. Within the DTM platform, entities such as connected vehicles and traffic light controllers could engage in information collecting, data pricing, trading, and decision-making. Simulation results demonstrate that our proposed AI agent-based pricing approach enhances data trading by offering rational prices, as evidenced by the observed improvement in traffic efficiency. This underscores the effectiveness and practical value of DTM, offering new perspectives for the evolution of data markets and smart cities. To the best of our knowledge, this is the first study employing LLMs in data pricing and a pioneering data trading practice in the field of intelligent vehicles and smart cities.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Multi-field quantum conferencing overcomes the network capacity limit
Authors:
Yuan-Mei Xie,
Yu-Shuo Lu,
Yao Fu,
Hua-Lei Yin,
Zeng-Bing Chen
Abstract:
Quantum conferencing enables multiple nodes within a quantum network to share a secure group key for private message broadcasting. The key rate, however, is limited by the repeaterless capacity to distribute multiparticle entangled states across the network. Currently, in the finite-size regime, no feasible schemes utilizing existing experimental techniques can overcome the fundamental rate-distan…
▽ More
Quantum conferencing enables multiple nodes within a quantum network to share a secure group key for private message broadcasting. The key rate, however, is limited by the repeaterless capacity to distribute multiparticle entangled states across the network. Currently, in the finite-size regime, no feasible schemes utilizing existing experimental techniques can overcome the fundamental rate-distance limit of quantum conferencing in quantum networks without repeaters. Here, we propose a practical, multi-field scheme that breaks this limit, involving virtually establishing Greenberger-Horne-Zeilinger states through post-measurement coincidence matching. This proposal features a measurement-device-independent characteristic and can directly scale to support any number of users. Simulations show that the fundamental limitation on the group key rate can be overcome in a reasonable running time of sending $10^{14}$ pulses. We predict that it offers an efficient design for long-distance broadcast communication in future quantum networks.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Frequency-resolved Raman Thermometry Analysis via a Multi-layer Heat Transfer Model for Bulk and Low-dimensional Materials
Authors:
Taocheng Yu,
Yilu Fu,
Chenguang Fu,
Tiejun Zhu,
Wee-Liat Ong
Abstract:
Raman thermometry is advantageous for measuring the thermal transport of low-dimensional materials due to its non-contact nature. Transient Raman methods have improved the accuracy of steady-state Raman thermometry by removing the need for accurate temperature calibration and laser absorption evaluation. However, current methods often resort to finite element analysis (FEA) to decipher the measure…
▽ More
Raman thermometry is advantageous for measuring the thermal transport of low-dimensional materials due to its non-contact nature. Transient Raman methods have improved the accuracy of steady-state Raman thermometry by removing the need for accurate temperature calibration and laser absorption evaluation. However, current methods often resort to finite element analysis (FEA) to decipher the measured signals. This step is time-consuming and impedes its ubiquitous adaptation. In this work, we replace the FEA by fitting the transient-state Raman signal to a three-dimensional (3D) analytical heat transfer model for measuring the thermal conductivity of two bulk layered materials [i.e., molybdenum disulfide (MoS2) and bismuth selenide (Bi2Se3) crystals] and the interfacial thermal conductance (h) of CVD-grown MoS2 and molybdenum di-selenide (MoSe2) on quartz (SiO2). Our measured results agree reasonably well with literature and theoretical calculations. We also performed a quantitative sensitivity analysis to give insights on how to improve the measurement sensitivity. Our work provides an efficient way to process the data of transient-based Raman thermometry for high throughput measurements.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Imaging of single barium atoms in a second matrix site in solid xenon for barium tagging in a $^{136}$Xe double beta decay experiment
Authors:
M. Yvaine,
D. Fairbank,
J. Soderstrom,
C. Taylor,
J. Stanley,
T. Walton,
C. Chambers,
A. Iverson,
W. Fairbank,
S. Al Kharusi,
A. Amy,
E. Angelico,
A. Anker,
I. J. Arnquist,
A. Atencio,
J. Bane,
V. Belov,
E. P. Bernard,
T. Bhatta,
A. Bolotnikov,
J. Breslin,
P. A. Breur,
J. P. Brodsky,
E. Brown,
T. Brunner
, et al. (112 additional authors not shown)
Abstract:
Neutrinoless double beta decay is one of the most sensitive probes for new physics beyond the Standard Model of particle physics. One of the isotopes under investigation is $^{136}$Xe, which would double beta decay into $^{136}$Ba. Detecting the single $^{136}$Ba daughter provides a sort of ultimate tool in the discrimination against backgrounds. Previous work demonstrated the ability to perform s…
▽ More
Neutrinoless double beta decay is one of the most sensitive probes for new physics beyond the Standard Model of particle physics. One of the isotopes under investigation is $^{136}$Xe, which would double beta decay into $^{136}$Ba. Detecting the single $^{136}$Ba daughter provides a sort of ultimate tool in the discrimination against backgrounds. Previous work demonstrated the ability to perform single atom imaging of Ba atoms in a single-vacancy site of a solid xenon matrix. In this paper, the effort to identify signal from individual barium atoms is extended to Ba atoms in a hexa-vacancy site in the matrix and is achieved despite increased photobleaching in this site. Abrupt fluorescence turn-off of a single Ba atom is also observed. Significant recovery of fluorescence signal lost through photobleaching is demonstrated upon annealing of Ba deposits in the Xe ice. Following annealing, it is observed that Ba atoms in the hexa-vacancy site exhibit antibleaching while Ba atoms in the tetra-vacancy site exhibit bleaching. This may be evidence for a matrix site transfer upon laser excitation. Our findings offer a path of continued research toward tagging of Ba daughters in all significant sites in solid xenon.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec…
▽ More
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
FFN: a Fine-grained Chinese-English Financial Domain Parallel Corpus
Authors:
Yuxin Fu,
Shijing Si,
Leyi Mai,
Xi-ang Li
Abstract:
Large Language Models (LLMs) have stunningly advanced the field of machine translation, though their effectiveness within the financial domain remains largely underexplored. To probe this issue, we constructed a fine-grained Chinese-English parallel corpus of financial news called FFN. We acquired financial news articles spanning between January 1st, 2014, to December 31, 2023, from mainstream med…
▽ More
Large Language Models (LLMs) have stunningly advanced the field of machine translation, though their effectiveness within the financial domain remains largely underexplored. To probe this issue, we constructed a fine-grained Chinese-English parallel corpus of financial news called FFN. We acquired financial news articles spanning between January 1st, 2014, to December 31, 2023, from mainstream media websites such as CNN, FOX, and China Daily. The dataset consists of 1,013 main text and 809 titles, all of which have been manually corrected. We measured the translation quality of two LLMs -- ChatGPT and ERNIE-bot, utilizing BLEU, TER and chrF scores as the evaluation metrics. For comparison, we also trained an OpenNMT model based on our dataset. We detail problems of LLMs and provide in-depth analysis, intending to stimulate further research and solutions in this largely uncharted territory. Our research underlines the need to optimize LLMs within the specific field of financial translation to ensure accuracy and quality.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Microscopic characteristics of SF6 partial discharge induced by a floating linear metal particle
Authors:
Zihao Feng,
Yuanyuan Jiang,
Liyang Zhang,
Zhigang Liu,
Kai Wang,
Xinxin Wang,
Xiaobing Zou,
Haiyun Luo,
Yangyang Fu
Abstract:
Direct current (DC) gas insulated transmission lines (GILs) have been widely used in power transmission, but might be threatened by partial discharge due to the presence of floating impurities (e.g., dust and metal particles) inside the sealed chamber. In this letter, by using a 2D fluid model we characterize the microscopic properties of the partial discharge induced by a floating linear metal pa…
▽ More
Direct current (DC) gas insulated transmission lines (GILs) have been widely used in power transmission, but might be threatened by partial discharge due to the presence of floating impurities (e.g., dust and metal particles) inside the sealed chamber. In this letter, by using a 2D fluid model we characterize the microscopic properties of the partial discharge induced by a floating linear metal particle in SF6 (both the discharge propagation and interaction between space charge and metal particle) under negative high voltage direct current (HVDC) conditions. Due to the strong electronegativity of SF6, the spatiotemporal distributions of the charged species (electrons, positive and negative ions), space charge, and reduced electric field are rather different from those in air. Notably, a negative ion region is observed around the top tip of the metal particle, and it plays an important role in the generation and propagation of primary and secondary streamers in SF6, which may lead to severe motion characteristics of the particle and aliasing of partial discharge signals. Additionally, we analyze the charging process and electric force reversal phenomenon, which may provide a more precise understanding of the underlying mechanisms of the firefly motion previously reported for DC GILs.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
EFCNet: Every Feature Counts for Small Medical Object Segmentation
Authors:
Lingjie Kong,
Qiaoling Wei,
Chengming Xu,
Han Chen,
Yanwei Fu
Abstract:
This paper explores the segmentation of very small medical objects with significant clinical value. While Convolutional Neural Networks (CNNs), particularly UNet-like models, and recent Transformers have shown substantial progress in image segmentation, our empirical findings reveal their poor performance in segmenting the small medical objects and lesions concerned in this paper. This limitation…
▽ More
This paper explores the segmentation of very small medical objects with significant clinical value. While Convolutional Neural Networks (CNNs), particularly UNet-like models, and recent Transformers have shown substantial progress in image segmentation, our empirical findings reveal their poor performance in segmenting the small medical objects and lesions concerned in this paper. This limitation may be attributed to information loss during their encoding and decoding process. In response to this challenge, we propose a novel model named EFCNet for small object segmentation in medical images. Our model incorporates two modules: the Cross-Stage Axial Attention Module (CSAA) and the Multi-Precision Supervision Module (MPS). These modules address information loss during encoding and decoding procedures, respectively. Specifically, CSAA integrates features from all stages of the encoder to adaptively learn suitable information needed in different decoding stages, thereby reducing information loss in the encoder. On the other hand, MPS introduces a novel multi-precision supervision mechanism to the decoder. This mechanism prioritizes attention to low-resolution features in the initial stages of the decoder, mitigating information loss caused by subsequent convolution and sampling processes and enhancing the model's global perception. We evaluate our model on two benchmark medical image datasets. The results demonstrate that EFCNet significantly outperforms previous segmentation methods designed for both medical and normal images.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an…
▽ More
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (649 additional authors not shown)
Abstract:
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and…
▽ More
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and determine the branching fractions $\mathcal{B}(D_s^+\toπ^+π^+π^-π^0|_{{\rm non}-η})=(2.04\pm0.08_{\rm stat.}\pm0.05_{\rm syst.})\%$ and $\mathcal{B}(D_s^+\toηπ^+)=(1.56\pm0.09_{\rm stat.}\pm0.04_{\rm syst.})\%$. Moreover, we measure the relative branching fraction between $φ\toπ^+π^-π^0$ and $φ\to K^+K^-$ to be $\frac{\mathcal{B}(φ(1020) \to π^+π^-π^0)}{\mathcal{B}(φ(1020) \to K^+K^-)}=0.230 \pm 0.014_{\rm stat.} \pm 0.010_{\rm syst.}$, which deviates from the world average value by more than $4σ$.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Efficient source-independent quantum conference key agreement
Authors:
Yu Bao,
Yi-Ran Xiao,
Yu-Chen Song,
Yao Fu,
Xiao-Yu Cao,
Hua-Lei Yin,
Zeng-Bing Chen
Abstract:
Quantum conference key agreement (QCKA) enables the unconditional secure distribution of conference keys among multiple participants. Due to challenges in high-fidelity preparation and long-distance distribution of multi-photon entanglement, entanglement-based QCKA is facing severe limitations in both key rate and scalability. Here, we propose a source-independent QCKA scheme utilizing the post-ma…
▽ More
Quantum conference key agreement (QCKA) enables the unconditional secure distribution of conference keys among multiple participants. Due to challenges in high-fidelity preparation and long-distance distribution of multi-photon entanglement, entanglement-based QCKA is facing severe limitations in both key rate and scalability. Here, we propose a source-independent QCKA scheme utilizing the post-matching method, feasible within the entangled photon pair distribution network. We introduce an equivalent distributing virtual multi-photon entanglement protocol for providing the unconditional security proof even in the case of coherent attacks. For the symmetry star-network, comparing with previous $n$-photon entanglement protocol, the conference key rate is improved from $O(η^{n})$ to $O(η^{2})$, where $η$ is the transmittance from the entanglement source to one participant. Simulation results show that the performance of our protocol has multiple orders of magnitude advantages in the intercity distance. We anticipate that our approach will demonstrate its potential in the implementation of quantum networks.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
TRAWL: Tensor Reduced and Approximated Weights for Large Language Models
Authors:
Yiran Luo,
Het Patel,
Yu Fu,
Dawon Ahn,
Jia Chen,
Yue Dong,
Evangelos E. Papalexakis
Abstract:
Large language models (LLMs) have fundamentally transformed artificial intelligence, catalyzing recent advancements while imposing substantial environmental and computational burdens. We introduce TRAWL (Tensor Reduced and Approximated Weights for Large Language Models), a novel methodology for optimizing LLMs through tensor decomposition. TRAWL leverages diverse strategies to exploit matrices wit…
▽ More
Large language models (LLMs) have fundamentally transformed artificial intelligence, catalyzing recent advancements while imposing substantial environmental and computational burdens. We introduce TRAWL (Tensor Reduced and Approximated Weights for Large Language Models), a novel methodology for optimizing LLMs through tensor decomposition. TRAWL leverages diverse strategies to exploit matrices within transformer-based architectures, realizing notable performance enhancements without necessitating retraining. The most significant improvements were observed through a layer-by-layer intervention strategy, particularly when applied to fully connected weights of the final layers, yielding up to 16% enhancement in accuracy without the need for additional data or fine-tuning. These results underscore the importance of targeted and adaptive techniques in increasing the efficiency and effectiveness of large language model optimization, thereby promoting the development of more sustainable and accessible AI systems.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns for Traffic Speed Prediction
Authors:
Yicheng Zhou,
Pengfei Wang,
Hao Dong,
Denghui Zhang,
Dingqi Yang,
Yanjie Fu,
Pengyang Wang
Abstract:
Urban traffic speed prediction aims to estimate the future traffic speed for improving urban transportation services. Enormous efforts have been made to exploit Graph Neural Networks (GNNs) for modeling spatial correlations and temporal dependencies of traffic speed evolving patterns, regularized by graph topology.While achieving promising results, current traffic speed prediction methods still su…
▽ More
Urban traffic speed prediction aims to estimate the future traffic speed for improving urban transportation services. Enormous efforts have been made to exploit Graph Neural Networks (GNNs) for modeling spatial correlations and temporal dependencies of traffic speed evolving patterns, regularized by graph topology.While achieving promising results, current traffic speed prediction methods still suffer from ignoring topology-free patterns, which cannot be captured by GNNs. To tackle this challenge, we propose a generic model for enabling the current GNN-based methods to preserve topology-free patterns. Specifically, we first develop a Dual Cross-Scale Transformer (DCST) architecture, including a Spatial Transformer and a Temporal Transformer, to preserve the cross-scale topology-free patterns and associated dynamics, respectively. Then, to further integrate both topology-regularized/-free patterns, we propose a distillation-style learning framework, in which the existing GNN-based methods are considered as the teacher model, and the proposed DCST architecture is considered as the student model. The teacher model would inject the learned topology-regularized patterns into the student model for integrating topology-free patterns. The extensive experimental results demonstrated the effectiveness of our methods.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Odd Dipole Screening in Radial Inflation
Authors:
Yang Fu,
H. George E. Hentschel,
Pawandeep Kaur,
Avanish Kumar,
Itamar Procaccia
Abstract:
The inflation of an inner radial (or spherical) cavity in an amorphous solids confined in a disk (or a sphere), served as a fruitful case model for studying the effects of plastic deformations on the mechanical response. It was shown that when the field associated with Eshelby quadrupolar charges is non-uniform, the displacement field is riddled with dipole charges that screen elasticity, reminisc…
▽ More
The inflation of an inner radial (or spherical) cavity in an amorphous solids confined in a disk (or a sphere), served as a fruitful case model for studying the effects of plastic deformations on the mechanical response. It was shown that when the field associated with Eshelby quadrupolar charges is non-uniform, the displacement field is riddled with dipole charges that screen elasticity, reminiscent of Debye monopoles screening in electrostatics. In this paper we look deeper into the screening phenomenon, taking into account the consequences of irreversibility that are associated with the breaking of Chiral symmetry. We consider the equations for the displacement field with the presence of "Odd Dipole Screening", solve them analytically and compare with numerical simulations. Suggestions how to test the theory in experiments are provided.
△ Less
Submitted 27 June, 2024; v1 submitted 23 June, 2024;
originally announced June 2024.
-
Repeater-Like Asynchronous Measurement-Device-Independent Quantum Conference Key Agreement
Authors:
Yu-Shuo Lu,
Yuan-Mei Xie,
Yao Fu,
Hua-Lei Yin,
Zeng-Bing Chen
Abstract:
Quantum conference key agreement facilitates secure communication among multiple parties through multipartite entanglement and is anticipated to be an important cryptographic primitive for future quantum networks. However, the experimental complexity and low efficiency associated with the synchronous detection of multipartite entangled states have significantly hindered their practical application…
▽ More
Quantum conference key agreement facilitates secure communication among multiple parties through multipartite entanglement and is anticipated to be an important cryptographic primitive for future quantum networks. However, the experimental complexity and low efficiency associated with the synchronous detection of multipartite entangled states have significantly hindered their practical application. In this work, we propose a measurement-device-independent conference key agreement protocol that utilizes asynchronous Greenberger-Horne-Zeilinger state measurement.This approach achieves a linear scaling of the conference key rate among multiple parties, exhibiting performance similar to that of the single-repeater scheme in quantum networks. The asynchronous measurement strategy bypasses the need for complex global phase locking technologies, concurrently extending the intercity transmission distance with composable security in the finite key regime. Additionally, our work also showcases the advantages of the asynchronous pairing concept in multiparty quantum entanglement.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration
Authors:
Zhongzhi Yu,
Zheng Wang,
Yonggan Fu,
Huihong Shi,
Khalid Shaikh,
Yingyan Celine Lin
Abstract:
Attention is a fundamental component behind the remarkable achievements of large language models (LLMs). However, our current understanding of the attention mechanism, especially regarding how attention distributions are established, remains limited. Inspired by recent studies that explore the presence of attention sink in the initial token, which receives disproportionately large attention scores…
▽ More
Attention is a fundamental component behind the remarkable achievements of large language models (LLMs). However, our current understanding of the attention mechanism, especially regarding how attention distributions are established, remains limited. Inspired by recent studies that explore the presence of attention sink in the initial token, which receives disproportionately large attention scores despite their lack of semantic importance, this work delves deeper into this phenomenon. We aim to provide a more profound understanding of the existence of attention sinks within LLMs and to uncover ways to enhance the achievable accuracy of LLMs by directly optimizing the attention distributions, without the need for weight finetuning. Specifically, this work begins with comprehensive visualizations of the attention distributions in LLMs during inference across various inputs and tasks. Based on these visualizations, to the best of our knowledge, we are the first to discover that (1) attention sinks occur not only at the start of sequences but also within later tokens of the input, and (2) not all attention sinks have a positive impact on the achievable accuracy of LLMs. Building upon our findings, we propose a training-free Attention Calibration Technique (ACT) that automatically optimizes the attention distributions on the fly during inference in an input-adaptive manner. Extensive experiments validate that ACT consistently enhances the accuracy of various LLMs across different applications. Specifically, ACT achieves an average improvement of up to 7.30% in accuracy across different datasets when applied to Llama-30B. Our code is available at https://github.com/GATECH-EIC/ACT.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction…
▽ More
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models
Authors:
Zhawnen Chen,
Tianchun Wang,
Yizhou Wang,
Michal Kosinski,
Xiang Zhang,
Yun Fu,
Sheng Li
Abstract:
Can large multimodal models have a human-like ability for emotional and social reasoning, and if so, how does it work? Recent research has discovered emergent theory-of-mind (ToM) reasoning capabilities in large language models (LLMs). LLMs can reason about people's mental states by solving various text-based ToM tasks that ask questions about the actors' ToM (e.g., human belief, desire, intention…
▽ More
Can large multimodal models have a human-like ability for emotional and social reasoning, and if so, how does it work? Recent research has discovered emergent theory-of-mind (ToM) reasoning capabilities in large language models (LLMs). LLMs can reason about people's mental states by solving various text-based ToM tasks that ask questions about the actors' ToM (e.g., human belief, desire, intention). However, human reasoning in the wild is often grounded in dynamic scenes across time. Thus, we consider videos a new medium for examining spatio-temporal ToM reasoning ability. Specifically, we ask explicit probing questions about videos with abundant social and emotional reasoning content. We develop a pipeline for multimodal LLM for ToM reasoning using video and text. We also enable explicit ToM reasoning by retrieving key frames for answering a ToM question, which reveals how multimodal LLMs reason about ToM.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Jogging the Memory of Unlearned Model Through Targeted Relearning Attack
Authors:
Shengyuan Hu,
Yiwei Fu,
Zhiwei Steven Wu,
Virginia Smith
Abstract:
Machine unlearning is a promising approach to mitigate undesirable memorization of training data in ML models. However, in this work we show that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of targeted relearning attacks. With access to only a small and potentially loosely related set of data, we find that we can 'jog' the memory of unlearned models to r…
▽ More
Machine unlearning is a promising approach to mitigate undesirable memorization of training data in ML models. However, in this work we show that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of targeted relearning attacks. With access to only a small and potentially loosely related set of data, we find that we can 'jog' the memory of unlearned models to reverse the effects of unlearning. We formalize this unlearning-relearning pipeline, explore the attack across three popular unlearning benchmarks, and discuss future directions and guidelines that result from our study.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Towards Adaptive Neighborhood for Advancing Temporal Interaction Graph Modeling
Authors:
Siwei Zhang,
Xi Chen,
Yun Xiong,
Xixi Wu,
Yao Zhang,
Yongrui Fu,
Yinglong Zhao,
Jiawei Zhang
Abstract:
Temporal Graph Networks (TGNs) have demonstrated their remarkable performance in modeling temporal interaction graphs. These works can generate temporal node representations by encoding the surrounding neighborhoods for the target node. However, an inherent limitation of existing TGNs is their reliance on fixed, hand-crafted rules for neighborhood encoding, overlooking the necessity for an adaptiv…
▽ More
Temporal Graph Networks (TGNs) have demonstrated their remarkable performance in modeling temporal interaction graphs. These works can generate temporal node representations by encoding the surrounding neighborhoods for the target node. However, an inherent limitation of existing TGNs is their reliance on fixed, hand-crafted rules for neighborhood encoding, overlooking the necessity for an adaptive and learnable neighborhood that can accommodate both personalization and temporal evolution across different timestamps. In this paper, we aim to enhance existing TGNs by introducing an adaptive neighborhood encoding mechanism. We present SEAN, a flexible plug-and-play model that can be seamlessly integrated with existing TGNs, effectively boosting their performance. To achieve this, we decompose the adaptive neighborhood encoding process into two phases: (i) representative neighbor selection, and (ii) temporal-aware neighborhood information aggregation. Specifically, we propose the Representative Neighbor Selector component, which automatically pinpoints the most important neighbors for the target node. It offers a tailored understanding of each node's unique surrounding context, facilitating personalization. Subsequently, we propose a Temporal-aware Aggregator, which synthesizes neighborhood aggregation by selectively determining the utilization of aggregation routes and decaying the outdated information, allowing our model to adaptively leverage both the contextually significant and current information during aggregation. We conduct extensive experiments by integrating SEAN into three representative TGNs, evaluating their performance on four public datasets and one financial benchmark dataset introduced in this paper. The results demonstrate that SEAN consistently leads to performance improvements across all models, achieving SOTA performance and exceptional robustness.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection
Authors:
Lingjie Kong,
Kai Wu,
Xiaobin Hu,
Wenhui Han,
Jinlong Peng,
Chengming Xu,
Donghao Luo,
Jiangning Zhang,
Chengjie Wang,
Yanwei Fu
Abstract:
Text-to-image based object customization, aiming to generate images with the same identity (ID) as objects of interest in accordance with text prompts and reference images, has made significant progress. However, recent customizing research is dominated by specialized tasks, such as human customization or virtual try-on, leaving a gap in general object customization. To this end, we introduce AnyM…
▽ More
Text-to-image based object customization, aiming to generate images with the same identity (ID) as objects of interest in accordance with text prompts and reference images, has made significant progress. However, recent customizing research is dominated by specialized tasks, such as human customization or virtual try-on, leaving a gap in general object customization. To this end, we introduce AnyMaker, an innovative zero-shot object customization framework capable of generating general objects with high ID fidelity and flexible text editability. The efficacy of AnyMaker stems from its novel general ID extraction, dual-level ID injection, and ID-aware decoupling. Specifically, the general ID extraction module extracts sufficient ID information with an ensemble of self-supervised models to tackle the diverse customization tasks for general objects. Then, to provide the diffusion UNet with the extracted ID as much while not damaging the text editability in the generation process, we design a global-local dual-level ID injection module, in which the global-level semantic ID is injected into text descriptions while the local-level ID details are injected directly into the model through newly added cross-attention modules. In addition, we propose an ID-aware decoupling module to disentangle ID-related information from non-ID elements in the extracted representations for high-fidelity generation of both identity and text descriptions. To validate our approach and boost the research of general object customization, we create the first large-scale general ID dataset, Multi-Category ID-Consistent (MC-IDC) dataset, with 315k text-image samples and 10k categories. Experiments show that AnyMaker presents remarkable performance in general object customization and outperforms specialized methods in corresponding tasks. Code and dataset will be released soon.
△ Less
Submitted 5 July, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Technique Report of CVPR 2024 PBDL Challenges
Authors:
Ying Fu,
Yu Li,
Shaodi You,
Boxin Shi,
Linwei Chen,
Yunhao Zou,
Zichun Wang,
Yichen Li,
Yuze Han,
Yingkai Zhang,
Jianan Wang,
Qinglin Liu,
Wei Yu,
Xiaoqian Lv,
Jianing Li,
Shengping Zhang,
Xiangyang Ji,
Yuanpei Chen,
Yuhan Zhang,
Weihang Peng,
Liwen Zhang,
Zhe Xu,
Dingyong Gou,
Cong Li,
Senyan Xu
, et al. (75 additional authors not shown)
Abstract:
The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a…
▽ More
The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches.
△ Less
Submitted 12 July, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.
-
Radial Projections in $\mathbb{R}^n$ Revisited
Authors:
Paige Bright,
Yuqiu Fu,
Kevin Ren
Abstract:
We generalize the recent results on radial projections by Orponen, Shmerkin, Wang using two different methods. In particular, we show that given $X,Y\subset \mathbb{R}^n$ Borel sets and $X\neq \emptyset$. If $\dim Y \in (k,k+1]$ for some $k\in \{1,\dots, n-1\}$, then \[ \sup_{x\in X} \dim π_x(Y\setminus \{x\}) \geq \min \{\dim X + \dim Y - k, k\}. \] Our results give a new approach to solving a co…
▽ More
We generalize the recent results on radial projections by Orponen, Shmerkin, Wang using two different methods. In particular, we show that given $X,Y\subset \mathbb{R}^n$ Borel sets and $X\neq \emptyset$. If $\dim Y \in (k,k+1]$ for some $k\in \{1,\dots, n-1\}$, then \[ \sup_{x\in X} \dim π_x(Y\setminus \{x\}) \geq \min \{\dim X + \dim Y - k, k\}. \] Our results give a new approach to solving a conjecture of Lund-Pham-Thu in all dimensions and for all ranges of $\dim Y$.
The first of our two methods for proving the above theorem is shorter, utilizing a result of the first author and Gan. Our second method, though longer, follows the original methodology of Orponen--Shmerkin--Wang, and requires a higher dimensional incidence estimate and a dual Furstenberg-set estimate for lines. These new estimates may be of independent interest.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the…
▽ More
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
Authors:
Ke Fan,
Zechen Bai,
Tianjun Xiao,
Tong He,
Max Horn,
Yanwei Fu,
Francesco Locatello,
Zheng Zhang
Abstract:
Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot attention, which utilizes attention mechanisms to iteratively refine slot representations. However, a major drawback of most object-centric models, including slot…
▽ More
Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot attention, which utilizes attention mechanisms to iteratively refine slot representations. However, a major drawback of most object-centric models, including slot attention, is their reliance on predefining the number of slots. This not only necessitates prior knowledge of the dataset but also overlooks the inherent variability in the number of objects present in each instance. To overcome this fundamental limitation, we present a novel complexity-aware object auto-encoder framework. Within this framework, we introduce an adaptive slot attention (AdaSlot) mechanism that dynamically determines the optimal number of slots based on the content of the data. This is achieved by proposing a discrete slot sampling module that is responsible for selecting an appropriate number of slots from a candidate list. Furthermore, we introduce a masked slot decoder that suppresses unselected slots during the decoding process. Our framework, tested extensively on object discovery tasks with various datasets, shows performance matching or exceeding top fixed-slot models. Moreover, our analysis substantiates that our method exhibits the capability to dynamically adapt the slot number according to each instance's complexity, offering the potential for further exploration in slot attention research. Project will be available at https://kfan21.github.io/AdaSlot/
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Blind Super-Resolution via Meta-learning and Markov Chain Monte Carlo Simulation
Authors:
Jingyuan Xia,
Zhixiong Yang,
Shengxi Li,
Shuanghui Zhang,
Yaowen Fu,
Deniz Gündüz,
Xiang Li
Abstract:
Learning-based approaches have witnessed great successes in blind single image super-resolution (SISR) tasks, however, handcrafted kernel priors and learning based kernel priors are typically required. In this paper, we propose a Meta-learning and Markov Chain Monte Carlo (MCMC) based SISR approach to learn kernel priors from organized randomness. In concrete, a lightweight network is adopted as k…
▽ More
Learning-based approaches have witnessed great successes in blind single image super-resolution (SISR) tasks, however, handcrafted kernel priors and learning based kernel priors are typically required. In this paper, we propose a Meta-learning and Markov Chain Monte Carlo (MCMC) based SISR approach to learn kernel priors from organized randomness. In concrete, a lightweight network is adopted as kernel generator, and is optimized via learning from the MCMC simulation on random Gaussian distributions. This procedure provides an approximation for the rational blur kernel, and introduces a network-level Langevin dynamics into SISR optimization processes, which contributes to preventing bad local optimal solutions for kernel estimation. Meanwhile, a meta-learning-based alternating optimization procedure is proposed to optimize the kernel generator and image restorer, respectively. In contrast to the conventional alternating minimization strategy, a meta-learning-based framework is applied to learn an adaptive optimization strategy, which is less-greedy and results in better convergence performance. These two procedures are iteratively processed in a plug-and-play fashion, for the first time, realizing a learning-based but plug-and-play blind SISR solution in unsupervised inference. Extensive simulations demonstrate the superior performance and generalization ability of the proposed approach when comparing with state-of-the-arts on synthesis and real-world datasets. The code is available at https://github.com/XYLGroup/MLMC.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Authors:
Xiaoshuai Song,
Muxi Diao,
Guanting Dong,
Zhengyang Wang,
Yujia Fu,
Runqi Qiao,
Zhexu Wang,
Dayuan Fu,
Huangxuan Wu,
Bin Liang,
Weihao Zeng,
Yejie Wang,
Zhuoma GongQue,
Jianing Yu,
Qiuna Tan,
Weiran Xu
Abstract:
Computer Science (CS) stands as a testament to the intricacies of human intelligence, profoundly advancing the development of artificial intelligence and modern society. However, the current community of large language models (LLMs) overly focuses on benchmarks for analyzing specific foundational skills (e.g. mathematics and code generation), neglecting an all-round evaluation of the computer scie…
▽ More
Computer Science (CS) stands as a testament to the intricacies of human intelligence, profoundly advancing the development of artificial intelligence and modern society. However, the current community of large language models (LLMs) overly focuses on benchmarks for analyzing specific foundational skills (e.g. mathematics and code generation), neglecting an all-round evaluation of the computer science field. To bridge this gap, we introduce CS-Bench, the first bilingual (Chinese-English) benchmark dedicated to evaluating the performance of LLMs in computer science. CS-Bench comprises approximately 5K meticulously curated test samples, covering 26 subfields across 4 key areas of computer science, encompassing various task forms and divisions of knowledge and reasoning. Utilizing CS-Bench, we conduct a comprehensive evaluation of over 30 mainstream LLMs, revealing the relationship between CS performance and model scales. We also quantitatively analyze the reasons for failures in existing LLMs and highlight directions for improvements, including knowledge supplementation and CS-specific reasoning. Further cross-capability experiments show a high correlation between LLMs' capabilities in computer science and their abilities in mathematics and coding. Moreover, expert LLMs specialized in mathematics and coding also demonstrate strong performances in several CS subfields. Looking ahead, we envision CS-Bench serving as a cornerstone for LLM applications in the CS field and paving new avenues in assessing LLMs' diverse reasoning capabilities. The CS-Bench data and evaluation code are available at https://github.com/csbench/csbench.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Macroscopic Tunneling Probe of Moiré Spin Textures in Twisted CrI$_3$
Authors:
Bowen Yang,
Tarun Patel,
Meixin Cheng,
Kostyantyn Pichugin,
Lin Tian,
Nachiket Sherlekar,
Shaohua Yan,
Yang Fu,
Shangjie Tian,
Hechang Lei,
Michael E. Reimer,
Junichi Okamoto,
Adam W. Tsen
Abstract:
Various noncollinear spin textures and magnetic phases have been predicted in twisted two-dimensional CrI$_3$ due to competing ferromagnetic (FM) and antiferromagnetic (AFM) interlayer exchange from moiré stacking - with potential spintronic applications even when the underlying material possesses a negligible Dzyaloshinskii-Moriya or dipole-dipole interaction. Recent measurements have shown evide…
▽ More
Various noncollinear spin textures and magnetic phases have been predicted in twisted two-dimensional CrI$_3$ due to competing ferromagnetic (FM) and antiferromagnetic (AFM) interlayer exchange from moiré stacking - with potential spintronic applications even when the underlying material possesses a negligible Dzyaloshinskii-Moriya or dipole-dipole interaction. Recent measurements have shown evidence of coexisting FM and AFM layer order in small-twist-angle CrI$_3$ bilayers and double bilayers. Yet, the nature of the magnetic textures remains unresolved and possibilities for their manipulation and electrical readout are unexplored. Here, we use tunneling magnetoresistance to investigate the collective spin states of twisted double-bilayer CrI$_3$ under both out-of-plane and in-plane magnetic fields together with detailed micromagnetic simulations of domain dynamics based on magnetic circular dichroism. Our results capture hysteretic and anisotropic field evolutions of the magnetic states and we further uncover two distinct non-volatile spin textures (out-of-plane and in-plane domains) at $\approx$ 1° twist angle, with a different global tunneling resistance that can be switched by magnetic field.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
ProTrain: Efficient LLM Training via Memory-Aware Techniques
Authors:
Hanmei Yang,
Jin Zhou,
Yao Fu,
Xiaoqun Wang,
Ramine Roane,
Hui Guan,
Tongping Liu
Abstract:
It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem, existing work exploits the combination of CPU and GPU for the training process, such as ZeRO-Offload. Such a technique largely democratizes billion-scale model training, making it possible to train with few consumer graphics cards. However, based on our observation, existing frameworks often provide coarse-g…
▽ More
It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem, existing work exploits the combination of CPU and GPU for the training process, such as ZeRO-Offload. Such a technique largely democratizes billion-scale model training, making it possible to train with few consumer graphics cards. However, based on our observation, existing frameworks often provide coarse-grained memory management and require experienced experts in configuration tuning, leading to suboptimal hardware utilization and performance. This paper proposes ProTrain, a novel training system that intelligently balances memory usage and performance by coordinating memory, computation, and IO. ProTrain achieves adaptive memory management through Chunk-Based Model State Management and Block-Wise Activation Management, guided by a Memory-Aware Runtime Profiler without user intervention. ProTrain does not change the training algorithm and thus does not compromise accuracy. Experiments show that ProTrain improves training throughput by 1.43$\times$ to 2.71$\times$ compared to the SOTA training systems.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (636 additional authors not shown)
Abstract:
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur…
▽ More
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measured in both destructive and constructive interference scenarios for the first time. The mass and width of the $η_{c}(1S)$ are measured to be $M=(2984.14 \pm 0.13 \pm 0.38)$ MeV/$c^{2}$ and $Γ=(28.82 \pm 0.11 \pm 0.82)$ MeV, respectively. Clear signals for the decays of the $χ_{cJ}(J=0,1,2)$ and the $η_{c}(2S)$ to $2(π^{+}π^{-})η$ are also observed for the first time, and the corresponding branching fractions are measured. The ratio of the branching fractions between the $η_{c}(2S)$ and $η_{c}(1S)$ decays is significantly lower than the theoretical prediction, which might suggest different dynamics in their decays.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Authors:
Haoran You,
Yichao Fu,
Zheng Wang,
Amir Yazdanbakhsh,
Yingyan,
Lin
Abstract:
Autoregressive Large Language Models (LLMs) have achieved impressive performance in language tasks but face two significant bottlenecks: (1) quadratic complexity in the attention module as the number of tokens increases, and (2) limited efficiency due to the sequential processing nature of autoregressive LLMs during generation. While linear attention and speculative decoding offer potential soluti…
▽ More
Autoregressive Large Language Models (LLMs) have achieved impressive performance in language tasks but face two significant bottlenecks: (1) quadratic complexity in the attention module as the number of tokens increases, and (2) limited efficiency due to the sequential processing nature of autoregressive LLMs during generation. While linear attention and speculative decoding offer potential solutions, their applicability and synergistic potential for enhancing autoregressive LLMs remain uncertain. We conduct the first comprehensive study on the efficacy of existing linear attention methods for autoregressive LLMs, integrating them with speculative decoding. We introduce an augmentation technique for linear attention that ensures compatibility with speculative decoding, enabling more efficient training and serving of LLMs. Extensive experiments and ablation studies involving seven existing linear attention models and five encoder/decoder-based LLMs consistently validate the effectiveness of our augmented linearized LLMs. Notably, our approach achieves up to a 6.67 reduction in perplexity on the LLaMA model and up to a 2$\times$ speedup during generation compared to prior linear attention methods. Codes and models are available at https://github.com/GATECH-EIC/Linearized-LLM.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Additive engineering for Sb$_2$S$_3$ indoor photovoltaics with efficiency exceeding 17%
Authors:
Xiao Chen,
Xiaoxuan Shu,
Jiangcheng Zhou,
Lei Wan,
Peng Xiao,
Yuchen Fu,
Junzhi Ye,
Yi-Teng Huang,
Bin Yan,
Dingjiang Xue,
Tao Chen,
Jiejie Chen,
Robert L. Z. Hoye,
Ru Zhou
Abstract:
Indoor photovoltaics (IPVs) have attracted increasing attention for sustainably powering Internet of Things (IoT) electronics. Sb$_2$S$_3$ is a promising IPV candidate material with a bandgap of ~1.75 eV, which is near the optimal value for indoor energy harvesting. However, the performance of Sb$_2$S$_3$ solar cells is limited by nonradiative recombination, closely associated with the poor-qualit…
▽ More
Indoor photovoltaics (IPVs) have attracted increasing attention for sustainably powering Internet of Things (IoT) electronics. Sb$_2$S$_3$ is a promising IPV candidate material with a bandgap of ~1.75 eV, which is near the optimal value for indoor energy harvesting. However, the performance of Sb$_2$S$_3$ solar cells is limited by nonradiative recombination, closely associated with the poor-quality absorber films. Additive engineering is an effective strategy to improved the properties of solution-processed films. This work shows that the addition of monoethanolamine (MEA) into the precursor solution allows the nucleation and growth of Sb$_2$S$_3$ films to be controlled, enabling the deposition of high-quality Sb$_2$S$_3$ absorbers with reduced grain boundary density, optimized band positions and increased carrier concentration. Complemented with computations, it is revealed that the incorporation of MEA leads to a more efficient and energetically favorable deposition for enhanced heterogeneous nucleation on the substrate, which increases the grain size and accelerates the deposition rate of Sb$_2$S$_3$ films. Due to suppressed carrier recombination and improved charge-carrier transport in Sb$_2$S$_3$ absorber films, the MEA-modulated Sb$_2$S$_3$ solar cell yields a power conversion efficiency (PCE) of 7.22% under AM1.5G illumination, and an IPV PCE of 17.55% under 1000 lux white light emitting diode (WLED) illumination, which is the highest yet reported for Sb$_2$S$_3$ IPVs. Furthermore, we construct high performance large-area Sb$_2$S$_3$ IPV modules to power IoT wireless sensors, and realize the long-term continuous recording of environmental parameters under WLED illumination in an office. This work highlights the great prospect of Sb$_2$S$_3$ photovoltaics for indoor energy harvesting.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Strong and weak $CP$ tests in sequential decays of polarized $Σ^0$ hyperons
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The wea…
▽ More
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The weak-$CP$ test is performed in the subsequent decays of their daughter particles $Λ$ and $\barΛ$. Also for the first time, the transverse polarizations of the $Σ^0$ hyperons in $J/ψ$ and $ψ(3686)$ decays are observed with opposite directions, and the ratios between the S-wave and D-wave contributions of the $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ decays are obtained. These results are crucial to understand the decay dynamics of the charmonium states and the production mechanism of the $Σ^0-\barΣ^0$ pairs.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Authors:
Haoran You,
Yipin Guo,
Yichao Fu,
Wei Zhou,
Huihong Shi,
Xiaofan Zhang,
Souvik Kundu,
Amir Yazdanbakhsh,
Yingyan,
Lin
Abstract:
Large language models (LLMs) have shown impressive performance on language tasks but face challenges when deployed on resource-constrained devices due to their extensive parameters and reliance on dense multiplications, resulting in high memory demands and latency bottlenecks. Shift-and-add reparameterization offers a promising solution by replacing costly multiplications with hardware-friendly pr…
▽ More
Large language models (LLMs) have shown impressive performance on language tasks but face challenges when deployed on resource-constrained devices due to their extensive parameters and reliance on dense multiplications, resulting in high memory demands and latency bottlenecks. Shift-and-add reparameterization offers a promising solution by replacing costly multiplications with hardware-friendly primitives in both the attention and multi-layer perceptron (MLP) layers of an LLM. However, current reparameterization techniques require training from scratch or full parameter fine-tuning to restore accuracy, which is resource-intensive for LLMs. To address this, we propose accelerating pretrained LLMs through post-training shift-and-add reparameterization, creating efficient multiplication-free models, dubbed ShiftAddLLM. Specifically, we quantize each weight matrix into binary matrices paired with group-wise scaling factors. The associated multiplications are reparameterized into (1) shifts between activations and scaling factors and (2) queries and adds according to the binary matrices. To reduce accuracy loss, we present a multi-objective optimization method to minimize both weight and output activation reparameterization errors. Additionally, based on varying sensitivity across layers to reparameterization, we develop an automated bit allocation strategy to further reduce memory usage and latency. Experiments on five LLM families and eight tasks consistently validate the effectiveness of ShiftAddLLM, achieving average perplexity improvements of 5.6 and 22.7 points at comparable or lower latency compared to the most competitive quantized LLMs at 3 and 2 bits, respectively, and more than 80% memory and energy reductions over the original LLMs. Codes and models are available at https://github.com/GATECH-EIC/ShiftAddLLM.
△ Less
Submitted 11 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Measurement of the integrated luminosity of the data collected at 3.773 GeV by BESIII from 2021 to 2024
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$,…
▽ More
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$, $8.157 \pm 0.031$~fb$^{-1}$, and $4.191 \pm 0.016$~fb$^{-1}$, respectively, by analyzing large angle Bhabha scattering events. The uncertainties are dominated by systematic effects and the statistical uncertainties are negligible. Our results provide essential input for future analyses and precision measurements.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Visualizing uniform lattice-scale pair density wave in single-layer FeSe/SrTiO3 films
Authors:
Yao Zhang,
Lianzhi Yang,
Chaofei Liu,
Wenhao Zhang,
Ying-Shuang Fu
Abstract:
Typical BCS superconductors are microscopically homogeneous in real space governed by the coherent Cooper pairs with high phase stiffness of superfluid density, which is characterized by a coherence length. However, a periodic oscillation of superconducting order parameter may develop driven by breaking the time-reversal or translational invariance. To date, such modulated orders were specific to…
▽ More
Typical BCS superconductors are microscopically homogeneous in real space governed by the coherent Cooper pairs with high phase stiffness of superfluid density, which is characterized by a coherence length. However, a periodic oscillation of superconducting order parameter may develop driven by breaking the time-reversal or translational invariance. To date, such modulated orders were specific to each material systems, with a periodicity much larger than the lattice constant. Here we report the direct observation of a uniform lattice-scale pair density wave (PDW) in single-layer FeSe/SrTiO3 films, enforced by peculiar interfacial structure of crystal symmetries breaking. Our spectroscopic imaging scanning tunneling microscopy unravels a spatial modulation of Cooper-pairing gap within a single unit-cell, depending on inequivalent atomic sites. Prominent periodic variation of superfluid density is visualized via Josephson current by a superconducting tip, indicating a real-space oscillation of phase stiffness. Such a lattice-scale superconducting modulation, which coexists with a larger length scale of PDW order, indicates the lattice-scale variation of both pairing strength and phase stiffness. Our findings provide new insights into the intertwined density-wave orders of quasiparticle character in correlated electronic systems, and provoke future studies on the unconventional pairing interaction and phase stiffness in the two-dimensional limit.
△ Less
Submitted 11 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation
Authors:
Shuting Wang,
Jiongnan Liu,
Shiren Song,
Jiehan Cheng,
Yuqi Fu,
Peidong Guo,
Kun Fang,
Yutao Zhu,
Zhicheng Dou
Abstract:
Retrieval-Augmented Generation (RAG) offers a promising solution to address various limitations of Large Language Models (LLMs), such as hallucination and difficulties in keeping up with real-time updates. This approach is particularly critical in expert and domain-specific applications where LLMs struggle to cover expert knowledge. Therefore, evaluating RAG models in such scenarios is crucial, ye…
▽ More
Retrieval-Augmented Generation (RAG) offers a promising solution to address various limitations of Large Language Models (LLMs), such as hallucination and difficulties in keeping up with real-time updates. This approach is particularly critical in expert and domain-specific applications where LLMs struggle to cover expert knowledge. Therefore, evaluating RAG models in such scenarios is crucial, yet current studies often rely on general knowledge sources like Wikipedia to assess the models' abilities in solving common-sense problems. In this paper, we evaluated LLMs by RAG settings in a domain-specific context, college enrollment. We identified six required abilities for RAG models, including the ability in conversational RAG, analyzing structural information, faithfulness to external knowledge, denoising, solving time-sensitive problems, and understanding multi-document interactions. Each ability has an associated dataset with shared corpora to evaluate the RAG models' performance. We evaluated popular LLMs such as Llama, Baichuan, ChatGLM, and GPT models. Experimental results indicate that existing closed-book LLMs struggle with domain-specific questions, highlighting the need for RAG models to solve expert problems. Moreover, there is room for RAG models to improve their abilities in comprehending conversational history, analyzing structural information, denoising, processing multi-document interactions, and faithfulness in expert knowledge. We expect future studies could solve these problems better.
△ Less
Submitted 16 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Neuro-Symbolic Temporal Point Processes
Authors:
Yang Yang,
Chao Yang,
Boyang Li,
Yinghao Fu,
Shuang Li
Abstract:
Our goal is to $\textit{efficiently}$ discover a compact set of temporal logic rules to explain irregular events of interest. We introduce a neural-symbolic rule induction framework within the temporal point process model. The negative log-likelihood is the loss that guides the learning, where the explanatory logic rules and their weights are learned end-to-end in a $\textit{differentiable}$ way.…
▽ More
Our goal is to $\textit{efficiently}$ discover a compact set of temporal logic rules to explain irregular events of interest. We introduce a neural-symbolic rule induction framework within the temporal point process model. The negative log-likelihood is the loss that guides the learning, where the explanatory logic rules and their weights are learned end-to-end in a $\textit{differentiable}$ way. Specifically, predicates and logic rules are represented as $\textit{vector embeddings}$, where the predicate embeddings are fixed and the rule embeddings are trained via gradient descent to obtain the most appropriate compositional representations of the predicate embeddings. To make the rule learning process more efficient and flexible, we adopt a $\textit{sequential covering algorithm}$, which progressively adds rules to the model and removes the event sequences that have been explained until all event sequences have been covered. All the found rules will be fed back to the models for a final rule embedding and weight refinement. Our approach showcases notable efficiency and accuracy across synthetic and real datasets, surpassing state-of-the-art baselines by a wide margin in terms of efficiency.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Optical read and write of spin states in organic diradicals
Authors:
Rituparno Chowdhury,
Petri Murto,
Naitik A. Panjwani,
Yan Sun,
Pratyush Ghosh,
Yorrick Boeije,
Vadim Derkach,
Seung-Je Woo,
Oliver Millington,
Daniel G. Congrave,
Yao Fu,
Tarig B. E. Mustafa,
Miguel Monteverde,
Jesús Cerdá,
Jan Behrends,
Akshay Rao,
David Beljonne,
Alexei Chepelianskii,
Hugo Bronstein,
Richard H. Friend
Abstract:
Optical control and read-out of the ground state spin structure has been demonstrated for defect states in crystalline semiconductors, including the diamond NV- center, and these are promising systems for quantum technologies. Molecular organic semiconductors offer synthetic control of spin placement, in contrast to current limitations in these crystalline systems. Here we report the discovery of…
▽ More
Optical control and read-out of the ground state spin structure has been demonstrated for defect states in crystalline semiconductors, including the diamond NV- center, and these are promising systems for quantum technologies. Molecular organic semiconductors offer synthetic control of spin placement, in contrast to current limitations in these crystalline systems. Here we report the discovery of spin-optical addressability in a diradical molecule that comprises two trityl radical groups coupled via a fluorene bridge. We demonstrate the three important properties that enable operation as a spin-photon interface: (i) triplet and singlet spin states show photoluminescence peaked at 640 and 700 nm respectively; this allows easy optical measurement of ground state spin. (ii) the ground state spin exchange is small (~60 μeV) that allows preparation of ground state spin population. This can be achieved by spin-selective excited state intersystem crossing, and we report up to 8% microwave-driven contrast in photoluminescence. (iii) both singlet and triplet manifolds have near-unity photoluminescence quantum yield, which is in contrast to the near-zero quantum yields in prior reports of molecular diradicals. Our results establish these tuneable open-shell organic molecules as a platform to engineer tailor-made spin-optical interfaces.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.