-
Autoregressive Speech Synthesis without Vector Quantization
Authors:
Lingwei Meng,
Long Zhou,
Shujie Liu,
Sanyuan Chen,
Bing Han,
Shujie Hu,
Yanqing Liu,
Jinyu Li,
Sheng Zhao,
Xixin Wu,
Helen Meng,
Furu Wei
Abstract:
We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross…
▽ More
We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross-entropy loss, we apply regression loss with a proposed spectrogram flux loss function to model the probability distribution of the continuous-valued tokens. (ii) we have incorporated variational inference into MELLE to facilitate sampling mechanisms, thereby enhancing the output diversity and model robustness. Experiments demonstrate that, compared to the two-stage codec language models VALL-E and its variants, the single-stage MELLE mitigates robustness issues by avoiding the inherent flaws of sampling discrete codes, achieves superior performance across multiple metrics, and, most importantly, offers a more streamlined paradigm. See https://aka.ms/melle for demos of our work.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
High power GaSb-based distributed feedback laser with laterally coupled dielectric gratings at 1.95μm
Authors:
Zhengqing Ding,
Juntian Cao,
Kun Zhan,
Yihang Chen,
Lidan Zhou,
Hao Tan,
Chenao Yang,
Ying Yu,
Zhichuan Niu,
Siyuan Yu
Abstract:
Traditional Distributed Feedback (DFB) or Distributed Bragg Reflector (DBR) lasers typically utilize buried gratings as frequency-selective optical feedback mechanisms. However, the fabrication of such gratings often necessitates regrowth processes, which can pose technical challenges for materials platforms such as GaAs and GaSb. Metal gratings were also used for GaSb lasers but they introduce ad…
▽ More
Traditional Distributed Feedback (DFB) or Distributed Bragg Reflector (DBR) lasers typically utilize buried gratings as frequency-selective optical feedback mechanisms. However, the fabrication of such gratings often necessitates regrowth processes, which can pose technical challenges for materials platforms such as GaAs and GaSb. Metal gratings were also used for GaSb lasers but they introduce additional absorption loss that limits device efficiency and output power. In this paper, we introduce a novel laterally coupled dielectric Bragg grating structure, which enables highly controllable, deterministic, and stable coupling between the grating and the optical mode. Our device demonstrates a continuous-wave output power of 47.02 mW at room temperature, exhibiting stable single-mode operation from 300-1000 mA and achieving a maximum side mode suppression ratio of 46.7 dB. These results underscore the innovative lateral coupled dielectric grating as a feasible and technologically superior approach for fabricating DFB and DBR lasers, which hold universal applicability across different material platforms and wavelength bands.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Collaborative Secret and Covert Communications for Multi-User Multi-Antenna Uplink UAV Systems: Design and Optimization
Authors:
Jinpeng Xu,
Lin Bai,
Xin Xie,
Lin Zhou
Abstract:
Motivated by diverse secure requirements of multi-user in UAV systems, we propose a collaborative secret and covert transmission method for multi-antenna ground users to unmanned aerial vehicle (UAV) communications. Specifically, based on the power domain non-orthogonal multiple access (NOMA), two ground users with distinct security requirements, named Bob and Carlo, superimpose their signals and…
▽ More
Motivated by diverse secure requirements of multi-user in UAV systems, we propose a collaborative secret and covert transmission method for multi-antenna ground users to unmanned aerial vehicle (UAV) communications. Specifically, based on the power domain non-orthogonal multiple access (NOMA), two ground users with distinct security requirements, named Bob and Carlo, superimpose their signals and transmit the combined signal to the UAV named Alice. An adversary Willie attempts to simultaneously eavesdrop Bob's confidential message and detect whether Carlo is transmitting or not. We derive close-form expressions of the secrecy connection probability (SCP) and the covert connection probability (CCP) to evaluate the link reliability for wiretap and covert transmissions, respectively. Furthermore, we bound the secrecy outage probability (SOP) from Bob to Alice and the detection error probability (DEP) of Willie to evaluate the link security for wiretap and covert transmissions, respectively. To characterize the theoretical benchmark of the above model, we formulate a weighted multi-objective optimization problem to maximize the average of secret and covert transmission rates subject to constraints SOP, DEP, the beamformers of Bob and Carlo, and UAV trajectory parameters. To solve the optimization problem, we propose an iterative optimization algorithm using successive convex approximation and block coordinate descent (SCA-BCD) methods. Our results reveal the influence of design parameters of the system on the wiretap and covert rates, analytically and numerically. In summary, our study fills the gaps in joint secret and covert transmission for multi-user multi-antenna uplink UAV communications and provides insights to construct such systems.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
$\mathrm{E^{2}CFD}$: Towards Effective and Efficient Cost Function Design for Safe Reinforcement Learning via Large Language Model
Authors:
Zepeng Wang,
Chao Ma,
Linjiang Zhou,
Libing Wu,
Lei Yang,
Xiaochuan Shi,
Guojun Peng
Abstract:
Different classes of safe reinforcement learning algorithms have shown satisfactory performance in various types of safety requirement scenarios. However, the existing methods mainly address one or several classes of specific safety requirement scenario problems and cannot be applied to arbitrary safety requirement scenarios. In addition, the optimization objectives of existing reinforcement learn…
▽ More
Different classes of safe reinforcement learning algorithms have shown satisfactory performance in various types of safety requirement scenarios. However, the existing methods mainly address one or several classes of specific safety requirement scenario problems and cannot be applied to arbitrary safety requirement scenarios. In addition, the optimization objectives of existing reinforcement learning algorithms are misaligned with the task requirements. Based on the need to address these issues, we propose $\mathrm{E^{2}CFD}$, an effective and efficient cost function design framework. $\mathrm{E^{2}CFD}$ leverages the capabilities of a large language model (LLM) to comprehend various safety scenarios and generate corresponding cost functions. It incorporates the \textit{fast performance evaluation (FPE)} method to facilitate rapid and iterative updates to the generated cost function. Through this iterative process, $\mathrm{E^{2}CFD}$ aims to obtain the most suitable cost function for policy training, tailored to the specific tasks within the safety scenario. Experiments have proven that the performance of policies trained using this framework is superior to traditional safe reinforcement learning algorithms and policies trained with carefully designed cost functions.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Determining the viewing angle from TeV light curve of GRB 221009A
Authors:
Lin Zhou,
Yuan-Chuan Zou
Abstract:
Gamma-ray bursts (GRBs) are among the most powerful explosive events in the universe. LHAASO recently observed the most luminous one: GRB 221009A, and unveiled its TeV light curve. The light curve exhibits a distinct jet break at around 670 seconds, enabling the derivation of the viewing angle based on the smoothness of the jet break. We constructed two models with or without considering the high-…
▽ More
Gamma-ray bursts (GRBs) are among the most powerful explosive events in the universe. LHAASO recently observed the most luminous one: GRB 221009A, and unveiled its TeV light curve. The light curve exhibits a distinct jet break at around 670 seconds, enabling the derivation of the viewing angle based on the smoothness of the jet break. We constructed two models with or without considering the high-latitude radiation, where the viewing angle was treated as a free parameter, to fit the TeV light curve. We obtained the viewing angles being 9.4 $\times 10^{-4}$ radians and 5.9 $\times 10^{-3}$ radians, respectively. These values closely resemble an on-axis scenario, given the opening angle is 1.4 $\times 10^{-2}$ radians.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Visual Grounding with Attention-Driven Constraint Balancing
Authors:
Weitai Kang,
Luowei Zhou,
Junyi Wu,
Changchang Sun,
Yan Yan
Abstract:
Unlike Object Detection, Visual Grounding task necessitates the detection of an object described by complex free-form language. To simultaneously model such complex semantic and visual representations, recent state-of-the-art studies adopt transformer-based models to fuse features from both modalities, further introducing various modules that modulate visual features to align with the language exp…
▽ More
Unlike Object Detection, Visual Grounding task necessitates the detection of an object described by complex free-form language. To simultaneously model such complex semantic and visual representations, recent state-of-the-art studies adopt transformer-based models to fuse features from both modalities, further introducing various modules that modulate visual features to align with the language expressions and eliminate the irrelevant redundant information. However, their loss function, still adopting common Object Detection losses, solely governs the bounding box regression output, failing to fully optimize for the above objectives. To tackle this problem, in this paper, we first analyze the attention mechanisms of transformer-based models. Building upon this, we further propose a novel framework named Attention-Driven Constraint Balancing (AttBalance) to optimize the behavior of visual features within language-relevant regions. Extensive experimental results show that our method brings impressive improvements. Specifically, we achieve constant improvements over five different models evaluated on four different benchmarks. Moreover, we attain a new state-of-the-art performance by integrating our method into QRNet.
△ Less
Submitted 6 July, 2024; v1 submitted 3 July, 2024;
originally announced July 2024.
-
NOEMA formIng Cluster survEy (NICE): Characterizing eight massive galaxy groups at $1.5 < z < 4$ in the COSMOS field
Authors:
Nikolaj B. Sillassen,
Shuowen Jin,
Georgios E. Magdis,
Emanuele Daddi,
Tao Wang,
Shiying Lu,
Hanwen Sun,
Vinod Arumugam,
Daizhong Liu,
Malte Brinch,
Chiara D'Eugenio,
Raphael Gobat,
Carlos Gómez-Guijarro,
Michael Rich,
Eva Schinnerer,
Veronica Strazzullo,
Qinghua Tan,
Francesco Valentino,
Yijun Wang,
Mengyuan Xiao,
Luwenjia Zhou,
David Blánquez-Sesé,
Zheng Cai,
Yanmei Chen,
Laure Ciesla
, et al. (19 additional authors not shown)
Abstract:
The NOEMA formIng Cluster survEy (NICE) is a large program targeting 69 massive galaxy group candidates at $z>2$ in six deep fields. We report spectroscopic confirmation of eight groups at $1.65\leq z\leq3.61$ in COSMOS. Homogeneously selected as significant overdensities of red IRAC sources with red Herschel colors, four groups are confirmed by CO and [CI] with NOEMA 3mm observations, three are c…
▽ More
The NOEMA formIng Cluster survEy (NICE) is a large program targeting 69 massive galaxy group candidates at $z>2$ in six deep fields. We report spectroscopic confirmation of eight groups at $1.65\leq z\leq3.61$ in COSMOS. Homogeneously selected as significant overdensities of red IRAC sources with red Herschel colors, four groups are confirmed by CO and [CI] with NOEMA 3mm observations, three are confirmed with ALMA, and one is confirmed by H$α$ from Subaru/FMOS. We constructed the integrated FIR SEDs for the eight groups, obtaining total IR SFR $=260-1300~{\rm M_\odot}$~yr$^{-1}$. We adopted six methods to estimate the dark matter masses, including stellar mass to halo mass relations, overdensity with galaxy bias, and NFW profile fitting to radial stellar mass density. We found the radial stellar mass density are consistent with a NFW profile, supporting that they are collapsed structures hosted by a single dark matter halo. The best halo mass estimates are $\log(M_{\rm h}/{\rm M_\odot})=12.8-13.7$ with uncertainty of 0.3 dex. From halo mass estimates, we derive baryonic accretion rate ${\rm BAR}=(1-8)\times10^{3}\,{\rm M_{\odot}/yr}$ for this sample. We find a quasi-linear correlation between the integrated SFR/BAR and the theoretical halo mass limit for cold streams, $M_{\rm stream}/M_{\rm h}$, with ${\rm SFR/BAR}=10^{-0.46\pm0.22}\left({M_{\rm stream}/M_{\rm h}}\right)^{0.71\pm0.16}$ with a scatter of $0.40\,{\rm dex}$. Further, we compare halo masses and stellar masses with simulations, and find all structures are consistent with being progenitors of $M_{\rm h}(z=0)>10^{14}\,{\rm M_{\odot}}$ galaxy clusters, and the most massive central galaxies have stellar masses consistent with brightest cluster galaxies (BCGs) progenitors in the TNG300 simulation. The results strongly suggest these structures are forming massive galaxy clusters via baryonic and dark matter accretion.
△ Less
Submitted 5 July, 2024; v1 submitted 3 July, 2024;
originally announced July 2024.
-
Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be…
▽ More
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Large and Small Deviations for Statistical Sequence Matching
Authors:
Lin Zhou,
Qianyun Wang,
Jingjing Wang,
Lin Bai,
Alfred O. Hero
Abstract:
We revisit the problem of statistical sequence matching between two databases of sequences initiated by Unnikrishnan (TIT 2015) and derive theoretical performance guarantees for the generalized likelihood ratio test (GLRT). We first consider the case where the number of matched pairs of sequences between the databases is known. In this case, the task is to accurately find the matched pairs of sequ…
▽ More
We revisit the problem of statistical sequence matching between two databases of sequences initiated by Unnikrishnan (TIT 2015) and derive theoretical performance guarantees for the generalized likelihood ratio test (GLRT). We first consider the case where the number of matched pairs of sequences between the databases is known. In this case, the task is to accurately find the matched pairs of sequences among all possible matches between the sequences in the two databases. We analyze the performance of the GLRT by Unnikrishnan and explicitly characterize the tradeoff between the mismatch and false reject probabilities under each hypothesis in both large and small deviations regimes. Furthermore, we demonstrate the optimality of Unnikrishnan's GLRT test under the generalized Neyman-Person criterion for both regimes and illustrate our theoretical results via numerical examples. Subsequently, we generalize our achievability analyses to the case where the number of matched pairs is unknown, and an additional error probability needs to be considered. When one of the two databases contains a single sequence, the problem of statistical sequence matching specializes to the problem of multiple classification introduced by Gutman (TIT 1989). For this special case, our result for the small deviations regime strengthens previous result of Zhou, Tan and Motani (Information and Inference 2020) by removing unnecessary conditions on the generating distributions.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Multiple topological transitions and spectral singularities in non-Hermitian Floquet systems
Authors:
Weiwei Zhu,
Longwen Zhou,
Linhu Li,
Jiangbin Gong
Abstract:
The interplay between Floquet driving and non-Hermitian gain/loss could give rise to intriguing phenomena including topological funneling of light, edge-state delocalization, anomalous topological transitions and Floquet non-Hermitian skin effects. In this work, we uncover two unique phenomena in Floquet systems caused by gain and loss. First, multiple topological transitions from anomalous Floque…
▽ More
The interplay between Floquet driving and non-Hermitian gain/loss could give rise to intriguing phenomena including topological funneling of light, edge-state delocalization, anomalous topological transitions and Floquet non-Hermitian skin effects. In this work, we uncover two unique phenomena in Floquet systems caused by gain and loss. First, multiple topological transitions from anomalous Floquet second-order topological insulators to anomalous Floquet first-order topological insulators and then to normal insulators can be induced by gain and loss. Interestingly, the resulting anomalous Floquet insulators further carry hybrid skin-topological boundary modes, which could either be fully localized or localized to different edges at different time slices and traversing along all edges in a single driving period. The topological phase transitions are also shown to be detectable through studies of transmission properties in the setting of coupled ring resonators. Second, gain and loss are found to induce singularities in the Floquet spectral, around which anomalous transmissions at flat quasienergy bands are predicted. These discoveries not only enhanced our understanding of topological matter and phase transitions in driven non-Hermitian systems, but also promoted their experimental realizations in optical and acoustic settings.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Machine Learning-Assisted 3D Printing of Thermoelectric Materials of Ultrahigh Performances at Room Temperature
Authors:
Kaidong Song,
Guoyue Xu,
A. N. M. Tanvir,
Ke Wang,
Md Omarsany Bappy,
Haijian Yang,
Wenjie Shang,
Le Zhou,
Alexander Dowling,
Tengei Luo,
Yanliang Zhang
Abstract:
Thermoelectric energy conversion is an attractive technology for generating electricity from waste heat and using electricity for solid-state cooling. However, conventional manufacturing processes for thermoelectric devices are costly and limited to simple device geometries. This work reports an extrusion printing method to fabricate high-performance thermoelectric materials with complex 3D archit…
▽ More
Thermoelectric energy conversion is an attractive technology for generating electricity from waste heat and using electricity for solid-state cooling. However, conventional manufacturing processes for thermoelectric devices are costly and limited to simple device geometries. This work reports an extrusion printing method to fabricate high-performance thermoelectric materials with complex 3D architectures. By integrating high-throughput experimentation and Bayesian optimization (BO), our approach significantly accelerates the simultaneous search for the optimal ink formulation and printing parameters that deliver high thermoelectric performances while maintaining desired shape fidelity. A Gaussian process regression (GPR)-based machine learning model is employed to expeditiously predict thermoelectric power factor as a function of ink formulation and printing parameters. The printed bismuth antimony telluride (BiSbTe)-based thermoelectric materials under the optimized conditions exhibit an ultrahigh room temperature zT of 1.3, which is by far the highest in the printed thermoelectric materials. The machine learning-guided ink-based printing strategy can be highly generalizable to a wide range of functional materials and devices for broad technological applications.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Full Iso-recursive Types
Authors:
Litao Zhou,
Qianyong Wan,
Bruno C. d. S. Oliveira
Abstract:
There are two well-known formulations of recursive types: iso-recursive and equi-recursive types. Abadi and Fiore [1996] have shown that iso- and equi-recursive types have the same expressive power. However, their encoding of equi-recursive types in terms of iso-recursive types requires explicit coercions. These coercions come with significant additional computational overhead, and complicate reas…
▽ More
There are two well-known formulations of recursive types: iso-recursive and equi-recursive types. Abadi and Fiore [1996] have shown that iso- and equi-recursive types have the same expressive power. However, their encoding of equi-recursive types in terms of iso-recursive types requires explicit coercions. These coercions come with significant additional computational overhead, and complicate reasoning about the equivalence of the two formulations of recursive types.
This paper proposes a generalization of iso-recursive types called full iso-recursive types. Full iso-recursive types allow encoding all programs with equi-recursive types without computational overhead. Instead of explicit term coercions, all type transformations are captured by computationally irrelevant casts, which can be erased at runtime without affecting the semantics of the program. Consequently, reasoning about the equivalence between the two approaches can be greatly simplified. We present a calculus called $λ^μ_{Fi}$, which extends the simply typed lambda calculus (STLC) with full iso-recursive types. The $λ^μ_{Fi}$ calculus is proved to be type sound, and shown to have the same expressive power as a calculus with equi-recursive types. We also extend our results to subtyping, and show that equi-recursive subtyping can be expressed in terms of iso-recursive subtyping with cast operators.
△ Less
Submitted 7 July, 2024; v1 submitted 30 June, 2024;
originally announced July 2024.
-
When is the Four-phonon Effect in Half-Heusler Materials more Pronounced?
Authors:
Yu Wu,
Shengnan Dai,
Linxuan Ji,
Yimin Ding,
Jiong Yang,
Liujiang Zhou
Abstract:
Suppressed three-phonon scattering processes have been considered to be the direct cause of materials exhibiting significant higher-order four-phonon interactions. However, after calculating the phonon-phonon interactions of 128 Half-Heusler materials by high-throughput, we find that the acoustic phonon bandwidth dominates the three-phonon and four-phonon scattering channels and keeps them roughly…
▽ More
Suppressed three-phonon scattering processes have been considered to be the direct cause of materials exhibiting significant higher-order four-phonon interactions. However, after calculating the phonon-phonon interactions of 128 Half-Heusler materials by high-throughput, we find that the acoustic phonon bandwidth dominates the three-phonon and four-phonon scattering channels and keeps them roughly in a co-increasing or decreasing behavior. The $aao$ and $aaa$ three-phonon scattering channels in Half-Heusler materials are weakly affected by the acoustic-optical gap and acoustic bunched features respectively only when acoustic phonon bandwidths are close. Finally, we found that Half-Heusler materials with smaller acoustic bandwidths tend to have a more pronounced four-phonon effect, although three-phonon scattering may not be significantly suppressed at this time.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Axiomatization of Gradient Smoothing in Neural Networks
Authors:
Linjiang Zhou,
Xiaochuan Shi,
Chao Ma,
Zepeng Wang
Abstract:
Gradients play a pivotal role in neural networks explanation. The inherent high dimensionality and structural complexity of neural networks result in the original gradients containing a significant amount of noise. While several approaches were proposed to reduce noise with smoothing, there is little discussion of the rationale behind smoothing gradients in neural networks. In this work, we propos…
▽ More
Gradients play a pivotal role in neural networks explanation. The inherent high dimensionality and structural complexity of neural networks result in the original gradients containing a significant amount of noise. While several approaches were proposed to reduce noise with smoothing, there is little discussion of the rationale behind smoothing gradients in neural networks. In this work, we proposed a gradient smooth theoretical framework for neural networks based on the function mollification and Monte Carlo integration. The framework intrinsically axiomatized gradient smoothing and reveals the rationale of existing methods. Furthermore, we provided an approach to design new smooth methods derived from the framework. By experimental measurement of several newly designed smooth methods, we demonstrated the research potential of our framework.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
S. Ahmed,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
X. H. Bai,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
J. Bloms,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (495 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions…
▽ More
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components.
△ Less
Submitted 2 July, 2024; v1 submitted 28 June, 2024;
originally announced July 2024.
-
Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec…
▽ More
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an…
▽ More
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry
Authors:
Linqing Chen,
Weilei Wang,
Zilong Bai,
Peng Xu,
Yan Fang,
Jie Fang,
Wentao Wu,
Lizhi Zhou,
Ruiji Zhang,
Yubin Xia,
Chaobo Xu,
Ran Hu,
Licong Xu,
Qijun Cai,
Haoran Hua,
Jing Sun,
Jin Liu,
Tian Qiu,
Haowen Liu,
Meng Hu,
Xiuwen Li,
Fei Gao,
Yufu Wang,
Lin Tie,
Chaochao Wang
, et al. (11 additional authors not shown)
Abstract:
Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general purpo…
▽ More
Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general purpose LLMs often fall short. In this study, we introduce PharmaGPT, a suite of domain specilized LLMs with 13 billion and 70 billion parameters, specifically trained on a comprehensive corpus tailored to the Bio-Pharmaceutical and Chemical domains. Our evaluation shows that PharmaGPT surpasses existing general models on specific-domain benchmarks such as NAPLEX, demonstrating its exceptional capability in domain-specific tasks. Remarkably, this performance is achieved with a model that has only a fraction, sometimes just one-tenth-of the parameters of general-purpose large models. This advancement establishes a new benchmark for LLMs in the bio-pharmaceutical and chemical fields, addressing the existing gap in specialized language modeling. It also suggests a promising path for enhanced research and development, paving the way for more precise and effective NLP applications in these areas.
△ Less
Submitted 9 July, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Operating Single-Photon Circulator by Spinning Optical Resonators
Authors:
Jing Li,
Tian-Xiang Lu,
Meiyu Peng,
Le-Man Kuang,
Hui Jing,
Lan Zhou
Abstract:
A circulator is one of the crucial devices in quantum networks and simulations. We propose a four-port circulator that regulate the flow of single photons at muti-frequency points by studying the coherent transmission of a single photon in a coupled system of two resonators and two waveguides. When both resonators are static or rotate at the same angular velocity, single-photon transport demonstra…
▽ More
A circulator is one of the crucial devices in quantum networks and simulations. We propose a four-port circulator that regulate the flow of single photons at muti-frequency points by studying the coherent transmission of a single photon in a coupled system of two resonators and two waveguides. When both resonators are static or rotate at the same angular velocity, single-photon transport demonstrates reciprocity; however, when the angular velocities differ, four distinct frequency points emerge where photon circulation can occur. In particular, when the angular velocities of the two resonators are equal and opposite, there are two different frequency points where photon circulation can be achieved, and there is a frequency point where a single photon input from any waveguide can be completely routed to the other waveguide. Interestingly, by rotating the two resonators, the single-photon circulation suppressed by the internal defect-induced backscattering can be restored.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (649 additional authors not shown)
Abstract:
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and…
▽ More
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and determine the branching fractions $\mathcal{B}(D_s^+\toπ^+π^+π^-π^0|_{{\rm non}-η})=(2.04\pm0.08_{\rm stat.}\pm0.05_{\rm syst.})\%$ and $\mathcal{B}(D_s^+\toηπ^+)=(1.56\pm0.09_{\rm stat.}\pm0.04_{\rm syst.})\%$. Moreover, we measure the relative branching fraction between $φ\toπ^+π^-π^0$ and $φ\to K^+K^-$ to be $\frac{\mathcal{B}(φ(1020) \to π^+π^-π^0)}{\mathcal{B}(φ(1020) \to K^+K^-)}=0.230 \pm 0.014_{\rm stat.} \pm 0.010_{\rm syst.}$, which deviates from the world average value by more than $4σ$.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Cavity Modified Oscillating Bound States with a $Λ$-type giant emitter in a linear waveguide
Authors:
Ge Sun,
Ya Yang,
Jing Li,
Jing Lu,
Lan Zhou
Abstract:
We study a system composed by a three-level giant atom (3GA), a waveguide initially in the vacuum state, and a single-mode cavity. The 3GA-cavity system is in a strong-coupling regime, and the distance between the coupling points is comparable to the coherent length of a spontaneously emitted photon. The dynamics of the 3GA and its radiative field in the waveguide for long time are analyzed. Besid…
▽ More
We study a system composed by a three-level giant atom (3GA), a waveguide initially in the vacuum state, and a single-mode cavity. The 3GA-cavity system is in a strong-coupling regime, and the distance between the coupling points is comparable to the coherent length of a spontaneously emitted photon. The dynamics of the 3GA and its radiative field in the waveguide for long time are analyzed. Besides the steady value, we also found an oscillatory character of the excited state population, a signature of oscillating bound states which is generated by the superposition of two bound states in the continuum. The radiative field propagates in the cavity-like geometry formed by the coupling points. When one bound state is emergent, a sine-like interference pattern is visible for the emitted field intensity in spacetime. An oscillatory character in time and a beat in space for the emitted field intensity are observed when two bound states are emergent in a subspace. The wavelengths and the periods are controlled by the number of the photons in the cavity.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction…
▽ More
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
An LLM Feature-based Framework for Dialogue Constructiveness Assessment
Authors:
Lexin Zhou,
Youmna Farag,
Andreas Vlachos
Abstract:
Research on dialogue constructiveness assessment focuses on (i) analysing conversational factors that influence individuals to take specific actions, win debates, change their perspectives or broaden their open-mindedness and (ii) predicting constructive outcomes following dialogues for such use cases. These objectives can be achieved by training either interpretable feature-based models (which of…
▽ More
Research on dialogue constructiveness assessment focuses on (i) analysing conversational factors that influence individuals to take specific actions, win debates, change their perspectives or broaden their open-mindedness and (ii) predicting constructive outcomes following dialogues for such use cases. These objectives can be achieved by training either interpretable feature-based models (which often involve costly human annotations) or neural models such as pre-trained language models (which have empirically shown higher task accuracy but lack interpretability). We propose a novel LLM feature-based framework that combines the strengths of feature-based and neural approaches while mitigating their downsides, in assessing dialogue constructiveness. The framework first defines a set of dataset-independent and interpretable linguistic features, which can be extracted by both prompting an LLM and simple heuristics. Such features are then used to train LLM feature-based models. We apply this framework to three datasets of dialogue constructiveness and find that our LLM feature-based models significantly outperform standard feature-based models and neural models, and tend to learn more robust prediction rules instead of relying on superficial shortcuts (as seen with neural models). Further, we demonstrate that interpreting these LLM feature-based models can yield valuable insights into what makes a dialogue constructive.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Super-resolution 3D tomography of vector near-fields in dielectric resonators
Authors:
Bingbing Zhu,
Qingnan Cai,
Yaxin Liu,
Sheng Zhang,
Weifeng Liu,
Qiong He,
Lei Zhou,
Zhensheng Tao
Abstract:
All-dielectric optical resonators, exhibiting exotic near-field distributions upon excitations, have emerged as low-loss, versatile and highly adaptable components in nanophotonic structures for manipulating electromagnetic waves and enhancing light-matter interactions. However, achieving experimental full three-dimensional characterization of near-fields within dielectric materials poses signific…
▽ More
All-dielectric optical resonators, exhibiting exotic near-field distributions upon excitations, have emerged as low-loss, versatile and highly adaptable components in nanophotonic structures for manipulating electromagnetic waves and enhancing light-matter interactions. However, achieving experimental full three-dimensional characterization of near-fields within dielectric materials poses significant challenges. Here, we develop a novel technique using high-order sideband generation to image near-field wave patterns inside dielectric optical resonators. By exploiting the phase-sensitivity of various harmonic orders that enables the detection of near-field distributions at distinct depths, we realize three-dimensional tomographic and super-resolution near-field imaging inside a micrometer-thick silicon anapole resonator. Furthermore, our method offers high-contrast polarization sensitivity and phase-resolving capability, providing comprehensive vectorial near-field information. Our approach can potentially be applied to diverse dielectric metamaterials, and becomes a valuable tool for comprehensive characterization of near-field wave phenomena within dielectric materials.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture
Authors:
Wenyan Li,
Xinyu Zhang,
Jiaang Li,
Qiwei Peng,
Raphael Tang,
Li Zhou,
Weijia Zhang,
Guimin Hu,
Yifei Yuan,
Anders Søgaard,
Daniel Hershcovich,
Desmond Elliott
Abstract:
Food is a rich and varied dimension of cultural heritage, crucial to both individuals and social groups. To bridge the gap in the literature on the often-overlooked regional diversity in this domain, we introduce FoodieQA, a manually curated, fine-grained image-text dataset capturing the intricate features of food cultures across various regions in China. We evaluate vision-language Models (VLMs)…
▽ More
Food is a rich and varied dimension of cultural heritage, crucial to both individuals and social groups. To bridge the gap in the literature on the often-overlooked regional diversity in this domain, we introduce FoodieQA, a manually curated, fine-grained image-text dataset capturing the intricate features of food cultures across various regions in China. We evaluate vision-language Models (VLMs) and large language models (LLMs) on newly collected, unseen food images and corresponding questions. FoodieQA comprises three multiple-choice question-answering tasks where models need to answer questions based on multiple images, a single image, and text-only descriptions, respectively. While LLMs excel at text-based question answering, surpassing human accuracy, the open-sourced VLMs still fall short by 41\% on multi-image and 21\% on single-image VQA tasks, although closed-weights models perform closer to human levels (within 10\%). Our findings highlight that understanding food and its cultural implications remains a challenging and under-explored direction.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
ESCoT: Towards Interpretable Emotional Support Dialogue Systems
Authors:
Tenggan Zhang,
Xinjie Zhang,
Jinming Zhao,
Li Zhou,
Qin Jin
Abstract:
Understanding the reason for emotional support response is crucial for establishing connections between users and emotional support dialogue systems. Previous works mostly focus on generating better responses but ignore interpretability, which is extremely important for constructing reliable dialogue systems. To empower the system with better interpretability, we propose an emotional support respo…
▽ More
Understanding the reason for emotional support response is crucial for establishing connections between users and emotional support dialogue systems. Previous works mostly focus on generating better responses but ignore interpretability, which is extremely important for constructing reliable dialogue systems. To empower the system with better interpretability, we propose an emotional support response generation scheme, named $\textbf{E}$motion-Focused and $\textbf{S}$trategy-Driven $\textbf{C}$hain-$\textbf{o}$f-$\textbf{T}$hought ($\textbf{ESCoT}$), mimicking the process of $\textit{identifying}$, $\textit{understanding}$, and $\textit{regulating}$ emotions. Specially, we construct a new dataset with ESCoT in two steps: (1) $\textit{Dialogue Generation}$ where we first generate diverse conversation situations, then enhance dialogue generation using richer emotional support strategies based on these situations; (2) $\textit{Chain Supplement}$ where we focus on supplementing selected dialogues with elements such as emotion, stimuli, appraisal, and strategy reason, forming the manually verified chains. Additionally, we further develop a model to generate dialogue responses with better interpretability. We also conduct extensive experiments and human evaluations to validate the effectiveness of the proposed ESCoT and generated dialogue responses. Our data and code are available at $\href{https://github.com/TeigenZhang/ESCoT}{https://github.com/TeigenZhang/ESCoT}$.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Authors:
Jifan Zhang,
Lalit Jain,
Yang Guo,
Jiayi Chen,
Kuan Lok Zhou,
Siddharth Suresh,
Andrew Wagenmaker,
Scott Sievert,
Timothy Rogers,
Kevin Jamieson,
Robert Mankoff,
Robert Nowak
Abstract:
We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human ratings on more than 2.2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over the past eight years. This unique dataset supports the development and evaluation of multimodal large language models and preference-based fine-tuning…
▽ More
We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human ratings on more than 2.2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over the past eight years. This unique dataset supports the development and evaluation of multimodal large language models and preference-based fine-tuning algorithms for humorous caption generation. We propose novel benchmarks for judging the quality of model-generated captions, utilizing both GPT4 and human judgments to establish ranking-based evaluation strategies. Our experimental results highlight the limitations of current fine-tuning methods, such as RLHF and DPO, when applied to creative tasks. Furthermore, we demonstrate that even state-of-the-art models like GPT4 and Claude currently underperform top human contestants in generating humorous captions. As we conclude this extensive data collection effort, we release the entire preference dataset to the research community, fostering further advancements in AI humor generation and evaluation.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
High-efficiency generation of vectorial holograms with metasurfaces
Authors:
Tong Liu,
Changhong Dai,
Dongyi Wang,
Lei Zhou
Abstract:
Holography plays a crucial role in optics applications, but it traditionally requires complex setup and bulky devices, being unfavourable for optics integration. While metasurface-based holograms are ultra-compact and easy to realize, holographic images generated are mostly restricted to scalar ones, with a few recent attempts on vectorial holograms suffering from complex meta-structures and low e…
▽ More
Holography plays a crucial role in optics applications, but it traditionally requires complex setup and bulky devices, being unfavourable for optics integration. While metasurface-based holograms are ultra-compact and easy to realize, holographic images generated are mostly restricted to scalar ones, with a few recent attempts on vectorial holograms suffering from complex meta-structures and low efficiencies. Here, we propose and experimentally demonstrate an efficient meta-platform to generate vectorial holograms with arbitrarily designed wave fronts and polarization distributions based on ultra-compact metaatoms. Combining GS algorithm and the wave-decomposition technique, we establish a generic strategy to retrieve the optical property, i.e., the distributions of reflection phase and polarization-conversion capability of the metasurface to generate a target vectorial holographic image. We next design a series of high-efficiency and deep-subwavelength metaatoms exhibiting arbitrarily designed reflection phases and polarization-conversion capabilities, and experimentally characterize their optical properties. Based on these metaatoms, we finally realize a series of meta-holograms that can generate pre-designed vectorial holographic images upon external illuminations, and experimentally characterize their working performances. Our work provides a high-efficiency and ultra-thin platform to generate vectorial holographic images, which can find many applications in onchip photonics.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the…
▽ More
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Direct generation of multi-photon hyperentanglement
Authors:
Peng Zhao,
Jia-Wei Ying,
Meng-Ying Yang,
Wei Zhong,
Ming-Ming Du,
Shu-Ting Shen,
Yun-Xi Li,
An-Lei Zhang,
Lan Zhou,
Yu-Bo Sheng
Abstract:
Multi-photon hyperentangement is of fundamental importance in optical quantum information processing. Existing theory and experiment producing multi-photon hyperentangled states have until now relied on the outcome post-selection, a procedure where only the measurement results corresponding to the desired state are considered. Such approach severely limits the usefulness of the resulting hyperenta…
▽ More
Multi-photon hyperentangement is of fundamental importance in optical quantum information processing. Existing theory and experiment producing multi-photon hyperentangled states have until now relied on the outcome post-selection, a procedure where only the measurement results corresponding to the desired state are considered. Such approach severely limits the usefulness of the resulting hyperentangled states. We present the protocols of direct production of three- and four-photon hyperentanglement and extend the approach to an arbitrary number of photons through a straightforward cascade of spontaneous parametric down-conversion (SPDC) sources. The generated multi-photon hyperentangled states are encoded in polarization-spatial modes and polarization-time bin degrees of freedom, respectively. Numerical calculation shows that if the average photon number $μ$ is set to 1, the down conversion efficiency is $7.6*10^{-6}$ and the repetition frequency of the laser is $10^9$ Hz, the number of the generation of three-photon and four-photon hyperentanglement after cascading can reach about $5.78*10^{-2}$ and $4.44*10^{-7}$ pairs per second, respectively. By eliminating the constraints of outcome post-selection, our protocols may represent important progresses for multi-photon hyperentangement generation and providing a pivotal role in future multi-party and high-capacity communication networks.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
A Plug-and-Play Untrained Neural Network for Full Waveform Inversion in Reconstructing Sound Speed Images of Ultrasound Computed Tomography
Authors:
Weicheng Yan,
Qiude Zhang,
Yun Wu,
Zhaohui Liu,
Liang Zhou,
Mingyue Ding,
Ming Yuchi,
Wu Qiu
Abstract:
Ultrasound computed tomography (USCT), as an emerging technology, can provide multiple quantitative parametric images of human tissue, such as sound speed and attenuation images, distinguishing it from conventional B-mode (reflection) ultrasound imaging. Full waveform inversion (FWI) is acknowledged as a technique with the greatest potential for reconstructing high-resolution sound speed images in…
▽ More
Ultrasound computed tomography (USCT), as an emerging technology, can provide multiple quantitative parametric images of human tissue, such as sound speed and attenuation images, distinguishing it from conventional B-mode (reflection) ultrasound imaging. Full waveform inversion (FWI) is acknowledged as a technique with the greatest potential for reconstructing high-resolution sound speed images in USCT. However, traditional FWI for sound speed image reconstruction suffers from high sensitivity to the initial model caused by its strong non-convex nonlinearity, resulting in poor performance when ultrasound signals are at high frequencies. This limitation significantly restricts the application of FWI in the USCT imaging field. In this paper, we propose an untrained neural network (UNN) that can be integrated into the traditional iteration-based FWI framework as an implicit regularization prior. This integration allows for seamless deployment as a plug-and-play module within existing FWI algorithms or their variants. Notably, the proposed UNN method can be trained in an unsupervised fashion, a vital aspect in medical imaging where ground truth data is often unavailable. Evaluations of the numerical simulation and phantom experiment of the breast demonstrate that the proposed UNN improves the robustness of image reconstruction, reduces image artifacts, and achieves great image contrast. To the best of our knowledge, this study represents the first attempt to propose an implicit UNN for FWI in reconstructing sound speed images for USCT.
△ Less
Submitted 13 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (636 additional authors not shown)
Abstract:
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur…
▽ More
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measured in both destructive and constructive interference scenarios for the first time. The mass and width of the $η_{c}(1S)$ are measured to be $M=(2984.14 \pm 0.13 \pm 0.38)$ MeV/$c^{2}$ and $Γ=(28.82 \pm 0.11 \pm 0.82)$ MeV, respectively. Clear signals for the decays of the $χ_{cJ}(J=0,1,2)$ and the $η_{c}(2S)$ to $2(π^{+}π^{-})η$ are also observed for the first time, and the corresponding branching fractions are measured. The ratio of the branching fractions between the $η_{c}(2S)$ and $η_{c}(1S)$ decays is significantly lower than the theoretical prediction, which might suggest different dynamics in their decays.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Authors:
Bing Han,
Long Zhou,
Shujie Liu,
Sanyuan Chen,
Lingwei Meng,
Yanming Qian,
Yanqing Liu,
Sheng Zhao,
Jinyu Li,
Furu Wei
Abstract:
With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis. However, sampling based decoding strategies bring astonishing diversity to generation, but also pose robustness issues such as typos, omissions and repetition. In addition, the high sampling rate of audio also brings h…
▽ More
With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis. However, sampling based decoding strategies bring astonishing diversity to generation, but also pose robustness issues such as typos, omissions and repetition. In addition, the high sampling rate of audio also brings huge computational overhead to the inference process of autoregression. To address these issues, we propose VALL-E R, a robust and efficient zero-shot TTS system, building upon the foundation of VALL-E. Specifically, we introduce a phoneme monotonic alignment strategy to strengthen the connection between phonemes and acoustic sequence, ensuring a more precise alignment by constraining the acoustic tokens to match their associated phonemes. Furthermore, we employ a codec-merging approach to downsample the discrete codes in shallow quantization layer, thereby accelerating the decoding speed while preserving the high quality of speech output. Benefiting from these strategies, VALL-E R obtains controllablity over phonemes and demonstrates its strong robustness by approaching the WER of ground truth. In addition, it requires fewer autoregressive steps, with over 60% time reduction during inference. This research has the potential to be applied to meaningful projects, including the creation of speech for those affected by aphasia. Audio samples will be available at: https://aka.ms/valler.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Strong and weak $CP$ tests in sequential decays of polarized $Σ^0$ hyperons
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The wea…
▽ More
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The weak-$CP$ test is performed in the subsequent decays of their daughter particles $Λ$ and $\barΛ$. Also for the first time, the transverse polarizations of the $Σ^0$ hyperons in $J/ψ$ and $ψ(3686)$ decays are observed with opposite directions, and the ratios between the S-wave and D-wave contributions of the $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ decays are obtained. These results are crucial to understand the decay dynamics of the charmonium states and the production mechanism of the $Σ^0-\barΣ^0$ pairs.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Measurement of the integrated luminosity of the data collected at 3.773 GeV by BESIII from 2021 to 2024
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$,…
▽ More
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$, $8.157 \pm 0.031$~fb$^{-1}$, and $4.191 \pm 0.016$~fb$^{-1}$, respectively, by analyzing large angle Bhabha scattering events. The uncertainties are dominated by systematic effects and the statistical uncertainties are negligible. Our results provide essential input for future analyses and precision measurements.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Authors:
Sanyuan Chen,
Shujie Liu,
Long Zhou,
Yanqing Liu,
Xu Tan,
Jinyu Li,
Sheng Zhao,
Yao Qian,
Furu Wei
Abstract:
This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration introduces two significant enhancements: Repetition Aware Sampling refines the original nucleus sampling process by accounting for token repetition in…
▽ More
This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration introduces two significant enhancements: Repetition Aware Sampling refines the original nucleus sampling process by accounting for token repetition in the decoding history. It not only stabilizes the decoding but also circumvents the infinite loop issue. Grouped Code Modeling organizes codec codes into groups to effectively shorten the sequence length, which not only boosts inference speed but also addresses the challenges of long sequence modeling. Our experiments on the LibriSpeech and VCTK datasets show that VALL-E 2 surpasses previous systems in speech robustness, naturalness, and speaker similarity. It is the first of its kind to reach human parity on these benchmarks. Moreover, VALL-E 2 consistently synthesizes high-quality speech, even for sentences that are traditionally challenging due to their complexity or repetitive phrases. The advantages of this work could contribute to valuable endeavors, such as generating speech for individuals with aphasia or people with amyotrophic lateral sclerosis. See https://aka.ms/valle2 for demos of VALL-E 2.
△ Less
Submitted 17 June, 2024; v1 submitted 8 June, 2024;
originally announced June 2024.
-
Measurements of the branching fractions of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^-π^0/η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for…
▽ More
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for $h_c \to K^+ K^- π^0$ and $h_c \to K^+ K^- η$ are found with significances of $3.5σ$ and $3.3σ$, respectively, after considering the systematic uncertainties. The branching fractions of these decays are measured to be $\mathcal{B}(h_c \to π^+ π^- π^0)=(1.36\pm0.16\pm0.14)\times10^{-3}$, $\mathcal{B}(h_c \to K^+ K^- π^0)=(3.26\pm0.84\pm0.36)\times10^{-4}$, and $\mathcal{B}(h_c \to K^+ K^- η)=(3.13\pm1.08\pm0.38)\times10^{-4}$, where the first uncertainties are statistical and the second are systematic. No significant signal of $h_c\toπ^+π^-η$ is found, and the upper limit of its decay branching fraction is determined to be $\mathcal{B}(h_c\toπ^+π^-η) < 4.0 \times 10^{-4}$ at 90% confidence level.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking
Authors:
Lijun Zhou,
Tao Tang,
Pengkun Hao,
Zihang He,
Kalok Ho,
Shuo Gu,
Wenbo Hou,
Zhihui Hao,
Haiyang Sun,
Kun Zhan,
Peng Jia,
Xianpeng Lang,
Xiaodan Liang
Abstract:
3D multiple object tracking (MOT) plays a crucial role in autonomous driving perception. Recent end-to-end query-based trackers simultaneously detect and track objects, which have shown promising potential for the 3D MOT task. However, existing methods overlook the uncertainty issue, which refers to the lack of precise confidence about the state and location of tracked objects. Uncertainty arises…
▽ More
3D multiple object tracking (MOT) plays a crucial role in autonomous driving perception. Recent end-to-end query-based trackers simultaneously detect and track objects, which have shown promising potential for the 3D MOT task. However, existing methods overlook the uncertainty issue, which refers to the lack of precise confidence about the state and location of tracked objects. Uncertainty arises owing to various factors during motion observation by cameras, especially occlusions and the small size of target objects, resulting in an inaccurate estimation of the object's position, label, and identity. To this end, we propose an Uncertainty-Aware 3D MOT framework, UA-Track, which tackles the uncertainty problem from multiple aspects. Specifically, we first introduce an Uncertainty-aware Probabilistic Decoder to capture the uncertainty in object prediction with probabilistic attention. Secondly, we propose an Uncertainty-guided Query Denoising strategy to further enhance the training process. We also utilize Uncertainty-reduced Query Initialization, which leverages predicted 2D object location and depth information to reduce query uncertainty. As a result, our UA-Track achieves state-of-the-art performance on the nuScenes benchmark, i.e., 66.3% AMOTA on the test split, surpassing the previous best end-to-end solution by a significant margin of 8.9% AMOTA.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Reciprocal Reward Influence Encourages Cooperation From Self-Interested Agents
Authors:
John L. Zhou,
Weizhe Hong,
Jonathan C. Kao
Abstract:
Emergent cooperation among self-interested individuals is a widespread phenomenon in the natural world, but remains elusive in interactions between artificially intelligent agents. Instead, naïve reinforcement learning algorithms typically converge to Pareto-dominated outcomes in even the simplest of social dilemmas. An emerging class of opponent-shaping methods have demonstrated the ability to re…
▽ More
Emergent cooperation among self-interested individuals is a widespread phenomenon in the natural world, but remains elusive in interactions between artificially intelligent agents. Instead, naïve reinforcement learning algorithms typically converge to Pareto-dominated outcomes in even the simplest of social dilemmas. An emerging class of opponent-shaping methods have demonstrated the ability to reach prosocial outcomes by influencing the learning of other agents. However, they rely on higher-order derivatives through the predicted learning step of other agents or learning meta-game dynamics, which in turn rely on stringent assumptions over opponent learning rules or exponential sample complexity, respectively. To provide a learning rule-agnostic and sample-efficient alternative, we introduce Reciprocators, reinforcement learning agents which are intrinsically motivated to reciprocate the influence of an opponent's actions on their returns. This approach effectively seeks to modify other agents' $Q$-values by increasing their return following beneficial actions (with respect to the Reciprocator) and decreasing it after detrimental actions, guiding them towards mutually beneficial actions without attempting to directly shape policy updates. We show that Reciprocators can be used to promote cooperation in a variety of temporally extended social dilemmas during simultaneous learning.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation
Authors:
Enhui Ma,
Lijun Zhou,
Tao Tang,
Zhan Zhang,
Dong Han,
Junpeng Jiang,
Kun Zhan,
Peng Jia,
Xianpeng Lang,
Haiyang Sun,
Di Lin,
Kaicheng Yu
Abstract:
Using generative models to synthesize new data has become a de-facto standard in autonomous driving to address the data scarcity issue. Though existing approaches are able to boost perception models, we discover that these approaches fail to improve the performance of planning of end-to-end autonomous driving models as the generated videos are usually less than 8 frames and the spatial and tempora…
▽ More
Using generative models to synthesize new data has become a de-facto standard in autonomous driving to address the data scarcity issue. Though existing approaches are able to boost perception models, we discover that these approaches fail to improve the performance of planning of end-to-end autonomous driving models as the generated videos are usually less than 8 frames and the spatial and temporal inconsistencies are not negligible. To this end, we propose Delphi, a novel diffusion-based long video generation method with a shared noise modeling mechanism across the multi-views to increase spatial consistency, and a feature-aligned module to achieves both precise controllability and temporal consistency. Our method can generate up to 40 frames of video without loss of consistency which is about 5 times longer compared with state-of-the-art methods. Instead of randomly generating new data, we further design a sampling policy to let Delphi generate new data that are similar to those failure cases to improve the sample efficiency. This is achieved by building a failure-case driven framework with the help of pre-trained visual language models. Our extensive experiment demonstrates that our Delphi generates a higher quality of long videos surpassing previous state-of-the-art methods. Consequentially, with only generating 4% of the training dataset size, our framework is able to go beyond perception and prediction tasks, for the first time to the best of our knowledge, boost the planning performance of the end-to-end autonomous driving model by a margin of 25%.
△ Less
Submitted 6 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Measurements of the branching fractions of semileptonic $D^{+}_s$ decays via $e^+e^-\to D_s^{*+}D_s^{*-}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
We measure the absolute branching fractions of semileptonic $D^+_s$ decays via the $e^+e^-\to D_s^{*+}D_s^{*-}$ process using $e^+e^-$ collision data corresponding to an integrated luminosity of $10.64~\mathrm{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies between 4.237 and 4.699 GeV. The branching fractions are…
▽ More
We measure the absolute branching fractions of semileptonic $D^+_s$ decays via the $e^+e^-\to D_s^{*+}D_s^{*-}$ process using $e^+e^-$ collision data corresponding to an integrated luminosity of $10.64~\mathrm{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies between 4.237 and 4.699 GeV. The branching fractions are ${\mathcal B}(D_s^+\to ηe^+ν_e)=(2.35\pm0.11_{\rm stat}\pm 0.10_{\rm syst})\%,$ ${\mathcal
B}(D_s^+\to η^\prime e^+ν_e)=(0.82\pm0.09_{\rm stat}\pm 0.04_{\rm syst})\%,$ ${\mathcal B}(D_s^+\to φe^+ν_e)=(2.21\pm0.16_{\rm stat}\pm 0.11_{\rm syst})\%,$ ${\mathcal B}(D_s^+\to f_0(980) e^+ν_e,f_0(980)\toπ^+π^-)=(0.15\pm0.02_{\rm stat}\pm 0.01_{\rm syst})\%,$ ${\mathcal
B}(D_s^+\to K^0 e^+ν_e)=(0.24\pm0.04_{\rm stat}\pm 0.01_{\rm syst})\%,$ and ${\mathcal B}(D_s^+\to K^{*0} e^+ν_e)=(0.19\pm0.03_{\rm stat}\pm 0.01_{\rm syst})\%.$ These results are consistent with those measured via the $e^+e^-\to D_s^{*\pm}D_s^{\mp}$ process by BESIII and CLEO. The hadronic transition form factors $D^+_s\to ηe^+ν_e$, $D^+_s\to η^\prime e^+ν_e$, and $D^+_s\to K^0 e^+ν_e$ at four-momentum transfer squared $q^2$ = 0 are determined to be $f^η_+(0) = 0.482 \pm 0.011_{\rm stat} \pm 0.009_{\rm syst}\pm0.004_{\rm input},$ $f^{η^{\prime}}_+(0) = 0.562 \pm 0.031_{\rm stat} \pm 0.014_{\rm
syst}\pm0.003_{\rm input},$ and $f^{K^0}_+(0) = 0.624 \pm 0.052_{\rm
stat} \pm 0.013_{\rm syst}\pm0.002_{\rm input}.$
△ Less
Submitted 4 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
J. Cheng,
Y. -C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng
, et al. (177 additional authors not shown)
Abstract:
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive…
▽ More
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Search for $e^{+}e^{-}\toη'ψ(2S)$ at center-of-mass energies from 4.66 to 4.95 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using data samples with an integrated luminosity of $4.67~\mathrm{fb}^{-1}$ collected by the BESIII detector operating at the BEPCII collider, we search for the process $e^+e^- \rightarrow η' ψ(2S)$ at center-of-mass energies from $4.66$ to $4.95~\mathrm{GeV}$. No significant signal is observed, and upper limits for the Born cross sections $σ^B(e^+e^-\rightarrowη'ψ(2S))$ at the 90\% confidence lev…
▽ More
Using data samples with an integrated luminosity of $4.67~\mathrm{fb}^{-1}$ collected by the BESIII detector operating at the BEPCII collider, we search for the process $e^+e^- \rightarrow η' ψ(2S)$ at center-of-mass energies from $4.66$ to $4.95~\mathrm{GeV}$. No significant signal is observed, and upper limits for the Born cross sections $σ^B(e^+e^-\rightarrowη'ψ(2S))$ at the 90\% confidence level are determined.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Realization of a cold atom gyroscope in space
Authors:
Jinting Li,
Xi Chen,
Danfang Zhang,
Wenzhang Wang,
Yang Zhou,
Meng He,
Jie Fang,
Lin Zhou,
Chuan He,
Junjie Jiang,
Huanyao Sun,
Qunfeng Chen,
Lei Qin,
Xiao Li,
Yibo Wang,
Xiaowei Zhang,
Jiaqi Zhong,
Runbing Li,
Meizhen An,
Long Zhang,
Shuquan Wang,
Zongfeng Li,
Jin Wang,
Mingsheng Zhan
Abstract:
High precision gyroscopes in space are important for sophisticated scientific experiments and deep space navigation. Microgravity in the space provides an ideal condition for operation of a cold atom gyroscope. To demonstrate this advantage, an atom interferometer (AI) was launched and installed in the China Space Station in 2022. Here reported is a realization of the cold atom gyroscope with this…
▽ More
High precision gyroscopes in space are important for sophisticated scientific experiments and deep space navigation. Microgravity in the space provides an ideal condition for operation of a cold atom gyroscope. To demonstrate this advantage, an atom interferometer (AI) was launched and installed in the China Space Station in 2022. Here reported is a realization of the cold atom gyroscope with this AI. By applying point source interferometry, spatial fringes are obtained and acceleration and rotation are extracted. The angles of the Raman lasers are precisely calibrated to avoid measurement error, and other systematic errors are also considered for the rotation measurement. The evaluated rotation measurement is (-115.64+/-1.71)*10^-5 rad/s in space, and an acceleration measurement resolution of 1.03*10^-6 m/s^2 is also obtained for a single image. This study conducts the first AI-based gyroscope in space and paves a way for future space-based AI experiments.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Study of the decays $χ_{cJ} \rightarrow Λ\barΛφ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (637 additional authors not shown)
Abstract:
Based on $(2712.4 \pm 14.3) \times 10^{6}$ $ e^{+}e^{-}\toψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, we report the first evidence of $χ_{c0}\to Λ\bar Λφ$ decays and the first observation of $χ_{c1,2}\to Λ\bar Λφ$ decays, with significances of $4.5σ$, $11.3σ$ and $13.0σ$, respectively. The decay branching fractions of $χ_{c0,1,2}\to Λ\bar Λφ$ are measured t…
▽ More
Based on $(2712.4 \pm 14.3) \times 10^{6}$ $ e^{+}e^{-}\toψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, we report the first evidence of $χ_{c0}\to Λ\bar Λφ$ decays and the first observation of $χ_{c1,2}\to Λ\bar Λφ$ decays, with significances of $4.5σ$, $11.3σ$ and $13.0σ$, respectively. The decay branching fractions of $χ_{c0,1,2}\to Λ\bar Λφ$ are measured to be $( 2.99\pm1.24\pm0.19) \times 10^{-5}$, $(6.01\pm0.90\pm0.40 )\times 10^{-5}$, and $(7.13\pm0.81\pm0.36) \times 10^{-5}$, where the first uncertainties are statistical and the second systematic. No obvious enhancement near the $Λ\barΛ$ production threshold or excited $Λ$ state is found in the $Λφ$ (or $\barΛφ$) system.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Emergent Oscillating bound states in a semi-infinite linear waveguide with a point-like $Λ$-type quantum emitter driven by a classical field
Authors:
YuPing He,
Ge Sun,
Jing Li,
Ya Yang,
Ling Lu,
Lan Zhou
Abstract:
An oscillating bound state is a phenomenon where excitations mediated by the continuum modes oscillate persistently. Although it is generated by the superposition of two bound states in the continuum (BICs), such phenomenon is said to be unique to giant atoms. We present the phenomenon of an oscillating bound state with an alternative waveguide QED system, which is a one-dimensional semi-infinite…
▽ More
An oscillating bound state is a phenomenon where excitations mediated by the continuum modes oscillate persistently. Although it is generated by the superposition of two bound states in the continuum (BICs), such phenomenon is said to be unique to giant atoms. We present the phenomenon of an oscillating bound state with an alternative waveguide QED system, which is a one-dimensional semi-infinite waveguide coupled to a \textit{point-like} quantum emitter. This \textit{point-like} quantum emitter is $Λ$-type quantum system with one transition driven by a classical field.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
Authors:
Chenyang Le,
Yao Qian,
Dongmei Wang,
Long Zhou,
Shujie Liu,
Xiaofei Wang,
Midia Yousefi,
Yanmin Qian,
Jinyu Li,
Sheng Zhao,
Michael Zeng
Abstract:
There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeline framework by concatenating speech recognition, machine translation and text-to-speech models. The primary challenges stem from the inherent complex…
▽ More
There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeline framework by concatenating speech recognition, machine translation and text-to-speech models. The primary challenges stem from the inherent complexities involved in direct translation tasks and the scarcity of data. In this study, we introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion yet facilitates end-to-end inference through joint probability. Furthermore, we propose two separated encoders to preserve the speaker's voice characteristics and isochrony from the source speech during the translation process, making it highly suitable for scenarios such as video dubbing. Our experiments on the French-English language pair demonstrate that our model outperforms the current state-of-the-art speech-to-speech translation model.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
JUNO Sensitivity to Invisible Decay Modes of Neutrons
Authors:
JUNO Collaboration,
Angel Abusleme,
Thomas Adam,
Kai Adamowicz,
Shakeel Ahmad,
Rizwan Ahmed,
Sebastiano Aiello,
Fengpeng An,
Qi An,
Giuseppe Andronico,
Nikolay Anfimov,
Vito Antonelli,
Tatiana Antoshkina,
João Pedro Athayde Marcondes de André,
Didier Auguste,
Weidong Bai,
Nikita Balashov,
Wander Baldini,
Andrea Barresi,
Davide Basilico,
Eric Baussan,
Marco Bellato,
Marco Beretta,
Antonio Bergnoli,
Daniel Bick
, et al. (635 additional authors not shown)
Abstract:
We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode…
▽ More
We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.