subscribe to arXiv mailings

Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation

Authors: Jingxuan Wei, Linzhuang Sun, Yichong Leng, Xu Tan, Bihui Yu, Ruifeng Guo

Abstract: Knowledge distillation, transferring knowledge from a teacher model to a student model, has emerged as a powerful technique in neural machine translation for compressing models or simplifying training targets. Knowledge distillation encompasses two primary methods: sentence-level distillation and token-level distillation. In sentence-level distillation, the student model is trained to align with t… ▽ More Knowledge distillation, transferring knowledge from a teacher model to a student model, has emerged as a powerful technique in neural machine translation for compressing models or simplifying training targets. Knowledge distillation encompasses two primary methods: sentence-level distillation and token-level distillation. In sentence-level distillation, the student model is trained to align with the output of the teacher model, which can alleviate the training difficulty and give student model a comprehensive understanding of global structure. Differently, token-level distillation requires the student model to learn the output distribution of the teacher model, facilitating a more fine-grained transfer of knowledge. Studies have revealed divergent performances between sentence-level and token-level distillation across different scenarios, leading to the confusion on the empirical selection of knowledge distillation methods. In this study, we argue that token-level distillation, with its more complex objective (i.e., distribution), is better suited for ``simple'' scenarios, while sentence-level distillation excels in ``complex'' scenarios. To substantiate our hypothesis, we systematically analyze the performance of distillation methods by varying the model size of student models, the complexity of text, and the difficulty of decoding procedure. While our experimental results validate our hypothesis, defining the complexity level of a given scenario remains a challenging task. So we further introduce a novel hybrid method that combines token-level and sentence-level distillation through a gating mechanism, aiming to leverage the advantages of both individual methods. Experiments demonstrate that the hybrid method surpasses the performance of token-level or sentence-level distillation methods and the previous works by a margin, demonstrating the effectiveness of the proposed hybrid method. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14710 [pdf, other]

Challenges of Using Pre-trained Models: the Practitioners' Perspective

Authors: Xin Tan, Taichuan Li, Ruohe Chen, Fang Liu, Li Zhang

Abstract: The challenges associated with using pre-trained models (PTMs) have not been specifically investigated, which hampers their effective utilization. To address this knowledge gap, we collected and analyzed a dataset of 5,896 PTM-related questions on Stack Overflow. We first analyze the popularity and difficulty trends of PTM-related questions. We find that PTM-related questions are becoming more and… ▽ More The challenges associated with using pre-trained models (PTMs) have not been specifically investigated, which hampers their effective utilization. To address this knowledge gap, we collected and analyzed a dataset of 5,896 PTM-related questions on Stack Overflow. We first analyze the popularity and difficulty trends of PTM-related questions. We find that PTM-related questions are becoming more and more popular over time. However, it is noteworthy that PTM-related questions not only have a lower response rate but also exhibit a longer response time compared to many well-researched topics in software engineering. This observation emphasizes the significant difficulty and complexity associated with the practical application of PTMs. To delve into the specific challenges, we manually annotate 430 PTM-related questions, categorizing them into a hierarchical taxonomy of 42 codes (i.e., leaf nodes) and three categories. This taxonomy encompasses many PTM prominent challenges such as fine-tuning, output understanding, and prompt customization, which reflects the gaps between current techniques and practical needs. We discuss the implications of our study for PTM practitioners, vendors, and educators, and suggest possible directions and solutions for future research. △ Less

Submitted 1 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: SANER 2024

arXiv:2404.14700 [pdf, other]

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Authors: Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Qifeng Liu, Yike Guo, Wei Xue

Abstract: Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large… ▽ More Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large-scale zero-shot speech synthesis system with approximately 5\% of the inference time compared with previous work. FlashSpeech is built on the latent consistency model and applies a novel adversarial consistency training approach that can train from scratch without the need for a pre-trained diffusion model as the teacher. Furthermore, a new prosody generator module enhances the diversity of prosody, making the rhythm of the speech sound more natural. The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation. Our experimental results demonstrate the superior performance of FlashSpeech. Notably, FlashSpeech can be about 20 times faster than other zero-shot speech synthesis systems while maintaining comparable performance in terms of voice quality and similarity. Furthermore, FlashSpeech demonstrates its versatility by efficiently performing tasks like voice conversion, speech editing, and diverse speech sampling. Audio samples can be found in https://flashspeech.github.io/. △ Less

Submitted 24 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: Efficient zero-shot speech synthesis

arXiv:2404.13659 [pdf, other]

LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing

Authors: Tong Wang, Guanzhou Chen, Xiaodong Zhang, Chenxi Liu, Xiaoliang Tan, Jiaqi Wang, Chanjuan He, Wenlin Zhou

Abstract: Despite the rapid evolution of semantic segmentation for land cover classification in high-resolution remote sensing imagery, integrating multiple data modalities such as Digital Surface Model (DSM), RGB, and Near-infrared (NIR) remains a challenge. Current methods often process only two types of data, missing out on the rich information that additional modalities can provide. Addressing this gap,… ▽ More Despite the rapid evolution of semantic segmentation for land cover classification in high-resolution remote sensing imagery, integrating multiple data modalities such as Digital Surface Model (DSM), RGB, and Near-infrared (NIR) remains a challenge. Current methods often process only two types of data, missing out on the rich information that additional modalities can provide. Addressing this gap, we propose a novel \textbf{L}ightweight \textbf{M}ultimodal data \textbf{F}usion \textbf{Net}work (LMFNet) to accomplish the tasks of fusion and semantic segmentation of multimodal remote sensing images. LMFNet uniquely accommodates various data types simultaneously, including RGB, NirRG, and DSM, through a weight-sharing, multi-branch vision transformer that minimizes parameter count while ensuring robust feature extraction. Our proposed multimodal fusion module integrates a \textit{Multimodal Feature Fusion Reconstruction Layer} and \textit{Multimodal Feature Self-Attention Fusion Layer}, which can reconstruct and fuse multimodal features. Extensive testing on public datasets such as US3D, ISPRS Potsdam, and ISPRS Vaihingen demonstrates the effectiveness of LMFNet. Specifically, it achieves a mean Intersection over Union ($mIoU$) of 85.09\% on the US3D dataset, marking a significant improvement over existing methods. Compared to unimodal approaches, LMFNet shows a 10\% enhancement in $mIoU$ with only a 0.5M increase in parameter count. Furthermore, against bimodal methods, our approach with trilateral inputs enhances $mIoU$ by 0.46 percentage points. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13276 [pdf]

Giant Rashba-Splitting of One-Dimensional Metallic States in Bi Dimer Lines on InAs(100)

Authors: Polina M. Sheverdyaeva, Gustav Bihlmayer, Silvio Modesti, Vitaliy Feyer, Matteo Jugovac, Giovanni Zamborlini, Christian Tusche, Ying-Jiun Chen, Xin Liang Tan, Kenta Hagiwara, Luca Petaccia, Sangeeta Thakur, Asish K. Kundu, Carlo Carbone, Paolo Moras

Abstract: Bismuth produces different types of ordered superstructures on the InAs(100) surface, depending on the growth procedure and coverage. The (2x1) phase forms at completion of a Bi monolayer and consists of a uniformly oriented array of parallel lines of Bi dimers. Scanning tunneling and core level spectroscopies demonstrate its metallic character, in contrast with the semiconducting properties expec… ▽ More Bismuth produces different types of ordered superstructures on the InAs(100) surface, depending on the growth procedure and coverage. The (2x1) phase forms at completion of a Bi monolayer and consists of a uniformly oriented array of parallel lines of Bi dimers. Scanning tunneling and core level spectroscopies demonstrate its metallic character, in contrast with the semiconducting properties expected on the basis of the electron counting principle. The weak electronic coupling among neighboring lines gives rise to quasi one-dimensional Bi-derived bands with open contours at the Fermi level. Spin- and angle-resolved photoelectron spectroscopy reveals a giant Rashba splitting of these bands, in good agreement with ab-initio electronic structure calculations. The very high density of the dimer lines, the metallic and quasi one-dimensional band dispersion and the Rashba-like spin texture make the Bi/InAs(100)-(2x1) phase an intriguing system, where novel transport regimes can be studied. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 4 figures, includes supplemental material file

arXiv:2404.13258 [pdf, ps, other]

Human Motor Learning Dynamics in High-dimensional Tasks

Authors: Ankur Kamboj, Rajiv Ranganathan, Xiaobo Tan, Vaibhav Srivastava

Abstract: Conventional approaches to enhancing movement coordination, such as providing instructions and visual feedback, are often inadequate in complex motor tasks with multiple degrees of freedom (DoFs). To effectively address coordination deficits in such complex motor systems, it becomes imperative to develop interventions grounded in a model of human motor learning; however, modeling such learning pro… ▽ More Conventional approaches to enhancing movement coordination, such as providing instructions and visual feedback, are often inadequate in complex motor tasks with multiple degrees of freedom (DoFs). To effectively address coordination deficits in such complex motor systems, it becomes imperative to develop interventions grounded in a model of human motor learning; however, modeling such learning processes is challenging due to the large DoFs. In this paper, we present a computational motor learning model that leverages the concept of motor synergies to extract low-dimensional learning representations in the high-dimensional motor space and the internal model theory of motor control to capture both fast and slow motor learning processes. We establish the model's convergence properties and validate it using data from a target capture game played by human participants. We study the influence of model parameters on several motor learning trade-offs such as speed-accuracy, exploration-exploitation, satisficing, and flexibility-performance, and show that the human motor learning system tunes these parameters to optimize learning and various output performance metrics. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 22 pages (single column), 9 figures

arXiv:2404.12964 [pdf, ps, other]

On the McKean-Vlasov SDE with branching

Authors: Julien Claisse, Jiazhi Kang, Xiaolu Tan

Abstract: We study a nonlinear branching diffusion process in the sense of McKean, i.e., where particles are subjected to a mean-field interaction. We consider first a strong formulation of the problem and we provide an existence and uniqueness result by using contraction arguments. Then we consider the notion of weak solution and its equivalent martingale problem formulation. In this setting, we provide a… ▽ More We study a nonlinear branching diffusion process in the sense of McKean, i.e., where particles are subjected to a mean-field interaction. We consider first a strong formulation of the problem and we provide an existence and uniqueness result by using contraction arguments. Then we consider the notion of weak solution and its equivalent martingale problem formulation. In this setting, we provide a general weak existence result, as well as a propagation of chaos property, i.e., the McKean-Vlasov branching diffusion is the limit of a large population branching process with mean-field interaction when the population size grows to infinity. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12000 [pdf, other]

How far are AI-powered programming assistants from meeting developers' needs?

Authors: Xin Tan, Xiao Long, Xianjun Ni, Yinghao Zhu, Jing Jiang, Li Zhang

Abstract: Recent In-IDE AI coding assistant tools (ACATs) like GitHub Copilot have significantly impacted developers' coding habits. While some studies have examined their effectiveness, there lacks in-depth investigation into the actual assistance process. To bridge this gap, we simulate real development scenarios encompassing three typical types of software development tasks and recruit 27 computer scienc… ▽ More Recent In-IDE AI coding assistant tools (ACATs) like GitHub Copilot have significantly impacted developers' coding habits. While some studies have examined their effectiveness, there lacks in-depth investigation into the actual assistance process. To bridge this gap, we simulate real development scenarios encompassing three typical types of software development tasks and recruit 27 computer science students to investigate their behavior with three popular ACATs. Our goal is to comprehensively assess ACATs' effectiveness, explore characteristics of recommended code, identify reasons for modifications, and understand users' challenges and expectations. To facilitate the study, we develop an experimental platform that includes a data collection plugin for VSCode IDE and provides functions for screen recording, code evaluation, and automatic generation of personalized interview and survey questions. Through analysis of the collected data, we find that ACATs generally enhance task completion rates, reduce time, improve code quality, and increase self-perceived productivity. However, the improvement is influenced by both the nature of coding tasks and users' experience level. Notably, for experienced participants, the use of ACATs may even increase completion time. We observe that "edited line completion" is the most frequently recommended way, while "comments completion" and "string completion" have the lowest acceptance rates. The primary reasons for modifying recommended code are disparities between output formats and requirements, flawed logic, and inconsistent code styles. In terms of challenges and expectations, optimization of service access and help documentation is also concerned by participants except for functionality and performance. Our study provides valuable insights into the effectiveness and usability of ACATs, informing further improvements in their design and implementation. △ Less

Submitted 24 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.08087 [pdf, other]

Time-resolved Hubble Space Telescope Wide Field Camera 3 Spectrophotometry Reveals Inefficient Day-to-Night Heat Redistribution in the Highly Irradiated Brown Dwarf SDSS 1557B

Authors: Rachael C. Amaro, Daniel Apai, Ben W. P. Lew, Yifan Zhou, Joshua D. Lothringer, Sarah L. Casewell, Xianyu Tan, Travis Barman, Mark S. Marley, L. C. Mayorga, Vivien Parmentier

Abstract: Brown dwarfs in ultra-short period orbits around white dwarfs offer a unique opportunity to study the properties of tidally-locked, fast rotating (1-3 hr), and highly-irradiated atmospheres. Here, we present phase-resolved spectrophotometry of the white dwarf-brown dwarf (WD-BD) binary SDSS 1557, which is the fifth WD-BD binary in our six-object sample. Using the Hubble Space Telescope Wide Field… ▽ More Brown dwarfs in ultra-short period orbits around white dwarfs offer a unique opportunity to study the properties of tidally-locked, fast rotating (1-3 hr), and highly-irradiated atmospheres. Here, we present phase-resolved spectrophotometry of the white dwarf-brown dwarf (WD-BD) binary SDSS 1557, which is the fifth WD-BD binary in our six-object sample. Using the Hubble Space Telescope Wide Field Camera 3 Near-infrared G141 instrument, the 1.1 to 1.7 $μ$m phase curves show rotational modulations with semi-amplitudes of 10.5$\pm$0.1%. We observe a wavelength dependent amplitude, with longer wavelengths producing larger amplitudes, while no wavelength dependent phase shifts were identified. The phase-resolved extracted BD spectra exhibit steep slopes and are nearly featureless. A simple radiative energy redistribution atmospheric model recreates the hemisphere integrated brightness temperatures at three distinct phases and finds evidence for weak redistribution efficiency. Our model also predicts a higher inclination than previously published. We find that SDSS 1557B, the second most irradiated BD in our sample, is likely dominated by clouds on the night side, whereas the featureless day side spectrum is likely dominated by H$^-$ opacity and a temperature inversion, much like the other highly-irradiated BD EPIC2122B. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 19 pages and 11 figures. Accepted to Astrophysical Journal

arXiv:2404.07787 [pdf, other]

Research on fine co-focus adjustment method for segmented solar telescope

Authors: Kunyan Wang, Yichun Dai, Bin Wang, Xu Tan, Dehua Yang, Zhenyu Jin

Abstract: For segmented telescopes, achieving fine co-focus adjustment is essential for realizing co-phase adjustment and maintenance, which involves adjusting the millimeter-scale piston between segments to fall within the capture range of the co-phase detection system. CGST proposes using a SHWFS for piston detection during the co-focus adjustment stage. However, the residual piston after adjustment excee… ▽ More For segmented telescopes, achieving fine co-focus adjustment is essential for realizing co-phase adjustment and maintenance, which involves adjusting the millimeter-scale piston between segments to fall within the capture range of the co-phase detection system. CGST proposes using a SHWFS for piston detection during the co-focus adjustment stage. However, the residual piston after adjustment exceeds the capture range of the broadband PSF phasing algorithm$(\pm 30 μm) $, and the multi-wavelength PSF algorithm requires even higher precision in co-focus adjustment. To improve the co-focus adjustment accuracy of CGST, a fine co-focus adjustment based on cross-calibration is proposed. This method utilizes a high-precision detector to calibrate and fit the measurements from the SHWFS, thereby reducing the impact of atmospheric turbulence and systematic errors on piston measurement accuracy during co-focus adjustment. Simulation results using CGST demonstrate that the proposed method significantly enhances adjustment accuracy compared to the SHWFS detection method. Additionally, the residual piston after fine co-focus adjustment using this method falls within the capture range of the multi-wavelength PSF algorithm. To verify the feasibility of this method, experiments were conducted on an 800mm ring segmented mirror system, successfully achieving fine co-focus adjustment where the remaining piston of all segments fell within $\pm 15 μm$. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07609 [pdf, other]

Achieving violation-free distributed optimization under coupling constraints

Authors: Changxin Liu, Xiao Tan, Xuyang Wu, Dimos V. Dimarogonas, Karl H. Johansson

Abstract: Constraint satisfaction is a critical component in a wide range of engineering applications, including but not limited to safe multi-agent control and economic dispatch in power systems. This study explores violation-free distributed optimization techniques for problems characterized by separable objective functions and coupling constraints. First, we incorporate auxiliary decision variables toget… ▽ More Constraint satisfaction is a critical component in a wide range of engineering applications, including but not limited to safe multi-agent control and economic dispatch in power systems. This study explores violation-free distributed optimization techniques for problems characterized by separable objective functions and coupling constraints. First, we incorporate auxiliary decision variables together with a network-dependent linear mapping to each coupling constraint. For the reformulated problem, we show that the projection of its feasible set onto the space of primal variables is identical to that of the original problem, which is the key to achieving all-time constraint satisfaction. Upon treating the reformulated problem as a min-min optimization problem with respect to auxiliary and primal variables, we demonstrate that the gradients in the outer minimization problem have a locally computable closed-form. Then, two violation-free distributed optimization algorithms are developed and their convergence under reasonable assumptions is analyzed. Finally, the proposed algorithm is applied to implement a control barrier function based controller in a distributed manner, and the results verify its effectiveness. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 13 pages, 6 figures

arXiv:2404.07571 [pdf, other]

A continuous-time violation-free multi-agent optimization algorithm and its applications to safe distributed control

Authors: Xiao Tan, Changxin Liu, Karl H. Johansson, Dimos V. Dimarogonas

Abstract: In this work, we propose a continuous-time distributed optimization algorithm with guaranteed zero coupling constraint violation and apply it to safe distributed control in the presence of multiple control barrier functions (CBF). The optimization problem is defined over a network that collectively minimizes a separable cost function with coupled linear constraints. An equivalent optimization prob… ▽ More In this work, we propose a continuous-time distributed optimization algorithm with guaranteed zero coupling constraint violation and apply it to safe distributed control in the presence of multiple control barrier functions (CBF). The optimization problem is defined over a network that collectively minimizes a separable cost function with coupled linear constraints. An equivalent optimization problem with auxiliary decision variables and a decoupling structure is proposed. A sensitivity analysis demonstrates that the subgradient information can be computed using local information. This then leads to a subgradient algorithm for updating the auxiliary variables. A case with sparse coupling constraints is further considered, and it is shown to have better memory and communication efficiency. For the specific case of a CBF-induced time-varying quadratic program (QP), an update law is proposed that achieves finite-time convergence. Numerical results involving a static resource allocation problem and a safe coordination problem for a multi-agent system demonstrate the efficiency and effectiveness of our proposed algorithms. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07436 [pdf, other]

Measurement of $e^{+}e^{-}\to ωη^{\prime}$ cross sections at $\sqrt{s}=$ 2.000 to 3.080 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (599 additional authors not shown)

Abstract: The Born cross sections for the process $e^{+}e^{-}\to ωη^{\prime}$ are measured at 22 center-of-mass energies from 2.000 to 3.080 GeV using data collected with the BESIII detector at the BEPCII collider. A resonant structure is observed with a statistical significance of 9.6$σ$. A Breit-Wigner fit determines its mass to be $M_R=(2153\pm30\pm31)~{\rm{MeV}}/c^{2}$ and its width to be… ▽ More The Born cross sections for the process $e^{+}e^{-}\to ωη^{\prime}$ are measured at 22 center-of-mass energies from 2.000 to 3.080 GeV using data collected with the BESIII detector at the BEPCII collider. A resonant structure is observed with a statistical significance of 9.6$σ$. A Breit-Wigner fit determines its mass to be $M_R=(2153\pm30\pm31)~{\rm{MeV}}/c^{2}$ and its width to be $Γ_{R}=(167\pm77\pm7)~\rm{MeV}$, where the first uncertainties are statistical and the second are systematic. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.07149 [pdf, other]

Tianyu: search for the second solar system and explore the dynamic universe

Authors: Fabo Feng, Yicheng Rui, Zhimao Du, Qing Lin, Congcong Zhang, Dan Zhou, Kaiming Cui, Masahiro Ogihara, Ming Yang, Jie Lin, Yongzhi Cai, Taozhi Yang, Xiaoying Pang, Mingjie Jian, Wenxiong Li, Hengxiao Guo, Xian Shi, Jianchun Shi, Jianyang Li, Kangrou Guo, Song Yao, Aming Chen, Peng Jia, Xianyu Tan, James S. Jenkins , et al. (10 additional authors not shown)

Abstract: Giant planets like Jupiter and Saturn, play important roles in the formation and habitability of Earth-like planets. The detection of solar system analogs that have multiple cold giant planets is essential for our understanding of planet habitability and planet formation. Although transit surveys such as Kepler and TESS have discovered thousands of exoplanets, these missions are not sensitive to l… ▽ More Giant planets like Jupiter and Saturn, play important roles in the formation and habitability of Earth-like planets. The detection of solar system analogs that have multiple cold giant planets is essential for our understanding of planet habitability and planet formation. Although transit surveys such as Kepler and TESS have discovered thousands of exoplanets, these missions are not sensitive to long period planets due to their limited observation baseline. The Tianyu project, comprising two 1-meter telescopes (Tianyu-I and II), is designed to detect transiting cold giant planets in order to find solar system analogs. Featuring a large field of view and equipped with a high-speed CMOS camera, Tianyu-I will perform a high-precision photometric survey of about 100 million stars, measuring light curves at hour-long cadence. The candidates found by Tianyu-I will be confirmed by Tianyu-II and other surveys and follow-up facilities through multi-band photometry, spectroscopy, and high resolution imaging. Tianyu telescopes will be situated at an elevation about 4000 meters in Lenghu, China. With a photometric precision of 1% for stars with V < 18 mag, Tianyu is expected to find more than 300 transiting exoplanets, including about 12 cold giant planets, over five years. A five-year survey of Tianyu would discover 1-2 solar system analogs. Moreover, Tianyu is also designed for non-exoplanetary exploration, incorporating multiple survey modes covering timescales from sub-seconds to months, with a particular emphasis on events occurring within the sub-second to hour range. It excels in observing areas such as infant supernovae, rare variable stars and binaries, tidal disruption events, Be stars, cometary activities, and interstellar objects. These discoveries not only enhance our comprehension of the universe but also offer compelling opportunities for public engagement in scientific exploration. △ Less

Submitted 10 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: 48 pages, 16 figures, accepted by Acta Astronomica Sinica

arXiv:2404.06393 [pdf, other]

MuPT: A Generative Symbolic Music Pretrained Transformer

Authors: Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan , et al. (4 additional authors not shown)

Abstract: In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal… ▽ More In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the challenges associated with misaligned measures from different tracks during generation, we propose the development of a Synchronized Multi-Track ABC Notation (SMT-ABC Notation), which aims to preserve coherence across multiple musical tracks. Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set. Furthermore, we explore the implications of the Symbolic Music Scaling Law (SMS Law) on model performance. The results indicate a promising direction for future research in music generation, offering extensive resources for community-led research through our open-source contributions. △ Less

Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.05231 [pdf, other]

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

Authors: Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, Lizhuang Ma

Abstract: The vision-language model has brought great improvement to few-shot industrial anomaly detection, which usually needs to design of hundreds of prompts through prompt engineering. For automated scenarios, we first use conventional prompt learning with many-class paradigm as the baseline to automatically learn prompts but found that it can not work well in one-class anomaly detection. To address the… ▽ More The vision-language model has brought great improvement to few-shot industrial anomaly detection, which usually needs to design of hundreds of prompts through prompt engineering. For automated scenarios, we first use conventional prompt learning with many-class paradigm as the baseline to automatically learn prompts but found that it can not work well in one-class anomaly detection. To address the above problem, this paper proposes a one-class prompt learning method for few-shot anomaly detection, termed PromptAD. First, we propose semantic concatenation which can transpose normal prompts into anomaly prompts by concatenating normal prompts with anomaly suffixes, thus constructing a large number of negative samples used to guide prompt learning in one-class setting. Furthermore, to mitigate the training challenge caused by the absence of anomaly images, we introduce the concept of explicit anomaly margin, which is used to explicitly control the margin between normal prompt features and anomaly prompt features through a hyper-parameter. For image-level/pixel-level anomaly detection, PromptAD achieves first place in 11/12 few-shot settings on MVTec and VisA. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR2024

arXiv:2404.03204 [pdf, other]

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Authors: Detai Xin, Xu Tan, Kai Shen, Zeqian Ju, Dongchao Yang, Yuancheng Wang, Shinnosuke Takamichi, Hiroshi Saruwatari, Shujie Liu, Jinyu Li, Sheng Zhao

Abstract: We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. Th… ▽ More We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. The core idea behind RALL-E is chain-of-thought (CoT) prompting, which decomposes the task into simpler steps to enhance the robustness of LLM-based TTS. To accomplish this idea, RALL-E first predicts prosody features (pitch and duration) of the input text and uses them as intermediate conditions to predict speech tokens in a CoT style. Second, RALL-E utilizes the predicted duration prompt to guide the computing of self-attention weights in Transformer to enforce the model to focus on the corresponding phonemes and prosody features when predicting speech tokens. Results of comprehensive objective and subjective evaluations demonstrate that, compared to a powerful baseline method VALL-E, RALL-E significantly improves the WER of zero-shot TTS from $5.6\%$ (without reranking) and $1.7\%$ (with reranking) to $2.5\%$ and $1.0\%$, respectively. Furthermore, we demonstrate that RALL-E correctly synthesizes sentences that are hard for VALL-E and reduces the error rate from $68\%$ to $4\%$. △ Less

Submitted 19 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.02952 [pdf, other]

Chirality-Driven Orbital Angular Momentum and Circular Dichroism in CoSi

Authors: Stefanie Suzanne Brinkman, Xin Liang Tan, Bjørnulf Brekke, Anders Christian Mathisen, Øyvind Finnseth, Richard Justin Schenk, Kenta Hagiwara, Meng-Jie Huang, Jens Buck, Matthias Kalläne, Moritz Hoesch, Kai Rossnagel, Kui-Hon Ou Yang, Minn-Tsong Lin, Guo-Jiun Shu, Ying-Jiun Chen, Christian Tusche, Hendrik Bentmann

Abstract: Chiral crystals and molecules were recently predicted to form an intriguing platform for unconventional orbital physics. Here, we report the observation of chirality-driven orbital textures in the bulk electronic structure of CoSi, a prototype member of the cubic B20 family of chiral crystals. Using circular dichroism in soft X-ray angle-resolved photoemission, we demonstrate the formation of a bu… ▽ More Chiral crystals and molecules were recently predicted to form an intriguing platform for unconventional orbital physics. Here, we report the observation of chirality-driven orbital textures in the bulk electronic structure of CoSi, a prototype member of the cubic B20 family of chiral crystals. Using circular dichroism in soft X-ray angle-resolved photoemission, we demonstrate the formation of a bulk orbital-angular-momentum texture and monopole-like orbital-momentum locking that depends on crystal handedness. We introduce the intrinsic chiral circular dichroism, icCD, as a differential photoemission observable and a natural probe of chiral electron states. Our findings render chiral crystals promising for spin-orbitronics applications. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: To be published in Physical Review Letters

Report number: QuSpin 2024

arXiv:2404.01532 [pdf, other]

Set-Aligning Framework for Auto-Regressive Event Temporal Graph Generation

Authors: Xingwei Tan, Yuxiang Zhou, Gabriele Pergola, Yulan He

Abstract: Event temporal graphs have been shown as convenient and effective representations of complex temporal relations between events in text. Recent studies, which employ pre-trained language models to auto-regressively generate linearised graphs for constructing event temporal graphs, have shown promising results. However, these methods have often led to suboptimal graph generation as the linearised gr… ▽ More Event temporal graphs have been shown as convenient and effective representations of complex temporal relations between events in text. Recent studies, which employ pre-trained language models to auto-regressively generate linearised graphs for constructing event temporal graphs, have shown promising results. However, these methods have often led to suboptimal graph generation as the linearised graphs exhibit set characteristics which are instead treated sequentially by language models. This discrepancy stems from the conventional text generation objectives, leading to erroneous penalisation of correct predictions caused by the misalignment of elements in target sequences. To address these challenges, we reframe the task as a conditional set generation problem, proposing a Set-aligning Framework tailored for the effective utilisation of Large Language Models (LLMs). The framework incorporates data augmentations and set-property regularisations designed to alleviate text generation loss penalties associated with the linearised graph edge sequences, thus encouraging the generation of more relation edges. Experimental results show that our framework surpasses existing baselines for event temporal graph generation. Furthermore, under zero-shot settings, the structural knowledge introduced through our framework notably improves model generalisation, particularly when the training examples available are limited. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: Accepted to NAACL 2024. 9 + 10 pages

arXiv:2403.19095 [pdf]

Purposeful remixing with generative AI: Constructing designer voice in multimodal composing

Authors: Xiao Tan, Wei Xu, Chaoran Wang

Abstract: Voice, the discursive construction of the writer's identity, has been extensively studied and theorized in composition studies. In multimodal writing, students are able to mobilize both linguistic and non linguistic resources to express their real or imagined identities. But at the same time, when students are limited to choose from available online resources, their voices might be compromised due… ▽ More Voice, the discursive construction of the writer's identity, has been extensively studied and theorized in composition studies. In multimodal writing, students are able to mobilize both linguistic and non linguistic resources to express their real or imagined identities. But at the same time, when students are limited to choose from available online resources, their voices might be compromised due to the incompatibility between their authorial intentions and the existing materials. This study, therefore, investigates whether the use of generative AI tools could help student authors construct a more consistent voice in multimodal writing. In this study, we have designed a photo essay assignment where students recount a story in the form of photo essays and prompt AI image generating tools to create photos for their storytelling. Drawing on interview data, written reflection, written annotation, and multimodal products from seven focal participants, we have identified two remixing practices, through which students attempted to establish a coherent and unique voice in writing. The study sheds light on the intentional and discursive nature of multimodal writing with AI as afforded by the technological flexibility, while also highlighting the practical and ethical challenges that could be attributed to students insufficient prompt and multimodal literacy and the innate limitations of AI systems. This study provides important implications for incorporating AI tools in designing multimodal writing tasks. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.19091 [pdf, other]

Observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (600 additional authors not shown)

Abstract: By analyzing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 2.93 $\rm fb^{-1}$ collected at a center-of-mass energy of 3.773 GeV with the \text{BESIII} detector, the first observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$ is reported. With a dominant hadronic contribution from $K_1(1270)$, the branching fra… ▽ More By analyzing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 2.93 $\rm fb^{-1}$ collected at a center-of-mass energy of 3.773 GeV with the \text{BESIII} detector, the first observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$ is reported. With a dominant hadronic contribution from $K_1(1270)$, the branching fractions are measured to be $\mathcal{B}(D^0\rightarrow {K}_1(1270)^-(\to K^0_Sπ^-π^0)e^+ν_e)=(1.69^{+0.53}_{-0.46}\pm0.15)\times10^{-4}$ and $\mathcal{B}(D^+\to \bar{K}_1(1270)^0(\to K^0_Sπ^+π^-)e^+ν_e)=(1.47^{+0.45}_{-0.40}\pm0.20)\times10^{-4}$ with statistical significance of 5.4$σ$ and 5.6$σ$, respectively. When combined with measurements of the $K_1(1270)\to K^+π^-π$ decays, the absolute branching fractions are determined to be $\mathcal{B}(D^0\to K_1(1270)^-e^+ν_e)=(1.05^{+0.33}_{-0.28}\pm0.12\pm0.12)\times10^{-3}$ and $\mathcal{B}(D^+\to \bar{K}_1(1270)^0e^+ν_e)=(1.29^{+0.40}_{-0.35}\pm0.18\pm0.15)\times10^{-3}$. The first and second uncertainties are statistical and systematic, respectively, and the third uncertainties originate from the assumed branching fractions of the $K_1(1270)\to Kππ$ decays. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 19pages

arXiv:2403.17387 [pdf, other]

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

Authors: Jiacheng Zhang, Jiaming Li, Xiangru Lin, Wei Zhang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

Abstract: We delve into pseudo-labeling for semi-supervised monocular 3D object detection (SSM3OD) and discover two primary issues: a misalignment between the prediction quality of 3D and 2D attributes and the tendency of depth supervision derived from pseudo-labels to be noisy, leading to significant optimization conflicts with other reliable forms of supervision. We introduce a novel decoupled pseudo-labe… ▽ More We delve into pseudo-labeling for semi-supervised monocular 3D object detection (SSM3OD) and discover two primary issues: a misalignment between the prediction quality of 3D and 2D attributes and the tendency of depth supervision derived from pseudo-labels to be noisy, leading to significant optimization conflicts with other reliable forms of supervision. We introduce a novel decoupled pseudo-labeling (DPL) approach for SSM3OD. Our approach features a Decoupled Pseudo-label Generation (DPG) module, designed to efficiently generate pseudo-labels by separately processing 2D and 3D attributes. This module incorporates a unique homography-based method for identifying dependable pseudo-labels in BEV space, specifically for 3D attributes. Additionally, we present a DepthGradient Projection (DGP) module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels, effectively decoupling the depth gradient and removing conflicting gradients. This dual decoupling strategy-at both the pseudo-label generation and gradient levels-significantly improves the utilization of pseudo-labels in SSM3OD. Our comprehensive experiments on the KITTI benchmark demonstrate the superiority of our method over existing approaches. △ Less

Submitted 23 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: To appear in CVPR2024

arXiv:2403.15127 [pdf, other]

Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection

Authors: Jiaming Li, Xiangru Lin, Wei Zhang, Xiao Tan, Yingying Li, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

Abstract: Current semi-supervised object detection (SSOD) algorithms typically assume class balanced datasets (PASCAL VOC etc.) or slightly class imbalanced datasets (MS-COCO, etc). This assumption can be easily violated since real world datasets can be extremely class imbalanced in nature, thus making the performance of semi-supervised object detectors far from satisfactory. Besides, the research for this… ▽ More Current semi-supervised object detection (SSOD) algorithms typically assume class balanced datasets (PASCAL VOC etc.) or slightly class imbalanced datasets (MS-COCO, etc). This assumption can be easily violated since real world datasets can be extremely class imbalanced in nature, thus making the performance of semi-supervised object detectors far from satisfactory. Besides, the research for this problem in SSOD is severely under-explored. To bridge this research gap, we comprehensively study the class imbalance problem for SSOD under more challenging scenarios, thus forming the first experimental setting for class imbalanced SSOD (CI-SSOD). Moreover, we propose a simple yet effective gradient-based sampling framework that tackles the class imbalance problem from the perspective of two types of confirmation biases. To tackle confirmation bias towards majority classes, the gradient-based reweighting and gradient-based thresholding modules leverage the gradients from each class to fully balance the influence of the majority and minority classes. To tackle the confirmation bias from incorrect pseudo labels of minority classes, the class-rebalancing sampling module resamples unlabeled data following the guidance of the gradient-based reweighting module. Experiments on three proposed sub-tasks, namely MS-COCO, MS-COCO to Object365 and LVIS, suggest that our method outperforms current class imbalanced object detectors by clear margins, serving as a baseline for future research in CI-SSOD. Code will be available at https://github.com/nightkeepers/CI-SSOD. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Accepted by ICCV2023

arXiv:2403.15100 [pdf, other]

Subequivariant Reinforcement Learning Framework for Coordinated Motion Control

Authors: Haoyu Wang, Xiaoyu Tan, Xihe Qiu, Chao Qu

Abstract: Effective coordination is crucial for motion control with reinforcement learning, especially as the complexity of agents and their motions increases. However, many existing methods struggle to account for the intricate dependencies between joints. We introduce CoordiGraph, a novel architecture that leverages subequivariant principles from physics to enhance coordination of motion control with rein… ▽ More Effective coordination is crucial for motion control with reinforcement learning, especially as the complexity of agents and their motions increases. However, many existing methods struggle to account for the intricate dependencies between joints. We introduce CoordiGraph, a novel architecture that leverages subequivariant principles from physics to enhance coordination of motion control with reinforcement learning. This method embeds the principles of equivariance as inherent patterns in the learning process under gravity influence, which aids in modeling the nuanced relationships between joints vital for motion control. Through extensive experimentation with sophisticated agents in diverse environments, we highlight the merits of our approach. Compared to current leading methods, CoordiGraph notably enhances generalization and sample efficiency. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 7 pages, 7 figures, 2024 IEEE International Conference on Robotics and Automation

arXiv:2403.10065 [pdf, other]

Triple GNNs: Introducing Syntactic and Semantic Information for Conversational Aspect-Based Quadruple Sentiment Analysis

Authors: Binbin Li, Yuqing Li, Siyu Jia, Bingnan Ma, Yu Ding, Zisen Qi, Xingbang Tan, Menghan Guo, Shenghui Liu

Abstract: Conversational Aspect-Based Sentiment Analysis (DiaASQ) aims to detect quadruples \{target, aspect, opinion, sentiment polarity\} from given dialogues. In DiaASQ, elements constituting these quadruples are not necessarily confined to individual sentences but may span across multiple utterances within a dialogue. This necessitates a dual focus on both the syntactic information of individual utteran… ▽ More Conversational Aspect-Based Sentiment Analysis (DiaASQ) aims to detect quadruples \{target, aspect, opinion, sentiment polarity\} from given dialogues. In DiaASQ, elements constituting these quadruples are not necessarily confined to individual sentences but may span across multiple utterances within a dialogue. This necessitates a dual focus on both the syntactic information of individual utterances and the semantic interaction among them. However, previous studies have primarily focused on coarse-grained relationships between utterances, thus overlooking the potential benefits of detailed intra-utterance syntactic information and the granularity of inter-utterance relationships. This paper introduces the Triple GNNs network to enhance DiaAsQ. It employs a Graph Convolutional Network (GCN) for modeling syntactic dependencies within utterances and a Dual Graph Attention Network (DualGATs) to construct interactions between utterances. Experiments on two standard datasets reveal that our model significantly outperforms state-of-the-art baselines. The code is available at \url{https://github.com/nlperi2b/Triple-GNNs-}. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: Accepted by CSCWD2024

arXiv:2403.07865 [pdf, other]

CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion

Authors: Qibing Ren, Chang Gao, Jing Shao, Junchi Yan, Xin Tan, Wai Lam, Lizhuang Ma

Abstract: The rapid advancement of Large Language Models (LLMs) has brought about remarkable generative capabilities but also raised concerns about their potential misuse. While strategies like supervised fine-tuning and reinforcement learning from human feedback have enhanced their safety, these methods primarily focus on natural languages, which may not generalize to other domains. This paper introduces C… ▽ More The rapid advancement of Large Language Models (LLMs) has brought about remarkable generative capabilities but also raised concerns about their potential misuse. While strategies like supervised fine-tuning and reinforcement learning from human feedback have enhanced their safety, these methods primarily focus on natural languages, which may not generalize to other domains. This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs, presenting a novel environment for testing the safety generalization of LLMs. Our comprehensive studies on state-of-the-art LLMs including GPT-4, Claude-2, and Llama-2 series reveal a new and universal safety vulnerability of these models against code input: CodeAttack bypasses the safety guardrails of all models more than 80\% of the time. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization, such as encoding natural language input with data structures. Furthermore, we give our hypotheses about the success of CodeAttack: the misaligned bias acquired by LLMs during code training, prioritizing code completion over avoiding the potential safety risk. Finally, we analyze potential mitigation measures. These findings highlight new safety risks in the code domain and the need for more robust safety alignment algorithms to match the code capabilities of LLMs. △ Less

Submitted 9 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: ACL Findings 2024, Code is available at https://github.com/renqibing/CodeAttack

arXiv:2403.04467 [pdf]

doi 10.3390/mi14071439

A Magnetic Millirobot Walks on Slippery Biological Surfaces for Targeted Cargo Delivery

Authors: Moonkwang Jeong, Xiangzhou Tan, Felix Fischer, Tian Qiu

Abstract: Small-scale robots hold great potential for targeted cargo delivery in minimally-inv asive medicine. However, current robots often face challenges to locomote efficiently on slip pery biological tissue surfaces, especially when loaded with heavy cargos. Here, we report a magnetic millirobot that can walk on rough and slippery biological tissues by anchoring itself on the soft tissue surface altern… ▽ More Small-scale robots hold great potential for targeted cargo delivery in minimally-inv asive medicine. However, current robots often face challenges to locomote efficiently on slip pery biological tissue surfaces, especially when loaded with heavy cargos. Here, we report a magnetic millirobot that can walk on rough and slippery biological tissues by anchoring itself on the soft tissue surface alternatingly with two feet and reciprocally rotating the body to mov e forward. We experimentally studied the locomotion, validated it with numerical simulations and optimized the actuation parameters to fit various terrains and loading conditions. Further more, we developed a permanent magnet set-up to enable wireless actuation within a huma n-scale volume which allows precise control of the millirobot to follow complex trajectories, cl imb vertical walls, and carry cargo up to four times of its own weight. Upon reaching the targ et location, it performs a deployment sequence to release the liquid drug into tissues. The ro bust gait of our millirobot on rough biological terrains, combined with its heavy load capacity, make it a versatile and effective miniaturized vehicle for targeted cargo delivery. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 15 pages

ACM Class: J.3

arXiv:2403.03100 [pdf, other]

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Authors: Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Abstract: While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing di… ▽ More While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing different attributes and generate them individually. Motivated by it, we propose NaturalSpeech 3, a TTS system with novel factorized diffusion models to generate natural speech in a zero-shot way. Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt. With this factorization design, NaturalSpeech 3 can effectively and efficiently model intricate speech with disentangled subspaces in a divide-and-conquer way. Experiments show that NaturalSpeech 3 outperforms the state-of-the-art TTS systems on quality, similarity, prosody, and intelligibility, and achieves on-par quality with human recordings. Furthermore, we achieve better performance by scaling to 1B parameters and 200K hours of training data. △ Less

Submitted 23 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot way

arXiv:2403.02905 [pdf, other]

MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model

Authors: Sen Wang, Jiangning Zhang, Weijian Cao, Xiaobin Hu, Moran Li, Xiaozhong Ji, Xin Tan, Mengtian Li, Zhifeng Xie, Chengjie Wang, Lizhuang Ma

Abstract: The body movements accompanying speech aid speakers in expressing their ideas. Co-speech motion generation is one of the important approaches for synthesizing realistic avatars. Due to the intricate correspondence between speech and motion, generating realistic and diverse motion is a challenging task. In this paper, we propose MMoFusion, a Multi-modal co-speech Motion generation framework based o… ▽ More The body movements accompanying speech aid speakers in expressing their ideas. Co-speech motion generation is one of the important approaches for synthesizing realistic avatars. Due to the intricate correspondence between speech and motion, generating realistic and diverse motion is a challenging task. In this paper, we propose MMoFusion, a Multi-modal co-speech Motion generation framework based on the diffusion model to ensure both the authenticity and diversity of generated motion. We propose a progressive fusion strategy to enhance the interaction of inter-modal and intra-modal, efficiently integrating multi-modal information. Specifically, we employ a masked style matrix based on emotion and identity information to control the generation of different motion styles. Temporal modeling of speech and motion is partitioned into style-guided specific feature encoding and shared feature encoding, aiming to learn both inter-modal and intra-modal features. Besides, we propose a geometric loss to enforce the joints' velocity and acceleration coherence among frames. Our framework generates vivid, diverse, and style-controllable motion of arbitrary length through inputting speech and editing identity and emotion. Extensive experiments demonstrate that our method outperforms current co-speech motion generation methods including upper body and challenging full body. △ Less

Submitted 17 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.02260 [pdf, other]

Latitude-dependent Atmospheric Waves and Long-period Modulations in Luhman 16 B from the Longest Lightcurve of an Extrasolar World

Authors: Nguyen Fuda, Dániel Apai, Domenico Nardiello, Xianyu Tan, Theodora Karalidi, Luigi Rolly Bedin

Abstract: In this work, we present the longest photometric monitoring of up to 1200 hours of the strongly variable brown-dwarf binaries Luhman 16 AB and provide evidence of $\pm$5% variability on a timescale of several-to-hundreds of hours for this object. We show that short-period rotational modulation around 5 hours (k = 1 wavenumber) and 2.5 hours (k = 2 wavenumber) dominate the variability under 10 hour… ▽ More In this work, we present the longest photometric monitoring of up to 1200 hours of the strongly variable brown-dwarf binaries Luhman 16 AB and provide evidence of $\pm$5% variability on a timescale of several-to-hundreds of hours for this object. We show that short-period rotational modulation around 5 hours (k = 1 wavenumber) and 2.5 hours (k = 2 wavenumber) dominate the variability under 10 hours, where the planetary-scale waves model composed of k = 1 and k = 2 waves provides good fits to both the periodogram and light curve. In particular, models consisting of three to four sine waves could explain the variability of light curve durations up to 100 hours. We show that the relative range of k = 2 periods is narrower compared to k = 1 period. Using simple models of zonal banding in Solar System giants, we suggest that the difference in period range arises from the difference in windspeed distribution at low and mid-to-high latitudes in the atmosphere. Lastly, we show that Luhman 16 AB also exhibits long-period $\pm$5% variability with periods ranging from 15 hours up to 100 hours over the longest monitoring of this object. Our results on k = 1 and k = 2 waves and long-period evolution are consistent with previous 3D atmosphere simulations, demonstrating that both latitude-dependent waves and slow-varying atmospheric features are potentially present in Luhman 16 AB atmospheres and are significant contribution to the light curve modulation over hundreds of rotations. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 27 pages, 20 figures. Accepted for publication in ApJ (February 21, 2024)

arXiv:2403.00758 [pdf, other]

Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training

Authors: Qingyan Guo, Rui Wang, Junliang Guo, Xu Tan, Jiang Bian, Yujiu Yang

Abstract: While large language models (LLMs) have achieved impressive performance across diverse tasks, recent studies showcase that causal LLMs suffer from the "reversal curse". It is a typical example that the model knows "A's father is B", but is unable to reason "B's child is A". This limitation poses a challenge to the advancement of artificial general intelligence (AGI), as it suggests a gap in the mo… ▽ More While large language models (LLMs) have achieved impressive performance across diverse tasks, recent studies showcase that causal LLMs suffer from the "reversal curse". It is a typical example that the model knows "A's father is B", but is unable to reason "B's child is A". This limitation poses a challenge to the advancement of artificial general intelligence (AGI), as it suggests a gap in the models' ability to comprehend and apply bidirectional reasoning. In this paper, we first conduct substantial evaluation and identify that the root cause of the reversal curse lies in the different word order between the training and inference stage, namely, the poor ability of causal language models to predict antecedent words within the training data. Accordingly, permutation on the training data is considered as a potential solution, since this can make the model predict antecedent words or tokens. However, previous permutation methods may disrupt complete phrases or entities, thereby posing challenges for the model to comprehend and learn from training data. To address this issue, we propose Semantic-aware Permutation Training (SPT), which addresses this issue by segmenting the training sentences into semantic units (i.e., entities or phrases) with an assistant language model and permuting these units before feeding into the model. Extensive experiments demonstrate that SPT effectively mitigates the reversal curse since the performance on reversed questions approximates that on the forward ones, and significantly advances the performance of existing works. △ Less

Submitted 20 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.19155 [pdf, other]

Beyond Language Models: Byte Models are Digital World Simulators

Authors: Shangda Wu, Xu Tan, Zili Wang, Rui Wang, Xiaobing Li, Maosong Sun

Abstract: Traditional deep learning often overlooks bytes, the basic units of the digital world, where all forms of information and operations are encoded and manipulated in binary format. Inspired by the success of next token prediction in natural language processing, we introduce bGPT, a model with next byte prediction to simulate the digital world. bGPT matches specialized models in performance across va… ▽ More Traditional deep learning often overlooks bytes, the basic units of the digital world, where all forms of information and operations are encoded and manipulated in binary format. Inspired by the success of next token prediction in natural language processing, we introduce bGPT, a model with next byte prediction to simulate the digital world. bGPT matches specialized models in performance across various modalities, including text, audio, and images, and offers new possibilities for predicting, simulating, and diagnosing algorithm or hardware behaviour. It has almost flawlessly replicated the process of converting symbolic music data, achieving a low error rate of 0.0011 bits per byte in converting ABC notation to MIDI format. In addition, bGPT demonstrates exceptional capabilities in simulating CPU behaviour, with an accuracy exceeding 99.99% in executing various operations. Leveraging next byte prediction, models like bGPT can directly learn from vast binary data, effectively simulating the intricate patterns of the digital world. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 19 pages, 5 figures, 5 tables

arXiv:2402.14312 [pdf, other]

doi 10.61977/ati2024008

The Jiao Tong University Spectroscopic Telescope Project

Authors: JUST Team, Chengze Liu, Ying Zu, Fabo Feng, Zhaoyu Li, Yu Yu, Hua Bai, Xiangqun Cui, Bozhong Gu, Yizhou Gu, Jiaxin Han, Yonghui Hou, Zhongwen Hu, Hangxin Ji, Yipeng Jing, Wei Li, Zhaoxiang Qi, Xianyu Tan, Cairang Tian, Dehua Yang, Xiangyan Yuan, Chao Zhai, Congcong Zhang, Jun Zhang, Haotong Zhang , et al. (6 additional authors not shown)

Abstract: The Jiao Tong University Spectroscopic Telescope (JUST) is a 4.4-meter f/6.0 segmentedmirror telescope dedicated to spectroscopic observations. The JUST primary mirror is composed of 18 hexagonal segments, each with a diameter of 1.1 m. JUST provides two Nasmyth platforms for placing science instruments. One Nasmyth focus fits a field of view of 10 arcmin and the other has an extended field of vie… ▽ More The Jiao Tong University Spectroscopic Telescope (JUST) is a 4.4-meter f/6.0 segmentedmirror telescope dedicated to spectroscopic observations. The JUST primary mirror is composed of 18 hexagonal segments, each with a diameter of 1.1 m. JUST provides two Nasmyth platforms for placing science instruments. One Nasmyth focus fits a field of view of 10 arcmin and the other has an extended field of view of 1.2 deg with correction optics. A tertiary mirror is used to switch between the two Nasmyth foci. JUST will be installed at a site at Lenghu in Qinghai Province, China, and will conduct spectroscopic observations with three types of instruments to explore the dark universe, trace the dynamic universe, and search for exoplanets: (1) a multi-fiber (2000 fibers) medium-resolution spectrometer (R=4000-5000) to spectroscopically map galaxies and large-scale structure; (2) an integral field unit (IFU) array of 500 optical fibers and/or a long-slit spectrograph dedicated to fast follow-ups of transient sources for multimessenger astronomy; (3) a high-resolution spectrometer (R~100000) designed to identify Jupiter analogs and Earth-like planets, with the capability to characterize the atmospheres of hot exoplanets. △ Less

Submitted 29 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: 28 pages, 6 figures

arXiv:2402.10739 [pdf, other]

PointMamba: A Simple State Space Model for Point Cloud Analysis

Authors: Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, Xiang Bai

Abstract: Transformers have become one of the foundational architectures in point cloud analysis tasks due to their excellent global modeling ability. However, the attention mechanism has quadratic complexity, making the design of a linear complexity method with global modeling appealing. In this paper, we propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM)… ▽ More Transformers have become one of the foundational architectures in point cloud analysis tasks due to their excellent global modeling ability. However, the attention mechanism has quadratic complexity, making the design of a linear complexity method with global modeling appealing. In this paper, we propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks. Unlike traditional Transformers, PointMamba employs a linear complexity algorithm, presenting global modeling capacity while significantly reducing computational costs. Specifically, our method leverages space-filling curves for effective point tokenization and adopts an extremely simple, non-hierarchical Mamba encoder as the backbone. Comprehensive evaluations demonstrate that PointMamba achieves superior performance across multiple datasets while significantly reducing GPU memory usage and FLOPs. This work underscores the potential of SSMs in 3D vision-related tasks and presents a simple yet effective Mamba-based baseline for future research. The code is available at https://github.com/LMD0311/PointMamba. △ Less

Submitted 29 May, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: Update the architecture and performance. The code is available at https://github.com/LMD0311/PointMamba

arXiv:2402.06632 [pdf, other]

Microgram $\mathrm{BaCl}_2$ Ablation Targets for Trapped Ion Experiments

Authors: Noah Greenberg, Akbar Jahangiri Jozani, Collin J. C. Epstein, Xinghe Tan, Rajibul Islam, Crystal Senko

Abstract: Trapped ions for quantum information processing has been an area of intense study due to the extraordinarily high fidelity operations that have been reported experimentally. Specifically, barium trapped ions have been shown to have exceptional state-preparation and measurement (SPAM) fidelities. The $^{133}\mathrm{Ba}^+$ ($I = 1/2$) isotope in particular is a promising candidate for large-scale qu… ▽ More Trapped ions for quantum information processing has been an area of intense study due to the extraordinarily high fidelity operations that have been reported experimentally. Specifically, barium trapped ions have been shown to have exceptional state-preparation and measurement (SPAM) fidelities. The $^{133}\mathrm{Ba}^+$ ($I = 1/2$) isotope in particular is a promising candidate for large-scale quantum computing experiments. However, a major pitfall with this isotope is that it is radioactive and is thus generally used in microgram quantities to satisfy safety regulations. We describe a new method for creating microgram barium chloride ($\mathrm{BaCl}_2$) ablation targets for use in trapped ion experiments and compare our procedure to previous methods. We outline two recipes for fabrication of ablation targets that increase the production of neutral atoms for isotope-selective loading of barium ions. We show that heat-treatment of the ablation targets greatly increases the consistency at which neutral atoms can be produced and we characterize the uniformity of these targets using trap-independent techniques such as energy dispersive x-ray spectroscopy (EDS) and neutral fluorescence collection. Our comparison between fabrication techniques and demonstration of consistent neutral fluorescence paves a path towards reliable loading of $^{133}\mathrm{Ba}^+$ in surface traps and opens opportunities for scalable quantum computing with this isotope. △ Less

Submitted 16 January, 2024; originally announced February 2024.

arXiv:2402.05239 [pdf, other]

Efficient approximate unitary designs from random Pauli rotations

Authors: Jeongwan Haah, Yunchao Liu, Xinyu Tan

Abstract: We construct random walks on simple Lie groups that quickly converge to the Haar measure for all moments up to order $t$. Specifically, a step of the walk on the unitary or orthognoal group of dimension $2^{\mathsf n}$ is a random Pauli rotation $e^{\mathrm i θP /2}$. The spectral gap of this random walk is shown to be $Ω(1/t)$, which coincides with the best previously known bound for a random wal… ▽ More We construct random walks on simple Lie groups that quickly converge to the Haar measure for all moments up to order $t$. Specifically, a step of the walk on the unitary or orthognoal group of dimension $2^{\mathsf n}$ is a random Pauli rotation $e^{\mathrm i θP /2}$. The spectral gap of this random walk is shown to be $Ω(1/t)$, which coincides with the best previously known bound for a random walk on the permutation group on $\{0,1\}^{\mathsf n}$. This implies that the walk gives an $\varepsilon$-approximate unitary $t$-design in depth $O(\mathsf n t^2 + t \log 1/\varepsilon)d$ where $d=O(\log \mathsf n)$ is the circuit depth to implement $e^{\mathrm i θP /2}$. Our simple proof uses quadratic Casimir operators of Lie algebras. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 21 pages, 1 figure

arXiv:2402.03829 [pdf, ps, other]

Precise Measurement of Born Cross Sections for $e^+e^-\to D\bar{D}$ and Observation of One Structure between $\sqrt{s} = 3.80-4.95$ GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (604 additional authors not shown)

Abstract: Using data samples collected with the BESIII detector at the BEPCII collider at center-of-mass energies ranging from 3.80 to 4.95 GeV, corresponding to an integrated luminosity of 20 fb$^{-1}$, a measurement of Born cross sections for the $e^+e^-\to D^{0}\bar{D}^{0}$ and $D^{+}D^{-}$ processes is presented with unprecedented precision. By performing a simultaneous fit to the dressed cross sections… ▽ More Using data samples collected with the BESIII detector at the BEPCII collider at center-of-mass energies ranging from 3.80 to 4.95 GeV, corresponding to an integrated luminosity of 20 fb$^{-1}$, a measurement of Born cross sections for the $e^+e^-\to D^{0}\bar{D}^{0}$ and $D^{+}D^{-}$ processes is presented with unprecedented precision. By performing a simultaneous fit to the dressed cross sections for both processes, one possible new structure around 3.9 GeV/$c^2$ is observed for the first time, in addition to seven known resonances $ψ(3770)$, $ψ(4040)$, $ψ(4160)$, $Y(4230)$, $Y(4360)$, $ψ(4415)$, and $Y(4660)$. These results offer crucial experimental insights into the nature of hadron production in the open charm region. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 9 pages, 4 figures, 1 tables, 1 Supplemental_Material

arXiv:2402.03636 [pdf, other]

Online Informative Sampling using Semantic Features in Underwater Environments

Authors: Shrutika Vishal Thengane, Yu Xiang Tan, Marcel Bartholomeus Prasetyo, Malika Meghjani

Abstract: The underwater world remains largely unexplored, with Autonomous Underwater Vehicles (AUVs) playing a crucial role in sub-sea explorations. However, continuous monitoring of underwater environments using AUVs can generate a significant amount of data. In addition, sending live data feed from an underwater environment requires dedicated on-board data storage options for AUVs which can hinder requir… ▽ More The underwater world remains largely unexplored, with Autonomous Underwater Vehicles (AUVs) playing a crucial role in sub-sea explorations. However, continuous monitoring of underwater environments using AUVs can generate a significant amount of data. In addition, sending live data feed from an underwater environment requires dedicated on-board data storage options for AUVs which can hinder requirements of other higher priority tasks. Informative sampling techniques offer a solution by condensing observations. In this paper, we present a semantically-aware online informative sampling (ON-IS) approach which samples an AUV's visual experience in real-time. Specifically, we obtain visual features from a fine-tuned object detection model to align the sampling outcomes with the desired semantic information. Our contributions are (a) a novel Semantic Online Informative Sampling (SON-IS) algorithm, (b) a user study to validate the proposed approach and (c) a novel evaluation metric to score our proposed algorithm with respect to the suggested samples by human subjects △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: In proceeding of IEEE/MTS OCEANS, 2024

arXiv:2402.01767 [pdf, other]

HiQA: A Hierarchical Contextual Augmentation RAG for Massive Documents QA

Authors: Xinyue Chen, Pengyu Gao, Jiangjiang Song, Xiaoyang Tan

Abstract: As language model agents leveraging external tools rapidly evolve, significant progress has been made in question-answering(QA) methodologies utilizing supplementary documents and the Retrieval-Augmented Generation (RAG) approach. This advancement has improved the response quality of language models and alleviates the appearance of hallucination. However, these methods exhibit limited retrieval ac… ▽ More As language model agents leveraging external tools rapidly evolve, significant progress has been made in question-answering(QA) methodologies utilizing supplementary documents and the Retrieval-Augmented Generation (RAG) approach. This advancement has improved the response quality of language models and alleviates the appearance of hallucination. However, these methods exhibit limited retrieval accuracy when faced with massive indistinguishable documents, presenting notable challenges in their practical application. In response to these emerging challenges, we present HiQA, an advanced framework for multi-document question-answering (MDQA) that integrates cascading metadata into content as well as a multi-route retrieval mechanism. We also release a benchmark called MasQA to evaluate and research in MDQA. Finally, HiQA demonstrates the state-of-the-art performance in multi-document environments. △ Less

Submitted 31 January, 2024; originally announced February 2024.

arXiv:2401.17464 [pdf, other]

Efficient Tool Use with Chain-of-Abstraction Reasoning

Authors: Silin Gao, Jane Dwivedi-Yu, Ping Yu, Xiaoqing Ellen Tan, Ramakanth Pasunuru, Olga Golovneva, Koustuv Sinha, Asli Celikyilmaz, Antoine Bosselut, Tianlu Wang

Abstract: To achieve faithful reasoning that aligns with human expectations, large language models (LLMs) need to ground their reasoning to real-world knowledge (e.g., web facts, math and physical rules). Tools help LLMs access this external knowledge, but there remains challenges for fine-tuning LLM agents (e.g., Toolformer) to invoke tools in multi-step reasoning problems, where inter-connected tool calls… ▽ More To achieve faithful reasoning that aligns with human expectations, large language models (LLMs) need to ground their reasoning to real-world knowledge (e.g., web facts, math and physical rules). Tools help LLMs access this external knowledge, but there remains challenges for fine-tuning LLM agents (e.g., Toolformer) to invoke tools in multi-step reasoning problems, where inter-connected tool calls require holistic and efficient tool usage planning. In this work, we propose a new method for LLMs to better leverage tools in multi-step reasoning. Our method, Chain-of-Abstraction (CoA), trains LLMs to first decode reasoning chains with abstract placeholders, and then call domain tools to reify each reasoning chain by filling in specific knowledge. This planning with abstract chains enables LLMs to learn more general reasoning strategies, which are robust to shifts of domain knowledge (e.g., math results) relevant to different reasoning questions. It also allows LLMs to perform decoding and calling of external tools in parallel, which avoids the inference delay caused by waiting for tool responses. In mathematical reasoning and Wiki QA domains, we show that our method consistently outperforms previous chain-of-thought and tool-augmented baselines on both in-distribution and out-of-distribution test sets, with an average ~6% absolute QA accuracy improvement. LLM agents trained with our method also show more efficient tool use, with inference speed being on average ~1.4x faster than baseline tool-augmented LLMs. △ Less

Submitted 26 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.15289 [pdf, other]

SoK: Where's the "up"?! A Comprehensive (bottom-up) Study on the Security of Arm Cortex-M Systems

Authors: Xi Tan, Zheyuan Ma, Sandro Pinto, Le Guan, Ning Zhang, Jun Xu, Zhiqiang Lin, Hongxin Hu, Ziming Zhao

Abstract: Arm Cortex-M processors are the most widely used 32-bit microcontrollers among embedded and Internet-of-Things devices. Despite the widespread usage, there has been little effort in summarizing their hardware security features, characterizing the limitations and vulnerabilities of their hardware and software stack, and systematizing the research on securing these systems. The goals and contributio… ▽ More Arm Cortex-M processors are the most widely used 32-bit microcontrollers among embedded and Internet-of-Things devices. Despite the widespread usage, there has been little effort in summarizing their hardware security features, characterizing the limitations and vulnerabilities of their hardware and software stack, and systematizing the research on securing these systems. The goals and contributions of this paper are multi-fold. First, we analyze the hardware security limitations and issues of Cortex-M systems. Second, we conducted a deep study of the software stack designed for Cortex-M and revealed its limitations, which is accompanied by an empirical analysis of 1,797 real-world firmware. Third, we categorize the reported bugs in Cortex-M software systems. Finally, we systematize the efforts that aim at securing Cortex-M systems and evaluate them in terms of the protections they offer, runtime performance, required hardware features, etc. Based on the insights, we develop a set of recommendations for the research community and MCU software developers. △ Less

Submitted 13 May, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Comments: To Appear in the 18th USENIX WOOT Conference on Offensive Technologies, August 12-13, 2024

ACM Class: C.0; K.6.5

arXiv:2401.14720 [pdf, ps, other]

Observation of structures in the processes $e^+e^-\rightarrowωχ_{c1}$ and $ωχ_{c2}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (608 additional authors not shown)

Abstract: We present measurements of the Born cross sections for the processes $e^+e^-\rightarrowωχ_{c1}$ and $ωχ_{c2}$ at center-of-mass energies $\sqrt{s}$ from 4.308 to 4.951 GeV. The measurements are performed with data samples corresponding to an integrated luminosity of 11.0 $\rm{fb}^{-1}$ collected with the BESIII detector operating at the BEPCII storage ring. Assuming the $e^+e^-\rightarrowωχ_{c2}$… ▽ More We present measurements of the Born cross sections for the processes $e^+e^-\rightarrowωχ_{c1}$ and $ωχ_{c2}$ at center-of-mass energies $\sqrt{s}$ from 4.308 to 4.951 GeV. The measurements are performed with data samples corresponding to an integrated luminosity of 11.0 $\rm{fb}^{-1}$ collected with the BESIII detector operating at the BEPCII storage ring. Assuming the $e^+e^-\rightarrowωχ_{c2}$ signals come from a single resonance, the mass and width are determined to be $M=(4413.6\pm9.0\pm0.8)$ MeV/$c^2$ and $Γ=(110.5\pm15.0\pm2.9)$ MeV, respectively, which is consistent with the parameters of the well-established resonance $ψ(4415)$. In addition, we also use one single resonance to describe the $e^+e^-\rightarrowωχ_{c1}$ lineshape, and determine the mass and width to be $M=(4544.2\pm18.7\pm1.7)$ MeV/$c^2$ and $Γ=(116.1\pm33.5\pm1.7)$ MeV, respectively. The structure of this lineshape, observed for the first time, requires further understanding. △ Less

Submitted 24 March, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Comments: 11 pages, 8 figures, with Supplemental Material

arXiv:2401.14711 [pdf, other]

Study of $e^{+}e^{-}\rightarrowπ^{+}π^{-}π^{0}$ at $\sqrt{s}$ from 2.00 to 3.08 GeV at BESIII

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (608 additional authors not shown)

Abstract: With the data samples taken at center-of-mass energies from 2.00 to 3.08 GeV with the BESIII detector at the BEPCII collider, a partial wave analysis on the $e^{+}e^{-}\rightarrowπ^{+}π^{-}π^{0}$ process is performed. The Born cross sections for $e^{+}e^{-}\rightarrowπ^{+}π^{-}π^{0}$ and its intermediate processes $e^{+}e^{-}\rightarrowρπ$ and $ρ(1450)π$ are measured as functions of $\sqrt{s}$. Th… ▽ More With the data samples taken at center-of-mass energies from 2.00 to 3.08 GeV with the BESIII detector at the BEPCII collider, a partial wave analysis on the $e^{+}e^{-}\rightarrowπ^{+}π^{-}π^{0}$ process is performed. The Born cross sections for $e^{+}e^{-}\rightarrowπ^{+}π^{-}π^{0}$ and its intermediate processes $e^{+}e^{-}\rightarrowρπ$ and $ρ(1450)π$ are measured as functions of $\sqrt{s}$. The results for $e^{+}e^{-}\rightarrowπ^{+}π^{-}π^{0}$ are consistent with previous results measured with the initial state radiation method within one standard deviation, and improve the uncertainty by a factor of ten. By fitting the line shapes of the Born cross sections for the $e^{+}e^{-}\rightarrowρπ$ and $ρ(1450)π$, a structure with mass $M = 2119\pm11\pm15\ {\rm MeV}/c^2$ and width $Γ=69\pm30\pm5 {\rm MeV}$ is observed with a significance of $5.9σ$, where the first uncertainties are statistical and the second ones are systematic. This structure can be intepreteted as an excited $ω$ state. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.13027 [pdf, ps, other]

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b

Authors: Taylor J. Bell, Nicolas Crouzet, Patricio E. Cubillos, Laura Kreidberg, Anjali A. A. Piette, Michael T. Roman, Joanna K. Barstow, Jasmina Blecic, Ludmila Carone, Louis-Philippe Coulombe, Elsa Ducrot, Mark Hammond, João M. Mendonça, Julianne I. Moses, Vivien Parmentier, Kevin B. Stevenson, Lucas Teinturier, Michael Zhang, Natalie M. Batalha, Jacob L. Bean, Björn Benneke, Benjamin Charnay, Katy L. Chubb, Brice-Olivier Demory, Peter Gao , et al. (58 additional authors not shown)

Abstract: Hot Jupiters are among the best-studied exoplanets, but it is still poorly understood how their chemical composition and cloud properties vary with longitude. Theoretical models predict that clouds may condense on the nightside and that molecular abundances can be driven out of equilibrium by zonal winds. Here we report a phase-resolved emission spectrum of the hot Jupiter WASP-43b measured from 5… ▽ More Hot Jupiters are among the best-studied exoplanets, but it is still poorly understood how their chemical composition and cloud properties vary with longitude. Theoretical models predict that clouds may condense on the nightside and that molecular abundances can be driven out of equilibrium by zonal winds. Here we report a phase-resolved emission spectrum of the hot Jupiter WASP-43b measured from 5-12 $μ$m with JWST's Mid-Infrared Instrument (MIRI). The spectra reveal a large day-night temperature contrast (with average brightness temperatures of 1524$\pm$35 and 863$\pm$23 Kelvin, respectively) and evidence for water absorption at all orbital phases. Comparisons with three-dimensional atmospheric models show that both the phase curve shape and emission spectra strongly suggest the presence of nightside clouds which become optically thick to thermal emission at pressures greater than ~100 mbar. The dayside is consistent with a cloudless atmosphere above the mid-infrared photosphere. Contrary to expectations from equilibrium chemistry but consistent with disequilibrium kinetics models, methane is not detected on the nightside (2$σ$ upper limit of 1-6 parts per million, depending on model assumptions). △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 61 pages, 13 figures, 4 tables. This preprint has been submitted to and accepted in principle for publication in Nature Astronomy without significant changes

arXiv:2401.11372 [pdf, other]

Back-stepping Experience Replay with Application to Model-free Reinforcement Learning for a Soft Snake Robot

Authors: Xinda Qi, Dong Chen, Zhaojian Li, Xiaobo Tan

Abstract: In this paper, we propose a novel technique, Back-stepping Experience Replay (BER), that is compatible with arbitrary off-policy reinforcement learning (RL) algorithms. BER aims to enhance learning efficiency in systems with approximate reversibility, reducing the need for complex reward shaping. The method constructs reversed trajectories using back-stepping transitions to reach random or fixed t… ▽ More In this paper, we propose a novel technique, Back-stepping Experience Replay (BER), that is compatible with arbitrary off-policy reinforcement learning (RL) algorithms. BER aims to enhance learning efficiency in systems with approximate reversibility, reducing the need for complex reward shaping. The method constructs reversed trajectories using back-stepping transitions to reach random or fixed targets. Interpretable as a bi-directional approach, BER addresses inaccuracies in back-stepping transitions through a distillation of the replay experience during learning. Given the intricate nature of soft robots and their complex interactions with environments, we present an application of BER in a model-free RL approach for the locomotion and navigation of a soft snake robot, which is capable of serpentine motion enabled by anisotropic friction between the body and ground. In addition, a dynamic simulator is developed to assess the effectiveness and efficiency of the BER algorithm, in which the robot demonstrates successful learning (reaching a 100% success rate) and adeptly reaches random targets, achieving an average speed 48% faster than that of the best baseline approach. △ Less

Submitted 20 January, 2024; originally announced January 2024.

Comments: Submitted to the IEEE for possible publication

arXiv:2401.09146 [pdf, other]

Continuous Piecewise-Affine Based Motion Model for Image Animation

Authors: Hexiang Wang, Fengqi Liu, Qianyu Zhou, Ran Yi, Xin Tan, Lizhuang Ma

Abstract: Image animation aims to bring static images to life according to driving videos and create engaging visual content that can be used for various purposes such as animation, entertainment, and education. Recent unsupervised methods utilize affine and thin-plate spline transformations based on keypoints to transfer the motion in driving frames to the source image. However, limited by the expressive p… ▽ More Image animation aims to bring static images to life according to driving videos and create engaging visual content that can be used for various purposes such as animation, entertainment, and education. Recent unsupervised methods utilize affine and thin-plate spline transformations based on keypoints to transfer the motion in driving frames to the source image. However, limited by the expressive power of the transformations used, these methods always produce poor results when the gap between the motion in the driving frame and the source image is large. To address this issue, we propose to model motion from the source image to the driving frame in highly-expressive diffeomorphism spaces. Firstly, we introduce Continuous Piecewise-Affine based (CPAB) transformation to model the motion and present a well-designed inference algorithm to generate CPAB transformation from control keypoints. Secondly, we propose a SAM-guided keypoint semantic loss to further constrain the keypoint extraction process and improve the semantic consistency between the corresponding keypoints on the source and driving images. Finally, we design a structure alignment loss to align the structure-related features extracted from driving and generated images, thus helping the generator generate results that are more consistent with the driving action. Extensive experiments on four datasets demonstrate the effectiveness of our method against state-of-the-art competitors quantitatively and qualitatively. Code will be publicly available at: https://github.com/DevilPG/AAAI2024-CPABMM. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.06377 [pdf, other]

Design and Nonlinear Modeling of a Modular Cable Driven Soft Robotic Arm

Authors: Xinda Qi, Yu Mei, Dong Chen, Zhaojian Li, Xiaobo Tan

Abstract: We propose a novel multi-section cable-driven soft robotic arm inspired by octopus tentacles along with a new modeling approach. Each section of the modular manipulator is made of a soft tubing backbone, a soft silicon arm body, and two rigid endcaps, which connect adjacent sections and decouple the actuation cables of different sections. The soft robotic arm is made with casting after the rigid e… ▽ More We propose a novel multi-section cable-driven soft robotic arm inspired by octopus tentacles along with a new modeling approach. Each section of the modular manipulator is made of a soft tubing backbone, a soft silicon arm body, and two rigid endcaps, which connect adjacent sections and decouple the actuation cables of different sections. The soft robotic arm is made with casting after the rigid endcaps are 3D-printed, achieving low-cost and convenient fabrication. To capture the nonlinear effect of cables pushing into the soft silicon arm body, which results from the absence of intermediate rigid cable guides for higher compliance, an analytical static model is developed to capture the relationship between the bending curvature and the cable lengths. The proposed model shows superior prediction performance in experiments over that of a baseline model, especially under large bending conditions. Based on the nonlinear static model, a kinematic model of a multi-section arm is further developed and used to derive a motion planning algorithm. Experiments show that the proposed soft arm has high flexibility and a large workspace, and the tracking errors under the algorithm based on the proposed modeling approach are up to 52$\%$ smaller than those with the algorithm derived from the baseline model. The presented modeling approach is expected to be applicable to a broad range of soft cable-driven actuators and manipulators. △ Less

Submitted 15 May, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: The paper has been accepted by IEEE Transactions on Mechatronics

arXiv:2401.06201 [pdf, other]

EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction

Authors: Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Yongliang Shen, Ren Kan, Dongsheng Li, Deqing Yang

Abstract: To address intricate real-world tasks, there has been a rising interest in tool utilization in applications of large language models (LLMs). To develop LLM-based agents, it usually requires LLMs to understand many tool functions from different tool documentation. But these documentations could be diverse, redundant or incomplete, which immensely affects the capability of LLMs in using tools. To so… ▽ More To address intricate real-world tasks, there has been a rising interest in tool utilization in applications of large language models (LLMs). To develop LLM-based agents, it usually requires LLMs to understand many tool functions from different tool documentation. But these documentations could be diverse, redundant or incomplete, which immensely affects the capability of LLMs in using tools. To solve this, we introduce EASYTOOL, a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction for easier tool usage. EasyTool purifies essential information from extensive tool documentation of different sources, and elaborates a unified interface (i.e., tool instruction) to offer standardized tool descriptions and functionalities for LLM-based agents. Extensive experiments on multiple different tasks demonstrate that EasyTool can significantly reduce token consumption and improve the performance of tool utilization in real-world scenarios. Our code will be available at \url{https://github.com/microsoft/JARVIS/} in the future. △ Less

Submitted 27 March, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.06199 [pdf, other]

xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein

Authors: Bo Chen, Xingyi Cheng, Pan Li, Yangli-ao Geng, Jing Gong, Shen Li, Zhilei Bei, Xu Tan, Boyan Wang, Xin Zeng, Chiming Liu, Aohan Zeng, Yuxiao Dong, Jie Tang, Le Song

Abstract: Protein language models have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pre-training objectives, which makes them struggle to handle protein understanding and generation tasks concurrently. We propose a unified protein language model, xTrimoPGLM, to address these two types of… ▽ More Protein language models have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pre-training objectives, which makes them struggle to handle protein understanding and generation tasks concurrently. We propose a unified protein language model, xTrimoPGLM, to address these two types of tasks simultaneously through an innovative pre-training framework. Our key technical contribution is an exploration of the compatibility and the potential for joint optimization of the two types of objectives, which has led to a strategy for training xTrimoPGLM at an unprecedented scale of 100 billion parameters and 1 trillion training tokens. Our extensive experiments reveal that 1) xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories. The model also facilitates an atomic-resolution view of protein structures, leading to an advanced 3D structural prediction model that surpasses existing language model-based tools. 2) xTrimoPGLM not only can generate de novo protein sequences following the principles of natural ones, but also can perform programmable generation after supervised fine-tuning (SFT) on curated sequences. These results highlight the substantial capability and versatility of xTrimoPGLM in understanding and generating protein sequences, contributing to the evolving landscape of foundation models in protein science. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.03859 [pdf, other]

Modeling the day-night temperature variations of ultra-hot Jupiters: confronting non-grey general circulation models and observations

Authors: Xianyu Tan, Thaddeus D. Komacek, Natasha E. Batalha, Drake Deming, Roxana Lupu, Vivien Parmentier, Raymond T. Pierrehumbert

Abstract: Ultra-hot Jupiters (UHJs) are natural laboratories to study extreme physics in planetary atmospheres and their rich observational data sets are yet to be confronted with models with varying complexities at a population level. In this work, we update the general circulation model of Tan & Komacek (2019) to include a non-grey radiative transfer scheme and apply it to simulate the realistic thermal s… ▽ More Ultra-hot Jupiters (UHJs) are natural laboratories to study extreme physics in planetary atmospheres and their rich observational data sets are yet to be confronted with models with varying complexities at a population level. In this work, we update the general circulation model of Tan & Komacek (2019) to include a non-grey radiative transfer scheme and apply it to simulate the realistic thermal structures, phase-dependent spectra, and wavelength-dependent phase curves of UHJs. We performed grids of models over a large range of equilibrium temperatures and rotation periods for varying assumptions, showing that the fractional day-night brightness temperature differences remain almost constant or slightly increase with increasing equilibrium temperature from the visible to mid-infrared wavelengths. This differs from previous work primarily due to the increasing planetary rotation rate with increasing equilibrium temperature for fixed host star type. Radiative effects of varying atmospheric compositions become more significant in dayside brightness temperature in longer wavelengths. Data-model comparisons of dayside brightness temperatures and phase curve amplitudes as a function of equilibrium temperature are in broad agreement. Observations show a large scatter compared to models even with a range of different assumptions, indicating significantly varying intrinsic properties in the hot Jupiter population. Our cloud-free models generally struggle to match all observations for individual targets with a single set of parameter choices, indicating the need for extra processes for understanding the heat transport of UHJs. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: Accepted to MNRAS; data underlying this article is available in Zenodo https://doi.org/10.5281/zenodo.10121933

Showing 51–100 of 897 results for author: Tan, X