-
Detailed Mapping of the Galactic Disk Structure in the Solar Neighborhood through LAMOST K Dwarfs
Authors:
Xi-Can Tang,
Hao Tian,
Jing Li,
Bing-qiu Chen,
Yi-Rong Chen,
Chao Liu,
Dan Qiu
Abstract:
The Galactic disk is one of the main components of the Milky Way, which contributes most of the luminosity. Its structure is essential for understanding the formation and evolution of the Milky Way. Using 174,443 K-type dwarf stars observed by both LAMOST and Gaia DR3, we study the disk density profile in the local volume within 1,200 pc. In the azimuthal dimension, we find strong asymmetric signa…
▽ More
The Galactic disk is one of the main components of the Milky Way, which contributes most of the luminosity. Its structure is essential for understanding the formation and evolution of the Milky Way. Using 174,443 K-type dwarf stars observed by both LAMOST and Gaia DR3, we study the disk density profile in the local volume within 1,200 pc. In the azimuthal dimension, we find strong asymmetric signal of the thin disk. The surface density and the scale height of the southern disk significantly change versus the azimuthal angle at the same galactocentric distance $R$. Meanwhile, in the vertical dimension, the scale height of the northern disk has quite different trend than that of the southern one. The scale height of the southern disk shows a decreasing trend with $φ\sim-2.5^\circ$, and change to an increasing one with $φ\sim5.0^°$. Meanwhile, the scale height of the northern disk has a consistently smaller increase. Finally, we divide the entire sample into three subsamples based on metallicity and all three subsamples show significant non-axisymmetric and north-south asymmetric signals in the Galactic disk. Furthermore, we find that the scale height of the metal-poor ([Fe/H] $<$ -0.4 dex) subsample in the northern disk is greater than that of the metal-rich ([Fe/H] $>$ -0.1 dex) subsample. However, in the southern disk, the scale height exhibits varying relationships across different metallicity slices.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Accurate Prior-centric Monocular Positioning with Offline LiDAR Fusion
Authors:
Jinhao He,
Huaiyang Huang,
Shuyang Zhang,
Jianhao Jiao,
Chengju Liu,
Ming Liu
Abstract:
Unmanned vehicles usually rely on Global Positioning System (GPS) and Light Detection and Ranging (LiDAR) sensors to achieve high-precision localization results for navigation purpose. However, this combination with their associated costs and infrastructure demands, poses challenges for widespread adoption in mass-market applications. In this paper, we aim to use only a monocular camera to achieve…
▽ More
Unmanned vehicles usually rely on Global Positioning System (GPS) and Light Detection and Ranging (LiDAR) sensors to achieve high-precision localization results for navigation purpose. However, this combination with their associated costs and infrastructure demands, poses challenges for widespread adoption in mass-market applications. In this paper, we aim to use only a monocular camera to achieve comparable onboard localization performance by tracking deep-learning visual features on a LiDAR-enhanced visual prior map. Experiments show that the proposed algorithm can provide centimeter-level global positioning results with scale, which is effortlessly integrated and favorable for low-cost robot system deployment in real-world applications.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Crossed real nodal-line phonons in gold monobromide
Authors:
Yilin Han,
Yichen Liu,
Chaoxi Cui,
Cheng-Cheng Liu,
Zhi-Ming Yu
Abstract:
Spacetime inversion symmetry can generate intriguing types of spinless excitations in crystalline materials. Here, we propose a topological phase protected by spacetime inversion symmetry - the crossed real nodal line (RNL) in the phonon spectrum of gold monobromide (AuBr). In AuBr, there exist four straight nodal lines, which are linked by a crossed nodal line formed by two lower bands. Remarkabl…
▽ More
Spacetime inversion symmetry can generate intriguing types of spinless excitations in crystalline materials. Here, we propose a topological phase protected by spacetime inversion symmetry - the crossed real nodal line (RNL) in the phonon spectrum of gold monobromide (AuBr). In AuBr, there exist four straight nodal lines, which are linked by a crossed nodal line formed by two lower bands. Remarkably, each adjacent two of the four straight nodal lines is a pair, forming a crossed RNL with nontrivial real Chern number. Such configuration and pairing mode of RNL have never been reported. The crossed RNL exhibits unique surface and hinge states distinguished from that of the conventional RNLs. The symmetry protection and the transformation under the symmetry-preserving strain of the crossed RNL are also investigated. Our results open the door to a new class of topological states, and predict its realization in experimentally synthesized material.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Tissue-Contrastive Semi-Masked Autoencoders for Segmentation Pretraining on Chest CT
Authors:
Jie Zheng,
Ru Wen,
Haiqin Hu,
Lina Wei,
Kui Su,
Wei Chen,
Chen Liu,
Jun Wang
Abstract:
Existing Masked Image Modeling (MIM) depends on a spatial patch-based masking-reconstruction strategy to perceive objects'features from unlabeled images, which may face two limitations when applied to chest CT: 1) inefficient feature learning due to complex anatomical details presented in CT images, and 2) suboptimal knowledge transfer owing to input disparity between upstream and downstream model…
▽ More
Existing Masked Image Modeling (MIM) depends on a spatial patch-based masking-reconstruction strategy to perceive objects'features from unlabeled images, which may face two limitations when applied to chest CT: 1) inefficient feature learning due to complex anatomical details presented in CT images, and 2) suboptimal knowledge transfer owing to input disparity between upstream and downstream models. To address these issues, we propose a new MIM method named Tissue-Contrastive Semi-Masked Autoencoder (TCS-MAE) for modeling chest CT images. Our method has two novel designs: 1) a tissue-based masking-reconstruction strategy to capture more fine-grained anatomical features, and 2) a dual-AE architecture with contrastive learning between the masked and original image views to bridge the gap of the upstream and downstream models. To validate our method, we systematically investigate representative contrastive, generative, and hybrid self-supervised learning methods on top of tasks involving segmenting pneumonia, mediastinal tumors, and various organs. The results demonstrate that, compared to existing methods, our TCS-MAE more effectively learns tissue-aware representations, thereby significantly enhancing segmentation performance across all tasks.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Unifying 3D Representation and Control of Diverse Robots with a Single Camera
Authors:
Sizhe Lester Li,
Annan Zhang,
Boyuan Chen,
Hanna Matusik,
Chao Liu,
Daniela Rus,
Vincent Sitzmann
Abstract:
Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have dramatically expanded feasible hardware, yet deploying these systems requires control software to translate desired motions into actuator commands. While conventional robots can easily be modeled as rigid links connected via joints, it remains an…
▽ More
Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have dramatically expanded feasible hardware, yet deploying these systems requires control software to translate desired motions into actuator commands. While conventional robots can easily be modeled as rigid links connected via joints, it remains an open challenge to model and control bio-inspired robots that are often multi-material or soft, lack sensing capabilities, and may change their material properties with use. Here, we introduce Neural Jacobian Fields, an architecture that autonomously learns to model and control robots from vision alone. Our approach makes no assumptions about the robot's materials, actuation, or sensing, requires only a single camera for control, and learns to control the robot without expert intervention by observing the execution of random commands. We demonstrate our method on a diverse set of robot manipulators, varying in actuation, materials, fabrication, and cost. Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot. By enabling robot control with a generic camera as the only sensor, we anticipate our work will dramatically broaden the design space of robotic systems and serve as a starting point for lowering the barrier to robotic automation.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
How to beat a Bayesian adversary
Authors:
Zihan Ding,
Kexin Jin,
Jonas Latz,
Chenguang Liu
Abstract:
Deep neural networks and other modern machine learning models are often susceptible to adversarial attacks. Indeed, an adversary may often be able to change a model's prediction through a small, directed perturbation of the model's input - an issue in safety-critical applications. Adversarially robust machine learning is usually based on a minmax optimisation problem that minimises the machine lea…
▽ More
Deep neural networks and other modern machine learning models are often susceptible to adversarial attacks. Indeed, an adversary may often be able to change a model's prediction through a small, directed perturbation of the model's input - an issue in safety-critical applications. Adversarially robust machine learning is usually based on a minmax optimisation problem that minimises the machine learning loss under maximisation-based adversarial attacks.
In this work, we study adversaries that determine their attack using a Bayesian statistical approach rather than maximisation. The resulting Bayesian adversarial robustness problem is a relaxation of the usual minmax problem. To solve this problem, we propose Abram - a continuous-time particle system that shall approximate the gradient flow corresponding to the underlying learning problem. We show that Abram approximates a McKean-Vlasov process and justify the use of Abram by giving assumptions under which the McKean-Vlasov process finds the minimiser of the Bayesian adversarial robustness problem. We discuss two ways to discretise Abram and show its suitability in benchmark adversarial deep learning experiments.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density
Authors:
Shuangqi Li,
Chen Liu,
Tong Zhang,
Hieu Le,
Sabine Süsstrunk,
Mathieu Salzmann
Abstract:
We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with either enhanced fidelity or increased diversity. Our approach involves manipulating the distribution of training and generated data through a novel metric for individual samples, named pseudo density, which is based on the nearest-neighbor information from real samples. Our appr…
▽ More
We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with either enhanced fidelity or increased diversity. Our approach involves manipulating the distribution of training and generated data through a novel metric for individual samples, named pseudo density, which is based on the nearest-neighbor information from real samples. Our approach offers three distinct techniques to adjust the fidelity and diversity of deep generative models: 1) Per-sample perturbation, enabling precise adjustments for individual samples towards either more common or more unique characteristics; 2) Importance sampling during model inference to enhance either fidelity or diversity in the generated data; 3) Fine-tuning with importance sampling, which guides the generative model to learn an adjusted distribution, thus controlling fidelity and diversity. Furthermore, our fine-tuning method demonstrates the ability to improve the Frechet Inception Distance (FID) for pre-trained generative models with minimal iterations.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Quantum-Train Long Short-Term Memory: Application on Flood Prediction Problem
Authors:
Chu-Hsuan Abraham Lin,
Chen-Yu Liu,
Kuan-Cheng Chen
Abstract:
Flood prediction is a critical challenge in the context of climate change, with significant implications for ecosystem preservation, human safety, and infrastructure protection. In this study, we tackle this problem by applying the Quantum-Train (QT) technique to a forecasting Long Short-Term Memory (LSTM) model trained by Quantum Machine Learning (QML) with significant parameter reduction. The QT…
▽ More
Flood prediction is a critical challenge in the context of climate change, with significant implications for ecosystem preservation, human safety, and infrastructure protection. In this study, we tackle this problem by applying the Quantum-Train (QT) technique to a forecasting Long Short-Term Memory (LSTM) model trained by Quantum Machine Learning (QML) with significant parameter reduction. The QT technique, originally successful in the A Matter of Taste challenge at QHack 2024, leverages QML to reduce the number of trainable parameters to a polylogarithmic function of the number of parameters in a classical neural network (NN). This innovative framework maps classical NN weights to a Hilbert space, altering quantum state probability distributions to adjust NN parameters. Our approach directly processes classical data without the need for quantum embedding and operates independently of quantum computing resources post-training, making it highly practical and accessible for real-world flood prediction applications. This model aims to improve the efficiency of flood forecasts, ultimately contributing to better disaster preparedness and response.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
SelfIE: Self-Initiated Explorable Instructions Towards Enhanced User Experience
Authors:
Hyeongcheol Kim,
Katherine Fennedy,
Georgia Zhang,
Can Liu,
Shengdong Zhao
Abstract:
Given the widespread use of procedural instructions with non-linear access (situational information retrieval), there has been a proposal to accommodate both linear and non-linear usage in instructional design. However, it has received inadequate scholarly attention, leading to limited exploration. This paper introduces Self-Initiated Explorable (SelfIE) instructions, a new design concept aiming a…
▽ More
Given the widespread use of procedural instructions with non-linear access (situational information retrieval), there has been a proposal to accommodate both linear and non-linear usage in instructional design. However, it has received inadequate scholarly attention, leading to limited exploration. This paper introduces Self-Initiated Explorable (SelfIE) instructions, a new design concept aiming at enabling users to navigate instructions flexibly by blending linear and non-linear access according to individual needs and situations during tasks. Using a Wizard-of-Oz protocol, we initially embodied SelfIE instructions within a toy-block assembly context and compared it with baseline instructions offering linear-only access (N=21). Results show a 71% increase in user preferences due to its ease of reflecting individual differences, empirically supporting the prior proposal. Besides, our observations identify three strategies for flexible access and suggest the potential of enhancing the user experience by considering cognitive processes and implementing flexible access in a wearable configuration. Following the design phase, we translated the WoZ-based design embodiment as working prototypes on the tablet and OHMD to assess usability and compare user experience between the two configurations (N=8). Our data yields valuable insights into managing the trade-offs between the two configurations, thereby facilitating more effective flexible access development.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Differentially Private Neural Network Training under Hidden State Assumption
Authors:
Ding Chen,
Chen Liu
Abstract:
We present a novel approach called differentially private stochastic block coordinate descent (DP-SBCD) for training neural networks with provable guarantees of differential privacy under the hidden state assumption. Our methodology incorporates Lipschitz neural networks and decomposes the training process of the neural network into sub-problems, each corresponding to the training of a specific la…
▽ More
We present a novel approach called differentially private stochastic block coordinate descent (DP-SBCD) for training neural networks with provable guarantees of differential privacy under the hidden state assumption. Our methodology incorporates Lipschitz neural networks and decomposes the training process of the neural network into sub-problems, each corresponding to the training of a specific layer. By doing so, we extend the analysis of differential privacy under the hidden state assumption to encompass non-convex problems and algorithms employing proximal gradient descent. Furthermore, in contrast to existing methods, we adopt a novel approach by utilizing calibrated noise sampled from adaptive distributions, yielding improved empirical trade-offs between utility and privacy.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
General Electronic Structure Calculation Method for Twisted Systems
Authors:
Junxi Yu,
Shifeng Qian,
Cheng-Cheng Liu
Abstract:
In recent years, two-dimensional twisted systems have gained increasing attention. However, the calculation of electronic structures in twisted material has remained a challenge. To address this, we have developed a general computational methodology that can generate twisted geometries starting from monolayer structure and obtain the precisely relaxed twisted structure through a machine learning-b…
▽ More
In recent years, two-dimensional twisted systems have gained increasing attention. However, the calculation of electronic structures in twisted material has remained a challenge. To address this, we have developed a general computational methodology that can generate twisted geometries starting from monolayer structure and obtain the precisely relaxed twisted structure through a machine learning-based method. Then the electronic structure properties of the twisted material are calculated using tight-Binding (TB) and continuum model methods, thus the entire process requires minimal computational resources. In this paper, we first introduce the theoretical methods for generating twisted structures and computing their electronic properties. We then provide calculations and brief analyses of the electronic structure properties for several typical two-dimensional materials with different characteristics. This work serves as a solid foundation for researchers interested in studying twisted systems.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Multi-task Prompt Words Learning for Social Media Content Generation
Authors:
Haochen Xue,
Chong Zhang,
Chengzhi Liu,
Fangyu Wu,
Xiaobo Jin
Abstract:
The rapid development of the Internet has profoundly changed human life. Humans are increasingly expressing themselves and interacting with others on social media platforms. However, although artificial intelligence technology has been widely used in many aspects of life, its application in social media content creation is still blank. To solve this problem, we propose a new prompt word generation…
▽ More
The rapid development of the Internet has profoundly changed human life. Humans are increasingly expressing themselves and interacting with others on social media platforms. However, although artificial intelligence technology has been widely used in many aspects of life, its application in social media content creation is still blank. To solve this problem, we propose a new prompt word generation framework based on multi-modal information fusion, which combines multiple tasks including topic classification, sentiment analysis, scene recognition and keyword extraction to generate more comprehensive prompt words. Subsequently, we use a template containing a set of prompt words to guide ChatGPT to generate high-quality tweets. Furthermore, in the absence of effective and objective evaluation criteria in the field of content generation, we use the ChatGPT tool to evaluate the results generated by the algorithm, making large-scale evaluation of content generation algorithms possible. Evaluation results on extensive content generation demonstrate that our cue word generation framework generates higher quality content compared to manual methods and other cueing techniques, while topic classification, sentiment analysis, and scene recognition significantly enhance content clarity and its consistency with the image.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
StoryDiffusion: How to Support UX Storyboarding With Generative-AI
Authors:
Zhaohui Liang,
Xiaoyu Zhang,
Kevin Ma,
Zhao Liu,
Xipei Ren,
Kosa Goucher-Lambert,
Can Liu
Abstract:
Storyboarding is an established method for designing user experiences. Generative AI can support this process by helping designers quickly create visual narratives. However, existing tools only focus on accurate text-to-image generation. Currently, it is not clear how to effectively support the entire creative process of storyboarding and how to develop AI-powered tools to support designers' indiv…
▽ More
Storyboarding is an established method for designing user experiences. Generative AI can support this process by helping designers quickly create visual narratives. However, existing tools only focus on accurate text-to-image generation. Currently, it is not clear how to effectively support the entire creative process of storyboarding and how to develop AI-powered tools to support designers' individual workflows. In this work, we iteratively developed and implemented StoryDiffusion, a system that integrates text-to-text and text-to-image models, to support the generation of narratives and images in a single pipeline. With a user study, we observed 12 UX designers using the system for both concept ideation and illustration tasks. Our findings identified AI-directed vs. user-directed creative strategies in both tasks and revealed the importance of supporting the interchange between narrative iteration and image generation. We also found effects of the design tasks on their strategies and preferences, providing insights for future development.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses from Diagram
Authors:
Ming-Liang Zhang,
Zhong-Zhi Li,
Fei Yin,
Liang Lin,
Cheng-Lin Liu
Abstract:
Geometry problem solving (GPS) requires capacities of multi-modal understanding, multi-hop reasoning and theorem knowledge application. In this paper, we propose a neural-symbolic model for plane geometry problem solving (PGPS), named PGPSNet-v2, with three key steps: modal fusion, reasoning process and knowledge verification. In modal fusion, we leverage textual clauses to express fine-grained st…
▽ More
Geometry problem solving (GPS) requires capacities of multi-modal understanding, multi-hop reasoning and theorem knowledge application. In this paper, we propose a neural-symbolic model for plane geometry problem solving (PGPS), named PGPSNet-v2, with three key steps: modal fusion, reasoning process and knowledge verification. In modal fusion, we leverage textual clauses to express fine-grained structural and semantic content of geometry diagram, and fuse diagram with textual problem efficiently through structural-semantic pre-training. For reasoning, we design an explicable solution program to describe the geometric reasoning process, and employ a self-limited decoder to generate solution program autoregressively. To reduce solution errors, a multi-level theorem verifier is proposed to eliminate solutions that do not match geometric principles, alleviating the hallucination of the neural model. We also construct a large-scale geometry problem dataset called PGPS9K, containing fine-grained annotations of textual clauses, solution program and involved knowledge tuples. Extensive experiments on datasets Geometry3K and PGPS9K show that our PGPSNet solver outperforms existing symbolic and neural solvers in GPS performance, while maintaining good explainability and reliability, and the solver components (fusion, reasoning, verification) are all justified effective.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Rapid Parameter Estimation for Merging Massive Black Hole Binaries Using ODE-Based Generative Models
Authors:
Bo Liang,
Minghui Du,
He Wang,
Yuxiang Xu,
Chang Liu,
Xiaotong Wei,
Peng Xu,
Li-e Qiang,
Ziren Luo
Abstract:
Detecting the coalescences of massive black hole binaries (MBHBs) is one of the primary targets for space-based gravitational wave observatories such as LISA, Taiji, and Tianqin. The fast and accurate parameter estimation of merging MBHBs is of great significance for both astrophysics and the global fitting of all resolvable sources. However, such analyses entail significant computational costs. T…
▽ More
Detecting the coalescences of massive black hole binaries (MBHBs) is one of the primary targets for space-based gravitational wave observatories such as LISA, Taiji, and Tianqin. The fast and accurate parameter estimation of merging MBHBs is of great significance for both astrophysics and the global fitting of all resolvable sources. However, such analyses entail significant computational costs. To address these challenges, inspired by the latest progress in generative models, we proposed a novel artificial intelligence (AI) based parameter estimation method called Variance Preserving Flow Matching Posterior Estimation (VPFMPE). Specifically, we utilize triangular interpolation to maintain variance over time, thereby constructing a transport path for training continuous normalization flows. Compared to the simple linear interpolation method used in flow matching to construct the optimal transport path, our approach better captures continuous temporal variations, making it more suitable for the parameter estimation of MBHBs. Additionally, we creatively introduce a parameter transformation method based on the symmetry in the detector's response function. This transformation is integrated within VPFMPE, allowing us to train the model using a simplified dataset, and then perform parameter estimation on more general data, hence also acting as a crucial factor in improving the training speed. In conclusion, for the first time, within a comprehensive and reasonable parameter range, we have achieved a complete and unbiased 11-dimensional rapid inference for MBHBs in the presence of astrophysical confusion noise using ODE-based generative models. In the experiments based on simulated data, our model produces posterior distributions comparable to those obtained by nested sampling.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Better Sampling, towards Better End-to-end Small Object Detection
Authors:
Zile Huang,
Chong Zhang,
Mingyu Jin,
Fangyu Wu,
Chengzhi Liu,
Xiaobo Jin
Abstract:
While deep learning-based general object detection has made significant strides in recent years, the effectiveness and efficiency of small object detection remain unsatisfactory. This is primarily attributed not only to the limited characteristics of such small targets but also to the high density and mutual overlap among these targets. The existing transformer-based small object detectors do not…
▽ More
While deep learning-based general object detection has made significant strides in recent years, the effectiveness and efficiency of small object detection remain unsatisfactory. This is primarily attributed not only to the limited characteristics of such small targets but also to the high density and mutual overlap among these targets. The existing transformer-based small object detectors do not leverage the gap between accuracy and inference speed. To address challenges, we propose methods enhancing sampling within an end-to-end framework. Sample Points Refinement (SPR) constrains localization and attention, preserving meaningful interactions in the region of interest and filtering out misleading information. Scale-aligned Target (ST) integrates scale information into target confidence, improving classification for small object detection. A task-decoupled Sample Reweighting (SR) mechanism guides attention toward challenging positive examples, utilizing a weight generator module to assess the difficulty and adjust classification loss based on decoder layer outcomes. Comprehensive experiments across various benchmarks reveal that our proposed detector excels in detecting small objects. Our model demonstrates a significant enhancement, achieving a 2.9\% increase in average precision (AP) over the state-of-the-art (SOTA) on the VisDrone dataset and a 1.7\% improvement on the SODA-D dataset.
△ Less
Submitted 17 May, 2024;
originally announced July 2024.
-
QTRL: Toward Practical Quantum Reinforcement Learning via Quantum-Train
Authors:
Chen-Yu Liu,
Chu-Hsuan Abraham Lin,
Chao-Han Huck Yang,
Kuan-Cheng Chen,
Min-Hsiu Hsieh
Abstract:
Quantum reinforcement learning utilizes quantum layers to process information within a machine learning model. However, both pure and hybrid quantum reinforcement learning face challenges such as data encoding and the use of quantum computers during the inference stage. We apply the Quantum-Train method to reinforcement learning tasks, called QTRL, training the classical policy network model using…
▽ More
Quantum reinforcement learning utilizes quantum layers to process information within a machine learning model. However, both pure and hybrid quantum reinforcement learning face challenges such as data encoding and the use of quantum computers during the inference stage. We apply the Quantum-Train method to reinforcement learning tasks, called QTRL, training the classical policy network model using a quantum machine learning model with polylogarithmic parameter reduction. This QTRL approach eliminates the data encoding issues of conventional quantum machine learning and reduces the training parameters of the corresponding classical policy network. Most importantly, the training result of the QTRL is a classical model, meaning the inference stage only requires classical computer. This is extremely practical and cost-efficient for reinforcement learning tasks, where low-latency feedback from the policy model is essential.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Two-Path GMM-ResNet and GMM-SENet for ASV Spoofing Detection
Authors:
Zhenchun Lei,
Hui Yan,
Changhong Liu,
Minglei Ma,
Yingen Yang
Abstract:
The automatic speaker verification system is sometimes vulnerable to various spoofing attacks. The 2-class Gaussian Mixture Model classifier for genuine and spoofed speech is usually used as the baseline for spoofing detection. However, the GMM classifier does not separately consider the scores of feature frames on each Gaussian component. In addition, the GMM accumulates the scores on all frames…
▽ More
The automatic speaker verification system is sometimes vulnerable to various spoofing attacks. The 2-class Gaussian Mixture Model classifier for genuine and spoofed speech is usually used as the baseline for spoofing detection. However, the GMM classifier does not separately consider the scores of feature frames on each Gaussian component. In addition, the GMM accumulates the scores on all frames independently, and does not consider their correlations. We propose the two-path GMM-ResNet and GMM-SENet models for spoofing detection, whose input is the Gaussian probability features based on two GMMs trained on genuine and spoofed speech respectively. The models consider not only the score distribution on GMM components, but also the relationship between adjacent frames. A two-step training scheme is applied to improve the system robustness. Experiments on the ASVspoof 2019 show that the LFCC+GMM-ResNet system can relatively reduce min-tDCF and EER by 76.1% and 76.3% on logical access scenario compared with the GMM, and the LFCC+GMM-SENet system by 94.4% and 95.4% on physical access scenario. After score fusion, the systems give the second-best results on both scenarios.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Wideband Beamforming with RIS: A Unified Framework via Space-Frequency Transformation
Authors:
Xiaowei Qian,
Xiaoling Hu,
Chenxi Liu,
Mugen Peng
Abstract:
The spectrum shift from the sub-6G band to the high-frequency band has posed an ever-increasing demand on the paradigm shift from narrowband beamforming to wideband beamforming. Despite recent research efforts, the problem of wideband beamforming design is particularly challenging in reconfigurable intelligent surface (RIS)-assisted systems, due to that RIS is not capable of performing frequency-d…
▽ More
The spectrum shift from the sub-6G band to the high-frequency band has posed an ever-increasing demand on the paradigm shift from narrowband beamforming to wideband beamforming. Despite recent research efforts, the problem of wideband beamforming design is particularly challenging in reconfigurable intelligent surface (RIS)-assisted systems, due to that RIS is not capable of performing frequency-dependent phase shift, therefore inducing high signal processing complexity. In this paper, we propose a simple-yet-efficient wideband beamforming design for RIS-assisted systems, in which a transmitter sends wideband signals to a desired target, through the aid of the RIS. In our proposed design, we exploit space-frequency Fourier transformation and stationary phase method to yield an approximate closed-form solution of the RIS phase shifts which significantly reduces the signal processing complexity, compared to the existing approaches. The obtained solution is then used to generate a large and flat beampattern over the desired frequency band. Through numerical results, we validate the effectiveness of our proposed beamforming design and demonstrate how it can improve system performances in terms of communication rate and sensing resolution. Beyond generating the flat beampattern, we highlight that our proposed design is capable of mimicking any desired beampattern by matching the RIS phase shift with the amplitude modulation function, thus providing valuable insights into the design of novel wideband beamforming for RIS-assisted systems.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Gapped Low Energy Excitations Across an Entanglement Percolation Transition in the Quantum Spin Liquid Candidate NaYbSe$_2$
Authors:
Luke Pritchard Cairns,
Yuanqi Lyu,
Chunxiao Liu,
Josue Rodriguez,
Kenneth Ng,
John Singleton,
James G. Analytis
Abstract:
The study of quantum magnetism in frustrated triangular lattices has promised the discovery of exotic excitations emerging from many-body entanglement, like the quantum spin liquid. This field is vexed by the interplay of disorder, correlations and long-range order, whose properties are challenging to control and disentangle. We study the entropy-carrying excitations of a leading candidate in this…
▽ More
The study of quantum magnetism in frustrated triangular lattices has promised the discovery of exotic excitations emerging from many-body entanglement, like the quantum spin liquid. This field is vexed by the interplay of disorder, correlations and long-range order, whose properties are challenging to control and disentangle. We study the entropy-carrying excitations of a leading candidate in this search, the material NaYbSe$_2$, as a function of site dilution to directly address this challenge. We map the evolution of the entangled spins across the percolation transition, showing unequivocal evidence for the presence of an energy gap in the excitations of the system. However, we also show that this gap onsets at the percolation transition where disorder is the greatest, strongly suggesting that it is unlikely be associated with a quantum spin liquid. Instead we suggest the more universal scenario of a short-range ordered state with entropy-carrying boundaries.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Protocol for scaling up a sign-ordered Kitaev chain without magnetic flux control
Authors:
Chun-Xiao Liu,
Sebastian Miles,
Alberto Bordin,
Sebastiaan L. D. ten Haaf,
A. Mert Bozkurt,
Michael Wimmer
Abstract:
Quantum dot-superconductor arrays have emerged as a new and promising material platform for realizing Kitaev chains with Majorana zero modes. So far, experiments have implemented a two-site chain with limited protection. We propose a protocol for scaling up the Kitaev chain that is accessible to current experiments and optimizes the Majorana protection. To this end, we make use of the fact that th…
▽ More
Quantum dot-superconductor arrays have emerged as a new and promising material platform for realizing Kitaev chains with Majorana zero modes. So far, experiments have implemented a two-site chain with limited protection. We propose a protocol for scaling up the Kitaev chain that is accessible to current experiments and optimizes the Majorana protection. To this end, we make use of the fact that the relative sign of normal and superconducting hoppings mediated by an Andreev bound state can be changed by electrostatic gates. In this way, our method only relies on the use of individual electrostatic gates on hybrid regions, quantum dots, and tunnel barriers, respectively, without the need for individual magnetic flux control, greatly simplifying the device design. Our work provides guidance for realizing a topologically protected Kitaev chain, which is the building block of error-resilient topological quantum computation.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Global analysis of fragmentation functions to charged hadrons with high-precision data from the LHC
Authors:
Jun Gao,
ChongYang Liu,
XiaoMin Shen,
Hongxi Xing,
Yuxiang Zhao
Abstract:
Fragmentation functions (FFs) are essential non-perturbative QCD inputs for predicting hadron production cross sections in high energy scatterings. In this study, we present a joint determination of FFs for light charged hadrons through a global analysis at next-to-leading order (NLO) in QCD. Our analysis incorporates a wide range of precision measurements from the LHC, as well as data from electr…
▽ More
Fragmentation functions (FFs) are essential non-perturbative QCD inputs for predicting hadron production cross sections in high energy scatterings. In this study, we present a joint determination of FFs for light charged hadrons through a global analysis at next-to-leading order (NLO) in QCD. Our analysis incorporates a wide range of precision measurements from the LHC, as well as data from electron-positron collisions and semi-inclusive deep inelastic scatterings. By including measurements of jet fragmentation at the LHC in our global analysis, we are able to impose strong constraints on the gluon FFs. A careful selection of hadron kinematics is applied to ensure the validity of factorization and perturbative calculations of QCD. In addition, we introduce several methodological advances in fitting, resulting in a flexible parametrization form and the inclusion of theoretical uncertainties from perturbative calculations. Our best-fit predictions show very good agreement with the global data, with $χ^2/N_{pt}\sim 0.90$. We also generate a large number of Hessian error sets to estimate uncertainties and correlations of the extracted FFs. FFs to charged pions (kaons and protons) are well constrained for momentum fractions down to 0.01 (0.1). Total momentum of partons carried by light charged hadrons are determined precisely. Their values for $u$, $d$ quarks and gluon saturate at about 50\% for a lower cut of the momentum fraction of 0.01. Pulls from individual datasets and impact of various choices of the analysis are also studied in details. Additionally, we present an update of the FMNLO program used for calculating hadron production cross sections. Our FFs, including the error sets (denoted as NPC23), are publicly available in the form of LHAPDF6 grids.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Rethinking the fundamental performance limits of integrated sensing and communication systems
Authors:
Zhouyuan Yu,
Xiaoling Hu,
Chenxi Liu,
Mugen Peng
Abstract:
Integrated sensing and communication (ISAC) has been recognized as a key enabler and feature of future wireless networks. In the existing works analyzing the performances of ISAC, discrete-time systems were commonly assumed, which, however, overlooked the impacts of temporal, spectral, and spatial properties. To address this issue, we establish a unified information model for the band-limited cont…
▽ More
Integrated sensing and communication (ISAC) has been recognized as a key enabler and feature of future wireless networks. In the existing works analyzing the performances of ISAC, discrete-time systems were commonly assumed, which, however, overlooked the impacts of temporal, spectral, and spatial properties. To address this issue, we establish a unified information model for the band-limited continuous-time ISAC systems. In the established information model, we employ a novel sensing performance metric, called the sensing mutual information (SMI). Through analysis, we show how the SMI can be utilized as a bridge between the mutual information domain and the mean squared error (MSE) domain. In addition, we illustrate the communication mutual information (CMI)-SMI and CMI-MSE regions to identify the performance bounds of ISAC systems in practical settings and reveal the trade-off between communication and sensing performances. Moreover, via analysis and numerical results, we provide two valuable insights into the design of novel ISAC-enabled systems: i) communication prefers the waveforms of random amplitude, sensing prefers the waveforms of constant amplitude, both communication and sensing favor the waveforms of low correlations with random phases; ii) There exists a linear positive proportional relationship between the allocated time-frequency resource and the achieved communication rate/sensing MSE.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Detection and Multi-Parameter Estimation for NLOS Targets: An IRS-assisted Framework
Authors:
Zhouyuan Yu,
Xiaoling Hu,
Chenxi Liu,
Qin Tao,
Mugen Peng
Abstract:
Intelligent reflecting surface (IRS) has the potential to enhance sensing performance, due to its capability of reshaping the echo signals. Different from the existing literature, which has commonly focused on IRS beamforming optimization, in this paper, we pay special attention to designing effective signal processing approaches to extract sensing information from IRS-reshaped echo signals. To th…
▽ More
Intelligent reflecting surface (IRS) has the potential to enhance sensing performance, due to its capability of reshaping the echo signals. Different from the existing literature, which has commonly focused on IRS beamforming optimization, in this paper, we pay special attention to designing effective signal processing approaches to extract sensing information from IRS-reshaped echo signals. To this end, we investigate an IRS-assisted non-line-of-sight (NLOS) target detection and multi-parameter estimation problem in orthogonal frequency division multiplexing (OFDM) systems. To address this problem, we first propose a novel detection and direction estimation framework, including a low-overhead hierarchical codebook that allows the IRS to generate three-dimensional beams with adjustable beam direction and width, a delay spectrum peak-based beam training scheme for detection and direction estimation, and a beam refinement scheme for further enhancing the accuracy of the direction estimation. Then, we propose a target range and velocity estimation scheme by extracting the delay-Doppler information from the IRS-reshaped echo signals. Numerical results demonstrate that the proposed schemes can achieve 99.7% target detection rate, a 10^{-3}-rad level direction estimation accuracy, and a 10^{-6}-m/10^{-5}-m/s level range/velocity estimation accuracy.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
UniPlane: Unified Plane Detection and Reconstruction from Posed Monocular Videos
Authors:
Yuzhong Huang,
Chen Liu,
Ji Hou,
Ke Huo,
Shiyu Dong,
Fred Morstatter
Abstract:
We present UniPlane, a novel method that unifies plane detection and reconstruction from posed monocular videos. Unlike existing methods that detect planes from local observations and associate them across the video for the final reconstruction, UniPlane unifies both the detection and the reconstruction tasks in a single network, which allows us to directly optimize final reconstruction quality an…
▽ More
We present UniPlane, a novel method that unifies plane detection and reconstruction from posed monocular videos. Unlike existing methods that detect planes from local observations and associate them across the video for the final reconstruction, UniPlane unifies both the detection and the reconstruction tasks in a single network, which allows us to directly optimize final reconstruction quality and fully leverage temporal information. Specifically, we build a Transformers-based deep neural network that jointly constructs a 3D feature volume for the environment and estimates a set of per-plane embeddings as queries. UniPlane directly reconstructs the 3D planes by taking dot products between voxel embeddings and the plane embeddings followed by binary thresholding. Extensive experiments on real-world datasets demonstrate that UniPlane outperforms state-of-the-art methods in both plane detection and reconstruction tasks, achieving +4.6 in F-score in geometry as well as consistent improvements in other geometry and segmentation metrics.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Incremental Gauss--Newton Methods with Superlinear Convergence Rates
Authors:
Zhiling Zhou,
Zhuanghua Liu,
Chengchang Liu,
Luo Luo
Abstract:
This paper addresses the challenge of solving large-scale nonlinear equations with Hölder continuous Jacobians. We introduce a novel Incremental Gauss--Newton (IGN) method within explicit superlinear convergence rate, which outperforms existing methods that only achieve linear convergence rate. In particular, we formulate our problem by the nonlinear least squares with finite-sum structure, and ou…
▽ More
This paper addresses the challenge of solving large-scale nonlinear equations with Hölder continuous Jacobians. We introduce a novel Incremental Gauss--Newton (IGN) method within explicit superlinear convergence rate, which outperforms existing methods that only achieve linear convergence rate. In particular, we formulate our problem by the nonlinear least squares with finite-sum structure, and our method incrementally iterates with the information of one component in each round. We also provide a mini-batch extension to our IGN method that obtains an even faster superlinear convergence rate. Furthermore, we conduct numerical experiments to show the advantages of the proposed methods.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
EDPNet: An Efficient Dual Prototype Network for Motor Imagery EEG Decoding
Authors:
Can Han,
Chen Liu,
Crystal Cai,
Jun Wang,
Dahong Qian
Abstract:
Motor imagery electroencephalograph (MI-EEG) decoding plays a crucial role in developing motor imagery brain-computer interfaces (MI-BCIs). However, decoding intentions from MI remains challenging due to the inherent complexity of EEG signals relative to the small-sample size. In this paper, we propose an Efficient Dual Prototype Network (EDPNet) to enable accurate and fast MI decoding. EDPNet emp…
▽ More
Motor imagery electroencephalograph (MI-EEG) decoding plays a crucial role in developing motor imagery brain-computer interfaces (MI-BCIs). However, decoding intentions from MI remains challenging due to the inherent complexity of EEG signals relative to the small-sample size. In this paper, we propose an Efficient Dual Prototype Network (EDPNet) to enable accurate and fast MI decoding. EDPNet employs a lightweight adaptive spatial-spectral fusion module, which promotes more efficient information fusion between multiple EEG electrodes. Subsequently, a parameter-free multi-scale variance pooling module extracts more comprehensive temporal features. Furthermore, we introduce dual prototypical learning to optimize the feature space distribution and training process, thereby improving the model's generalization ability on small-sample MI datasets. Our experimental results show that the EDPNet outperforms state-of-the-art models with superior classification accuracy and kappa values (84.11% and 0.7881 for dataset BCI competition IV 2a, 86.65% and 0.7330 for dataset BCI competition IV 2b). Additionally, we use the BCI competition III IVa dataset with fewer training data to further validate the generalization ability of the proposed EDPNet. We also achieve superior performance with 82.03% classification accuracy. Benefiting from the lightweight parameters and superior decoding accuracy, our EDPNet shows great potential for MI-BCI applications. The code is publicly available at https://github.com/hancan16/EDPNet.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Stereo Risk: A Continuous Modeling Approach to Stereo Matching
Authors:
Ce Liu,
Suryansh Kumar,
Shuhang Gu,
Radu Timofte,
Yao Yao,
Luc Van Gool
Abstract:
We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision. As it is well-known that stereo matching boils down to a per-pixel disparity estimation problem, the popular state-of-the-art stereo-matching approaches widely rely on regressing the scene disparity values, yet via discretization of scene disparity values. Such discretization o…
▽ More
We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision. As it is well-known that stereo matching boils down to a per-pixel disparity estimation problem, the popular state-of-the-art stereo-matching approaches widely rely on regressing the scene disparity values, yet via discretization of scene disparity values. Such discretization often fails to capture the nuanced, continuous nature of scene depth. Stereo Risk departs from the conventional discretization approach by formulating the scene disparity as an optimal solution to a continuous risk minimization problem, hence the name "stereo risk". We demonstrate that $L^1$ minimization of the proposed continuous risk function enhances stereo-matching performance for deep networks, particularly for disparities with multi-modal probability distributions. Furthermore, to enable the end-to-end network training of the non-differentiable $L^1$ risk optimization, we exploited the implicit function theorem, ensuring a fully differentiable network. A comprehensive analysis demonstrates our method's theoretical soundness and superior performance over the state-of-the-art methods across various benchmark datasets, including KITTI 2012, KITTI 2015, ETH3D, SceneFlow, and Middlebury 2014.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification
Authors:
Hui Yan,
Zhenchun Lei,
Changhong Liu,
Yong Zhou
Abstract:
With the development of deep learning, many different network architectures have been explored in speaker verification. However, most network architectures rely on a single deep learning architecture, and hybrid networks combining different architectures have been little studied in ASV tasks. In this paper, we propose the GMM-ResNext model for speaker verification. Conventional GMM does not consid…
▽ More
With the development of deep learning, many different network architectures have been explored in speaker verification. However, most network architectures rely on a single deep learning architecture, and hybrid networks combining different architectures have been little studied in ASV tasks. In this paper, we propose the GMM-ResNext model for speaker verification. Conventional GMM does not consider the score distribution of each frame feature over all Gaussian components and ignores the relationship between neighboring speech frames. So, we extract the log Gaussian probability features based on the raw acoustic features and use ResNext-based network as the backbone to extract the speaker embedding. GMM-ResNext combines Generative and Discriminative Models to improve the generalization ability of deep learning models and allows one to more easily specify meaningful priors on model parameters. A two-path GMM-ResNext model based on two gender-related GMMs has also been proposed. The Experimental results show that the proposed GMM-ResNext achieves relative improvements of 48.1\% and 11.3\% in EER compared with ResNet34 and ECAPA-TDNN on VoxCeleb1-O test set.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be…
▽ More
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
VFIMamba: Video Frame Interpolation with State Space Models
Authors:
Guozhen Zhang,
Chunxu Liu,
Yutao Cui,
Xiaotong Zhao,
Kai Ma,
Limin Wang
Abstract:
Inter-frame modeling is pivotal in generating intermediate frames for video frame interpolation (VFI). Current approaches predominantly rely on convolution or attention-based models, which often either lack sufficient receptive fields or entail significant computational overheads. Recently, Selective State Space Models (S6) have emerged, tailored specifically for long sequence modeling, offering b…
▽ More
Inter-frame modeling is pivotal in generating intermediate frames for video frame interpolation (VFI). Current approaches predominantly rely on convolution or attention-based models, which often either lack sufficient receptive fields or entail significant computational overheads. Recently, Selective State Space Models (S6) have emerged, tailored specifically for long sequence modeling, offering both linear complexity and data-dependent modeling capabilities. In this paper, we propose VFIMamba, a novel frame interpolation method for efficient and dynamic inter-frame modeling by harnessing the S6 model. Our approach introduces the Mixed-SSM Block (MSB), which initially rearranges tokens from adjacent frames in an interleaved fashion and subsequently applies multi-directional S6 modeling. This design facilitates the efficient transmission of information across frames while upholding linear complexity. Furthermore, we introduce a novel curriculum learning strategy that progressively cultivates proficiency in modeling inter-frame dynamics across varying motion magnitudes, fully unleashing the potential of the S6 model. Experimental findings showcase that our method attains state-of-the-art performance across diverse benchmarks, particularly excelling in high-resolution scenarios. In particular, on the X-TEST dataset, VFIMamba demonstrates a noteworthy improvement of 0.80 dB for 4K frames and 0.96 dB for 2K frames.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
GMM-ResNet2: Ensemble of Group ResNet Networks for Synthetic Speech Detection
Authors:
Zhenchun Lei,
Hui Yan,
Changhong Liu,
Yong Zhou,
Minglei Ma
Abstract:
Deep learning models are widely used for speaker recognition and spoofing speech detection. We propose the GMM-ResNet2 for synthesis speech detection. Compared with the previous GMM-ResNet model, GMM-ResNet2 has four improvements. Firstly, the different order GMMs have different capabilities to form smooth approximations to the feature distribution, and multiple GMMs are used to extract multi-scal…
▽ More
Deep learning models are widely used for speaker recognition and spoofing speech detection. We propose the GMM-ResNet2 for synthesis speech detection. Compared with the previous GMM-ResNet model, GMM-ResNet2 has four improvements. Firstly, the different order GMMs have different capabilities to form smooth approximations to the feature distribution, and multiple GMMs are used to extract multi-scale Log Gaussian Probability features. Secondly, the grouping technique is used to improve the classification accuracy by exposing the group cardinality while reducing both the number of parameters and the training time. The final score is obtained by ensemble of all group classifier outputs using the averaging method. Thirdly, the residual block is improved by including one activation function and one batch normalization layer. Finally, an ensemble-aware loss function is proposed to integrate the independent loss functions of all ensemble members. On the ASVspoof 2019 LA task, the GMM-ResNet2 achieves a minimum t-DCF of 0.0227 and an EER of 0.79\%. On the ASVspoof 2021 LA task, the GMM-ResNet2 achieves a minimum t-DCF of 0.2362 and an EER of 2.19\%, and represents a relative reductions of 31.4\% and 76.3\% compared with the LFCC-LCNN baseline.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis
Authors:
Tianyu Cui,
Shiyu Ma,
Ziang Chen,
Tong Xiao,
Shimin Tao,
Yilun Liu,
Shenglin Zhang,
Duoming Lin,
Changchang Liu,
Yuzhe Cai,
Weibin Meng,
Yongqian Sun,
Dan Pei
Abstract:
Log analysis is crucial for ensuring the orderly and stable operation of information systems, particularly in the field of Artificial Intelligence for IT Operations (AIOps). Large Language Models (LLMs) have demonstrated significant potential in natural language processing tasks. In the AIOps domain, they excel in tasks such as anomaly detection, root cause analysis of faults, operations and maint…
▽ More
Log analysis is crucial for ensuring the orderly and stable operation of information systems, particularly in the field of Artificial Intelligence for IT Operations (AIOps). Large Language Models (LLMs) have demonstrated significant potential in natural language processing tasks. In the AIOps domain, they excel in tasks such as anomaly detection, root cause analysis of faults, operations and maintenance script generation, and alert information summarization. However, the performance of current LLMs in log analysis tasks remains inadequately validated. To address this gap, we introduce LogEval, a comprehensive benchmark suite designed to evaluate the capabilities of LLMs in various log analysis tasks for the first time. This benchmark covers tasks such as log parsing, log anomaly detection, log fault diagnosis, and log summarization. LogEval evaluates each task using 4,000 publicly available log data entries and employs 15 different prompts for each task to ensure a thorough and fair assessment. By rigorously evaluating leading LLMs, we demonstrate the impact of various LLM technologies on log analysis performance, focusing on aspects such as self-consistency and few-shot contextual learning. We also discuss findings related to model quantification, Chinese-English question-answering evaluation, and prompt engineering. These findings provide insights into the strengths and weaknesses of LLMs in multilingual environments and the effectiveness of different prompt strategies. Various evaluation methods are employed for different tasks to accurately measure the performance of LLMs in log analysis, ensuring a comprehensive assessment. The insights gained from LogEvals evaluation reveal the strengths and limitations of LLMs in log analysis tasks, providing valuable guidance for researchers and practitioners.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
ModelVerification.jl: a Comprehensive Toolbox for Formally Verifying Deep Neural Networks
Authors:
Tianhao Wei,
Luca Marzari,
Kai S. Yun,
Hanjiang Hu,
Peizhi Niu,
Xusheng Luo,
Changliu Liu
Abstract:
Deep Neural Networks (DNN) are crucial in approximating nonlinear functions across diverse applications, ranging from image classification to control. Verifying specific input-output properties can be a highly challenging task due to the lack of a single, self-contained framework that allows a complete range of verification types. To this end, we present \texttt{ModelVerification.jl (MV)}, the fir…
▽ More
Deep Neural Networks (DNN) are crucial in approximating nonlinear functions across diverse applications, ranging from image classification to control. Verifying specific input-output properties can be a highly challenging task due to the lack of a single, self-contained framework that allows a complete range of verification types. To this end, we present \texttt{ModelVerification.jl (MV)}, the first comprehensive, cutting-edge toolbox that contains a suite of state-of-the-art methods for verifying different types of DNNs and safety specifications. This versatile toolbox is designed to empower developers and machine learning practitioners with robust tools for verifying and ensuring the trustworthiness of their DNN models.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
TCSR-SQL: Towards Table Content-aware Text-to-SQL with Self-retrieval
Authors:
Wenbo Xu,
Liang Yan,
Peiyi Han,
Haifeng Zhu,
Chuanyi Liu,
Shaoming Duan,
Cuiyun Gao,
Yingwei Liang
Abstract:
Large Language Model-based (LLM-based) Text-to-SQL methods have achieved important progress in generating SQL queries for real-world applications. When confronted with table content-aware questions in real-world scenarios, ambiguous data content keywords and non-existent database schema column names within the question leads to the poor performance of existing methods. To solve this problem, we pr…
▽ More
Large Language Model-based (LLM-based) Text-to-SQL methods have achieved important progress in generating SQL queries for real-world applications. When confronted with table content-aware questions in real-world scenarios, ambiguous data content keywords and non-existent database schema column names within the question leads to the poor performance of existing methods. To solve this problem, we propose a novel approach towards Table Content-aware Text-to-SQL with Self-Retrieval (TCSR-SQL). It leverages LLM's in-context learning capability to extract data content keywords within the question and infer possible related database schema, which is used to generate Seed SQL to fuzz search databases. The search results are further used to confirm the encoding knowledge with the designed encoding knowledge table, including column names and exact stored content values used in the SQL. The encoding knowledge is sent to obtain the final Precise SQL following multi-rounds of generation-execution-revision process. To validate our approach, we introduce a table-content-aware, question-related benchmark dataset, containing 1,692 question-SQL pairs. Comprehensive experiments conducted on this benchmark demonstrate the remarkable performance of TCSR-SQL, achieving an improvement of at least 13.7% in execution accuracy compared to other state-of-the-art methods.
△ Less
Submitted 12 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
CAMON: Cooperative Agents for Multi-Object Navigation with LLM-based Conversations
Authors:
Pengying Wu,
Yao Mu,
Kangjie Zhou,
Ji Ma,
Junting Chen,
Chang Liu
Abstract:
Visual navigation tasks are critical for household service robots. As these tasks become increasingly complex, effective communication and collaboration among multiple robots become imperative to ensure successful completion. In recent years, large language models (LLMs) have exhibited remarkable comprehension and planning abilities in the context of embodied agents. However, their application in…
▽ More
Visual navigation tasks are critical for household service robots. As these tasks become increasingly complex, effective communication and collaboration among multiple robots become imperative to ensure successful completion. In recent years, large language models (LLMs) have exhibited remarkable comprehension and planning abilities in the context of embodied agents. However, their application in household scenarios, specifically in the use of multiple agents collaborating to complete complex navigation tasks through communication, remains unexplored. Therefore, this paper proposes a framework for decentralized multi-agent navigation, leveraging LLM-enabled communication and collaboration. By designing the communication-triggered dynamic leadership organization structure, we achieve faster team consensus with fewer communication instances, leading to better navigation effectiveness and collaborative exploration efficiency. With the proposed novel communication scheme, our framework promises to be conflict-free and robust in multi-object navigation tasks, even when there is a surge in team size.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
KOROL: Learning Visualizable Object Feature with Koopman Operator Rollout for Manipulation
Authors:
Hongyi Chen,
Abulikemu Abuduweili,
Aviral Agrawal,
Yunhai Han,
Harish Ravichandar,
Changliu Liu,
Jeffrey Ichnowski
Abstract:
Learning dexterous manipulation skills presents significant challenges due to complex nonlinear dynamics that underlie the interactions between objects and multi-fingered hands. Koopman operators have emerged as a robust method for modeling such nonlinear dynamics within a linear framework. However, current methods rely on runtime access to ground-truth (GT) object states, making them unsuitable f…
▽ More
Learning dexterous manipulation skills presents significant challenges due to complex nonlinear dynamics that underlie the interactions between objects and multi-fingered hands. Koopman operators have emerged as a robust method for modeling such nonlinear dynamics within a linear framework. However, current methods rely on runtime access to ground-truth (GT) object states, making them unsuitable for vision-based practical applications. Unlike image-to-action policies that implicitly learn visual features for control, we use a dynamics model, specifically the Koopman operator, to learn visually interpretable object features critical for robotic manipulation within a scene. We construct a Koopman operator using object features predicted by a feature extractor and utilize it to auto-regressively advance system states. We train the feature extractor to embed scene information into object features, thereby enabling the accurate propagation of robot trajectories. We evaluate our approach on simulated and real-world robot tasks, with results showing that it outperformed the model-based imitation learning NDP by 1.08$\times$ and the image-to-action Diffusion Policy by 1.16$\times$. The results suggest that our method maintains task success rates with learned features and extends applicability to real-world manipulation without GT object states.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Accurate Shear Recovery with Multi-Band Images of Hyper Suprime-Cam
Authors:
Cong Liu,
Jun Zhang,
Hekun Li,
Pedro Alonso Vaquero,
Wenting Wang
Abstract:
The existing large scale weak lensing surveys typically reserve the best seeing conditions for a certain optical band to minimize shape measurement errors and maximize the number of usable background galaxies. This is because most popular shear measurement methods contain explicit or implicit thresholds on the galaxy-to-PSF (point spread function) size ratio, below which their shape measurement er…
▽ More
The existing large scale weak lensing surveys typically reserve the best seeing conditions for a certain optical band to minimize shape measurement errors and maximize the number of usable background galaxies. This is because most popular shear measurement methods contain explicit or implicit thresholds on the galaxy-to-PSF (point spread function) size ratio, below which their shape measurement errors increase abruptly. Using the DECaLS data, we have previously demonstrated that the Fourier\_Quad method performs very well on poorly resolved galaxy images in general. It is therefore a ready tool for shear measurement with multi-band images regardless of their seeing conditions. In this paper, we apply the Fourier\_Quad pipeline on the multi-band images from the third public data release of the Hyper Suprime-Cam Subaru Strategic Program. We show that the shear catalogs from the five optical bands (g/r/i/z/y) all pass the field-distortion test with very high accuracy. Using the LOWZ and CMASS galaxies as foreground lenses, we show that the errorbar in the galaxy-galaxy lensing measurement can be decreased by factors around 15\% by combining shear catalogs from different bands. This indicates that it is worthful to do multi-bands shear measurements for a better shear statistics.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
On-chip high energy photon radiation source based on microwave-dielectric undulator
Authors:
Fuming Jiang,
Xinyu Xie,
Chengpu Liu,
Ye Tian
Abstract:
A new on-chip light source configuration has been proposed, which utilizes the interaction between microwave and a dielectric nanopillar array to generate a periodic electromagnetic near field, and applies periodic transverse acceleration to relativistic electrons to generate high-energy photon radiation. Here the dielectric nanopillar array interacting with microwave acts as the electron undulato…
▽ More
A new on-chip light source configuration has been proposed, which utilizes the interaction between microwave and a dielectric nanopillar array to generate a periodic electromagnetic near field, and applies periodic transverse acceleration to relativistic electrons to generate high-energy photon radiation. Here the dielectric nanopillar array interacting with microwave acts as the electron undulator, in which the near field drives electrons to oscillate. When an electron beam operates in this nanopillar array in this light source configuration, it is subjected to a periodic transverse near-field force, and will radiate X-ray or even gamma-ray high energy photons after a relativistic frequency up-conversion. Compared with the laser-dielectric undulator based on the interaction between strong lasers and nanostructures to generate a plasmonic near field, this configuration is less prone to damage during operation.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Distinguishing Surface and Bulk Electromagnetism via Their Dynamics in an Intrinsic Magnetic Topological Insulator
Authors:
Khanh Duy Nguyen,
Woojoo Lee,
Jianchen Dang,
Tongyao Wu,
Gabriele Berruto,
Chenhui Yan,
Chi Ian Jess Ip,
Haoran Lin,
Qiang Gao,
Seng Huat Lee,
Binghai Yan,
Chaoxing Liu,
Zhiqiang Mao,
Xiao-Xiao Zhang,
Shuolong Yang
Abstract:
The indirect exchange interaction between local magnetic moments via surface electrons has been long predicted to bolster the surface ferromagnetism in magnetic topological insulators (MTIs), which facilitates the quantum anomalous Hall effect. This unconventional effect is critical to determining the operating temperatures of future topotronic devices. However, the experimental confirmation of th…
▽ More
The indirect exchange interaction between local magnetic moments via surface electrons has been long predicted to bolster the surface ferromagnetism in magnetic topological insulators (MTIs), which facilitates the quantum anomalous Hall effect. This unconventional effect is critical to determining the operating temperatures of future topotronic devices. However, the experimental confirmation of this mechanism remains elusive, especially in intrinsic MTIs. Here we combine time-resolved photoemission spectroscopy with time-resolved magneto-optical Kerr effect measurements to elucidate the unique electromagnetism at the surface of an intrinsic MTI MnBi2Te4. Theoretical modeling based on 2D Ruderman-Kittel-Kasuya-Yosida interactions captures the initial quenching of a surface-rooted exchange gap within a factor of two but over-estimates the bulk demagnetization by one order of magnitude. This mechanism directly explains the sizable gap in the quasi-2D electronic state and the nonzero residual magnetization in even-layer MnBi2Te4. Furthermore, it leads to efficient light-induced demagnetization comparable to state-of-the-art magnetophotonic crystals, promising an effective manipulation of magnetism and topological orders for future topotronics.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
S. Ahmed,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
X. H. Bai,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
J. Bloms,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (495 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions…
▽ More
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components.
△ Less
Submitted 2 July, 2024; v1 submitted 28 June, 2024;
originally announced July 2024.
-
Imaging semiconductor-to-metal transition and topological flat bands of twisted bilayer MoTe2
Authors:
Yufeng Liu,
Yu Gu,
Ting Bao,
Ning Mao,
Can Li,
Shudan Jiang,
Liang Liu,
Dandan Guan,
Yaoyi Li,
Hao Zheng,
Canhua Liu,
Kenji Watanabe,
Takashi Taniguchi,
Wenhui Duan,
Jinfeng Jia,
Xiaoxue Liu,
Yang Zhang,
Tingxin Li,
Shiyong Wang
Abstract:
Two-dimensional (2D) moiré materials have emerged as a highly tunable platform for investigating novel quantum states of matter arising from strong electronic correlations and nontrivial band topology. Recently, topological flat bands formed in 2D semiconducting moiré superlattices have attracted great interests. In particular, a series of topological quantum phases, including the long-sought frac…
▽ More
Two-dimensional (2D) moiré materials have emerged as a highly tunable platform for investigating novel quantum states of matter arising from strong electronic correlations and nontrivial band topology. Recently, topological flat bands formed in 2D semiconducting moiré superlattices have attracted great interests. In particular, a series of topological quantum phases, including the long-sought fractional quantum anomalous Hall (FQAH) effect, have recently been experimentally observed in twisted bilayer MoTe2 (tMoTe2). However, the microscopic information of tMoTe2 moiré superlattice and its electronic structure is still lacking. Here, we present scanning tunneling microscopy and spectroscopy (STM/STS) studies of the tMoTe2 moiré superlattice, with twist angles ranging from about 2.3° to 2.8°. We developed a contact-STM mode to apply pressure on tMoTe2 and observed a phase transition from band insulator to metal of tMoTe2 under pressure at the charge neutrality point. STM imaging reveals a pronounced in-plane lattice reconstruction with periodic strain redistribution in the tMoTe2, which serves as gauge fields for generating topological moiré bands. Importantly, the electronic states of the low-energy moiré flat bands primarily concentrate at the XM and MX regions as revealed by STS imaging. Such spatial distributions are nicely reproduced by our first principal calculations with a large-scale basis, suggesting the low-energy moiré flat bands are formed through the hybridization of K valley bands of the top layer and K' valley bands of the bottom layer. Overall, our findings provide compelling real-space evidence of electronic structure under pressure and topological flat bands of tMoTe2, paving the way for further STM/STS investigations of correlated topological states within the topological flat band in gate-tunable tMoTe2 devices.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec…
▽ More
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale
Authors:
Keenon Werling,
Janelle Kaneda,
Alan Tan,
Rishi Agarwal,
Six Skov,
Tom Van Wouwe,
Scott Uhlrich,
Nicholas Bianco,
Carmichael Ong,
Antoine Falisse,
Shardul Sapkota,
Aidan Chandra,
Joshua Carter,
Ezio Preatoni,
Benjamin Fregly,
Jennifer Hicks,
Scott Delp,
C. Karen Liu
Abstract:
While reconstructing human poses in 3D from inexpensive sensors has advanced significantly in recent years, quantifying the dynamics of human motion, including the muscle-generated joint torques and external forces, remains a challenge. Prior attempts to estimate physics from reconstructed human poses have been hampered by a lack of datasets with high-quality pose and force data for a variety of m…
▽ More
While reconstructing human poses in 3D from inexpensive sensors has advanced significantly in recent years, quantifying the dynamics of human motion, including the muscle-generated joint torques and external forces, remains a challenge. Prior attempts to estimate physics from reconstructed human poses have been hampered by a lack of datasets with high-quality pose and force data for a variety of movements. We present the AddBiomechanics Dataset 1.0, which includes physically accurate human dynamics of 273 human subjects, over 70 hours of motion and force plate data, totaling more than 24 million frames. To construct this dataset, novel analytical methods were required, which are also reported here. We propose a benchmark for estimating human dynamics from motion using this dataset, and present several baseline results. The AddBiomechanics Dataset is publicly available at https://addbiomechanics.org/download_data.html.
△ Less
Submitted 16 May, 2024;
originally announced June 2024.
-
MatchTime: Towards Automatic Soccer Game Commentary Generation
Authors:
Jiayuan Rao,
Haoning Wu,
Chang Liu,
Yanfeng Wang,
Weidi Xie
Abstract:
Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for…
▽ More
Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for soccer game commentary generation, termed as SN-Caption-test-align; Second, we propose a multi-modal temporal alignment pipeline to automatically correct and filter the existing dataset at scale, creating a higher-quality soccer game commentary dataset for training, denoted as MatchTime; Third, based on our curated dataset, we train an automatic commentary generation model, named MatchVoice. Extensive experiments and ablation studies have demonstrated the effectiveness of our alignment pipeline, and training model on the curated datasets achieves state-of-the-art performance for commentary generation, showcasing that better alignment can lead to significant performance improvements in downstream tasks.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Authors:
Zhongwei Wan,
Ziang Wu,
Che Liu,
Jinfa Huang,
Zhihong Zhu,
Peng Jin,
Longyue Wang,
Li Yuan
Abstract:
Long-context Multimodal Large Language Models (MLLMs) demand substantial computational resources for inference as the growth of their multimodal Key-Value (KV) cache, in response to increasing input lengths, challenges memory and time efficiency. Unlike single-modality LLMs that manage only textual contexts, the KV cache of long-context MLLMs includes representations from multiple images with temp…
▽ More
Long-context Multimodal Large Language Models (MLLMs) demand substantial computational resources for inference as the growth of their multimodal Key-Value (KV) cache, in response to increasing input lengths, challenges memory and time efficiency. Unlike single-modality LLMs that manage only textual contexts, the KV cache of long-context MLLMs includes representations from multiple images with temporal and spatial relationships and related textual contexts. The predominance of image tokens means traditional optimizations for LLMs' KV caches are unsuitable for multimodal long-context settings, and no prior works have addressed this challenge. In this work, we introduce LOOK-M, a pioneering, fine-tuning-free approach that efficiently reduces the multimodal KV cache size while maintaining performance comparable to a full cache. We observe that during prompt prefill, the model prioritizes more textual attention over image features, and based on the multimodal interaction observation, a new proposed text-prior method is explored to compress the KV cache. Furthermore, to mitigate the degradation of image contextual information, we propose several compensatory strategies using KV pairs merging. LOOK-M demonstrates that with a significant reduction in KV Cache memory usage, such as reducing it by 80% in some cases, it not only achieves up to 1.5x faster decoding but also maintains or even enhances performance across a variety of long context multimodal tasks.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an…
▽ More
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Anatomizing Societal Recovery at the Microscale: Heterogeneity in Household Lifestyle Activities Rebounding after Disasters
Authors:
Natalie Coleman,
Chenyue Liu,
Ali Mostafavi
Abstract:
This study presents a granular analysis of societal recovery from disasters at the individual level, focusing on the aftermath of Hurricane Harvey and Hurricane Ida. Societal recovery is defined as the restoration of the societal functioning of the affected community to its normal/steady-state level. It evaluates the recovery of impacted residents based on fluctuations in their lifestyle patterns…
▽ More
This study presents a granular analysis of societal recovery from disasters at the individual level, focusing on the aftermath of Hurricane Harvey and Hurricane Ida. Societal recovery is defined as the restoration of the societal functioning of the affected community to its normal/steady-state level. It evaluates the recovery of impacted residents based on fluctuations in their lifestyle patterns in visits to points of interest. The analysis focuses on: (1) the extent of heterogeneity in lifestyle recovery of residents in the same spatial area; and (2) the extent to which variations in lifestyle recovery and its heterogeneity among users can be explained based on hazard impact extent and social vulnerability. As lifestyle recovery progresses, heterogeneity diminishes, indicating that lower lifestyle recovery rates correlate with higher heterogeneity within a spatial area. This relationship between lifestyle recovery and heterogeneity can lead to the misestimation of recovery timelines, potentially resulting in the inefficient allocation of resources and disproportionate attention to already recovering communities. Key contributions of the study are fourfold: First, it characterizes societal recovery at the finest scale by examining fluctuations in individual lifestyles, revealing heterogeneity even among neighbors. Second, it proposes using individual lifestyle as an indicator of societal functioning to measure, more human centrically, disaster impacts and recovery speeds. Third, it introduces a method for quantifying lifestyle recovery that enables near-real-time monitoring, departing from traditional survey-based methods. Fourth, it provides empirical insights into the relationship between disaster impacts and societal recovery, showing that the severity of disaster impacts and resident income levels and percentage of minority populations influence recovery durations.
△ Less
Submitted 27 June, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.