subscribe to arXiv mailings

MPCODER: Multi-user Personalized Code Generator with Explicit and Implicit Style Representation Learning

Authors: Zhenlong Dai, Chang Yao, WenKang Han, Ying Yuan, Zhipeng Gao, Jingyuan Chen

Abstract: Large Language Models (LLMs) have demonstrated great potential for assisting developers in their daily development. However, most research focuses on generating correct code, how to use LLMs to generate personalized code has seldom been investigated. To bridge this gap, we proposed MPCoder (Multi-user Personalized Code Generator) to generate personalized code for multiple users. To better learn co… ▽ More Large Language Models (LLMs) have demonstrated great potential for assisting developers in their daily development. However, most research focuses on generating correct code, how to use LLMs to generate personalized code has seldom been investigated. To bridge this gap, we proposed MPCoder (Multi-user Personalized Code Generator) to generate personalized code for multiple users. To better learn coding style features, we utilize explicit coding style residual learning to capture the syntax code style standards and implicit style learning to capture the semantic code style conventions. We train a multi-user style adapter to better differentiate the implicit feature representations of different users through contrastive learning, ultimately enabling personalized code generation for multiple users. We further propose a novel evaluation metric for estimating similarities between codes of different coding styles. The experimental results show the effectiveness of our approach for this novel task. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted by ACL 2024, Main Conference

arXiv:2406.11382 [pdf, other]

Baryon-number-violating nucleon decays in ALP effective field theories

Authors: Tong Li, Michael A. Schmidt, Chang-Yuan Yao

Abstract: The search for baryon-number-violating (BNV) nucleon decay is an intriguing probe of new physics beyond the SM in future neutrino experiments with enhanced sensitivity. The dark sector states such as an axion or axion-like particle (ALP) can induce nucleon decays with distinct signature and kinematics from the conventional nucleon decays. In this work, we study the ALP effective field theories (EF… ▽ More The search for baryon-number-violating (BNV) nucleon decay is an intriguing probe of new physics beyond the SM in future neutrino experiments with enhanced sensitivity. The dark sector states such as an axion or axion-like particle (ALP) can induce nucleon decays with distinct signature and kinematics from the conventional nucleon decays. In this work, we study the ALP effective field theories (EFTs) with baryon number violation and the impact of light ALP on BNV nucleon decays. We revisit the dimension-8 BNV operators in the extended EFTs with an ALP field $a$ respecting shift symmetry. The low-energy EFT operators with $|Δ(B-L)|=2$ and $|Δ(B-L)|=0$ are matched to the baryon chiral perturbation theory. We obtain the effective chiral Lagrangian and the BNV interactions between ALP and baryons/mesons. The ALP interactions lead to two-body baryon decays $B\to \ell~({\rm or}~ν)~a$ and three-body nucleon decays $N\to M~\ell~({\rm or}~ν)~a$. We obtain the constraints on the UV scale from the invisible $Λ^0$ decay search at BESIII, the invisible neutron decay search at KamLAND and proton decay search at Super-K. We also show the projections of some other baryon/nucleon decays and present the distinct distributions of kinematic observable. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 28 pages, 4 figures, 7 tables

Report number: CPPC-2024-05, DESY-24-082

arXiv:2406.06062 [pdf, other]

ProcessPainter: Learn Painting Process from Sequence Data

Authors: Yiren Song, Shijie Huang, Chen Yao, Xiaojun Ye, Hai Ci, Jiaming Liu, Yuxuan Zhang, Mike Zheng Shou

Abstract: The painting process of artists is inherently stepwise and varies significantly among different painters and styles. Generating detailed, step-by-step painting processes is essential for art education and research, yet remains largely underexplored. Traditional stroke-based rendering methods break down images into sequences of brushstrokes, yet they fall short of replicating the authentic processe… ▽ More The painting process of artists is inherently stepwise and varies significantly among different painters and styles. Generating detailed, step-by-step painting processes is essential for art education and research, yet remains largely underexplored. Traditional stroke-based rendering methods break down images into sequences of brushstrokes, yet they fall short of replicating the authentic processes of artists, with limitations confined to basic brushstroke modifications. Text-to-image models utilizing diffusion processes generate images through iterative denoising, also diverge substantially from artists' painting process. To address these challenges, we introduce ProcessPainter, a text-to-video model that is initially pre-trained on synthetic data and subsequently fine-tuned with a select set of artists' painting sequences using the LoRA model. This approach successfully generates painting processes from text prompts for the first time. Furthermore, we introduce an Artwork Replication Network capable of accepting arbitrary-frame input, which facilitates the controlled generation of painting processes, decomposing images into painting sequences, and completing semi-finished artworks. This paper offers new perspectives and tools for advancing art education and image generation technology. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.03639 [pdf, ps, other]

Gravitating vortices and Symplectic Reduction by Stages

Authors: L. Álvarez-Cónsul, M. Garcia-Fernandez, O. García-Prada, V. P. Pingali, C. -J. Yao

Abstract: We undertake a novel approach to the existence problem for gravitating vortices on a Riemann surface based on symplectic reduction by stages, which seems to be new in the PDE as well as the gauge theory literature. The main technical tool for our study is the reduced $α$-K-energy, for which we establish convexity properties by means of finite-energy pluripotential theory, as recently applied to th… ▽ More We undertake a novel approach to the existence problem for gravitating vortices on a Riemann surface based on symplectic reduction by stages, which seems to be new in the PDE as well as the gauge theory literature. The main technical tool for our study is the reduced $α$-K-energy, for which we establish convexity properties by means of finite-energy pluripotential theory, as recently applied to the study of constant scalar curvature Kähler metrics. Using these methods, we prove that the existence of solutions to the gravitating vortex equations on the sphere implies the polystability of the effective divisor defined by the zeroes of the Higgs field. This approach also enables us to establish the uniqueness of gravitating vortices in any admissible Kähler class, in the absence of automorphisms. Lastly, we also prove the existence of solutions for the gravitating vortex equations for genus $g\geq 1$ for certain ranges of the coupling constant $α$ and the volume. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 48 pages, no figures, comments are welcome

MSC Class: Primary 53C07; Secondary 53D20; 53C25

arXiv:2406.02430 [pdf, other]

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and subjective evaluations. With fine-tuning, we achieve even higher subjective scores across these metrics. Seed-TTS offers superior controllability over various speech attributes such as emotion and is capable of generating highly expressive and diverse speech for speakers in the wild. Furthermore, we propose a self-distillation method for speech factorization, as well as a reinforcement learning approach to enhance model robustness, speaker similarity, and controllability. We additionally present a non-autoregressive (NAR) variant of the Seed-TTS model, named $\text{Seed-TTS}_\text{DiT}$, which utilizes a fully diffusion-based architecture. Unlike previous NAR-based TTS systems, $\text{Seed-TTS}_\text{DiT}$ does not depend on pre-estimated phoneme durations and performs speech generation through end-to-end processing. We demonstrate that this variant achieves comparable performance to the language model-based variant and showcase its effectiveness in speech editing. We encourage readers to listen to demos at \url{https://bytedancespeech.github.io/seedtts_tech_report}. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.00094 [pdf, other]

The flavor invariants of the $ν$SM

Authors: Christophe Grojean, Jonathan Kley, Damien Leflot, Chang-Yuan Yao

Abstract: Sixty years after the experimental discovery of CP violation in the quark sector, the existence of a similar CP violation in the lepton sector is still to be established. Actually, the structure of such a violation depends crucially on the origin of the neutrino masses. In an attempt at categorizing the leptonic sources of CP violation, we studied the $ν$SM, the Standard Model extended with three… ▽ More Sixty years after the experimental discovery of CP violation in the quark sector, the existence of a similar CP violation in the lepton sector is still to be established. Actually, the structure of such a violation depends crucially on the origin of the neutrino masses. In an attempt at categorizing the leptonic sources of CP violation, we studied the $ν$SM, the Standard Model extended with three generations of sterile neutrinos, that can interpolate continuously between the Dirac and Majorana scenarios of neutrino masses. In particular, we perform a classification of the Jarlskog-like flavor invariants entering CP-violating observables and we study their suppression with the heavy Majorana mass in the seesaw limit of the model. To simplify the construction of the invariants, we introduce a graph-based method. With the guidance of the Hilbert series and plethystic logarithm of the theory, we construct the \emph{generating} and \emph{primary} sets of invariants for the $ν$SM for the first time. Unlike in the Standard Model and some other theories, we find that the numbers of generating invariants and the syzygies among them cannot immediately be read off from the plethystic logarithm, but require a more careful examination. Our analysis reveals that the \emph{generating} set contains 459 invariants, out of which 208 are CP-even and 251 are CP-odd. In the seesaw limit of the $ν$SM, we show that all parameters of the UV theory can be captured in the effective theory with a certain suppression with the heavy Majorana mass, while these parameters can only appear in a \emph{flavor-invariant} way with a \emph{higher} mass suppression. Furthermore, we discuss how the necessary and sufficient conditions for CP violation can be captured by utilizing these invariants. Along the way, we present useful algorithms to enumerate and build the flavor invariants. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 27 pages + appendices, 3 figures

Report number: CERN-TH-2024-076, DESY-24-021, HU-EP-24/14

arXiv:2405.18458 [pdf]

Asymmetrical estimator for training grey-box deep photonic neural networks

Authors: Yizhi Wang, Minjia Chen, Chunhui Yao, Jie Ma, Ting Yan, Richard Penty, Qixiang Cheng

Abstract: Physical neural networks (PNNs) are emerging paradigms for neural network acceleration due to their high-bandwidth, in-propagation analogue processing. Despite the advantages of PNN for inference, training remains a challenge. The imperfect information of the physical transformation means the failure of conventional gradient-based updates from backpropagation (BP). Here, we present the asymmetrica… ▽ More Physical neural networks (PNNs) are emerging paradigms for neural network acceleration due to their high-bandwidth, in-propagation analogue processing. Despite the advantages of PNN for inference, training remains a challenge. The imperfect information of the physical transformation means the failure of conventional gradient-based updates from backpropagation (BP). Here, we present the asymmetrical training (AT) method, which treats the PNN structure as a grey box. AT performs training while only knowing the last layer output and neuron topological connectivity of a deep neural network structure, not requiring information about the physical control-transformation mapping. We experimentally demonstrated the AT method on deep grey-box PNNs implemented by uncalibrated photonic integrated circuits (PICs), improving the classification accuracy of Iris flower and modified MNIST hand-written digits from random guessing to near theoretical maximum. We also showcased the consistently enhanced performance of AT over BP for different datasets, including MNIST, fashion-MNIST, and Kuzushiji-MNIST. The AT method demonstrated successful training with minimal hardware overhead and reduced computational overhead, serving as a robust light-weight training alternative to fully explore the advantages of physical computation. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 17 pages, 5 figures

MSC Class: 78-05

arXiv:2405.14336 [pdf, other]

I$^2$VC: A Unified Framework for Intra- & Inter-frame Video Compression

Authors: Meiqin Liu, Chenming Xu, Yukai Gu, Chao Yao, Yao Zhao

Abstract: Video compression aims to reconstruct seamless frames by encoding the motion and residual information from existing frames. Previous neural video compression methods necessitate distinct codecs for three types of frames (I-frame, P-frame and B-frame), which hinders a unified approach and generalization across different video contexts. Intra-codec techniques lack the advanced Motion Estimation and… ▽ More Video compression aims to reconstruct seamless frames by encoding the motion and residual information from existing frames. Previous neural video compression methods necessitate distinct codecs for three types of frames (I-frame, P-frame and B-frame), which hinders a unified approach and generalization across different video contexts. Intra-codec techniques lack the advanced Motion Estimation and Motion Compensation (MEMC) found in inter-codec, leading to fragmented frameworks lacking uniformity. Our proposed Intra- & Inter-frame Video Compression (I$^2$VC) framework employs a single spatio-temporal codec that guides feature compression rates according to content importance. This unified codec transforms the dependence across frames into a conditional coding scheme, thus integrating intra- and inter-frame compression into one cohesive strategy. Given the absence of explicit motion data, achieving competent inter-frame compression with only a conditional codec poses a challenge. To resolve this, our approach includes an implicit inter-frame alignment mechanism. With the pre-trained diffusion denoising process, the utilization of a diffusion-inverted reference feature rather than random noise supports the initial compression state. This process allows for selective denoising of motion-rich regions based on decoded features, facilitating accurate alignment without the need for MEMC. Our experimental findings, across various compression configurations (AI, LD and RA) and frame types, prove that I$^2$VC outperforms the state-of-the-art perceptual learned codecs. Impressively, it exhibits a 58.4% enhancement in perceptual reconstruction performance when benchmarked against the H.266/VVC standard (VTM). Official implementation can be found at https://github.com/GYukai/I2VC. △ Less

Submitted 1 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: 19 pages, 10 figures

arXiv:2405.02313 [pdf, ps, other]

Physics-informed Data-driven Cavitation Model for a Specific MG EOS

Authors: Minsheng Huang, Chengbao Yao, Pan Wang, Lidong Cheng, Wenjun Ying

Abstract: We present a novel one-fluid cavitation model of a specific Mie-Grüneisen equation of state(EOS), named polynomial EOS, based on an artificial neural network. Not only the physics-informed equation but also the experimental data are embedded into the proposed model by an optimization problem. The physics-informed data-driven model provides the concerned pressure within the cavitation region, where… ▽ More We present a novel one-fluid cavitation model of a specific Mie-Grüneisen equation of state(EOS), named polynomial EOS, based on an artificial neural network. Not only the physics-informed equation but also the experimental data are embedded into the proposed model by an optimization problem. The physics-informed data-driven model provides the concerned pressure within the cavitation region, where the density tends to zero when the pressure falls below the saturated pressure. The present model is then applied to computing the challenging compressible multi-phase flow simulation, such as nuclear and underwater explosions. Numerical simulations show that our model in application agrees well with the corresponding experimental data, ranging from one dimension to three dimensions with the $h-$adaptive mesh refinement algorithm and load balance techniques in the structured and unstructured grid. △ Less

Submitted 5 April, 2024; originally announced May 2024.

Comments: 29 pages, 18 figures

arXiv:2405.00277 [pdf, other]

The strong-coupling quantum thermodynamics of quantum Brownian motion based on the exact solution of its reduced density matrix

Authors: Chuan-Zhe Yao, Wei-Min Zhang

Abstract: We derive the quantum thermodynamics of quantum Brownian motion from the exact solution of its reduced density matrix. We start from the total equilibrium thermal state between the Brownian particle and its reservoir, and solve analytically and exactly the reduced density matrix of the system by taking the partial trace over all the reservoir states. We find that the reduced Hamiltonian and the re… ▽ More We derive the quantum thermodynamics of quantum Brownian motion from the exact solution of its reduced density matrix. We start from the total equilibrium thermal state between the Brownian particle and its reservoir, and solve analytically and exactly the reduced density matrix of the system by taking the partial trace over all the reservoir states. We find that the reduced Hamiltonian and the reduced partition function of the Brownian motion must be renormalized significantly, as shown in the general nonperturbative renormalization theory of quantum thermodynamics for open quantum systems we developed recently [Phys. Rev. Res. 4, 023141 (2022)]. The reduced Hamiltonian contains not only a frequency shift but also a squeezing pairing interaction, where a momentum-dependent potential is generated naturally from the strong coupling between the Brownian particle and the reservoir, after traced over all the reservoir states. The resulting exact reduced density matrix of the Brownian motion is given by a squeezing thermal state. Moreover, beyond the weak coupling limit, in order to obtain correctly the reduced partition function of the Brownian motion, one must take into account the non-negligible changes of the reservoir state induced by the system-reservoir coupling. Using the exact solutions of the reduced density matrix, the reduced Hamiltonian as well as the reduced partition function of the Brownian motion, we show that the controversial results obtained from the different definitions of internal energy and the issue of the negative heat capacity in the previous studies of strong-coupling quantum thermodynamics are resolved. △ Less

Submitted 5 July, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

Comments: 21 pages, 5 figures

arXiv:2404.15016 [pdf, ps, other]

Convergence of the hypersymplectic flow on $T^4$ with $T^3$-symmetry

Authors: Joel Fine, Weiyong He, Chengjian Yao

Abstract: A hypersymplectic structure on a 4-manifold is a triple $ω_1, ω_2, ω_3$ of 2-forms for which every non-trivial linear combination $a^1ω_1 + a^2 ω_2 + a^3 ω_3$ is a symplectic form. Donaldson has conjectured that when the underlying manifold is compact, any such structure is isotopic in its cohomolgy class to a hyperkähler triple. We prove this conjecture for a hypersymplectic structure on $T^4$ wh… ▽ More A hypersymplectic structure on a 4-manifold is a triple $ω_1, ω_2, ω_3$ of 2-forms for which every non-trivial linear combination $a^1ω_1 + a^2 ω_2 + a^3 ω_3$ is a symplectic form. Donaldson has conjectured that when the underlying manifold is compact, any such structure is isotopic in its cohomolgy class to a hyperkähler triple. We prove this conjecture for a hypersymplectic structure on $T^4$ which is invariant under the standard $T^3$ action. The proof uses the hypersymplectic flow, a geometric flow which attempts to deform a given hypersymplectic structure to a hyperkähler triple. We prove that on $T^4$, when starting from a $T^3$-invariant hypersymplectic structure, the flow exists for all time and converges modulo diffeomorphisms to the unique cohomologous hyperkähler structure. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 25 pages

MSC Class: 58J35; 53C26; 53D05

arXiv:2404.13600 [pdf, other]

Are We Ready for Planetary Exploration Robots? The TAIL-Plus Dataset for SLAM in Granular Environments

Authors: Zirui Wang, Chen Yao, Yangtao Ge, Guowei Shi, Ningbo Yang, Zheng Zhu, Kewei Dong, Hexiang Wei, Zhenzhong Jia, Jing Wu

Abstract: So far, planetary surface exploration depends on various mobile robot platforms. The autonomous navigation and decision-making of these mobile robots in complex terrains largely rely on their terrain-aware perception, localization and mapping capabilities. In this paper we release the TAIL-Plus dataset, a new challenging dataset in deformable granular environments for planetary exploration robots,… ▽ More So far, planetary surface exploration depends on various mobile robot platforms. The autonomous navigation and decision-making of these mobile robots in complex terrains largely rely on their terrain-aware perception, localization and mapping capabilities. In this paper we release the TAIL-Plus dataset, a new challenging dataset in deformable granular environments for planetary exploration robots, which is an extension to our previous work, TAIL (Terrain-Aware multI-modaL) dataset. We conducted field experiments on beaches that are considered as planetary surface analog environments for diverse sandy terrains. In TAIL-Plus dataset, we provide more sequences with multiple loops and expand the scene from day to night. Benefit from our sensor suite with modular design, we use both wheeled and quadruped robots for data collection. The sensors include a 3D LiDAR, three downward RGB-D cameras, a pair of global-shutter color cameras that can be used as a forward-looking stereo camera, an RTK-GPS device and an extra IMU. Our datasets are intended to help researchers developing multi-sensor simultaneous localization and mapping (SLAM) algorithms for robots in unstructured, deformable granular terrains. Our datasets and supplementary materials will be available at \url{https://tailrobot.github.io/}. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: Accepted to the IEEE ICRA Workshop on Field Robotics 2024

arXiv:2404.09986 [pdf]

Thermal conversion of ultrathin nickel hydroxide for wide bandgap 2D nickel oxides

Authors: Lu Ping, Nicholas Russo, Zifan Wang, Ching-Hsiang Yao, Kevin E. Smith, Xi Ling

Abstract: Wide bandgap (WBG) semiconductors (Eg >2.0 eV) are integral to the advancement of next generation electronics, optoelectronics, and power industries, owing to their capability for high temperature operation, high breakdown voltage and efficient light emission. Enhanced power efficiency and functional performance can be attained through miniaturization, specifically via the integration of device fa… ▽ More Wide bandgap (WBG) semiconductors (Eg >2.0 eV) are integral to the advancement of next generation electronics, optoelectronics, and power industries, owing to their capability for high temperature operation, high breakdown voltage and efficient light emission. Enhanced power efficiency and functional performance can be attained through miniaturization, specifically via the integration of device fabrication into two-dimensional (2D) structure enabled by WBG 2D semiconductors. However, as an essential subgroup of WBG semiconductors, 2D transition metal oxides (TMOs) remain largely underexplored in terms of physical properties and applications in 2D opto-electronic devices, primarily due to the scarcity of sufficiently large 2D crystals. Thus, our goal is to develop synthesis pathways for 2D TMOs possessing large crystal domain (e.g. >10 nm), expanding the 2D TMOs family and providing insights for future engineering of 2D TMOs. Here, we demonstrate the synthesis of WBG 2D nickel oxide (NiO) (Eg > 2.7 eV) thermally converted from 2D nickel hydroxide (Ni(OH)2) with the lateral domain size larger than 10 um. Moreover, the conversion process is investigated using various microscopic techniques such as atomic force microscopy (AFM), Raman spectroscopy, transmission electron microscopy (TEM) and X-ray photoelectron spectroscopy (XPS), providing significant insights on the morphology and structure variation under different oxidative conditions. The electronic structure of the converted NixOy is further investigated using multiple soft X-ray spectroscopies, such as X-ray absorption (XAS) and emission spectroscopies (XES). △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.06853 [pdf, other]

Revealing mechanism of pore defect formation in laser directed energy deposition of aluminum alloy via in-situ synchrotron X-ray imaging

Authors: Wei Liu, Yuxiao Li, Chunxia Yao, Dongsheng Zhang, Darui Sun, Sen Chen, Yu Wu, Jun Wang, Lei Lud, Sheng-Nian Luo, Ye Tao, Bingbing Zhang

Abstract: Laser metal additive manufacturing technology is capable of producing components with complex geometries and compositions that cannot be realized by conventional manufacturing methods. However, a large number of pores generated during the additive manufacturing process greatly affect the mechanical properties of the additively manufactured parts, and the mechanism of such pore generation has not b… ▽ More Laser metal additive manufacturing technology is capable of producing components with complex geometries and compositions that cannot be realized by conventional manufacturing methods. However, a large number of pores generated during the additive manufacturing process greatly affect the mechanical properties of the additively manufactured parts, and the mechanism of such pore generation has not been revealed by direct observation clearly. Here, we report the mechanism of pore generation in the laser direct energy deposition process as revealed by {\it in-situ} high-speed high-resolution synchrotron X-ray imaging. We found that dissolution and re-precipitation of external gases and precipitation of metal vapors are the two main mechanisms of pore formation. We further explored the effects of different process parameters on the generation of pores and optimized the process to suppress pore generation. This work provides important insights into the formation of porosity defects during laser metal additive manufacturing, and can provide guidance for related process optimization. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 7 figures

arXiv:2404.05225 [pdf, other]

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

Authors: Chuwei Luo, Yufan Shen, Zhaoqing Zhu, Qi Zheng, Zhi Yu, Cong Yao

Abstract: Recently, leveraging large language models (LLMs) or multimodal large language models (MLLMs) for document understanding has been proven very promising. However, previous works that employ LLMs/MLLMs for document understanding have not fully explored and utilized the document layout information, which is vital for precise document understanding. In this paper, we propose LayoutLLM, an LLM/MLLM bas… ▽ More Recently, leveraging large language models (LLMs) or multimodal large language models (MLLMs) for document understanding has been proven very promising. However, previous works that employ LLMs/MLLMs for document understanding have not fully explored and utilized the document layout information, which is vital for precise document understanding. In this paper, we propose LayoutLLM, an LLM/MLLM based method for document understanding. The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts. The proposed layout instruction tuning strategy consists of two components: Layout-aware Pre-training and Layout-aware Supervised Fine-tuning. To capture the characteristics of document layout in Layout-aware Pre-training, three groups of pre-training tasks, corresponding to document-level, region-level and segment-level information, are introduced. Furthermore, a novel module called layout chain-of-thought (LayoutCoT) is devised to enable LayoutLLM to focus on regions relevant to the question and generate accurate answers. LayoutCoT is effective for boosting the performance of document understanding. Meanwhile, it brings a certain degree of interpretability, which could facilitate manual inspection and correction. Experiments on standard benchmarks show that the proposed LayoutLLM significantly outperforms existing methods that adopt open-source 7B LLMs/MLLMs for document understanding. The training data of the LayoutLLM is publicly available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/LayoutLLM △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2403.19128 [pdf, other]

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

Authors: Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang

Abstract: Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions. Various methods have been proposed to address the challenging problem of VsTP. However, due to the diversified targets and heterogeneous… ▽ More Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions. Various methods have been proposed to address the challenging problem of VsTP. However, due to the diversified targets and heterogeneous schemas, previous works usually design task-specific architectures and objectives for individual tasks, which inadvertently leads to modal isolation and complex workflow. In this paper, we propose a unified paradigm for parsing visually-situated text across diverse scenarios. Specifically, we devise a universal model, called OmniParser, which can simultaneously handle three typical visually-situated text parsing tasks: text spotting, key information extraction, and table recognition. In OmniParser, all tasks share the unified encoder-decoder architecture, the unified objective: point-conditioned text generation, and the unified input & output representation: prompt & structured sequences. Extensive experiments demonstrate that the proposed OmniParser achieves state-of-the-art (SOTA) or highly competitive performances on 7 datasets for the three visually-situated text parsing tasks, despite its unified, concise design. The code is available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2403.17842 [pdf, other]

Experimental Realization of Discrete Time Quasi-Crystals

Authors: Guanghui He, Bingtian Ye, Ruotian Gong, Changyu Yao, Zhongyuan Liu, Kater W. Murch, Norman Y. Yao, Chong Zu

Abstract: Floquet (periodically driven) systems can give rise to unique non-equilibrium phases of matter without equilibrium analogs. The most prominent example is the realization of discrete time crystals. An intriguing question emerges: what other novel phases can manifest when the constraint of time periodicity is relaxed? In this study, we explore quantum systems subjected to a quasi-periodic drive. Lev… ▽ More Floquet (periodically driven) systems can give rise to unique non-equilibrium phases of matter without equilibrium analogs. The most prominent example is the realization of discrete time crystals. An intriguing question emerges: what other novel phases can manifest when the constraint of time periodicity is relaxed? In this study, we explore quantum systems subjected to a quasi-periodic drive. Leveraging a strongly interacting spin ensemble in diamond, we identify the emergence of long-lived discrete time quasi-crystals. Unlike conventional time crystals, time quasi-crystals exhibit robust sub-harmonic responses at multiple incommensurate frequencies. Furthermore, we show that the multi-frequency nature of the quasi-periodic drive allows for the formation of diverse patterns associated with different discrete time quasi-crystalline phases. Our findings demonstrate the existence of non-equilibrium phases in quasi-Floquet settings, significantly broadening the catalog of novel phenomena in driven many-body quantum systems. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 7+5 pages, 4+5 figures

arXiv:2403.16875 [pdf, other]

TAIL: A Terrain-Aware Multi-Modal SLAM Dataset for Robot Locomotion in Deformable Granular Environments

Authors: Chen Yao, Yangtao Ge, Guowei Shi, Zirui Wang, Ningbo Yang, Zheng Zhu, Hexiang Wei, Yuntian Zhao, Jing Wu, Zhenzhong Jia

Abstract: Terrain-aware perception holds the potential to improve the robustness and accuracy of autonomous robot navigation in the wilds, thereby facilitating effective off-road traversals. However, the lack of multi-modal perception across various motion patterns hinders the solutions of Simultaneous Localization And Mapping (SLAM), especially when confronting non-geometric hazards in demanding landscapes… ▽ More Terrain-aware perception holds the potential to improve the robustness and accuracy of autonomous robot navigation in the wilds, thereby facilitating effective off-road traversals. However, the lack of multi-modal perception across various motion patterns hinders the solutions of Simultaneous Localization And Mapping (SLAM), especially when confronting non-geometric hazards in demanding landscapes. In this paper, we first propose a Terrain-Aware multI-modaL (TAIL) dataset tailored to deformable and sandy terrains. It incorporates various types of robotic proprioception and distinct ground interactions for the unique challenges and benchmark of multi-sensor fusion SLAM. The versatile sensor suite comprises stereo frame cameras, multiple ground-pointing RGB-D cameras, a rotating 3D LiDAR, an IMU, and an RTK device. This ensemble is hardware-synchronized, well-calibrated, and self-contained. Utilizing both wheeled and quadrupedal locomotion, we efficiently collect comprehensive sequences to capture rich unstructured scenarios. It spans the spectrum of scope, terrain interactions, scene changes, ground-level properties, and dynamic robot characteristics. We benchmark several state-of-the-art SLAM methods against ground truth and provide performance validations. Corresponding challenges and limitations are also reported. All associated resources are accessible upon request at \url{https://tailrobot.github.io/}. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: Submitted to IEEE Robotics and Automation Letters

arXiv:2403.16662 [pdf, other]

RU22Fact: Optimizing Evidence for Multilingual Explainable Fact-Checking on Russia-Ukraine Conflict

Authors: Yirong Zeng, Xiao Ding, Yi Zhao, Xiangyu Li, Jie Zhang, Chao Yao, Ting Liu, Bing Qin

Abstract: Fact-checking is the task of verifying the factuality of a given claim by examining the available evidence. High-quality evidence plays a vital role in enhancing fact-checking systems and facilitating the generation of explanations that are understandable to humans. However, the provision of both sufficient and relevant evidence for explainable fact-checking systems poses a challenge. To tackle th… ▽ More Fact-checking is the task of verifying the factuality of a given claim by examining the available evidence. High-quality evidence plays a vital role in enhancing fact-checking systems and facilitating the generation of explanations that are understandable to humans. However, the provision of both sufficient and relevant evidence for explainable fact-checking systems poses a challenge. To tackle this challenge, we propose a method based on a Large Language Model to automatically retrieve and summarize evidence from the Web. Furthermore, we construct RU22Fact, a novel multilingual explainable fact-checking dataset on the Russia-Ukraine conflict in 2022 of 16K samples, each containing real-world claims, optimized evidence, and referenced explanation. To establish a baseline for our dataset, we also develop an end-to-end explainable fact-checking system to verify claims and generate explanations. Experimental results demonstrate the prospect of optimized evidence in increasing fact-checking performance and also indicate the possibility of further progress in the end-to-end claim verification and explanation generation tasks. △ Less

Submitted 26 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: 12 pages, 3 figures, accepted by lrec-coling2024

arXiv:2403.14023 [pdf]

A system capable of verifiably and privately screening global DNA synthesis

Authors: Carsten Baum, Jens Berlips, Walther Chen, Hongrui Cui, Ivan Damgard, Jiangbin Dong, Kevin M. Esvelt, Mingyu Gao, Dana Gretton, Leonard Foner, Martin Kysel, Kaiyi Zhang, Juanru Li, Xiang Li, Omer Paneth, Ronald L. Rivest, Francesca Sage-Ling, Adi Shamir, Yue Shen, Meicen Sun, Vinod Vaikuntanathan, Lynn Van Hauwe, Theia Vogel, Benjamin Weinstein-Raun, Yun Wang , et al. (5 additional authors not shown)

Abstract: Printing custom DNA sequences is essential to scientific and biomedical research, but the technology can be used to manufacture plagues as well as cures. Just as ink printers recognize and reject attempts to counterfeit money, DNA synthesizers and assemblers should deny unauthorized requests to make viral DNA that could be used to ignite a pandemic. There are three complications. First, we don't n… ▽ More Printing custom DNA sequences is essential to scientific and biomedical research, but the technology can be used to manufacture plagues as well as cures. Just as ink printers recognize and reject attempts to counterfeit money, DNA synthesizers and assemblers should deny unauthorized requests to make viral DNA that could be used to ignite a pandemic. There are three complications. First, we don't need to quickly update printers to deal with newly discovered currencies, whereas we regularly learn of new viruses and other biological threats. Second, anti-counterfeiting specifications on a local printer can't be extracted and misused by malicious actors, unlike information on biological threats. Finally, any screening must keep the inspected DNA sequences private, as they may constitute valuable trade secrets. Here we describe SecureDNA, a free, privacy-preserving, and fully automated system capable of verifiably screening all DNA synthesis orders of 30+ base pairs against an up-to-date database of hazards, and its operational performance and specificity when applied to 67 million base pairs of DNA synthesized by providers in the United States, Europe, and China. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Main text 10 pages, 4 figures. 5 supplementary figures. Total 21 pages. Direct correspondence to: Ivan B. Damgard (ivan@cs.au.dk), Andrew C. Yao (andrewcyao@mail.tsinghua.edu.cn), Kevin M. Esvelt (esvelt@mit.edu)

arXiv:2403.13761 [pdf, other]

HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition

Authors: Yuyi Zhang, Yuanzhi Zhu, Dezhi Peng, Peirong Zhang, Zhenhua Yang, Zhibo Yang, Cong Yao, Lianwen Jin

Abstract: Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary. Traditional one-hot encoding methods struggle with the representation of hierarchical radicals, recognition of Out-Of-Vocabulary (OOV) characters, and on-device deployment due to their computational intensity. To address these challenges, we propose… ▽ More Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary. Traditional one-hot encoding methods struggle with the representation of hierarchical radicals, recognition of Out-Of-Vocabulary (OOV) characters, and on-device deployment due to their computational intensity. To address these challenges, we propose HierCode, a novel and lightweight codebook that exploits the innate hierarchical nature of Chinese characters. HierCode employs a multi-hot encoding strategy, leveraging hierarchical binary tree encoding and prototype learning to create distinctive, informative representations for each character. This approach not only facilitates zero-shot recognition of OOV characters by utilizing shared radicals and structures but also excels in line-level recognition tasks by computing similarity with visual features, a notable advantage over existing methods. Extensive experiments across diverse benchmarks, including handwritten, scene, document, web, and ancient text, have showcased HierCode's superiority for both conventional and zero-shot Chinese character or text recognition, exhibiting state-of-the-art performance with significantly fewer parameters and fast inference speed. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13065 [pdf, other]

Aligned Yet Large Dipoles: a SMEFT Study

Authors: Quentin Bonnefoy, Jonathan Kley, Di Liu, Alejo N. Rossia, Chang-Yuan Yao

Abstract: We study a non-universal flavor scenario at the level of the Standard Model Effective Field Theory, according to which the matrix of Wilson coefficients $c_{uW}$ of an up-type electroweak quark dipole operator is aligned with the up-type Yukawa coupling. Such an alignment usually follows from the assumption of Minimal Flavor Violation (MFV), away from which we step by allowing the entries of… ▽ More We study a non-universal flavor scenario at the level of the Standard Model Effective Field Theory, according to which the matrix of Wilson coefficients $c_{uW}$ of an up-type electroweak quark dipole operator is aligned with the up-type Yukawa coupling. Such an alignment usually follows from the assumption of Minimal Flavor Violation (MFV), away from which we step by allowing the entries of $c_{uW}$ to be sizable along the first quark generations. A particular example, which we refer to as ``inverse hierarchy MFV", features Wilson coefficients inversely proportional to quark masses, and arises from BSM models respecting MFV and containing heavy fields that replicate the mass hierarchy of SM quarks. We then analyze the phenomenology driven by $c_{uW}$ at colliders and at lower-energy flavor experiments. We show that precision measurements of the process $pp\rightarrow W h\rightarrow γγ\ellν$ at FCC-$hh$ could set an upper bound on $|c_{uW}|\lesssim\mathcal{O}(10^{-2})(Λ/{\rm TeV})^{2}$, with $Λ$ the cutoff of the effective field theory. This bound is an order of magnitude stronger than the existing LHC bounds. Moreover, we estimate that $W h\rightarrow b\bar b \ellν$ at HL-LHC could also give competitive bounds. In the low-energy regime, we consider bounds arising from rare kaon decays, which turn out to be loose, $|c_{uW}^{11}|<\mathcal{O}(1)(Λ/{\rm TeV})^{2}$. We finally demonstrate that our flavor and operator assumptions can be derived from a weakly-coupled UV model, which we choose to simultaneously illustrate the UV origin of inverse hierarchy MFV. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 35 pages, 6 figures, 7 tables. Comments are welcomed

Report number: DESY-24-033, HU-EP-24/09, LAPTH-011/24, COMETA-2024-004

arXiv:2403.12008 [pdf, other]

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Authors: Vikram Voleti, Chun-Han Yao, Mark Boss, Adam Letts, David Pankratz, Dmitry Tochilkin, Christian Laforte, Robin Rombach, Varun Jampani

Abstract: We present Stable Video 3D (SV3D) -- a latent video diffusion model for high-resolution, image-to-multi-view generation of orbital videos around a 3D object. Recent work on 3D generation propose techniques to adapt 2D generative models for novel view synthesis (NVS) and 3D optimization. However, these methods have several disadvantages due to either limited views or inconsistent NVS, thereby affec… ▽ More We present Stable Video 3D (SV3D) -- a latent video diffusion model for high-resolution, image-to-multi-view generation of orbital videos around a 3D object. Recent work on 3D generation propose techniques to adapt 2D generative models for novel view synthesis (NVS) and 3D optimization. However, these methods have several disadvantages due to either limited views or inconsistent NVS, thereby affecting the performance of 3D object generation. In this work, we propose SV3D that adapts image-to-video diffusion model for novel multi-view synthesis and 3D generation, thereby leveraging the generalization and multi-view consistency of the video models, while further adding explicit camera control for NVS. We also propose improved 3D optimization techniques to use SV3D and its NVS outputs for image-to-3D generation. Extensive experimental results on multiple datasets with 2D and 3D metrics as well as user study demonstrate SV3D's state-of-the-art performance on NVS as well as 3D reconstruction compared to prior works. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Project page: https://sv3d.github.io/

arXiv:2403.11221 [pdf, other]

Lion: Minimizing Distributed Transactions through Adaptive Replica Provision (Extended Version)

Authors: Qiushi Zheng, Zhanhao Zhao, Wei Lu, Chang Yao, Yuxing Chen, Anqun Pan, Xiaoyong Du

Abstract: Distributed transaction processing often involves multiple rounds of cross-node communications, and therefore tends to be slow. To improve performance, existing approaches convert distributed transactions into single-node transactions by either migrating co-accessed partitions onto the same nodes or establishing a super node housing replicas of the entire database. However, migration-based methods… ▽ More Distributed transaction processing often involves multiple rounds of cross-node communications, and therefore tends to be slow. To improve performance, existing approaches convert distributed transactions into single-node transactions by either migrating co-accessed partitions onto the same nodes or establishing a super node housing replicas of the entire database. However, migration-based methods might cause transactions to be blocked due to waiting for data migration, while the super node can become a bottleneck. In this paper, we present Lion, a novel transaction processing protocol that utilizes partition-based replication to reduce the occurrence of distributed transactions. Lion aims to assign a node with one replica from each partition involved in a given transaction's read or write operations. To ensure such a node is available, we propose an adaptive replica provision mechanism, enhanced with an LSTM-based workload prediction algorithm, to determine the appropriate node for locating replicas of co-accessed partitions. The adaptation of replica placement is conducted preemptively and asynchronously, thereby minimizing its impact on performance. By employing this adaptive replica placement strategy, we ensure that the majority of transactions can be efficiently processed on a single node without additional overhead. Only a small fraction of transactions will need to be treated as regular distributed transactions when such a node is unavailable. Consequently, Lion effectively minimizes distributed transactions while avoiding any disruption caused by data migration or the creation of a super node. We conduct extensive experiments to compare Lion against various transaction processing protocols. The results show that Lion achieves up to 2.7x higher throughput and 76.4% better scalability against these state-of-the-art approaches. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.10357 [pdf, other]

ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D image

Authors: Marco Pesavento, Yuanlu Xu, Nikolaos Sarafianos, Robert Maier, Ziyan Wang, Chun-Han Yao, Marco Volino, Edmond Boyer, Adrian Hilton, Tony Tung

Abstract: Recent progress in human shape learning, shows that neural implicit models are effective in generating 3D human surfaces from limited number of views, and even from a single RGB image. However, existing monocular approaches still struggle to recover fine geometric details such as face, hands or cloth wrinkles. They are also easily prone to depth ambiguities that result in distorted geometries alon… ▽ More Recent progress in human shape learning, shows that neural implicit models are effective in generating 3D human surfaces from limited number of views, and even from a single RGB image. However, existing monocular approaches still struggle to recover fine geometric details such as face, hands or cloth wrinkles. They are also easily prone to depth ambiguities that result in distorted geometries along the camera optical axis. In this paper, we explore the benefits of incorporating depth observations in the reconstruction process by introducing ANIM, a novel method that reconstructs arbitrary 3D human shapes from single-view RGB-D images with an unprecedented level of accuracy. Our model learns geometric details from both multi-resolution pixel-aligned and voxel-aligned features to leverage depth information and enable spatial relationships, mitigating depth ambiguities. We further enhance the quality of the reconstructed shape by introducing a depth-supervision strategy, which improves the accuracy of the signed distance field estimation of points that lie on the reconstructed surface. Experiments demonstrate that ANIM outperforms state-of-the-art works that use RGB, surface normals, point cloud or RGB-D data as input. In addition, we introduce ANIM-Real, a new multi-modal dataset comprising high-quality scans paired with consumer-grade RGB-D camera, and our protocol to fine-tune ANIM, enabling high-quality reconstruction from real-world human capture. △ Less

Submitted 18 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR24; Project page: https://marcopesavento.github.io/ANIM/

arXiv:2403.00496 [pdf]

Benchmarking reconstructive spectrometer with multi-resonant cavities

Authors: Chunhui Yao, Kangning Xu, Tianhua Lin, Jie Ma, Chumeng Yao, Peng Bao, Zhitian Shi, Richard Penty, Qixiang Cheng

Abstract: Recent years have seen the rapid development of miniaturized reconstructive spectrometers (RSs), yet they still confront a range of technical challenges, such as bandwidth/resolution ratio, sensing speed, and/or power efficiency. Reported RS designs often suffer from insufficient decorrelation between sampling channels, which results in limited compressive sampling efficiency, in essence, due to i… ▽ More Recent years have seen the rapid development of miniaturized reconstructive spectrometers (RSs), yet they still confront a range of technical challenges, such as bandwidth/resolution ratio, sensing speed, and/or power efficiency. Reported RS designs often suffer from insufficient decorrelation between sampling channels, which results in limited compressive sampling efficiency, in essence, due to inadequate engineering of sampling responses. This in turn leads to poor spectral-pixel-to-channel ratios (SPCRs), typically restricted at single digits. So far, there lacks a general guideline for manipulating RS sampling responses for the effectiveness of spectral information acquisition. In this study, we shed light on a fundamental parameter from the compressive sensing theory - the average mutual correlation coefficient v - and provide insight into how it serves as a critical benchmark in RS design with regards to the SPCR and reconstruction accuracy. To this end, we propose a novel RS design with multi-resonant cavities, consisting of a series of partial reflective interfaces. Such multi-cavity configuration offers an expansive parameter space, facilitating the superlative optimization of sampling matrices with minimized v. As a proof-of-concept demonstration, a single-shot, dual-band RS is implemented on a SiN platform, tailored for capturing signature spectral shapes across different wavelength regions, with customized photonic crystal nanobeam mirrors. Experimentally, the device demonstrates an overall operation bandwidth of 270 nm and a <0.5 nm resolution with only 15 sampling channels per band, leading to a record high SPCR of 18.0. Moreover, the proposed multi-cavity design can be readily adapted to various photonic platforms. For instance, we showcase that by employing multi-layer coatings, an ultra-broadband RS can be optimized to exhibit a 700 nm bandwidth with an SPCR of over 100. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.17232 [pdf, other]

Two-scale Neural Networks for Partial Differential Equations with Small Parameters

Authors: Qiao Zhuang, Chris Ziyi Yao, Zhongqiang Zhang, George Em Karniadakis

Abstract: We propose a two-scale neural network method for solving partial differential equations (PDEs) with small parameters using physics-informed neural networks (PINNs). We directly incorporate the small parameters into the architecture of neural networks. The proposed method enables solving PDEs with small parameters in a simple fashion, without adding Fourier features or other computationally taxing… ▽ More We propose a two-scale neural network method for solving partial differential equations (PDEs) with small parameters using physics-informed neural networks (PINNs). We directly incorporate the small parameters into the architecture of neural networks. The proposed method enables solving PDEs with small parameters in a simple fashion, without adding Fourier features or other computationally taxing searches of truncation parameters. Various numerical examples demonstrate reasonable accuracy in capturing features of large derivatives in the solutions caused by small parameters. △ Less

Submitted 27 February, 2024; originally announced February 2024.

MSC Class: 65N35; 35B25 ACM Class: I.2.6

arXiv:2402.09152 [pdf, other]

Improved Regret for Bandit Convex Optimization with Delayed Feedback

Authors: Yuanyu Wan, Chang Yao, Mingli Song, Lijun Zhang

Abstract: We investigate bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under an arbitrary delay. Let $n,T,\bar{d}$ denote the dimensionality, time horizon, and average delay, respectively. Previous studies have achieved an $O(\sqrt{n}T^{3/4}+(n\bar{d})^{1/3}T^{2/3})$ regret bound for this problem, whose delay-independent part matches the regret o… ▽ More We investigate bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under an arbitrary delay. Let $n,T,\bar{d}$ denote the dimensionality, time horizon, and average delay, respectively. Previous studies have achieved an $O(\sqrt{n}T^{3/4}+(n\bar{d})^{1/3}T^{2/3})$ regret bound for this problem, whose delay-independent part matches the regret of the classical non-delayed bandit gradient descent algorithm. However, there is a large gap between its delay-dependent part, i.e., $O((n\bar{d})^{1/3}T^{2/3})$, and an existing $Ω(\sqrt{\bar{d}T})$ lower bound. In this paper, we illustrate that this gap can be filled in the worst case, where $\bar{d}$ is very close to the maximum delay $d$. Specifically, we first develop a novel algorithm, and prove that it enjoys a regret bound of $O(\sqrt{n}T^{3/4}+\sqrt{dT})$ in general. Compared with the previous result, our regret bound is better for $d=O((n\bar{d})^{2/3}T^{1/3})$, and the delay-dependent part is tight in the worst case. The primary idea is to decouple the joint effect of the delays and the bandit feedback on the regret by carefully incorporating the delayed bandit feedback with a blocking update mechanism. Furthermore, we show that the proposed algorithm can improve the regret bound to $O((nT)^{2/3}\log^{1/3}T+d\log T)$ for strongly convex functions. Finally, if the action sets are unconstrained, we demonstrate that it can be simply extended to achieve an $O(n\sqrt{T\log T}+d\log T)$ regret bound for strongly convex and smooth functions. △ Less

Submitted 23 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.07625 [pdf, other]

Autonomous Data Selection with Language Models for Mathematical Texts

Authors: Yifan Zhang, Yifan Luo, Yang Yuan, Andrew Chi-Chih Yao

Abstract: To improve language models' proficiency in mathematical reasoning via continual pretraining, we introduce a novel strategy that leverages base language models for autonomous data selection. Departing from conventional supervised fine-tuning or trained classifiers with human-annotated data, our approach Autonomous Data Selection (AutoDS) utilizes meta-prompted language models as zero-shot verifiers… ▽ More To improve language models' proficiency in mathematical reasoning via continual pretraining, we introduce a novel strategy that leverages base language models for autonomous data selection. Departing from conventional supervised fine-tuning or trained classifiers with human-annotated data, our approach Autonomous Data Selection (AutoDS) utilizes meta-prompted language models as zero-shot verifiers to evaluate and select high-quality mathematical content autonomously. To demonstrate the efficacy of our method, we continuously pretrained a 7B-parameter language model on our curated dataset, achieving substantial improvements in downstream performance on the MATH, GSM8K, and BIG-Bench Hard (BBH) tasks with a token amount reduced by orders of magnitude compared to previous continual pretraining works. Our method showcases a 2 times increase in pretraining token efficiency compared to state-of-the-art baselines, underscoring the potential of our approach in enhancing models' mathematical reasoning capabilities. The AutoMathText dataset is available at https://huggingface.co/datasets/math-ai/AutoMathText. The code is available at https://github.com/yifanzhang-pro/AutoMathText. △ Less

Submitted 2 April, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.06641 [pdf, other]

A Survey of a Random Matrix Model for a Family of Cusp Forms

Authors: Owen Barrett, Zoë X. Batterman, Aditya Jambhale, Steven J. Miller, Akash L. Narayanan, Kishan Sharma, Chris Yao

Abstract: The Katz-Sarnak philosophy states that statistics of zeros of $L$-function families near the central point as the conductors tend to infinity agree with those of eigenvalues of random matrix ensembles as the matrix size tends to infinity. While numerous results support this conjecture, S. J. Miller observed that for finite conductors, very different behavior can occur for zeros near the central po… ▽ More The Katz-Sarnak philosophy states that statistics of zeros of $L$-function families near the central point as the conductors tend to infinity agree with those of eigenvalues of random matrix ensembles as the matrix size tends to infinity. While numerous results support this conjecture, S. J. Miller observed that for finite conductors, very different behavior can occur for zeros near the central point in elliptic curve families. This led to the excised model of Dueñez, Huynh, Keating, Miller, and Snaith, whose predictions for quadratic twists of a given elliptic curve are beautifully fit by the data. The key ingredients are relating the discretization of central values of the $L$-functions to excising matrices based on the value of the characteristic polynomials at 1 and using lower order terms (in statistics such as the one-level density and pair-correlation) to adjust the matrix size. We discuss recent successes by the authors in extending this model to a family of quadratic twists of finite conductor of a given holomorphic cuspidal newform of level an odd prime level. In particular, we predict very little repulsion for forms with weight greater than 2. △ Less

Submitted 17 April, 2024; v1 submitted 28 January, 2024; originally announced February 2024.

Comments: 28 pages, 7 figures

MSC Class: 11M26; 11M50

arXiv:2401.17372 [pdf, other]

Optically-Trapped Nanodiamond-Relaxometry Detection of Nanomolar Paramagnetic Spins in Aqueous Environments

Authors: Shiva Iyer, Changyu Yao, Olivia Lazorik, Pengyun Wang, Gianna Glenn, Michael Mohs, Yinyao Shi, Michael Mansour, Erik Henriksen, Kater Murch, Shankar Mukherji, Chong Zu

Abstract: Probing electrical and magnetic properties in aqueous environments remains a frontier challenge in nanoscale sensing. Our inability to do so with quantitative accuracy imposes severe limitations, for example, on our understanding of the ionic environments in a diverse array of systems, ranging from novel materials to the living cell. The Nitrogen-Vacancy (NV) center in fluorescent nanodiamonds (FN… ▽ More Probing electrical and magnetic properties in aqueous environments remains a frontier challenge in nanoscale sensing. Our inability to do so with quantitative accuracy imposes severe limitations, for example, on our understanding of the ionic environments in a diverse array of systems, ranging from novel materials to the living cell. The Nitrogen-Vacancy (NV) center in fluorescent nanodiamonds (FNDs) has emerged as a good candidate to sense temperature, pH, and the concentration of paramagnetic species at the nanoscale, but comes with several hurdles such as particle-to-particle variation which render calibrated measurements difficult, and the challenge to tightly confine and precisely position sensors in aqueous environment. To address this, we demonstrate relaxometry with NV centers within optically-trapped FNDs. In a proof of principle experiment, we show that optically-trapped FNDs enable highly reproducible nanomolar sensitivity to the paramagnetic ion, (\mathrm{Gd}^{3+}). We capture the three distinct phases of our experimental data by devising a model analogous to nanoscale Langmuir adsorption combined with spin coherence dynamics. Our work provides a basis for routes to sense free paramagnetic ions and molecules in biologically relevant conditions. △ Less

Submitted 20 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: 6+7 pages, 3+8 figures

arXiv:2401.17254 [pdf, other]

Limiting Behavior in Missing Sums of Sumsets

Authors: Aditya Jambhale, Rauan Kaldybayev, Steven J. Miller, Chris Yao

Abstract: We study $|A + A|$ as a random variable, where $A \subseteq \{0, \dots, N\}$ is a random subset such that each $0 \le n \le N$ is included with probability $0 < p < 1$, and where $A + A$ is the set of sums $a + b$ for $a,b$ in $A$. Lazarev, Miller, and O'Bryant studied the distribution of $2N + 1 - |A + A|$, the number of summands not represented in $A + A$ when $p = 1/2$. A recent paper by Chu, K… ▽ More We study $|A + A|$ as a random variable, where $A \subseteq \{0, \dots, N\}$ is a random subset such that each $0 \le n \le N$ is included with probability $0 < p < 1$, and where $A + A$ is the set of sums $a + b$ for $a,b$ in $A$. Lazarev, Miller, and O'Bryant studied the distribution of $2N + 1 - |A + A|$, the number of summands not represented in $A + A$ when $p = 1/2$. A recent paper by Chu, King, Luntzlara, Martinez, Miller, Shao, Sun, and Xu generalizes this to all $p\in (0,1)$, calculating the first and second moments of the number of missing summands and establishing exponential upper and lower bounds on the probability of missing exactly $n$ summands, mostly working in the limit of large $N$. We provide exponential bounds on the probability of missing at least $n$ summands, find another expression for the second moment of the number of missing summands, extract its leading-order behavior in the limit of small $p$, and show that the variance grows asymptotically slower than the mean, proving that for small $p$, the number of missing summands is very likely to be near its expected value. △ Less

Submitted 1 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: 25 pages, 6 figures

MSC Class: 11P99; 11B13

arXiv:2401.09003 [pdf, other]

Augmenting Math Word Problems via Iterative Question Composing

Authors: Haoxiong Liu, Yifan Zhang, Yifan Luo, Andrew Chi-Chih Yao

Abstract: Despite the advancements in large language models (LLMs) for mathematical reasoning, solving competition-level math problems remains a significant challenge, especially for open-source LLMs without external tools. We introduce the MMIQC dataset, comprising a mixture of processed web data and synthetic question-response pairs, aimed at enhancing the mathematical reasoning capabilities of base langu… ▽ More Despite the advancements in large language models (LLMs) for mathematical reasoning, solving competition-level math problems remains a significant challenge, especially for open-source LLMs without external tools. We introduce the MMIQC dataset, comprising a mixture of processed web data and synthetic question-response pairs, aimed at enhancing the mathematical reasoning capabilities of base language models. Models fine-tuned on MMIQC consistently surpass their counterparts in performance on the MATH benchmark across various model sizes. Notably, Qwen-72B-MMIQC achieves a 45.0% accuracy, exceeding the previous open-source state-of-the-art by 8.2% and outperforming the initial version GPT-4 released in 2023. Extensive evaluation results on Hungarian high school finals suggest that such improvement can generalize to unseen data. Our ablation study on MMIQC reveals that a large part of the improvement can be attributed to our novel augmentation method, Iterative Question Composing (IQC), which involves iteratively composing new questions from seed problems using an LLM and applying rejection sampling through another LLM. The MMIQC dataset is available on the HuggingFace hub at https://huggingface.co/datasets/Vivacem/MMIQC. Our code is available at https://github.com/iiis-ai/IterativeQuestionComposing. △ Less

Submitted 10 February, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.07030 [pdf, ps, other]

Subsonic Euler flows in a three-dimensional finitely long cylinder with arbitrary cross section

Authors: Shangkun Weng, Changkui Yao

Abstract: This paper concerns the well-posedness of subsonic flows in a three-dimensional finitely long cylinder with arbitrary cross section. We establish the existence and uniqueness of subsonic flows in the Sobolev space by prescribing the normal component of the momentum, the vorticity, the entropy, the Bernoulli's quantity at the entrance and the normal component of the momentum at the exit. One of the… ▽ More This paper concerns the well-posedness of subsonic flows in a three-dimensional finitely long cylinder with arbitrary cross section. We establish the existence and uniqueness of subsonic flows in the Sobolev space by prescribing the normal component of the momentum, the vorticity, the entropy, the Bernoulli's quantity at the entrance and the normal component of the momentum at the exit. One of the key points in the analysis is to utilize the deformation-curl decomposition for the steady Euler system introduced in \cite{WX19} to deal with the hyperbolic and elliptic modes. Another one is to employ the separation of variables to improve the regularity of solutions to a deformation-curl system near the intersection between the entrance and exit with the cylinder wall. △ Less

Submitted 13 January, 2024; originally announced January 2024.

MSC Class: 35M12; 76G25; 76N10

arXiv:2401.05638 [pdf, other]

MatSAM: Efficient Extraction of Microstructures of Materials via Visual Large Model

Authors: Changtai Li, Xu Han, Chao Yao, Xiaojuan Ban

Abstract: Efficient and accurate extraction of microstructures in micrographs of materials is essential in process optimization and the exploration of structure-property relationships. Deep learning-based image segmentation techniques that rely on manual annotation are laborious and time-consuming and hardly meet the demand for model transferability and generalization on various source images. Segment Anyth… ▽ More Efficient and accurate extraction of microstructures in micrographs of materials is essential in process optimization and the exploration of structure-property relationships. Deep learning-based image segmentation techniques that rely on manual annotation are laborious and time-consuming and hardly meet the demand for model transferability and generalization on various source images. Segment Anything Model (SAM), a large visual model with powerful deep feature representation and zero-shot generalization capabilities, has provided new solutions for image segmentation. In this paper, we propose MatSAM, a general and efficient microstructure extraction solution based on SAM. A simple yet effective point-based prompt generation strategy is designed, grounded on the distribution and shape of microstructures. Specifically, in an unsupervised and training-free way, it adaptively generates prompt points for different microscopy images, fuses the centroid points of the coarsely extracted region of interest (ROI) and native grid points, and integrates corresponding post-processing operations for quantitative characterization of microstructures of materials. For common microstructures including grain boundary and multiple phases, MatSAM achieves superior zero-shot segmentation performance to conventional rule-based methods and is even preferable to supervised learning methods evaluated on 16 microscopy datasets whose micrographs are imaged by the optical microscope (OM) and scanning electron microscope (SEM). Especially, on 4 public datasets, MatSAM shows unexpected competitive segmentation performance against their specialist models. We believe that, without the need for human labeling, MatSAM can significantly reduce the cost of quantitative characterization and statistical analysis of extensive microstructures of materials, and thus accelerate the design of new materials. △ Less

Submitted 2 March, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: 18 pages, 8 figures, and 5 tables. Updated with revision and code repository

arXiv:2401.05412 [pdf, other]

Spatial-Related Sensors Matters: 3D Human Motion Reconstruction Assisted with Textual Semantics

Authors: Xueyuan Yang, Chao Yao, Xiaojuan Ban

Abstract: Leveraging wearable devices for motion reconstruction has emerged as an economical and viable technique. Certain methodologies employ sparse Inertial Measurement Units (IMUs) on the human body and harness data-driven strategies to model human poses. However, the reconstruction of motion based solely on sparse IMUs data is inherently fraught with ambiguity, a consequence of numerous identical IMU r… ▽ More Leveraging wearable devices for motion reconstruction has emerged as an economical and viable technique. Certain methodologies employ sparse Inertial Measurement Units (IMUs) on the human body and harness data-driven strategies to model human poses. However, the reconstruction of motion based solely on sparse IMUs data is inherently fraught with ambiguity, a consequence of numerous identical IMU readings corresponding to different poses. In this paper, we explore the spatial importance of multiple sensors, supervised by text that describes specific actions. Specifically, uncertainty is introduced to derive weighted features for each IMU. We also design a Hierarchical Temporal Transformer (HTT) and apply contrastive learning to achieve precise temporal and feature alignment of sensor data with textual semantics. Experimental results demonstrate our proposed approach achieves significant improvements in multiple metrics compared to existing methods. Notably, with textual supervision, our method not only differentiates between ambiguous actions such as sitting and standing but also produces more precise and natural motion. △ Less

Submitted 26 December, 2023; originally announced January 2024.

Comments: Accepted by AAAI 2024

arXiv:2401.01522 [pdf, other]

LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training

Authors: Rujiao Long, Hangdi Xing, Zhibo Yang, Qi Zheng, Zhi Yu, Cong Yao, Fei Huang

Abstract: Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes or learning to directly generate the corresponding markup sequences from the table images. However, existing approaches either count on additional heuristic rules to recover the table structures, or… ▽ More Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes or learning to directly generate the corresponding markup sequences from the table images. However, existing approaches either count on additional heuristic rules to recover the table structures, or face challenges in capturing long-range dependencies within tables, resulting in increased complexity. In this paper, we propose an alternative paradigm. We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time regresses logical location as well as spatial location of table cells in a unified network. Our proposed LORE is conceptually simpler, easier to train, and more accurate than other paradigms of TSR. Moreover, inspired by the persuasive success of pre-trained models on a number of computer vision and natural language processing tasks, we propose two pre-training tasks to enrich the spatial and logical representations at the feature level of LORE, resulting in an upgraded version called LORE++. The incorporation of pre-training in LORE++ has proven to enjoy significant advantages, leading to a substantial enhancement in terms of accuracy, generalization, and few-shot capability compared to its predecessor. Experiments on standard benchmarks against methods of previous paradigms demonstrate the superiority of LORE++, which highlights the potential and promising prospect of the logical location regression paradigm for TSR. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2303.03730

arXiv:2312.12142 [pdf, other]

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Authors: Zhenhua Yang, Dezhi Peng, Yuxin Kong, Yuyi Zhang, Cong Yao, Lianwen Jin

Abstract: Automatic font generation is an imitation task, which aims to create a font library that mimics the style of reference images while preserving the content from source images. Although existing font generation methods have achieved satisfactory performance, they still struggle with complex characters and large style variations. To address these issues, we propose FontDiffuser, a diffusion-based ima… ▽ More Automatic font generation is an imitation task, which aims to create a font library that mimics the style of reference images while preserving the content from source images. Although existing font generation methods have achieved satisfactory performance, they still struggle with complex characters and large style variations. To address these issues, we propose FontDiffuser, a diffusion-based image-to-image one-shot font generation method, which innovatively models the font imitation task as a noise-to-denoise paradigm. In our method, we introduce a Multi-scale Content Aggregation (MCA) block, which effectively combines global and local content cues across different scales, leading to enhanced preservation of intricate strokes of complex characters. Moreover, to better manage the large variations in style transfer, we propose a Style Contrastive Refinement (SCR) module, which is a novel structure for style representation learning. It utilizes a style extractor to disentangle styles from images, subsequently supervising the diffusion model via a meticulously designed style contrastive loss. Extensive experiments demonstrate FontDiffuser's state-of-the-art performance in generating diverse characters and styles. It consistently excels on complex characters and large style changes compared to previous methods. The code is available at https://github.com/yeungchenwa/FontDiffuser. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024; Github Page: https://github.com/yeungchenwa/FontDiffuser

Journal ref: 38th AAAI Conference on Artificial Intelligence (AAAI2024), Vancouver, BC, Canada, 2024

arXiv:2312.09613 [pdf, other]

Rethinking Causal Relationships Learning in Graph Neural Networks

Authors: Hang Gao, Chengyu Yao, Jiangmeng Li, Lingyu Si, Yifan Jin, Fengge Wu, Changwen Zheng, Huaping Liu

Abstract: Graph Neural Networks (GNNs) demonstrate their significance by effectively modeling complex interrelationships within graph-structured data. To enhance the credibility and robustness of GNNs, it becomes exceptionally crucial to bolster their ability to capture causal relationships. However, despite recent advancements that have indeed strengthened GNNs from a causal learning perspective, conductin… ▽ More Graph Neural Networks (GNNs) demonstrate their significance by effectively modeling complex interrelationships within graph-structured data. To enhance the credibility and robustness of GNNs, it becomes exceptionally crucial to bolster their ability to capture causal relationships. However, despite recent advancements that have indeed strengthened GNNs from a causal learning perspective, conducting an in-depth analysis specifically targeting the causal modeling prowess of GNNs remains an unresolved issue. In order to comprehensively analyze various GNN models from a causal learning perspective, we constructed an artificially synthesized dataset with known and controllable causal relationships between data and labels. The rationality of the generated data is further ensured through theoretical foundations. Drawing insights from analyses conducted using our dataset, we introduce a lightweight and highly adaptable GNN module designed to strengthen GNNs' causal learning capabilities across a diverse range of tasks. Through a series of experiments conducted on both synthetic datasets and other real-world datasets, we empirically validate the effectiveness of the proposed module. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.07823 [pdf, other]

Semantic Lens: Instance-Centric Semantic Alignment for Video Super-Resolution

Authors: Qi Tang, Yao Zhao, Meiqin Liu, Jian Jin, Chao Yao

Abstract: As a critical clue of video super-resolution (VSR), inter-frame alignment significantly impacts overall performance. However, accurate pixel-level alignment is a challenging task due to the intricate motion interweaving in the video. In response to this issue, we introduce a novel paradigm for VSR named Semantic Lens, predicated on semantic priors drawn from degraded videos. Specifically, video is… ▽ More As a critical clue of video super-resolution (VSR), inter-frame alignment significantly impacts overall performance. However, accurate pixel-level alignment is a challenging task due to the intricate motion interweaving in the video. In response to this issue, we introduce a novel paradigm for VSR named Semantic Lens, predicated on semantic priors drawn from degraded videos. Specifically, video is modeled as instances, events, and scenes via a Semantic Extractor. Those semantics assist the Pixel Enhancer in understanding the recovered contents and generating more realistic visual results. The distilled global semantics embody the scene information of each frame, while the instance-specific semantics assemble the spatial-temporal contexts related to each instance. Furthermore, we devise a Semantics-Powered Attention Cross-Embedding (SPACE) block to bridge the pixel-level features with semantic knowledge, composed of a Global Perspective Shifter (GPS) and an Instance-Specific Semantic Embedding Encoder (ISEE). Concretely, the GPS module generates pairs of affine transformation parameters for pixel-level feature modulation conditioned on global semantics. After that, the ISEE module harnesses the attention mechanism to align the adjacent frames in the instance-centric semantic space. In addition, we incorporate a simple yet effective pre-alignment module to alleviate the difficulty of model training. Extensive experiments demonstrate the superiority of our model over existing state-of-the-art VSR methods. △ Less

Submitted 19 January, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024

arXiv:2311.17215 [pdf, other]

Applications of Moments of Dirichlet Coefficients in Elliptic Curve Families

Authors: Zoë Batterman, Aditya Jambhale, Steven J. Miller, Akash L. Narayanan, Kishan Sharma, Andrew Yang, Chris Yao

Abstract: The moments of the coefficients of elliptic curve L-functions are related to numerous arithmetic problems. Rosen and Silverman proved a conjecture of Nagao relating the first moment of one-parameter families satisfying Tate's conjecture to the rank of the corresponding elliptic surface over Q(T); one can also construct families of moderate rank by finding families with large first moments. Michel… ▽ More The moments of the coefficients of elliptic curve L-functions are related to numerous arithmetic problems. Rosen and Silverman proved a conjecture of Nagao relating the first moment of one-parameter families satisfying Tate's conjecture to the rank of the corresponding elliptic surface over Q(T); one can also construct families of moderate rank by finding families with large first moments. Michel proved that if j(T) is not constant, then the second moment of the family is of size p^2 + O(p^(3/2)); these two moments show that for suitably small support the behavior of zeros near the central point agree with that of eigenvalues from random matrix ensembles, with the higher moments impacting the rate of convergence. In his thesis, Miller noticed a negative bias in the second moment of every one-parameter family of elliptic curves over the rationals whose second moment had a calculable closed-form expression, specifically the first lower order term which does not average to zero is on average negative. This Bias Conjecture is confirmed for many families; however, these are highly non-generic families whose resulting Legendre sums can be determined. Inspired by the recent successes by Yang-Hui He, Kyu-Hwan Lee, Thomas Oliver, Alexey Pozdnyakov and others in investigations of murmurations of elliptic curve coefficients with machine learning techniques, we pose a similar problem for trying to understand the Bias Conjecture. As a start to this program, we numerically investigate the Bias Conjecture for a family whose bias is positive for half the primes. Since the numerics do not offer conclusive evidence that negative bias for the other half is enough to overwhelm the positive bias, the Bias Conjecture cannot be verified for the family. △ Less

Submitted 17 June, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

MSC Class: 11G05; 11G40

arXiv:2311.14956 [pdf, other]

Anomalous hot electron generation from two-plasmon decay instability driven by broadband laser pulses with intensity modulations

Authors: C. Yao, J. Li, L. Hao, R. Yan, C. Wang, A. Lei, Y-K. Ding, J. Zheng

Abstract: We investigate the hot electrons generated from two-plasmon decay (TPD) instability driven by laser pulses with intensity modulated by a frequency $Δω_m$. Our primary focus lies on scenarios where $Δω_m$ is on the same order of the TPD growth rate $ γ_0$ ( $Δω_m \sim γ_0$), corresponding to moderate laser frequency bandwidths for TPD mitigation. With $Δω_m$ conveniently modeled by a basic two-colo… ▽ More We investigate the hot electrons generated from two-plasmon decay (TPD) instability driven by laser pulses with intensity modulated by a frequency $Δω_m$. Our primary focus lies on scenarios where $Δω_m$ is on the same order of the TPD growth rate $ γ_0$ ( $Δω_m \sim γ_0$), corresponding to moderate laser frequency bandwidths for TPD mitigation. With $Δω_m$ conveniently modeled by a basic two-color scheme of the laser wave fields in fully-kinetic particle-in-cell simulations, we demonstrate that the energies of TPD modes and hot electrons exhibit intermittent evolution at the frequency $Δω_m$, particularly when $Δω_m \sim γ_0$. With the dynamic TPD behavior, the overall ratio of hot electron energy to the incident laser energy, $f_{hot}$, changes significantly with $Δω_m$. While $f_{hot}$ drops notably with increasing $Δω_m$ at large $Δω_m$ limit as expected, it goes anomalously beyond the hot electron energy ratio for a single-frequency incident laser pulse with the same average intensity when $Δω_m$ falls below a specific threshold frequency $Δω_c$. We find this threshold frequency primarily depends on $γ_0$ and the collisional damping rate of plasma waves, with relatively lower sensitivity to the density scale length. We develop a scaling model characterizing the relation of $Δω_c$ and laser plasma conditions, enabling the potential extention of our findings to more complex and realistic scenarios. △ Less

Submitted 25 November, 2023; originally announced November 2023.

arXiv:2311.11482 [pdf, other]

Meta Prompting for AI Systems

Authors: Yifan Zhang, Yang Yuan, Andrew Chi-Chih Yao

Abstract: In this work, we present a comprehensive study of Meta Prompting (MP), an innovative technique reshaping the utilization of language models (LMs) and AI systems in problem-solving and data interaction. Grounded in type theory and category theory, Meta Prompting emphasizes the structure and syntax of information over traditional content-centric methods. The paper explores the formal definitions of… ▽ More In this work, we present a comprehensive study of Meta Prompting (MP), an innovative technique reshaping the utilization of language models (LMs) and AI systems in problem-solving and data interaction. Grounded in type theory and category theory, Meta Prompting emphasizes the structure and syntax of information over traditional content-centric methods. The paper explores the formal definitions of Meta Prompting, sets it apart from few-shot prompting, and underlines its effectiveness in various AI applications. A key focus is applying Meta Prompting for complex reasoning tasks, showing how it effectively deconstructs intricate problems into simpler sub-problems, enhancing token efficiency, and enabling more equitable problem-solving comparisons, especially against few-shot prompting methods. Additionally, the paper introduces Meta Prompting for prompting tasks, allowing LLMs to self-generate new prompts in a recursive, metaprogramming-like manner. Empirical experiments, including using a Qwen-72B base language model equipped with meta prompt without instruction-tuning to solve MATH problems with accuracy at 46.3%, which surpass the supervised fine-tuned counterpart trained with extensive mathematical QA instruction pairs and even the initial version of GPT-4, solving GSM8K problems with 83.5% accuracy with zero-shot meta-prompted Qwen-72B base language model, and solving the Game of 24 tasks with a 100% success rate using GPT-4, demonstrate the meta prompting's efficacy in achieving high accuracy and efficiency, showcasing Meta Prompting's transformative impact on AI problem-solving The code is available at https://github.com/meta-prompting/meta-prompting. △ Less

Submitted 15 June, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

arXiv:2310.18090 [pdf, ps, other]

Probabilistic Constellation Shaping for OFDM-Based ISAC Signaling

Authors: Zhen Du, Fan Liu, Yifeng Xiong, Tony Xiao Han, Weijie Yuan, Yuanhao Cui, Changhua Yao, Yonina C. Eldar

Abstract: Integrated Sensing and Communications (ISAC) has garnered significant attention as a promising technology for the upcoming sixth-generation wireless communication systems (6G). In pursuit of this goal, a common strategy is that a unified waveform, such as Orthogonal Frequency Division Multiplexing (OFDM), should serve dual-functional roles by enabling simultaneous sensing and communications (S&C)… ▽ More Integrated Sensing and Communications (ISAC) has garnered significant attention as a promising technology for the upcoming sixth-generation wireless communication systems (6G). In pursuit of this goal, a common strategy is that a unified waveform, such as Orthogonal Frequency Division Multiplexing (OFDM), should serve dual-functional roles by enabling simultaneous sensing and communications (S&C) operations. However, the sensing performance of an OFDM communication signal is substantially affected by the randomness of the data symbols mapped from bit streams. Therefore, achieving a balance between preserving communication capability (i.e., the randomness) while improving sensing performance remains a challenging task. To cope with this issue, in this paper we analyze the ambiguity function of the OFDM communication signal modulated by random data. Subsequently, a probabilistic constellation shaping (PCS) method is proposed to devise the probability distributions of constellation points, which is able to strike a scalable S&C tradeoff of the random transmitted signal. Finally, the superiority of the proposed PCS method over conventional uniformly distributed constellations is validated through numerical simulations. △ Less

Submitted 27 October, 2023; originally announced October 2023.

arXiv:2310.16070 [pdf, other]

Spatial-Temporal Hypergraph Neural Network for Traffic Forecasting

Authors: Chengzhi Yao, Zhi Li, Junbo Wang

Abstract: Traffic forecasting, which benefits from mobile Internet development and position technologies, plays a critical role in Intelligent Transportation Systems. It helps to implement rich and varied transportation applications and bring convenient transportation services to people based on collected traffic data. Most existing methods usually leverage graph-based deep learning networks to model the co… ▽ More Traffic forecasting, which benefits from mobile Internet development and position technologies, plays a critical role in Intelligent Transportation Systems. It helps to implement rich and varied transportation applications and bring convenient transportation services to people based on collected traffic data. Most existing methods usually leverage graph-based deep learning networks to model the complex road network for traffic forecasting shallowly. Despite their effectiveness, these methods are generally limited in fully capturing high-order spatial dependencies caused by road network topology and high-order temporal dependencies caused by traffic dynamics. To tackle the above issues, we focus on the essence of traffic system and propose STHODE: Spatio-Temporal Hypergraph Neural Ordinary Differential Equation Network, which combines road network topology and traffic dynamics to capture high-order spatio-temporal dependencies in traffic data. Technically, STHODE consists of a spatial module and a temporal module. On the one hand, we construct a spatial hypergraph and leverage an adaptive MixHop hypergraph ODE network to capture high-order spatial dependencies. On the other hand, we utilize a temporal hypergraph and employ a hyperedge evolving ODE network to capture high-order temporal dependencies. Finally, we aggregate the outputs of stacked STHODE layers to mutually enhance the prediction performance. Extensive experiments conducted on four real-world traffic datasets demonstrate the superior performance of our proposed model compared to various baselines. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.12430 [pdf, other]

DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond

Authors: Cong Yao

Abstract: In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations that are readable and manipulable by machines. Specifically, basic capabilities, including text detection, text recognition, t… ▽ More In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations that are readable and manipulable by machines. Specifically, basic capabilities, including text detection, text recognition, table structure recognition and layout analysis, are provided. Upon these basic capabilities, we also build a set of fully functional pipelines for document parsing, i.e., general text reading, table parsing, and document structurization, to drive various applications related to documents in real-world scenarios. Moreover, DocXChain is concise, modularized and flexible, such that it can be readily integrated with existing tools, libraries or models (such as LangChain and ChatGPT), to construct more powerful systems that can accomplish more complicated and challenging tasks. The code of DocXChain is publicly available at:~\url{https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/Applications/DocXChain} △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 4 pages, 4 figures, 2 tables

arXiv:2310.10362 [pdf, other]

Self-Pro: A Self-Prompt and Tuning Framework for Graph Neural Networks

Authors: Chenghua Gong, Xiang Li, Jianxiang Yu, Cheng Yao, Jiaqi Tan, Chengcheng Yu

Abstract: Graphs have become an important modeling tool for web applications, and Graph Neural Networks (GNNs) have achieved great success in graph representation learning. However, the performance of traditional GNNs heavily relies on a large amount of supervision. Recently, ``pre-train, fine-tune'' has become the paradigm to address the issues of label dependency and poor generalization. However, the pre-… ▽ More Graphs have become an important modeling tool for web applications, and Graph Neural Networks (GNNs) have achieved great success in graph representation learning. However, the performance of traditional GNNs heavily relies on a large amount of supervision. Recently, ``pre-train, fine-tune'' has become the paradigm to address the issues of label dependency and poor generalization. However, the pre-training strategies vary for graphs with homophily and heterophily, and the objectives for various downstream tasks also differ. This leads to a gap between pretexts and downstream tasks, resulting in ``negative transfer'' and poor performance. Inspired by prompt learning in Natural Language Processing (NLP), many studies turn to bridge the gap and fully leverage the pre-trained model. However, existing methods for graph prompting are tailored to homophily, neglecting inherent heterophily on graphs. Meanwhile, most of them rely on the randomly initialized prompts, which negatively impact on the stability. Therefore, we propose Self-Prompt, a prompting framework for graphs based on the model and data itself. We first introduce asymmetric graph contrastive learning for pretext to address heterophily and align the objectives of pretext and downstream tasks. Then we reuse the component from pre-training phase as the self adapter and introduce self-prompts based on graph itself for task adaptation. Finally, we conduct extensive experiments on 11 benchmark datasets to demonstrate its superiority. We provide our codes at https://github.com/gongchenghua/Self-Pro. △ Less

Submitted 4 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: Accepted at ECML-PKDD 2024

arXiv:2310.08064 [pdf]

Age Estimation Based on Graph Convolutional Networks and Multi-head Attention Mechanisms

Authors: Miaomiao Yang, Changwei Yao, Shijin Yan

Abstract: Age estimation technology is a part of facial recognition and has been applied to identity authentication. This technology achieves the development and application of a juvenile anti-addiction system by authenticating users in the game. Convolutional Neural Network (CNN) and Transformer algorithms are widely used in this application scenario. However, these two models cannot flexibly extract and m… ▽ More Age estimation technology is a part of facial recognition and has been applied to identity authentication. This technology achieves the development and application of a juvenile anti-addiction system by authenticating users in the game. Convolutional Neural Network (CNN) and Transformer algorithms are widely used in this application scenario. However, these two models cannot flexibly extract and model features of faces with irregular shapes, and they are ineffective in capturing key information. Furthermore, the above methods will contain a lot of background information while extracting features, which will interfere with the model. In consequence, it is easy to extract redundant information from images. In this paper, a new modeling idea is proposed to solve this problem, which can flexibly model irregular objects. The Graph Convolutional Network (GCN) is used to extract features from irregular face images effectively, and multi-head attention mechanisms are added to avoid redundant features and capture key region information in the image. This model can effectively improve the accuracy of age estimation and reduce the MAE error value to about 3.64, which is better than the effect of today's age estimation model, to improve the accuracy of face recognition and identity authentication. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.04975 [pdf, ps, other]

A Trustworthy and Consistent Blockchain Oracle Scheme for Industrial Internet of Things

Authors: Peng Liu, Youquan Xian, Chuanjian Yao, Peng Wang, Li-e Wang, Xianxian Li

Abstract: Blockchain provides decentralization and trustlessness features for the Industrial Internet of Things (IIoT), which expands the application scenarios of IIoT. To address the problem that the blockchain cannot actively obtain off-chain data, the blockchain oracle is proposed as a bridge between the blockchain and external data. However, the existing oracle schemes are difficult to solve the problem… ▽ More Blockchain provides decentralization and trustlessness features for the Industrial Internet of Things (IIoT), which expands the application scenarios of IIoT. To address the problem that the blockchain cannot actively obtain off-chain data, the blockchain oracle is proposed as a bridge between the blockchain and external data. However, the existing oracle schemes are difficult to solve the problem of low quality of service caused by frequent data changes and heterogeneous devices in IIoT, and the current oracle node selection schemes are difficult to balance security and quality of service. To tackle these problems, this paper proposes a secure and reliable oracle scheme that can obtain high-quality off-chain data. Specifically, we first design an oracle node selection algorithm based on Verifiable Random Function (VRF) and reputation mechanism to securely select high-quality nodes. Second, we propose a data filtering algorithm based on a sliding window to further improve the consistency of the collected data. We verify the security of the proposed scheme through security analysis. The experimental results show that the proposed scheme can effectively improve the service quality of the oracle. △ Less

Submitted 7 October, 2023; originally announced October 2023.

Comments: Rejected after the third round of review of IEEE Internet of Things Journal

arXiv:2310.00890 [pdf]

doi 10.1002/adma.202313742

Femtosecond electron diffraction reveals local disorder and local anharmonicity in thermoelectric SnSe

Authors: Jingjun Li, Yingpeng Qi, Qing Yang, Luye Yue, Changyuan Yao, Zijing Chen, Sheng Meng, Dao Xiang, Jianming Cao

Abstract: The microscopic arrangement of atoms and molecules is the determining factor in how materials behave and perform. Beyond the long-range periodicity, the local disorder with local structures deviating from the average lattice structure plays a vital role in determining the physical properties of the phonon, electron and spin subsystems in crystalline functional materials. Experimentally characteriz… ▽ More The microscopic arrangement of atoms and molecules is the determining factor in how materials behave and perform. Beyond the long-range periodicity, the local disorder with local structures deviating from the average lattice structure plays a vital role in determining the physical properties of the phonon, electron and spin subsystems in crystalline functional materials. Experimentally characterizing the 3D atomic configuration of such local disorder and correlating it with the advanced functions remain a big challenge. Time-domain evolution of the local disorder, either static or dynamical, is lost due to the characterization at equilibrium state with conventional probing techniques. With the combination of femtosecond electron diffraction, structure factor calculation and TDDFT-MD simulation, we exclusively identify the static local disorder and the local anharmonicity of it in thermoelectric SnSe. The ultrafast structural dynamics in time domain reveal a dominant static off-symmetry displacement of Sn (~0.4 angstrom) and the anharmonicity of this local disorder induces an ultrafast atomic displacement within 100 fs after photoexcitation. The microscopic picture of the local anharmonicity indicates a direct and first signature of the THz Einstein oscillators in real space. Therefore, a glass-like thermal transport channel with the local disorder, the Einstein oscillators and the local anharmonicity, updates the fundamental insight into the long-debated ultralow thermal conductivity in SnSe. The local disorder over one to a few unit cells is pervasive and indispensable in thermoelectric materials, multiferroic materials and correlated electronic materials. Our method of revealing the 3D local disorder and the local correlated interactions by ultrafast structural dynamics will inspire broad interest in construction of the structure-property relationship in material science. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Report number: 2313742

Journal ref: Adv. Mater. 2313742 (2024)

Showing 1–50 of 320 results for author: Yao, C