-
Assessing the Utility of Large Language Models for Phenotype-Driven Gene Prioritization in Rare Genetic Disorder Diagnosis
Authors:
Junyoung Kim,
Jingye Yang,
Kai Wang,
Chunhua Weng,
Cong Liu
Abstract:
Phenotype-driven gene prioritization is a critical process in the diagnosis of rare genetic disorders for identifying and ranking potential disease-causing genes based on observed physical traits or phenotypes. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models have opened doors to the potential of AI prediction…
▽ More
Phenotype-driven gene prioritization is a critical process in the diagnosis of rare genetic disorders for identifying and ranking potential disease-causing genes based on observed physical traits or phenotypes. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models have opened doors to the potential of AI predictions through extensive training on diverse corpora and complex models. This study conducted a comprehensive evaluation of five large language models, including two Generative Pre-trained Transformers series, and three Llama2 series, assessing their performance across three key metrics: task completeness, gene prediction accuracy, and adherence to required output structures. Various experiments explored combinations of models, prompts, input types, and task difficulty levels. Our findings reveal that even the best-performing LLM, GPT-4, achieved an accuracy of 16.0%, which still lags behind traditional bioinformatics tools. Prediction accuracy increased with the parameter/model size. A similar increasing trend was observed for the task completion rate, with complicated prompts more likely to increase task completeness in models smaller than GPT-4. However, complicated prompts are more likely to decrease the structure compliance rate, but no prompt effects on GPT-4. Compared to HPO term-based input, LLM was also able to achieve better than random prediction accuracy by taking free-text input, but slightly lower than with the HPO input. Bias analysis showed that certain genes, such as MECP2, CDKL5, and SCN1A, are more likely to be top-ranked, potentially explaining the variances observed across different datasets. This study provides valuable insights into the integration of LLMs within genomic analysis, contributing to the ongoing discussion on the utilization of advanced LLMs in clinical workflows.
△ Less
Submitted 2 April, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Deep Generative Domain Adaptation with Temporal Relation Knowledge for Cross-User Activity Recognition
Authors:
Xiaozhou Ye,
Kevin I-Kai Wang
Abstract:
In human activity recognition (HAR), the assumption that training and testing data are independent and identically distributed (i.i.d.) often fails, particularly in cross-user scenarios where data distributions vary significantly. This discrepancy highlights the limitations of conventional domain adaptation methods in HAR, which typically overlook the inherent temporal relations in time-series dat…
▽ More
In human activity recognition (HAR), the assumption that training and testing data are independent and identically distributed (i.i.d.) often fails, particularly in cross-user scenarios where data distributions vary significantly. This discrepancy highlights the limitations of conventional domain adaptation methods in HAR, which typically overlook the inherent temporal relations in time-series data. To bridge this gap, our study introduces a Conditional Variational Autoencoder with Universal Sequence Mapping (CVAE-USM) approach, which addresses the unique challenges of time-series domain adaptation in HAR by relaxing the i.i.d. assumption and leveraging temporal relations to align data distributions effectively across different users. This method combines the strengths of Variational Autoencoder (VAE) and Universal Sequence Mapping (USM) to capture and utilize common temporal patterns between users for improved activity recognition. Our results, evaluated on two public HAR datasets (OPPT and PAMAP2), demonstrate that CVAE-USM outperforms existing state-of-the-art methods, offering a more accurate and generalizable solution for cross-user activity recognition.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Structure-preserving, weighted implicit-explicit schemes for multi-phase incompressible Navier-Stokes/Darcy coupled nonlocal Allen-Cahn model
Authors:
Meng Li,
Ke Wang,
Nan Wang
Abstract:
A multitude of substances exist as mixtures comprising multiple chemical components in the natural world. These substances undergo morphological changes under external influences. the phase field model coupled with fluid flow, the dynamic movement and evolution of the phase interface intricately interact with the fluid motion. This article focuses on the N-component models that couple the conserva…
▽ More
A multitude of substances exist as mixtures comprising multiple chemical components in the natural world. These substances undergo morphological changes under external influences. the phase field model coupled with fluid flow, the dynamic movement and evolution of the phase interface intricately interact with the fluid motion. This article focuses on the N-component models that couple the conservative Allen-Cahn equation with two types of incompressible fluid flow systems: the Navier-Stokes equation and the Darcy equation. By utilizing the scalar auxiliary variable method and the projection method, we innovatively construct two types of structure-preserving weighted implicit-explicit schemes for the coupled models, resulting in fully decoupled linear systems and second-order accuracy in time. The schemes are proved to be mass-conservative. In addition, with the application of $G$-norm inspired by the idea of $G$-stability, we rigorously establish its unconditional energy stability. Finally, the performance of the proposed scheme is verified by some numerical simulations.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Spatio-Temporal Fluid Dynamics Modeling via Physical-Awareness and Parameter Diffusion Guidance
Authors:
Hao Wu,
Fan Xu,
Yifan Duan,
Ziwei Niu,
Weiyan Wang,
Gaofeng Lu,
Kun Wang,
Yuxuan Liang,
Yang Wang
Abstract:
This paper proposes a two-stage framework named ST-PAD for spatio-temporal fluid dynamics modeling in the field of earth sciences, aiming to achieve high-precision simulation and prediction of fluid dynamics through spatio-temporal physics awareness and parameter diffusion guidance. In the upstream stage, we design a vector quantization reconstruction module with temporal evolution characteristics…
▽ More
This paper proposes a two-stage framework named ST-PAD for spatio-temporal fluid dynamics modeling in the field of earth sciences, aiming to achieve high-precision simulation and prediction of fluid dynamics through spatio-temporal physics awareness and parameter diffusion guidance. In the upstream stage, we design a vector quantization reconstruction module with temporal evolution characteristics, ensuring balanced and resilient parameter distribution by introducing general physical constraints. In the downstream stage, a diffusion probability network involving parameters is utilized to generate high-quality future states of fluids, while enhancing the model's generalization ability by perceiving parameters in various physical setups. Extensive experiments on multiple benchmark datasets have verified the effectiveness and robustness of the ST-PAD framework, which showcase that ST-PAD outperforms current mainstream models in fluid dynamics modeling and prediction, especially in effectively capturing local representations and maintaining significant advantages in OOD generations.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Search for $ΔS=2$ nonleptonic hyperon decays $Ω^-\toΣ^{0}π^{-}$ and $Ω^-\to nK^{-}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be…
▽ More
Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be $\mathcal{B}(Ω^-\toΣ^{0}π^-) < 5.4\times 10^{-4}$ and $\mathcal{B}(Ω^-\to nK^{-}) < 2.4\times 10^{-4}$ at the $90\%$ confidence level.
△ Less
Submitted 14 April, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations
Authors:
Kewei Wang,
Yizheng Wu,
Jun Cen,
Zhiyu Pan,
Xingyi Li,
Zhe Wang,
Zhiguo Cao,
Guosheng Lin
Abstract:
The perception of motion behavior in a dynamic environment holds significant importance for autonomous driving systems, wherein class-agnostic motion prediction methods directly predict the motion of the entire point cloud. While most existing methods rely on fully-supervised learning, the manual labeling of point cloud data is laborious and time-consuming. Therefore, several annotation-efficient…
▽ More
The perception of motion behavior in a dynamic environment holds significant importance for autonomous driving systems, wherein class-agnostic motion prediction methods directly predict the motion of the entire point cloud. While most existing methods rely on fully-supervised learning, the manual labeling of point cloud data is laborious and time-consuming. Therefore, several annotation-efficient methods have been proposed to address this challenge. Although effective, these methods rely on weak annotations or additional multi-modal data like images, and the potential benefits inherent in the point cloud sequence are still underexplored. To this end, we explore the feasibility of self-supervised motion prediction with only unlabeled LiDAR point clouds. Initially, we employ an optimal transport solver to establish coarse correspondences between current and future point clouds as the coarse pseudo motion labels. Training models directly using such coarse labels leads to noticeable spatial and temporal prediction inconsistencies. To mitigate these issues, we introduce three simple spatial and temporal regularization losses, which facilitate the self-supervised training process effectively. Experimental results demonstrate the significant superiority of our approach over the state-of-the-art self-supervised methods.
△ Less
Submitted 21 March, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Authors:
Alexander Khazatsky,
Karl Pertsch,
Suraj Nair,
Ashwin Balakrishna,
Sudeep Dasari,
Siddharth Karamcheti,
Soroush Nasiriany,
Mohan Kumar Srirama,
Lawrence Yunliang Chen,
Kirsty Ellis,
Peter David Fagan,
Joey Hejna,
Masha Itkina,
Marion Lepert,
Yecheng Jason Ma,
Patrick Tree Miller,
Jimmy Wu,
Suneel Belkhale,
Shivin Dass,
Huy Ha,
Arhan Jain,
Abraham Lee,
Youngwoon Lee,
Marius Memmel,
Sungjae Park
, et al. (74 additional authors not shown)
Abstract:
The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu…
▽ More
The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Knowledge and Data Dual-Driven Channel Estimation and Feedback for Ultra-Massive MIMO Systems under Hybrid Field Beam Squint Effect
Authors:
Kuiyu Wang,
Zhen Gao,
Sheng Chen,
Boyu Ning,
Gaojie Chen,
Yu Su,
Zhaocheng Wang,
H. Vincent Poor
Abstract:
Acquiring accurate channel state information (CSI) at an access point (AP) is challenging for wideband millimeter wave (mmWave) ultra-massive multiple-input and multiple-output (UMMIMO) systems, due to the high-dimensional channel matrices, hybrid near- and far- field channel feature, beam squint effects, and imperfect hardware constraints, such as low-resolution analog-to-digital converters, and…
▽ More
Acquiring accurate channel state information (CSI) at an access point (AP) is challenging for wideband millimeter wave (mmWave) ultra-massive multiple-input and multiple-output (UMMIMO) systems, due to the high-dimensional channel matrices, hybrid near- and far- field channel feature, beam squint effects, and imperfect hardware constraints, such as low-resolution analog-to-digital converters, and in-phase and quadrature imbalance. To overcome these challenges, this paper proposes an efficient downlink channel estimation (CE) and CSI feedback approach based on knowledge and data dual-driven deep learning (DL) networks. Specifically, we first propose a data-driven residual neural network de-quantizer (ResNet-DQ) to pre-process the received pilot signals at user equipment (UEs), where the noise and distortion brought by imperfect hardware can be mitigated. A knowledge-driven generalized multiple measurement vector learned approximate message passing (GMMV-LAMP) network is then developed to jointly estimate the channels by exploiting the approximately same physical angle shared by different subcarriers. In particular, two wideband redundant dictionaries (WRDs) are proposed such that the measurement matrices of the GMMV-LAMP network can accommodate the far-field and near-field beam squint effect, respectively. Finally, we propose an encoder at the UEs and a decoder at the AP by a data-driven CSI residual network (CSI-ResNet) to compress the CSI matrix into a low-dimensional quantized bit vector for feedback, thereby reducing the feedback overhead substantially. Simulation results show that the proposed knowledge and data dual-driven approach outperforms conventional downlink CE and CSI feedback methods, especially in the case of low signal-to-noise ratios.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
DDSB: An Unsupervised and Training-free Method for Phase Detection in Echocardiography
Authors:
Zhenyu Bu,
Yang Liu,
Jiayu Huo,
Jingjing Peng,
Kaini Wang,
Guangquan Zhou,
Rachel Sparks,
Prokar Dasgupta,
Alejandro Granados,
Sebastien Ourselin
Abstract:
Accurate identification of End-Diastolic (ED) and End-Systolic (ES) frames is key for cardiac function assessment through echocardiography. However, traditional methods face several limitations: they require extensive amounts of data, extensive annotations by medical experts, significant training resources, and often lack robustness. Addressing these challenges, we proposed an unsupervised and tra…
▽ More
Accurate identification of End-Diastolic (ED) and End-Systolic (ES) frames is key for cardiac function assessment through echocardiography. However, traditional methods face several limitations: they require extensive amounts of data, extensive annotations by medical experts, significant training resources, and often lack robustness. Addressing these challenges, we proposed an unsupervised and training-free method, our novel approach leverages unsupervised segmentation to enhance fault tolerance against segmentation inaccuracies. By identifying anchor points and analyzing directional deformation, we effectively reduce dependence on the accuracy of initial segmentation images and enhance fault tolerance, all while improving robustness. Tested on Echo-dynamic and CAMUS datasets, our method achieves comparable accuracy to learning-based models without their associated drawbacks. The code is available at https://github.com/MRUIL/DDSB
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Offline Imitation of Badminton Player Behavior via Experiential Contexts and Brownian Motion
Authors:
Kuang-Da Wang,
Wei-Yao Wang,
Ping-Chun Hsieh,
Wen-Chih Peng
Abstract:
In the dynamic and rapid tactic involvements of turn-based sports, badminton stands out as an intrinsic paradigm that requires alter-dependent decision-making of players. While the advancement of learning from offline expert data in sequential decision-making has been witnessed in various domains, how to rally-wise imitate the behaviors of human players from offline badminton matches has remained…
▽ More
In the dynamic and rapid tactic involvements of turn-based sports, badminton stands out as an intrinsic paradigm that requires alter-dependent decision-making of players. While the advancement of learning from offline expert data in sequential decision-making has been witnessed in various domains, how to rally-wise imitate the behaviors of human players from offline badminton matches has remained underexplored. Replicating opponents' behavior benefits players by allowing them to undergo strategic development with direction before matches. However, directly applying existing methods suffers from the inherent hierarchy of the match and the compounding effect due to the turn-based nature of players alternatively taking actions. In this paper, we propose RallyNet, a novel hierarchical offline imitation learning model for badminton player behaviors: (i) RallyNet captures players' decision dependencies by modeling decision-making processes as a contextual Markov decision process. (ii) RallyNet leverages the experience to generate context as the agent's intent in the rally. (iii) To generate more realistic behavior, RallyNet leverages Geometric Brownian Motion (GBM) to model the interactions between players by introducing a valuable inductive bias for learning player behaviors. In this manner, RallyNet links player intents with interaction models with GBM, providing an understanding of interactions for sports analytics. We extensively validate RallyNet with the largest available real-world badminton dataset consisting of men's and women's singles, demonstrating its ability to imitate player behaviors. Results reveal RallyNet's superiority over offline imitation learning methods and state-of-the-art turn-based approaches, outperforming them by at least 16% in mean rule-based agent normalization score. Furthermore, we discuss various practical use cases to highlight RallyNet's applicability.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Fractionalization Signatures in the Dynamics of Quantum Spin Liquids
Authors:
Kang Wang,
Shi Feng,
Penghao Zhu,
Runze Chi,
Hai-Jun Liao,
Nandini Trivedi,
Tao Xiang
Abstract:
We investigate the signatures of fractionalization in quantum spin liquids by studying different phases of the Kitaev honeycomb model in the presence of an out-of-plane magnetic field through which the model becomes non-integrable. Using the infinite Projected Entangled Pair States (iPEPS) ansatz, along with analytical calculations and exact diagonalization, we calculate dynamical signatures of fr…
▽ More
We investigate the signatures of fractionalization in quantum spin liquids by studying different phases of the Kitaev honeycomb model in the presence of an out-of-plane magnetic field through which the model becomes non-integrable. Using the infinite Projected Entangled Pair States (iPEPS) ansatz, along with analytical calculations and exact diagonalization, we calculate dynamical signatures of fractionalized particles through spin-spin and dimer-dimer correlations. Our analysis demonstrates the ability of these correlations to discern distinct fractionalized quantum sectors, namely Majorana fermions and the emergent $Z_2$ fluxes, in both the chiral spin liquid (CSL) phase under weak field and the emergent intermediate gapless phase (IGP) under moderate field. Importantly, our calculation reveals the nature of IGP observed at moderate fields, a region of ongoing debate, indicating that this phase is a Majorana metal induced by strong flux fluctuations.
△ Less
Submitted 20 March, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
Authors:
Xiao Fu,
Wei Yin,
Mu Hu,
Kaixuan Wang,
Yuexin Ma,
Ping Tan,
Shaojie Shen,
Dahua Lin,
Xiaoxiao Long
Abstract:
We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes, e.g., depth and normals, from single images. While significant research has already been conducted in this area, the progress has been substantially limited by the low diversity and poor quality of publicly available datasets. As a result, the prior works either are constrained to limited scenar…
▽ More
We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes, e.g., depth and normals, from single images. While significant research has already been conducted in this area, the progress has been substantially limited by the low diversity and poor quality of publicly available datasets. As a result, the prior works either are constrained to limited scenarios or suffer from the inability to capture geometric details. In this paper, we demonstrate that generative models, as opposed to traditional discriminative models (e.g., CNNs and Transformers), can effectively address the inherently ill-posed problem. We further show that leveraging diffusion priors can markedly improve generalization, detail preservation, and efficiency in resource usage. Specifically, we extend the original stable diffusion model to jointly predict depth and normal, allowing mutual information exchange and high consistency between the two representations. More importantly, we propose a simple yet effective strategy to segregate the complex data distribution of various scenes into distinct sub-distributions. This strategy enables our model to recognize different scene layouts, capturing 3D geometry with remarkable fidelity. GeoWizard sets new benchmarks for zero-shot depth and normal prediction, significantly enhancing many downstream applications such as 3D reconstruction, 2D content creation, and novel viewpoint synthesis.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
Authors:
Wangbo Zhao,
Jiasheng Tang,
Yizeng Han,
Yibing Song,
Kai Wang,
Gao Huang,
Fan Wang,
Yang You
Abstract:
Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency. However, the exploration of enhancing inference efficiency during adaptation remains underexplored. This limits the broader application of pre-trained ViT models, especially when the model is computationally extensive. In this paper,…
▽ More
Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency. However, the exploration of enhancing inference efficiency during adaptation remains underexplored. This limits the broader application of pre-trained ViT models, especially when the model is computationally extensive. In this paper, we propose Dynamic Tuning (DyT), a novel approach to improve both parameter and inference efficiency for ViT adaptation. Specifically, besides using the lightweight adapter modules, we propose a token dispatcher to distinguish informative tokens from less important ones, allowing the latter to dynamically skip the original block, thereby reducing the redundant computation during inference. Additionally, we explore multiple design variants to find the best practice of DyT. Finally, inspired by the mixture-of-experts (MoE) mechanism, we introduce an enhanced adapter to further boost the adaptation performance. We validate DyT across various tasks, including image/video recognition and semantic segmentation. For instance, DyT achieves comparable or even superior performance compared to existing PEFT methods while evoking only 71%-85% of their FLOPs on the VTAB-1K benchmark.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework
Authors:
Kaiyan Chang,
Kun Wang,
Nan Yang,
Ying Wang,
Dantong Jin,
Wenlong Zhu,
Zhirong Chen,
Cangyuan Li,
Hao Yan,
Yunhao Zhou,
Zhuoliang Zhao,
Yuan Cheng,
Yudong Pan,
Yiqi Liu,
Mengdi Wang,
Shengwen Liang,
Yinhe Han,
Huawei Li,
Xiaowei Li
Abstract:
Recent advances in large language models have demonstrated their potential for automated generation of hardware description language (HDL) code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these large language models (LLMs) in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by L…
▽ More
Recent advances in large language models have demonstrated their potential for automated generation of hardware description language (HDL) code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these large language models (LLMs) in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by LLMs. Additionally, the absence of a Verilog and Electronic Design Automation (EDA) script data augmentation framework significantly increases the time required to prepare the training dataset for LLM trainers. This paper proposes an automated design-data augmentation framework, which generates high-volume and high-quality natural language aligned with Verilog and EDA scripts. For Verilog generation, it translates Verilog files to an abstract syntax tree and then maps nodes to natural language with a predefined template. For Verilog repair, it uses predefined rules to generate the wrong verilog file and then pairs EDA Tool feedback with the right and wrong verilog file. For EDA Script generation, it uses existing LLM(GPT-3.5) to obtain the description of the Script. To evaluate the effectiveness of our data augmentation method, we finetune Llama2-13B and Llama2-7B models using the dataset generated by our augmentation framework. The results demonstrate a significant improvement in the Verilog generation tasks with LLMs. Moreover, the accuracy of Verilog generation surpasses that of the current state-of-the-art open-source Verilog generation model, increasing from 58.8% to 70.6% with the same benchmark. Our 13B model (ChipGPT-FT) has a pass rate improvement compared with GPT-3.5 in Verilog generation and outperforms in EDA script (i.e., SiliconCompiler) generation with only 200 EDA script data.
△ Less
Submitted 10 July, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Test of lepton universality and measurement of the form factors of $D^0\to K^{*}(892)^-μ^+ν_μ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (637 additional authors not shown)
Abstract:
We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an a…
▽ More
We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an amplitude analysis, the $S\text{-}{\rm wave}$ contribution is determined to be $(5.76 \pm 0.35_{\rm stat} \pm 0.29_{\rm syst})\%$ of the total decay rate in addition to the dominated $K^{*}(892)^-$ component. The branching fraction of $D^0\to K^{*}(892)^-μ^+ν_μ$ is given to be $(2.062 \pm 0.039_{\rm stat} \pm 0.032_{\rm syst})\%$, which improves the precision of the world average by a factor of 5. Combining with the world average of ${\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)$, the ratio of the branching fractions obtained is $\frac{{\mathcal B}(D^0\to K^{*}(892)^-μ^+ν_μ)}{{\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)} = 0.96\pm0.08$, in agreement with lepton flavor universality. Furthermore, assuming single-pole dominance parameterization, the most precise hadronic form factor ratios for $D^0\to K^{*}(892)^{-} μ^+ν_μ$ are extracted to be $r_{V}=V(0)/A_1(0)=1.37 \pm 0.09_{\rm stat} \pm 0.03_{\rm syst}$ and $r_{2}=A_2(0)/A_1(0)=0.76 \pm 0.06_{\rm stat} \pm 0.02_{\rm syst}$.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Real-World Computational Aberration Correction via Quantized Domain-Mixing Representation
Authors:
Qi Jiang,
Zhonghua Yi,
Shaohua Gao,
Yao Gao,
Xiaolong Qian,
Hao Shi,
Lei Sun,
Zhijie Xu,
Kailun Yang,
Kaiwei Wang
Abstract:
Relying on paired synthetic data, existing learning-based Computational Aberration Correction (CAC) methods are confronted with the intricate and multifaceted synthetic-to-real domain gap, which leads to suboptimal performance in real-world applications. In this paper, in contrast to improving the simulation pipeline, we deliver a novel insight into real-world CAC from the perspective of Unsupervi…
▽ More
Relying on paired synthetic data, existing learning-based Computational Aberration Correction (CAC) methods are confronted with the intricate and multifaceted synthetic-to-real domain gap, which leads to suboptimal performance in real-world applications. In this paper, in contrast to improving the simulation pipeline, we deliver a novel insight into real-world CAC from the perspective of Unsupervised Domain Adaptation (UDA). By incorporating readily accessible unpaired real-world data into training, we formalize the Domain Adaptive CAC (DACAC) task, and then introduce a comprehensive Real-world aberrated images (Realab) dataset to benchmark it. The setup task presents a formidable challenge due to the intricacy of understanding the target aberration domain. To this intent, we propose a novel Quntized Domain-Mixing Representation (QDMR) framework as a potent solution to the issue. QDMR adapts the CAC model to the target domain from three key aspects: (1) reconstructing aberrated images of both domains by a VQGAN to learn a Domain-Mixing Codebook (DMC) which characterizes the degradation-aware priors; (2) modulating the deep features in CAC model with DMC to transfer the target domain knowledge; and (3) leveraging the trained VQGAN to generate pseudo target aberrated images from the source ones for convincing target domain supervision. Extensive experiments on both synthetic and real-world benchmarks reveal that the models with QDMR consistently surpass the competitive methods in mitigating the synthetic-to-real gap, which produces visually pleasant real-world CAC results with fewer artifacts. Codes and datasets will be made publicly available.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A
Authors:
The LHAASO Collaboration,
Zhen Cao,
F. Aharonian,
Q. An,
A. Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen
, et al. (256 additional authors not shown)
Abstract:
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at…
▽ More
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components.
△ Less
Submitted 26 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
High-energy Neutrinos from Outflows Powered by Kicked Remnants of Binary Black Hole Mergers in AGN Accretion Disks
Authors:
Zhi-Peng Ma,
Kai Wang
Abstract:
Merging of stellar-mass binary black holes (BBH) could take place within the accretion disk of active galactic nuclei (AGN). The resulting BH remnant is likely to accrete the disk gas at a super-Eddington rate, launching a fast, quasi-spherical outflow (wind). Particles will be accelerated by shocks driven by the wind, subsequently interacting with the shocked disk gas or radiation field through h…
▽ More
Merging of stellar-mass binary black holes (BBH) could take place within the accretion disk of active galactic nuclei (AGN). The resulting BH remnant is likely to accrete the disk gas at a super-Eddington rate, launching a fast, quasi-spherical outflow (wind). Particles will be accelerated by shocks driven by the wind, subsequently interacting with the shocked disk gas or radiation field through hadronic processes and resulting in the production of high-energy neutrinos and potential electromagnetic (EM) emissions. This study delves into the intricate evolution of the shock driven by the remnant BH wind within AGN disks. Subsequently, we calculated the production of neutrinos and the expected detection numbers for a single event, along with their contributions to the overall diffuse neutrino background. Our analysis, considering various scenarios, reveals considerable neutrino production and possible detection by IceCube for nearby events. The contribution of the remnant BH winds on the diffuse neutrino background is minor due to the low event rate density, but it can be improved to some extent for some optimistic parameters. We also propose that there could be two neutrino/EM bursts, one originating from the premerger BBH wind and the other from the remnant BH wind, with the latter typically having a time gap to the GW event of around tens of days. When combined with the anticipated gravitational waves (GW) emitted during the BBH merger, such a system emerges as a promising candidate for joint observations involving neutrinos, GWs, and EM signals.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
High precision proton beam monitor system concept design on CSNS based on SiC
Authors:
Ye He,
Xingchen Li,
Zijun Xu,
Ming Qi,
Congcong Wang,
Chenwei Wang,
Hai Lu,
Xiaojun Nie,
Ruirui Fan,
Hantao Jing,
Weiming Song,
Keqi Wang,
Kai Liu,
Peilian Liu,
Hui Li,
Zaiyi Li,
Chenxi Fu,
Xiyuan Zhang,
Xiaoshen Kang,
Zhan Li,
Weiguo Lu,
Suyu Xiao,
Xin Shi
Abstract:
A high precision beam monitor system based on silicon carbide PIN sensor is designed for China Spallation Neutron Source 1.6 GeV proton beam to monitor the proton beam fluence.The concept design of the beam monitor system is finished together with front-end electronics with silicon carbide PIN sensors, readout system and mechanical system.Several tests are performed to study the performance of eac…
▽ More
A high precision beam monitor system based on silicon carbide PIN sensor is designed for China Spallation Neutron Source 1.6 GeV proton beam to monitor the proton beam fluence.The concept design of the beam monitor system is finished together with front-end electronics with silicon carbide PIN sensors, readout system and mechanical system.Several tests are performed to study the performance of each component of the system.The charge collection of the SiC PIN sensors after proton radiation is studied with 80 MeV proton beam for continuous running. Research on the performance of the front-end electronics and readout system is finished for better data acquisition.The uncertainty of proton beam fluence is below 1% in the beam monitor system.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Analysis of singular subspaces under random perturbations
Authors:
Ke Wang
Abstract:
We present a comprehensive analysis of singular vector and singular subspace perturbations in the context of the signal plus random Gaussian noise matrix model. Assuming a low-rank signal matrix, we extend the Davis-Kahan-Wedin theorem in a fully generalized manner, applicable to any unitarily invariant matrix norm, extending previous results of O'Rourke, Vu and the author. We also obtain the fine…
▽ More
We present a comprehensive analysis of singular vector and singular subspace perturbations in the context of the signal plus random Gaussian noise matrix model. Assuming a low-rank signal matrix, we extend the Davis-Kahan-Wedin theorem in a fully generalized manner, applicable to any unitarily invariant matrix norm, extending previous results of O'Rourke, Vu and the author. We also obtain the fine-grained results, which encompass the $\ell_\infty$ analysis of singular vectors, the $\ell_{2, \infty}$ analysis of singular subspaces, as well as the exploration of linear and bilinear functions related to the singular vectors. Moreover, we explore the practical implications of these findings, in the context of the Gaussian mixture model and the submatrix localization problem.
△ Less
Submitted 19 March, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Offboard Occupancy Refinement with Hybrid Propagation for Autonomous Driving
Authors:
Hao Shi,
Song Wang,
Jiaming Zhang,
Xiaoting Yin,
Zhongdao Wang,
Guangming Wang,
Jianke Zhu,
Kailun Yang,
Kaiwei Wang
Abstract:
Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision. Previous methods, confined to onboard processing, struggle with simultaneous geometric and semantic estimation, continuity across varying viewpoints, and single-view occlusion. Our paper introduces OccFiner, a novel offboard framework designed to enhance the acc…
▽ More
Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision. Previous methods, confined to onboard processing, struggle with simultaneous geometric and semantic estimation, continuity across varying viewpoints, and single-view occlusion. Our paper introduces OccFiner, a novel offboard framework designed to enhance the accuracy of vision-based occupancy predictions. OccFiner operates in two hybrid phases: 1) a multi-to-multi local propagation network that implicitly aligns and processes multiple local frames for correcting onboard model errors and consistently enhancing occupancy accuracy across all distances. 2) the region-centric global propagation, focuses on refining labels using explicit multi-view geometry and integrating sensor bias, especially to increase the accuracy of distant occupied voxels. Extensive experiments demonstrate that OccFiner improves both geometric and semantic accuracy across various types of coarse occupancy, setting a new state-of-the-art performance on the SemanticKITTI dataset. Notably, OccFiner elevates vision-based SSC models to a level even surpassing that of LiDAR-based onboard SSC models. Furthermore, OccFiner is the first to achieve automatic annotation of SSC in a purely vision-based approach. Quantitative experiments prove that OccFiner successfully facilitates occupancy data loop-closure in autonomous driving. Additionally, we quantitatively and qualitatively validate the superiority of the offboard approach on city-level SSC static maps. The source code will be made publicly available at https://github.com/MasterHow/OccFiner.
△ Less
Submitted 7 July, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production
Authors:
Jiuniu Wang,
Zehua Du,
Yuyuan Zhao,
Bo Yuan,
Kexiang Wang,
Jian Liang,
Yaxi Zhao,
Yihen Lu,
Gengliang Li,
Junlong Gao,
Xin Tu,
Zhenyu Guo
Abstract:
The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. The system integrates multiple generative capabilities within a unified framework, so that individual…
▽ More
The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. The system integrates multiple generative capabilities within a unified framework, so that individual users can leverage these modules easily. This innovative system would convert user story proposals into scripts, images, and audio, and then integrate these multimodal contents into videos. Additionally, the animating units (e.g., Gen-2 and Sora) could make the videos more infectious. The AesopAgent system could orchestrate task workflow for video generation, ensuring that the generated video is both rich in content and coherent. This system mainly contains two layers, i.e., the Horizontal Layer and the Utility Layer. In the Horizontal Layer, we introduce a novel RAG-based evolutionary system that optimizes the whole video generation workflow and the steps within the workflow. It continuously evolves and iteratively optimizes workflow by accumulating expert experience and professional knowledge, including optimizing the LLM prompts and utilities usage. The Utility Layer provides multiple utilities, leading to consistent image generation that is visually coherent in terms of composition, characters, and style. Meanwhile, it provides audio and special effects, integrating them into expressive and logically arranged videos. Overall, our AesopAgent achieves state-of-the-art performance compared with many previous works in visual storytelling. Our AesopAgent is designed for convenient service for individual users, which is available on the following page: https://aesopai.github.io/.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Unraveling the nature of quasi van der Waals Epitaxy of magnetic topological insulators Cr: (BixSb1-x)2Te3 on a GaAs (111) substrate through coherently strained interface
Authors:
Yuxing Ren,
Lixuan Tai,
Kaicheng Pan,
Yueyun Chen,
Benjamin Z. Gregory,
Jin Ho Kang,
Malcolm Jackson,
Michael Liao,
Yifei Sun,
Noah Bodzin,
Kin Wong,
Suchismita Sarker,
B. C. Regan,
Chee-Wei Wong,
Mark Goorsky,
Andrej Singer,
Kang L. Wang
Abstract:
Quasi van der Waals Epitaxy (qvdWE) has been realized for decades at the interfaces between 3D and 2D materials or van der Waals materials. The growth of magnetic topological insulators (MTI) Cr: (BixSb1-x)2Te3 (CBST) on GaAs (111) substrates for Quantum Anomalous Hall Effect (QAH) is actually one of the examples of qvdWE, which is not well noticed despite the fact that its advantages have been us…
▽ More
Quasi van der Waals Epitaxy (qvdWE) has been realized for decades at the interfaces between 3D and 2D materials or van der Waals materials. The growth of magnetic topological insulators (MTI) Cr: (BixSb1-x)2Te3 (CBST) on GaAs (111) substrates for Quantum Anomalous Hall Effect (QAH) is actually one of the examples of qvdWE, which is not well noticed despite the fact that its advantages have been used in growth of various MTI materials. This is distinguished from the growth of MTIs on other substrates. Although the qvdWE mode has been used in many 2D growth on III-V substrates, the specific features and mechanisms are not well demonstrated and summarized yet. Here in this work, we have for the first time shown the features of both coherent interfaces and the existence of strain originating from qvdWE at the same time.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
Authors:
JunDa Cheng,
Wei Yin,
Kaixuan Wang,
Xiaozhi Chen,
Shijie Wang,
Xin Yang
Abstract:
Multi-view depth estimation has achieved impressive performance over various benchmarks. However, almost all current multi-view systems rely on given ideal camera poses, which are unavailable in many real-world scenarios, such as autonomous driving. In this work, we propose a new robustness benchmark to evaluate the depth estimation system under various noisy pose settings. Surprisingly, we find c…
▽ More
Multi-view depth estimation has achieved impressive performance over various benchmarks. However, almost all current multi-view systems rely on given ideal camera poses, which are unavailable in many real-world scenarios, such as autonomous driving. In this work, we propose a new robustness benchmark to evaluate the depth estimation system under various noisy pose settings. Surprisingly, we find current multi-view depth estimation methods or single-view and multi-view fusion methods will fail when given noisy pose settings. To address this challenge, we propose a single-view and multi-view fused depth estimation system, which adaptively integrates high-confident multi-view and single-view results for both robust and accurate depth estimations. The adaptive fusion module performs fusion by dynamically selecting high-confidence regions between two branches based on a wrapping confidence map. Thus, the system tends to choose the more reliable branch when facing textureless scenes, inaccurate calibration, dynamic objects, and other degradation or challenging conditions. Our method outperforms state-of-the-art multi-view and fusion methods under robustness testing. Furthermore, we achieve state-of-the-art performance on challenging benchmarks (KITTI and DDAD) when given accurate pose estimations. Project website: https://github.com/Junda24/AFNet/.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
SGE: Structured Light System Based on Gray Code with an Event Camera
Authors:
Xingyu Lu,
Lei Sun,
Diyang Gu,
Zhijie Xu,
Kaiwei Wang
Abstract:
Fast and accurate depth sensing has long been a significant research challenge. Event camera, as a device that quickly responds to intensity changes, provides a new solution for structured light (SL) systems. In this paper, we introduce Gray code into event-based SL systems for the first time. Our setup includes an event camera and Digital Light Processing (DLP) projector, enabling depth estimatio…
▽ More
Fast and accurate depth sensing has long been a significant research challenge. Event camera, as a device that quickly responds to intensity changes, provides a new solution for structured light (SL) systems. In this paper, we introduce Gray code into event-based SL systems for the first time. Our setup includes an event camera and Digital Light Processing (DLP) projector, enabling depth estimation through high-speed projection and decoding of Gray code patterns. By employing spatio-temporal encoding for point matching, our method is immune to timestamp noise, realizing high-speed depth estimation without loss of accuracy. The binary nature of events and Gray code minimizes data redundancy, enabling us to fully utilize sensor bandwidth at 100%. Experimental results show that our approach achieves accuracy comparable to state-of-the-art scanning methods while surpassing them in data acquisition speed (up to 41 times improvement) without sacrificing accuracy. Our proposed approach offers a highly promising solution for ultra-fast, real-time, and high-precision dense depth estimation. Code and dataset will be publicly available.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Determination of the number of $ψ(3686)$ events taken at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be…
▽ More
The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be $(107.7\pm0.6)\times 10^6$ and $(345.4\pm 2.6)\times 10^6$, respectively. Both numbers are consistent with the previous measurements within one standard deviation. The total number of $ψ(3686)$ events in the three data samples is $(2712.4\pm14.3)\times10^6$.
△ Less
Submitted 28 May, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Neutrinoless double-$β$ decay and double Gamow-Teller transitions
Authors:
Y. K. Wang,
P. W. Zhao,
J. Meng
Abstract:
The neutrinoless double-$β$ ($0νββ$) decay and the double Gamow-Teller (DGT) transition are investigated with the state-of-the-art Relativistic Configuration-interaction Density functional theory. A strong linear correlation between the nuclear matrix elements (NMEs) of the $0νββ$ decay and the DGT transition is demonstrated. This linear correlation is found to originate from the similarity of the…
▽ More
The neutrinoless double-$β$ ($0νββ$) decay and the double Gamow-Teller (DGT) transition are investigated with the state-of-the-art Relativistic Configuration-interaction Density functional theory. A strong linear correlation between the nuclear matrix elements (NMEs) of the $0νββ$ decay and the DGT transition is demonstrated. This linear correlation is found to originate from the similarity of the leading-order term of the $0νββ$-decay operator and the DGT-transition one, as revealed by expanding the $0νββ$-decay operator in terms of the spherical harmonics. The present results provide a strong support to constrain the $0νββ$-decay NMEs through the double charge-exchange reactions.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Temporal-Mapping Photography for Event Cameras
Authors:
Yuhan Bao,
Lei Sun,
Yuqin Ma,
Kaiwei Wang
Abstract:
Event cameras, or Dynamic Vision Sensors (DVS) are novel neuromorphic sensors that capture brightness changes as a continuous stream of ``events'' rather than traditional intensity frames. Converting sparse events to dense intensity frames faithfully has long been an ill-posed problem. Previous methods have primarily focused on converting events to video in dynamic scenes or with a moving camera.…
▽ More
Event cameras, or Dynamic Vision Sensors (DVS) are novel neuromorphic sensors that capture brightness changes as a continuous stream of ``events'' rather than traditional intensity frames. Converting sparse events to dense intensity frames faithfully has long been an ill-posed problem. Previous methods have primarily focused on converting events to video in dynamic scenes or with a moving camera. In this paper, for the first time, we realize events to dense intensity image conversion using a stationary event camera in static scenes. Different from traditional methods that mainly rely on event integration, the proposed Event-Based Temporal Mapping Photography (EvTemMap) measures the time of event emitting for each pixel. Then, the resulting Temporal Matrix is converted to an intensity frame with a temporal mapping neural network. At the hardware level, the proposed EvTemMap is implemented by combining a transmittance adjustment device with a DVS, named Adjustable Transmittance Dynamic Vision Sensor. Additionally, we collected TemMat dataset under various conditions including low-light and high dynamic range scenes. The experimental results showcase the high dynamic range, fine-grained details, and high-grayscale-resolution of the proposed EvTemMap, as well as the enhanced performance on downstream computer vision tasks compared to other methods. The code and TemMat dataset will be made publicly available.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to Standard RL
Authors:
Kaiwen Wang,
Dawen Liang,
Nathan Kallus,
Wen Sun
Abstract:
We study Risk-Sensitive Reinforcement Learning (RSRL) with the Optimized Certainty Equivalent (OCE) risk, which generalizes Conditional Value-at-risk (CVaR), entropic risk and Markowitz's mean-variance. Using an augmented Markov Decision Process (MDP), we propose two general meta-algorithms via reductions to standard RL: one based on optimistic algorithms and another based on policy optimization.…
▽ More
We study Risk-Sensitive Reinforcement Learning (RSRL) with the Optimized Certainty Equivalent (OCE) risk, which generalizes Conditional Value-at-risk (CVaR), entropic risk and Markowitz's mean-variance. Using an augmented Markov Decision Process (MDP), we propose two general meta-algorithms via reductions to standard RL: one based on optimistic algorithms and another based on policy optimization. Our optimistic meta-algorithm generalizes almost all prior RSRL theory with entropic risk or CVaR. Under discrete rewards, our optimistic theory also certifies the first RSRL regret bounds for MDPs with bounded coverability, e.g., exogenous block MDPs. Under discrete rewards, our policy optimization meta-algorithm enjoys both global convergence and local improvement guarantees in a novel metric that lower bounds the true OCE risk. Finally, we instantiate our framework with PPO, construct an MDP, and show that it learns the optimal risk-sensitive policy while prior algorithms provably fail.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes
Authors:
Xingyi Li,
Zhiguo Cao,
Yizheng Wu,
Kewei Wang,
Ke Xian,
Zhe Wang,
Guosheng Lin
Abstract:
Current 3D stylization methods often assume static scenes, which violates the dynamic nature of our real world. To address this limitation, we present S-DyRF, a reference-based spatio-temporal stylization method for dynamic neural radiance fields. However, stylizing dynamic 3D scenes is inherently challenging due to the limited availability of stylized reference images along the temporal axis. Our…
▽ More
Current 3D stylization methods often assume static scenes, which violates the dynamic nature of our real world. To address this limitation, we present S-DyRF, a reference-based spatio-temporal stylization method for dynamic neural radiance fields. However, stylizing dynamic 3D scenes is inherently challenging due to the limited availability of stylized reference images along the temporal axis. Our key insight lies in introducing additional temporal cues besides the provided reference. To this end, we generate temporal pseudo-references from the given stylized reference. These pseudo-references facilitate the propagation of style information from the reference to the entire dynamic 3D scene. For coarse style transfer, we enforce novel views and times to mimic the style details present in pseudo-references at the feature level. To preserve high-frequency details, we create a collection of stylized temporal pseudo-rays from temporal pseudo-references. These pseudo-rays serve as detailed and explicit stylization guidance for achieving fine style transfer. Experiments on both synthetic and real-world datasets demonstrate that our method yields plausible stylized results of space-time view synthesis on dynamic 3D scenes.
△ Less
Submitted 22 March, 2024; v1 submitted 10 March, 2024;
originally announced March 2024.
-
Large Language Models on Fine-grained Emotion Detection Dataset with Data Augmentation and Transfer Learning
Authors:
Kaipeng Wang,
Zhi Jing,
Yongye Su,
Yikun Han
Abstract:
This paper delves into enhancing the classification performance on the GoEmotions dataset, a large, manually annotated dataset for emotion detection in text. The primary goal of this paper is to address the challenges of detecting subtle emotions in text, a complex issue in Natural Language Processing (NLP) with significant practical applications. The findings offer valuable insights into addressi…
▽ More
This paper delves into enhancing the classification performance on the GoEmotions dataset, a large, manually annotated dataset for emotion detection in text. The primary goal of this paper is to address the challenges of detecting subtle emotions in text, a complex issue in Natural Language Processing (NLP) with significant practical applications. The findings offer valuable insights into addressing the challenges of emotion detection in text and suggest directions for future research, including the potential for a survey paper that synthesizes methods and performances across various datasets in this domain.
△ Less
Submitted 9 April, 2024; v1 submitted 10 March, 2024;
originally announced March 2024.
-
Large Generative Model Assisted 3D Semantic Communication
Authors:
Feibo Jiang,
Yubo Peng,
Li Dong,
Kezhi Wang,
Kun Yang,
Cunhua Pan,
Xiaohu You
Abstract:
Semantic Communication (SC) is a novel paradigm for data transmission in 6G. However, there are several challenges posed when performing SC in 3D scenarios: 1) 3D semantic extraction; 2) Latent semantic redundancy; and 3) Uncertain channel estimation. To address these issues, we propose a Generative AI Model assisted 3D SC (GAM-3DSC) system. Firstly, we introduce a 3D Semantic Extractor (3DSE), wh…
▽ More
Semantic Communication (SC) is a novel paradigm for data transmission in 6G. However, there are several challenges posed when performing SC in 3D scenarios: 1) 3D semantic extraction; 2) Latent semantic redundancy; and 3) Uncertain channel estimation. To address these issues, we propose a Generative AI Model assisted 3D SC (GAM-3DSC) system. Firstly, we introduce a 3D Semantic Extractor (3DSE), which employs generative AI models, including Segment Anything Model (SAM) and Neural Radiance Field (NeRF), to extract key semantics from a 3D scenario based on user requirements. The extracted 3D semantics are represented as multi-perspective images of the goal-oriented 3D object. Then, we present an Adaptive Semantic Compression Model (ASCM) for encoding these multi-perspective images, in which we use a semantic encoder with two output heads to perform semantic encoding and mask redundant semantics in the latent semantic space, respectively. Next, we design a conditional Generative adversarial network and Diffusion model aided-Channel Estimation (GDCE) to estimate and refine the Channel State Information (CSI) of physical channels. Finally, simulation results demonstrate the advantages of the proposed GAM-3DSC system in effectively transmitting the goal-oriented 3D scenario.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Chlorine and zinc co-doping effects on the electronic structure and optical properties of γ-CuI
Authors:
Chao Li,
Meicong Li,
Zhuli Zhang,
Qiang Zhao,
Naixin Liu,
Kailei Wang,
Fan Zhang,
Xiaoping Ouyang
Abstract:
The effects of chlorine (Cl) and zinc (Zn) co-doping on the electronic structure and optical properties of the zinc blende (γ) phase of copper iodide (γ-CuI) scintillator material are investigated by using first-principles density functional theory calculations. The band structure, density of states, dielectric function, absorption coefficients, and reflectivity were analyzed before and after dopi…
▽ More
The effects of chlorine (Cl) and zinc (Zn) co-doping on the electronic structure and optical properties of the zinc blende (γ) phase of copper iodide (γ-CuI) scintillator material are investigated by using first-principles density functional theory calculations. The band structure, density of states, dielectric function, absorption coefficients, and reflectivity were analyzed before and after doping. Results show co-doping significantly modifies the band structure, reduces the band gap, and generates impurity energy levels. Cl doping enhances absorption in the high energy region while reducing visible light absorption. Zn doping induces a redshift in absorption and n-type conductivity at high concentrations. With suitable co-doping ratios, the absorption coefficient and reflectivity of γ-CuI can be optimized in the visible range to improve scintillation light yield. The calculations provide guidance for co-doping γ-CuI scintillators to achieve superior detection performance. The n-type conductivity also makes doped γ-CuI promising for optoelectronic applications.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Switching the Loss Reduces the Cost in Batch Reinforcement Learning
Authors:
Alex Ayoub,
Kaiwen Wang,
Vincent Liu,
Samuel Robertson,
James McInerney,
Dawen Liang,
Nathan Kallus,
Csaba Szepesvári
Abstract:
We propose training fitted Q-iteration with log-loss (FQI-LOG) for batch reinforcement learning (RL). We show that the number of samples needed to learn a near-optimal policy with FQI-LOG scales with the accumulated cost of the optimal policy, which is zero in problems where acting optimally achieves the goal and incurs no cost. In doing so, we provide a general framework for proving…
▽ More
We propose training fitted Q-iteration with log-loss (FQI-LOG) for batch reinforcement learning (RL). We show that the number of samples needed to learn a near-optimal policy with FQI-LOG scales with the accumulated cost of the optimal policy, which is zero in problems where acting optimally achieves the goal and incurs no cost. In doing so, we provide a general framework for proving $\textit{small-cost}$ bounds, i.e. bounds that scale with the optimal achievable cost, in batch RL. Moreover, we empirically verify that FQI-LOG uses fewer samples than FQI trained with squared loss on problems where the optimal policy reliably achieves the goal.
△ Less
Submitted 12 March, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Superconductivity of the New Medium-Entropy Alloy V4Ti2W with a Body-Centered Cubic Structure
Authors:
Kuan Li,
Weijie Lin,
Ruixin Guo,
Shu Guo,
Lingyong Zeng,
Longfu Li,
Peifeng Yu,
Kangwang Wang,
Chao Zhang,
Huixia Luo
Abstract:
Medium- and high-entropy alloy (MEA and HEA) superconductors have attracted considerable interest since their discovery. This paper reports the superconducting properties of ternary tungsten-containing MEA V4Ti2W for the first time. V4Ti2W is a type II superconductor with a body-centered cubic (BCC) structure. Experimental results of resistivity, magnetization, and heat capacity indicate that the…
▽ More
Medium- and high-entropy alloy (MEA and HEA) superconductors have attracted considerable interest since their discovery. This paper reports the superconducting properties of ternary tungsten-containing MEA V4Ti2W for the first time. V4Ti2W is a type II superconductor with a body-centered cubic (BCC) structure. Experimental results of resistivity, magnetization, and heat capacity indicate that the superconducting transition temperature of the MEA V4Ti2W is roughly 5.0 K. The critical magnetic fields at the upper and lower ends are 9.93(2) T and 40.7(3) mT, respectively. Interestingly, few BCC MEA superconductors with VEC greater than 4.8 have been found. The addition of tungsten leads to a VEC of 4.83 e/a for V4Ti2W, which is rarely higher than the 4.8 value. Adding tungsten element expands the variety of MEA alloys, which may improve the microstructure and mechanical properties of materials and even superconducting properties. This material could potentially offer a new platform for the investigation of innovative MEA and HEA superconductors.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
MUC: Mixture of Uncalibrated Cameras for Robust 3D Human Body Reconstruction
Authors:
Yitao Zhu,
Sheng Wang,
Mengjie Xu,
Zixu Zhuang,
Zhixin Wang,
Kaidong Wang,
Han Zhang,
Qian Wang
Abstract:
Multiple cameras can provide multi-view video coverage of a person. It is necessary to fuse multi-view data, e.g., for subsequent behavioral analysis, while such fusion often relies on calibration of cameras in traditional solutions. However, it is non-trivial to calibrate multiple cameras. In this work, we propose a method to reconstruct 3D human body from multiple uncalibrated camera views. Firs…
▽ More
Multiple cameras can provide multi-view video coverage of a person. It is necessary to fuse multi-view data, e.g., for subsequent behavioral analysis, while such fusion often relies on calibration of cameras in traditional solutions. However, it is non-trivial to calibrate multiple cameras. In this work, we propose a method to reconstruct 3D human body from multiple uncalibrated camera views. First, we adopt a pre-trained human body encoder to process each individual camera view, such that human body models and parameters can be reconstructed for each view. Next, instead of simply averaging models across views, we train a network to determine the weights of individual views for their fusion, based on the parameters estimated for joints and hands of human body as well as camera positions. Further, we turn to the mesh surface of human body for dynamic fusion, such that facial expression can be seamlessly integrated into the model of human body. Our method has demonstrated superior performance in reconstructing human body upon two public datasets. More importantly, our method can flexibly support ad-hoc deployment of an arbitrary number of cameras, which has significant potential in related applications. We will release source code upon acceptance of the paper.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Relative alignment between gas structures and magnetic field in Orion A at different scales using different molecular gas tracers
Authors:
Wenyu Jiao,
Ke Wang,
Fengwei Xu,
Chao Wang,
Henrik Beuther
Abstract:
Context: Magnetic fields can play crucial roles in high-mass star formation. Nonetheless, the significance of magnetic fields at various scales and their relationship with gas structures is largely overlooked. Aims: Our goal is to examine the relationship between the magnetic field and molecular gas structures within the Orion A giant molecular cloud at different scales and density regimes. Method…
▽ More
Context: Magnetic fields can play crucial roles in high-mass star formation. Nonetheless, the significance of magnetic fields at various scales and their relationship with gas structures is largely overlooked. Aims: Our goal is to examine the relationship between the magnetic field and molecular gas structures within the Orion A giant molecular cloud at different scales and density regimes. Methods: We assess the gas intensity structures and column densities in Orion A by utilizing $^{12}$CO, $^{13}$CO, and C$^{18}$O from Nobeyama observations. Through comparing Nobeyama observations with {\it{Planck}} polarization observations on large scales ($\sim0.6$ pc) and JCMT polarization observations on small scales ($\sim0.04$ pc), we investigate how the role of magnetic fields change with scale and density. Results: We find a similar trend from parallel to perpendicular alignment with increasing column densities in Orion A at both large and small spatial scales. Besides, when changing from low-density to high-density tracers, the relative orientation preference changes from random to perpendicular. The self-similar results at different scales indicate that magnetic fields are dynamically important in both cloud formation and filament formation. However, magnetic fields properties at small scales are relative complicated, and the interplay between magnetic field and star-forming activities needs to be discussed case-by-case.
△ Less
Submitted 19 June, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Secure MIMO Communication Relying on Movable Antennas
Authors:
Jun Tang,
Cunhua Pan,
Yang Zhang,
Hong Ren,
Kezhi Wang
Abstract:
This paper considers a movable antenna (MA)-aided secure multiple-input multiple-output (MIMO) communication system consisting of a base station (BS), a legitimate information receiver (IR) and an eavesdropper (Eve), where the BS is equipped with MAs to enhance the system's physical layer security (PLS). Specifically, we aim to maximize the secrecy rate (SR) by jointly optimizing the transmit prec…
▽ More
This paper considers a movable antenna (MA)-aided secure multiple-input multiple-output (MIMO) communication system consisting of a base station (BS), a legitimate information receiver (IR) and an eavesdropper (Eve), where the BS is equipped with MAs to enhance the system's physical layer security (PLS). Specifically, we aim to maximize the secrecy rate (SR) by jointly optimizing the transmit precoding (TPC) matrix, the artificial noise (AN) covariance matrix and the MAs' positions under the constraints of the maximum transmit power and the minimum distance between MAs. To solve this non-convex problem with highly coupled optimization variables, the block coordinate descent (BCD) method is applied to alternately update the variables. Specifically, we first reformulate the SR into a tractable form by utilizing the minimum mean square error (MMSE) method, and derive the optimal TPC matrix and the AN covariance matrix with fixed MAs' positions by applying the Lagrangian multiplier method in semi-closed forms. Then, the majorization-minimization (MM) algorithm is employed to iteratively optimize each MA's position while keeping others fixed. Finally, simulation results are provided to demonstrate the effectiveness of the proposed algorithms and the significant advantages of the MA-aided system over conventional fixed position antenna (FPA)-based system in enhancing system's security.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Observation of the decay $h_{c}\to3(π^{+}π^{-})π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to…
▽ More
Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to be $\left( {9.28\pm 1.14 \pm 0.77} \right) \times {10^{ - 3}}$, where the first uncertainty is statistical and the second is systematic. In addition, first evidence is found for the modes $h_{c} \to 2(π^{+}π^{-})π^{0}η$ and $h_{c}\to2(π^{+}π^{-})ω$ with significances of 4.8$σ$ and 4.7$σ$, and their branching fractions are determined to be $(7.55\pm1.51\pm0.77)\times10^{-3}$ and $\left( {4.00 \pm 0.86 \pm 0.35}\right) \times {10^{ - 3}}$, respectively. No significant signals of $h_c\to 2(π^+π^-)η$ and $h_{c}\to p\bar{p}$ are observed, and the upper limits of the branching fractions of these decays are determined to be $<6.19\times10^{-4}$ and $<4.40\times10^{-5}$ at the 90% confidence level, respectively.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Dark Dragon Breaks Magnetic Chain: Dynamical Substructures of IRDC G28.34 Form in Supported Environments
Authors:
Junhao Liu,
Qizhou Zhang,
Yuxin Lin,
Keping Qiu,
Patrick M. Koch,
Hauyu Baobab Liu,
Zhi-Yun Li,
Josep Miquel Girart,
Thushara G. S. Pillai,
Shanghuo Li,
Huei-Ru Vivien Chen,
Tao-Chung Ching,
Paul T. P. Ho,
Shih-Ping Lai,
Ramprasad Rao,
Ya-Wen Tang,
Ke Wang
Abstract:
We have comprehensively studied the multi-scale physical properties of the infrared dark cloud (IRDC) G28.34 (the Dragon cloud) with dust polarization and molecular line data from Planck, FCRAO-14m, JCMT, and ALMA. We find that the averaged magnetic fields of clumps tend to be either parallel with or perpendicular to the cloud-scale magnetic fields, while the cores in clump MM4 tend to have magnet…
▽ More
We have comprehensively studied the multi-scale physical properties of the infrared dark cloud (IRDC) G28.34 (the Dragon cloud) with dust polarization and molecular line data from Planck, FCRAO-14m, JCMT, and ALMA. We find that the averaged magnetic fields of clumps tend to be either parallel with or perpendicular to the cloud-scale magnetic fields, while the cores in clump MM4 tend to have magnetic fields aligned with the clump fields. Implementing the relative orientation analysis (for magnetic fields, column density gradients, and local gravity), Velocity Gradient Technique (VGT), and modified Davis-Chandrasekhar-Fermi (DCF) analysis, we find that: G28.34 is located in a trans-to-sub-Alfvénic environment ($\mathcal{M}_{A}=0.74$ within $r=15$ pc); the magnetic field is effectively resisting gravitational collapse in large-scale diffuse gas, but is distorted by gravity within the cloud and affected by star formation activities in high-density regions; and the normalized mass-to-flux ratio tends to increase with increasing density and decreasing radius. Considering the thermal, turbulent, and magnetic supports, we find that the environmental gas of G28.34 is in a super-virial (supported) state, the infrared dark clumps may be in a near-equilibrium state, and core MM4-core4 is in a sub-virial (gravity-dominant) state. In summary, we suggest that magnetic fields dominate gravity and turbulence in the cloud environment at large scales, resulting in relatively slow cloud formation and evolution processes. Within the cloud, gravity could overwhelm magnetic fields and turbulence, allowing local dynamical star formation to happen.
△ Less
Submitted 18 March, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Rethinking Clustered Federated Learning in NOMA Enhanced Wireless Networks
Authors:
Yushen Lin,
Kaidi Wang,
Zhiguo Ding
Abstract:
This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-orthogonal multiple access (NOMA) under non-independent and identically distributed (non-IID) datasets, where multiple devices participate in the aggregation with time limitations and a finite number of sub-channels. A detailed theoretical analysis of the generalization gap that measures…
▽ More
This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-orthogonal multiple access (NOMA) under non-independent and identically distributed (non-IID) datasets, where multiple devices participate in the aggregation with time limitations and a finite number of sub-channels. A detailed theoretical analysis of the generalization gap that measures the degree of non-IID in the data distribution is presented. Following that, solutions to address the challenges posed by non-IID conditions are proposed with the analysis of the properties. Specifically, users' data distributions are parameterized as concentration parameters and grouped using spectral clustering, with Dirichlet distribution serving as the prior. The investigation into the generalization gap and convergence rate guides the design of sub-channel assignments through the matching-based algorithm, and the power allocation is achieved by Karush-Kuhn-Tucker (KKT) conditions with the derived closed-form solution. The extensive simulation results show that the proposed cluster-based FL framework can outperform FL baselines in terms of both test accuracy and convergence rate. Moreover, jointly optimizing sub-channel and power allocation in NOMA-enhanced networks can lead to a significant improvement.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
DynST: Dynamic Sparse Training for Resource-Constrained Spatio-Temporal Forecasting
Authors:
Hao Wu,
Haomin Wen,
Guibin Zhang,
Yutong Xia,
Kai Wang,
Yuxuan Liang,
Yu Zheng,
Kun Wang
Abstract:
The ever-increasing sensor service, though opening a precious path and providing a deluge of earth system data for deep-learning-oriented earth science, sadly introduce a daunting obstacle to their industrial level deployment. Concretely, earth science systems rely heavily on the extensive deployment of sensors, however, the data collection from sensors is constrained by complex geographical and s…
▽ More
The ever-increasing sensor service, though opening a precious path and providing a deluge of earth system data for deep-learning-oriented earth science, sadly introduce a daunting obstacle to their industrial level deployment. Concretely, earth science systems rely heavily on the extensive deployment of sensors, however, the data collection from sensors is constrained by complex geographical and social factors, making it challenging to achieve comprehensive coverage and uniform deployment. To alleviate the obstacle, traditional approaches to sensor deployment utilize specific algorithms to design and deploy sensors. These methods dynamically adjust the activation times of sensors to optimize the detection process across each sub-region. Regrettably, formulating an activation strategy generally based on historical observations and geographic characteristics, which make the methods and resultant models were neither simple nor practical. Worse still, the complex technical design may ultimately lead to a model with weak generalizability. In this paper, we introduce for the first time the concept of spatio-temporal data dynamic sparse training and are committed to adaptively, dynamically filtering important sensor distributions. To our knowledge, this is the first proposal (termed DynST) of an industry-level deployment optimization concept at the data level. However, due to the existence of the temporal dimension, pruning of spatio-temporal data may lead to conflicts at different timestamps. To achieve this goal, we employ dynamic merge technology, along with ingenious dimensional mapping to mitigate potential impacts caused by the temporal aspect. During the training process, DynST utilize iterative pruning and sparse training, repeatedly identifying and dynamically removing sensor perception areas that contribute the least to future predictions.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Observation of $ψ(3686)\to 3φ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (645 additional authors not shown)
Abstract:
Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant str…
▽ More
Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant structure is observed in the $φφ$ invariant mass spectra.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
NeRF-VPT: Learning Novel View Representations with Neural Radiance Fields via View Prompt Tuning
Authors:
Linsheng Chen,
Guangrun Wang,
Liuchun Yuan,
Keze Wang,
Ken Deng,
Philip H. S. Torr
Abstract:
Neural Radiance Fields (NeRF) have garnered remarkable success in novel view synthesis. Nonetheless, the task of generating high-quality images for novel views persists as a critical challenge. While the existing efforts have exhibited commendable progress, capturing intricate details, enhancing textures, and achieving superior Peak Signal-to-Noise Ratio (PSNR) metrics warrant further focused atte…
▽ More
Neural Radiance Fields (NeRF) have garnered remarkable success in novel view synthesis. Nonetheless, the task of generating high-quality images for novel views persists as a critical challenge. While the existing efforts have exhibited commendable progress, capturing intricate details, enhancing textures, and achieving superior Peak Signal-to-Noise Ratio (PSNR) metrics warrant further focused attention and advancement. In this work, we propose NeRF-VPT, an innovative method for novel view synthesis to address these challenges. Our proposed NeRF-VPT employs a cascading view prompt tuning paradigm, wherein RGB information gained from preceding rendering outcomes serves as instructive visual prompts for subsequent rendering stages, with the aspiration that the prior knowledge embedded in the prompts can facilitate the gradual enhancement of rendered image quality. NeRF-VPT only requires sampling RGB data from previous stage renderings as priors at each training stage, without relying on extra guidance or complex techniques. Thus, our NeRF-VPT is plug-and-play and can be readily integrated into existing methods. By conducting comparative analyses of our NeRF-VPT against several NeRF-based approaches on demanding real-scene benchmarks, such as Realistic Synthetic 360, Real Forward-Facing, Replica dataset, and a user-captured dataset, we substantiate that our NeRF-VPT significantly elevates baseline performance and proficiently generates more high-quality novel view images than all the compared state-of-the-art methods. Furthermore, the cascading learning of NeRF-VPT introduces adaptability to scenarios with sparse inputs, resulting in a significant enhancement of accuracy for sparse-view novel view synthesis. The source code and dataset are available at \url{https://github.com/Freedomcls/NeRF-VPT}.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Small, Versatile and Mighty: A Range-View Perception Framework
Authors:
Qiang Meng,
Xiao Wang,
JiaBao Wang,
Liujiang Yan,
Ke Wang
Abstract:
Despite its compactness and information integrity, the range view representation of LiDAR data rarely occurs as the first choice for 3D perception tasks. In this work, we further push the envelop of the range-view representation with a novel multi-task framework, achieving unprecedented 3D detection performances. Our proposed Small, Versatile, and Mighty (SVM) network utilizes a pure convolutional…
▽ More
Despite its compactness and information integrity, the range view representation of LiDAR data rarely occurs as the first choice for 3D perception tasks. In this work, we further push the envelop of the range-view representation with a novel multi-task framework, achieving unprecedented 3D detection performances. Our proposed Small, Versatile, and Mighty (SVM) network utilizes a pure convolutional architecture to fully unleash the efficiency and multi-tasking potentials of the range view representation. To boost detection performances, we first propose a range-view specific Perspective Centric Label Assignment (PCLA) strategy, and a novel View Adaptive Regression (VAR) module to further refine hard-to-predict box properties. In addition, our framework seamlessly integrates semantic segmentation and panoptic segmentation tasks for the LiDAR point cloud, without extra modules. Among range-view-based methods, our model achieves new state-of-the-art detection performances on the Waymo Open Dataset. Especially, over 10 mAP improvement over convolutional counterparts can be obtained on the vehicle class. Our presented results for other tasks further reveal the multi-task capabilities of the proposed small but mighty framework.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Symmetry-breaking normal state response and surface superconductivity in topological semimetal YPtBi
Authors:
Hyunsoo Kim,
Tristin Metz,
Halyna Hodovanets,
Daniel Kraft,
Kefeng Wang,
Yun Suk Eo,
Johnpierre Paglione
Abstract:
Most of the half-Heusler RPtBi compounds (R=rare earth) host various surface states due to spin-orbit coupling driven topological band structure. While recent ARPES measurements ubiquitously reported the existence of surface states in RPtBi, their evidence by other experimental techniques remains elusive. Here we report the angle-dependent magnetic field response of electrical transport properties…
▽ More
Most of the half-Heusler RPtBi compounds (R=rare earth) host various surface states due to spin-orbit coupling driven topological band structure. While recent ARPES measurements ubiquitously reported the existence of surface states in RPtBi, their evidence by other experimental techniques remains elusive. Here we report the angle-dependent magnetic field response of electrical transport properties of YPtBi in both the normal and superconducting states. The angle dependence of both magnetoresistance and the superconducting upper critical field breaks the rotational symmetry of the cubic crystal structure, and the angle between the applied magnetic field and the measurement plane of a plate-like sample prevails. Furthermore, the measured upper critical field is notably higher than the bulk response for an in-plane magnetic field configuration, suggesting the presence of quasi-2D superconductivity. Our work suggests the transport properties cannot be explained solely by the bulk carrier response, requiring robust normal and superconducting surface states to flourish in YPtBi.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes
Authors:
Ziying Pan,
Kun Wang,
Gang Li,
Feihong He,
Yongxuan Lai
Abstract:
The class-conditional image generation based on diffusion models is renowned for generating high-quality and diverse images. However, most prior efforts focus on generating images for general categories, e.g., 1000 classes in ImageNet-1k. A more challenging task, large-scale fine-grained image generation, remains the boundary to explore. In this work, we present a parameter-efficient strategy, cal…
▽ More
The class-conditional image generation based on diffusion models is renowned for generating high-quality and diverse images. However, most prior efforts focus on generating images for general categories, e.g., 1000 classes in ImageNet-1k. A more challenging task, large-scale fine-grained image generation, remains the boundary to explore. In this work, we present a parameter-efficient strategy, called FineDiffusion, to fine-tune large pre-trained diffusion models scaling to large-scale fine-grained image generation with 10,000 categories. FineDiffusion significantly accelerates training and reduces storage overhead by only fine-tuning tiered class embedder, bias terms, and normalization layers' parameters. To further improve the image generation quality of fine-grained categories, we propose a novel sampling method for fine-grained image generation, which utilizes superclass-conditioned guidance, specifically tailored for fine-grained categories, to replace the conventional classifier-free guidance sampling. Compared to full fine-tuning, FineDiffusion achieves a remarkable 1.56x training speed-up and requires storing merely 1.77% of the total model parameters, while achieving state-of-the-art FID of 9.776 on image generation of 10,000 classes. Extensive qualitative and quantitative experiments demonstrate the superiority of our method compared to other parameter-efficient fine-tuning methods. The code and more generated results are available at our project website: https://finediffusion.github.io/.
△ Less
Submitted 3 June, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs
Authors:
Zimu Lu,
Aojun Zhou,
Houxing Ren,
Ke Wang,
Weikang Shi,
Junting Pan,
Mingjie Zhan,
Hongsheng Li
Abstract:
Large language models (LLMs) have exhibited great potential in mathematical reasoning. However, there remains a performance gap in this area between existing open-source models and closed-source models such as GPT-4. In this paper, we introduce MathGenie, a novel method for generating diverse and reliable math problems from a small-scale problem-solution dataset (denoted as seed data). We augment…
▽ More
Large language models (LLMs) have exhibited great potential in mathematical reasoning. However, there remains a performance gap in this area between existing open-source models and closed-source models such as GPT-4. In this paper, we introduce MathGenie, a novel method for generating diverse and reliable math problems from a small-scale problem-solution dataset (denoted as seed data). We augment the ground-truth solutions of our seed data and train a back-translation model to translate the augmented solutions back into new questions. Subsequently, we generate code-integrated solutions for the new questions. To ensure the correctness of the code-integrated solutions, we employ rationale-based strategy for solution verification. Various pretrained models, ranging from 7B to 70B, are trained on the newly curated data to test the effectiveness of the proposed augmentation technique, resulting in a family of models known as MathGenieLM. These models consistently outperform previous open-source models across five representative mathematical reasoning datasets, achieving state-of-the-art performance. In particular, MathGenieLM-InternLM2 achieves an accuracy of 87.7% on GSM8K and 55.7% on MATH, securing the best overall score among open-source language models.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
IR2: Information Regularization for Information Retrieval
Authors:
Jianyou Wang,
Kaicheng Wang,
Xiaoyue Wang,
Weili Cao,
Ramamohan Paturi,
Leon Bergen
Abstract:
Effective information retrieval (IR) in settings with limited training data, particularly for complex queries, remains a challenging task. This paper introduces IR2, Information Regularization for Information Retrieval, a technique for reducing overfitting during synthetic data generation. This approach, representing a novel application of regularization techniques in synthetic data creation for I…
▽ More
Effective information retrieval (IR) in settings with limited training data, particularly for complex queries, remains a challenging task. This paper introduces IR2, Information Regularization for Information Retrieval, a technique for reducing overfitting during synthetic data generation. This approach, representing a novel application of regularization techniques in synthetic data creation for IR, is tested on three recent IR tasks characterized by complex queries: DORIS-MAE, ArguAna, and WhatsThatBook. Experimental results indicate that our regularization techniques not only outperform previous synthetic query generation methods on the tasks considered but also reduce cost by up to 50%. Furthermore, this paper categorizes and explores three regularization methods at different stages of the query synthesis pipeline-input, prompt, and output-each offering varying degrees of performance improvement compared to models where no regularization is applied. This provides a systematic approach for optimizing synthetic data generation in data-limited, complex-query IR scenarios. All code, prompts and synthetic data are available at https://github.com/Info-Regularization/Information-Regularization.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation
Authors:
Yasheng Sun,
Wenqing Chu,
Hang Zhou,
Kaisiyuan Wang,
Hideki Koike
Abstract:
While considerable progress has been made in achieving accurate lip synchronization for 3D speech-driven talking face generation, the task of incorporating expressive facial detail synthesis aligned with the speaker's speaking status remains challenging. Our goal is to directly leverage the inherent style information conveyed by human speech for generating an expressive talking face that aligns wi…
▽ More
While considerable progress has been made in achieving accurate lip synchronization for 3D speech-driven talking face generation, the task of incorporating expressive facial detail synthesis aligned with the speaker's speaking status remains challenging. Our goal is to directly leverage the inherent style information conveyed by human speech for generating an expressive talking face that aligns with the speaking status. In this paper, we propose AVI-Talking, an Audio-Visual Instruction system for expressive Talking face generation. This system harnesses the robust contextual reasoning and hallucination capability offered by Large Language Models (LLMs) to instruct the realistic synthesis of 3D talking faces. Instead of directly learning facial movements from human speech, our two-stage strategy involves the LLMs first comprehending audio information and generating instructions implying expressive facial details seamlessly corresponding to the speech. Subsequently, a diffusion-based generative network executes these instructions. This two-stage process, coupled with the incorporation of LLMs, enhances model interpretability and provides users with flexibility to comprehend instructions and specify desired operations or modifications. Extensive experiments showcase the effectiveness of our approach in producing vivid talking faces with expressive facial movements and consistent emotional status.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.