subscribe to arXiv mailings

Discovery of a Relativistic Stripped Envelope Type Ic-BL Supernova at z = 2.83 with JWST

Authors: M. R. Siebert, C. Decoursey, D. A. Coulter, M. Engesser, J. D. R. Pierel, A. Rest, E. Egami, M. Shahbandeh, W. Chen, O. D. Fox, Y. Zenati, T. J. Moriya, A. J. Bunker, P. A. Cargile, M. Curti, D. J. Eisenstein, S. Gezari, S. Gomez, M. Guolo, B. D. Johnson, B. A. Joshi, M. Karmen, R. Maiolino, R. M. Quimby, B. Robertson , et al. (4 additional authors not shown)

Abstract: We present JWST NIRCam and NIRSpec observations of a Type Ic supernova (SN Ic) and its host galaxy (JADES-GS+53.13533-27.81457) at $z = 2.83$. This SN (named SN 2023adta) was identified in deep James Webb Space Telescope (JWST)/NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) Program. Follow-up observations with JWST/NIRSpec provided a spectroscopic redshift of $z = 2.83$ an… ▽ More We present JWST NIRCam and NIRSpec observations of a Type Ic supernova (SN Ic) and its host galaxy (JADES-GS+53.13533-27.81457) at $z = 2.83$. This SN (named SN 2023adta) was identified in deep James Webb Space Telescope (JWST)/NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) Program. Follow-up observations with JWST/NIRSpec provided a spectroscopic redshift of $z = 2.83$ and the classification as a SN Ic-BL. The light curve of SN 2023adta matches well with other stripped envelope supernovae and we find a high peak luminosity, $M_V = -19.0 \pm 0.2$ mag, based on the distribution of best-fit SNe. The broad absorption features in its spectrum are consistent with other SNe Ic-BL 1-3 weeks after peak brightness. We measure a Ca II NIR triplet expansion velocity of $29{,}000 \pm 2{,}000$ km s$^{-1}$. The host galaxy of SN 2023adta is irregular, and modeling of its spectral energy distribution (SED) indicates a metallicity of $Z = 0.35^{+0.16}_{-0.08} Z_{\odot}$. This environment is consistent with the population of low-$z$ SNe Ic-BL which prefer lower metallicities relative to other stripped envelope supernovae, and track long duration $γ$-ray burst (LGRB) environments. We do not identify any GRBs that are coincident with SN 2023adta. Given the rarity of SNe Ic-BL in the local universe, the detection of a SN Ic-BL at $z = 2.83$ could indicate that their rates are enhanced at high redshift. △ Less

Submitted 10 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

Comments: 16 pages, 7 figures, submitted to ApJL

arXiv:2406.05060 [pdf, other]

The JADES Transient Survey: Discovery and Classification of Supernovae in the JADES Deep Field

Authors: Christa DeCoursey, Eiichi Egami, Justin D. R. Pierel, Fengwu Sun, Armin Rest, David A. Coulter, Michael Engesser, Matthew R. Siebert, Kevin N. Hainline, Benjamin D. Johnson, Andrew J. Bunker, Phillip A. Cargile, Stephane Charlot, Wenlei Chen, Mirko Curti, Shea DeFour-Remy, Daniel J. Eisenstein, Ori D. Fox, Suvi Gezari, Sebastian Gomez, Jacob Jencson, Bhavin A. Joshi, Sanvi Khairnar, Jianwei Lyu, Roberto Maiolino , et al. (13 additional authors not shown)

Abstract: The JWST Advanced Deep Extragalactic Survey (JADES) is a multi-cycle JWST program that has taken among the deepest near-infrared images to date (down to $\sim$30.5 ABmag) over $\sim$25 arcmin$^2$ in the GOODS-S field in two sets of observations with one year of separation. This presented the first opportunity to systematically search for transients, mostly supernovae (SNe), out to $z>$2. We found… ▽ More The JWST Advanced Deep Extragalactic Survey (JADES) is a multi-cycle JWST program that has taken among the deepest near-infrared images to date (down to $\sim$30.5 ABmag) over $\sim$25 arcmin$^2$ in the GOODS-S field in two sets of observations with one year of separation. This presented the first opportunity to systematically search for transients, mostly supernovae (SNe), out to $z>$2. We found 79 SNe: 38 at $z<$2, 23 at 2$<z<$3, 8 at 3$<z<$4, 7 at 4$<z<$5, and 3 with undetermined redshifts, where the redshifts are predominantly based on spectroscopic or highly reliable JADES photometric redshifts of the host galaxies. At this depth, the detection rate is $\sim$1-2 per arcmin$^2$ per year, demonstrating the power of JWST as a supernova discovery machine. We also conducted multi-band follow-up NIRCam observations of a subset of the SNe to better constrain their light curves and classify their types. Here, we present the survey, sample, search parameters, spectral energy distributions (SEDs), light curves, and classifications. Even at $z\geq$2, the NIRCam data quality is such that we can perform multi-epoch light-curve fitting to classify supernovae with a reasonable degree of confidence. The multi-epoch SN sample includes a Type Ia SN at $z_{spec}= $ 2.91, Type IIP SN at $z_{spec}= $ 3.61, and a Type Ic-BL SN at $z_{spec}= $ 2.845. We also found that two $z\sim$16 galaxy candidates from the first imaging epoch were actually transients that faded in the second epoch, illustrating the possibility that moderate/high-redshift SNe could mimic high-redshift dropout galaxies. △ Less

Submitted 10 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

Comments: 43 pages, 15 figures, 15 tables. Submitted to ApJ. Appendix A (64 MB) is available at https://drive.google.com/file/d/1xs5jXUVOvdDPgdghK72KR1FMGvPcK7dv/view?usp=sharing . Appendix B (81 MB) is available at https://drive.google.com/file/d/18ImLT80pQdPzXCZA-KEy21DaE2CQiGz1/view?usp=sharing . References updated

arXiv:2406.04999 [pdf, other]

ProMotion: Prototypes As Motion Learners

Authors: Yawen Lu, Dongfang Liu, Qifan Wang, Cheng Han, Yiming Cui, Zhiwen Cao, Xueling Zhang, Yingjie Victor Chen, Heng Fan

Abstract: In this work, we introduce ProMotion, a unified prototypical framework engineered to model fundamental motion tasks. ProMotion offers a range of compelling attributes that set it apart from current task-specific paradigms. We adopt a prototypical perspective, establishing a unified paradigm that harmonizes disparate motion learning approaches. This novel paradigm streamlines the architectural desi… ▽ More In this work, we introduce ProMotion, a unified prototypical framework engineered to model fundamental motion tasks. ProMotion offers a range of compelling attributes that set it apart from current task-specific paradigms. We adopt a prototypical perspective, establishing a unified paradigm that harmonizes disparate motion learning approaches. This novel paradigm streamlines the architectural design, enabling the simultaneous assimilation of diverse motion information. We capitalize on a dual mechanism involving the feature denoiser and the prototypical learner to decipher the intricacies of motion. This approach effectively circumvents the pitfalls of ambiguity in pixel-wise feature matching, significantly bolstering the robustness of motion representation. We demonstrate a profound degree of transferability across distinct motion patterns. This inherent versatility reverberates robustly across a comprehensive spectrum of both 2D and 3D downstream tasks. Empirical results demonstrate that ProMotion outperforms various well-known specialized architectures, achieving 0.54 and 0.054 Abs Rel error on the Sintel and KITTI depth datasets, 1.04 and 2.01 average endpoint error on the clean and final pass of Sintel flow benchmark, and 4.30 F1-all error on the KITTI flow benchmark. For its efficacy, we hope our work can catalyze a paradigm shift in universal models in computer vision. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 11 pages

arXiv:2406.04821 [pdf, other]

Deep Learning Powered Estimate of The Extrinsic Parameters on Unmanned Surface Vehicles

Authors: Yi Shen, Hao Liu, Chang Zhou, Wentao Wang, Zijun Gao, Qi Wang

Abstract: Unmanned Surface Vehicles (USVs) are pivotal in marine exploration, but their sensors' accuracy is compromised by the dynamic marine environment. Traditional calibration methods fall short in these conditions. This paper introduces a deep learning architecture that predicts changes in the USV's dynamic metacenter and refines sensors' extrinsic parameters in real time using a Time-Sequence General… ▽ More Unmanned Surface Vehicles (USVs) are pivotal in marine exploration, but their sensors' accuracy is compromised by the dynamic marine environment. Traditional calibration methods fall short in these conditions. This paper introduces a deep learning architecture that predicts changes in the USV's dynamic metacenter and refines sensors' extrinsic parameters in real time using a Time-Sequence General Regression Neural Network (GRNN) with Euler angles as input. Simulation data from Unity3D ensures robust training and testing. Experimental results show that the Time-Sequence GRNN achieves the lowest mean squared error (MSE) loss, outperforming traditional neural networks. This method significantly enhances sensor calibration for USVs, promising improved data accuracy in challenging maritime conditions. Future work will refine the network and validate results with real-world data. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: Accepted by The 9th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS 2024)

arXiv:2406.03853 [pdf, other]

Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism

Authors: Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai

Abstract: The recent advancements in large language models (LLMs) have been extraordinary, yet the escalating inference costs associated with them present challenges in real-world applications. To address these challenges, we propose a novel approach called Early-exiting Speculative Decoding (EESD) with lossless acceleration. Specifically, EESD utilizes a segment of the LLM to generate draft tokens, incorpo… ▽ More The recent advancements in large language models (LLMs) have been extraordinary, yet the escalating inference costs associated with them present challenges in real-world applications. To address these challenges, we propose a novel approach called Early-exiting Speculative Decoding (EESD) with lossless acceleration. Specifically, EESD utilizes a segment of the LLM to generate draft tokens, incorporating Early-exiting structures after the first N layers. To enhance the quality of draft tokens, a self-distillation method is integrated. This early-exiting design not only reduces deployment and training costs but also significantly accelerates the token generation speed. Moreover, we introduce a novel sampling mechanism that leverages Thompson Sampling to regulate the generation processes, automatically determining the quantity of draft tokens in each round. The original LLM is then employed to validate these draft tokens through a single forward pass, and thus guarantees that the final output text maintains a distribution consistent with vanilla auto-regressive decoding. The experimental results on both 13B and 70B models demonstrate that our approach decodes tokens at a markedly accelerated rate compared to prior methods, showing the effectiveness of our approach. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Accepted by ACL 2024 (Findings)

arXiv:2406.03803 [pdf, ps, other]

Determining the Weight Spectrum of the Reed--Muller Codes RM(m-6,m)

Authors: Yueying Lou, Qichun Wang

Abstract: The weight spectra of the Reed-Muller codes $RM(r,m)$ were unknown for $r=3,...,m-5$. In IEEE Trans. Inform. Theory 2024, Carlet determined the weight spectrum of $RM(m-5,m)$ for $m\ge10$ using the Maiorana-McFarland construction, where the result was tried to be extended to $RM(m-6,m)$, but many problems occurred and much work needed to be done. In this paper, we propose a novel way of constructi… ▽ More The weight spectra of the Reed-Muller codes $RM(r,m)$ were unknown for $r=3,...,m-5$. In IEEE Trans. Inform. Theory 2024, Carlet determined the weight spectrum of $RM(m-5,m)$ for $m\ge10$ using the Maiorana-McFarland construction, where the result was tried to be extended to $RM(m-6,m)$, but many problems occurred and much work needed to be done. In this paper, we propose a novel way of constructing Reed--Muller codewords and determine the weight spectrum of $RM(m-6,m)$ for $m\ge12$, which gives a positive answer to an open question on the weight spectrum of $RM(m-c,m)$ for $c=6$. Moreover, we put forward a conjecture and verify it for some cases. If the conjecture is true, then that open question can be completely solved. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03250 [pdf, other]

Prompt-based Visual Alignment for Zero-shot Policy Transfer

Authors: Haihan Gao, Rui Zhang, Qi Yi, Hantao Yao, Haochen Li, Jiaming Guo, Shaohui Peng, Yunkai Gao, QiCheng Wang, Xing Hu, Yuanbo Wen, Zihao Zhang, Zidong Du, Ling Li, Qi Guo, Yunji Chen

Abstract: Overfitting in RL has become one of the main obstacles to applications in reinforcement learning(RL). Existing methods do not provide explicit semantic constrain for the feature extractor, hindering the agent from learning a unified cross-domain representation and resulting in performance degradation on unseen domains. Besides, abundant data from multiple domains are needed. To address these issue… ▽ More Overfitting in RL has become one of the main obstacles to applications in reinforcement learning(RL). Existing methods do not provide explicit semantic constrain for the feature extractor, hindering the agent from learning a unified cross-domain representation and resulting in performance degradation on unseen domains. Besides, abundant data from multiple domains are needed. To address these issues, in this work, we propose prompt-based visual alignment (PVA), a robust framework to mitigate the detrimental domain bias in the image for zero-shot policy transfer. Inspired that Visual-Language Model (VLM) can serve as a bridge to connect both text space and image space, we leverage the semantic information contained in a text sequence as an explicit constraint to train a visual aligner. Thus, the visual aligner can map images from multiple domains to a unified domain and achieve good generalization performance. To better depict semantic information, prompt tuning is applied to learn a sequence of learnable tokens. With explicit constraints of semantic information, PVA can learn unified cross-domain representation under limited access to cross-domain data and achieves great zero-shot generalization ability in unseen domains. We verify PVA on a vision-based autonomous driving task with CARLA simulator. Experiments show that the agent generalizes well on unseen domains under limited access to multi-domain data. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: This paper has been accepted by ICML2024

arXiv:2406.03081 [pdf, other]

A Quantum Neural Network-Based Approach to Power Quality Disturbances Detection and Recognition

Authors: Guo-Dong Li, Hai-Yan He, Yue Li, Xin-Hao Li, Hao Liu, Qing-Le Wang, Long Cheng

Abstract: Power quality disturbances (PQDs) significantly impact the stability and reliability of power systems, necessitating accurate and efficient detection and recognition methods. While numerous classical algorithms for PQDs detection and recognition have been extensively studied and applied, related work in the quantum domain is still in its infancy. In this paper, an improved quantum neural networks… ▽ More Power quality disturbances (PQDs) significantly impact the stability and reliability of power systems, necessitating accurate and efficient detection and recognition methods. While numerous classical algorithms for PQDs detection and recognition have been extensively studied and applied, related work in the quantum domain is still in its infancy. In this paper, an improved quantum neural networks (QNN) model for PQDs detection and recognition is proposed. Specifically, the model constructs a quantum circuit comprising data qubits and ancilla qubits. Classical data is transformed into quantum data by embedding it into data qubits via the encoding layer. Subsequently, parametric quantum gates are utilized to form the variational layer, which facilitates qubit information transformation, thereby extracting essential feature information for detection and recognition. The expected value is obtained by measuring ancilla qubits, enabling the completion of disturbance classification based on this expected value. An analysis reveals that the runtime and space complexities of the QNN are $O\left ( poly\left ( N \right ) \right )$ and $O\left ( N \right )$, respectively. Extensive experiments validate the feasibility and superiority of the proposed model in PQD detection and recognition. The model achieves accuracies of 99.75\%, 97.85\% and 95.5\% in experiments involving the detection of disturbances, recognition of seven single disturbances, and recognition of ten mixed disturbances, respectively. Additionally, noise simulation and comparative experiments demonstrate that the proposed model exhibits robust anti-noise capabilities, requires few training parameters, and maintains high accuracy. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.02931 [pdf, other]

Measurements of the branching fractions of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^-π^0/η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for… ▽ More Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for $h_c \to K^+ K^- π^0$ and $h_c \to K^+ K^- η$ are found with significances of $3.5σ$ and $3.3σ$, respectively, after considering the systematic uncertainties. The branching fractions of these decays are measured to be $\mathcal{B}(h_c \to π^+ π^- π^0)=(1.36\pm0.16\pm0.14)\times10^{-3}$, $\mathcal{B}(h_c \to K^+ K^- π^0)=(3.26\pm0.84\pm0.36)\times10^{-4}$, and $\mathcal{B}(h_c \to K^+ K^- η)=(3.13\pm1.08\pm0.38)\times10^{-4}$, where the first uncertainties are statistical and the second are systematic. No significant signal of $h_c\toπ^+π^-η$ is found, and the upper limit of its decay branching fraction is determined to be $\mathcal{B}(h_c\toπ^+π^-η) < 4.0 \times 10^{-4}$ at 90% confidence level. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 9 pages, 7 figures

arXiv:2406.02924 [pdf, other]

Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models

Authors: Peijie Dong, Lujun Li, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu

Abstract: Despite the remarkable capabilities, Large Language Models (LLMs) face deployment challenges due to their extensive size. Pruning methods drop a subset of weights to accelerate, but many of them require retraining, which is prohibitively expensive and computationally demanding. Recently, post-training pruning approaches introduced novel metrics, enabling the pruning of LLMs without retraining. How… ▽ More Despite the remarkable capabilities, Large Language Models (LLMs) face deployment challenges due to their extensive size. Pruning methods drop a subset of weights to accelerate, but many of them require retraining, which is prohibitively expensive and computationally demanding. Recently, post-training pruning approaches introduced novel metrics, enabling the pruning of LLMs without retraining. However, these metrics require the involvement of human experts and tedious trial and error. To efficiently identify superior pruning metrics, we develop an automatic framework for searching symbolic pruning metrics using genetic programming. In particular, we devise an elaborate search space encompassing the existing pruning metrics to discover the potential symbolic pruning metric. We propose an opposing operation simplification strategy to increase the diversity of the population. In this way, Pruner-Zero allows auto-generation of symbolic pruning metrics. Based on the searched results, we explore the correlation between pruning metrics and performance after pruning and summarize some principles. Extensive experiments on LLaMA and LLaMA-2 on language modeling and zero-shot tasks demonstrate that our Pruner-Zero obtains superior performance than SOTA post-training pruning methods. Code at: \url{https://github.com/pprp/Pruner-Zero}. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted by ICML2024, 29 pages, 4 figures

arXiv:2406.02911 [pdf, other]

Improving In-Context Learning with Prediction Feedback for Sentiment Analysis

Authors: Hongling Xu, Qianlong Wang, Yice Zhang, Min Yang, Xi Zeng, Bing Qin, Ruifeng Xu

Abstract: Large language models (LLMs) have achieved promising results in sentiment analysis through the in-context learning (ICL) paradigm. However, their ability to distinguish subtle sentiments still remains a challenge. Inspired by the human ability to adjust understanding via feedback, this paper enhances ICL by incorporating prior predictions and feedback, aiming to rectify sentiment misinterpretation… ▽ More Large language models (LLMs) have achieved promising results in sentiment analysis through the in-context learning (ICL) paradigm. However, their ability to distinguish subtle sentiments still remains a challenge. Inspired by the human ability to adjust understanding via feedback, this paper enhances ICL by incorporating prior predictions and feedback, aiming to rectify sentiment misinterpretation of LLMs. Specifically, the proposed framework consists of three steps: (1) acquiring prior predictions of LLMs, (2) devising predictive feedback based on correctness, and (3) leveraging a feedback-driven prompt to refine sentiment understanding. Experimental results across nine sentiment analysis datasets demonstrate the superiority of our framework over conventional ICL methods, with an average F1 improvement of 5.95%. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted by ACL 2024 (Findings)

arXiv:2406.02888 [pdf, other]

HYDRA: Model Factorization Framework for Black-Box LLM Personalization

Authors: Yuchen Zhuang, Haotian Sun, Yue Yu, Rushi Qiang, Qifan Wang, Chao Zhang, Bo Dai

Abstract: Personalization has emerged as a critical research area in modern intelligent systems, focusing on mining users' behavioral history and adapting to their preferences for delivering tailored experiences. Despite the remarkable few-shot capabilities exhibited by black-box large language models (LLMs), the inherent opacity of their model parameters presents significant challenges in aligning the gene… ▽ More Personalization has emerged as a critical research area in modern intelligent systems, focusing on mining users' behavioral history and adapting to their preferences for delivering tailored experiences. Despite the remarkable few-shot capabilities exhibited by black-box large language models (LLMs), the inherent opacity of their model parameters presents significant challenges in aligning the generated output with individual expectations. Existing solutions have primarily focused on prompt design to incorporate user-specific profiles and behaviors; however, such approaches often struggle to generalize effectively due to their inability to capture shared knowledge among all users. To address these challenges, we propose HYDRA, a model factorization framework that captures both user-specific behavior patterns from historical data and shared general knowledge among all users to deliver personalized generation. In order to capture user-specific behavior patterns, we first train a reranker to prioritize the most useful information from top-retrieved relevant historical records. By combining the prioritized history with the corresponding query, we train an adapter to align the output with individual user-specific preferences, eliminating the reliance on access to inherent model parameters of black-box LLMs. Both the reranker and the adapter can be decomposed into a base model with multiple user-specific heads, resembling a hydra. The base model maintains shared knowledge across users, while the multiple personal heads capture user-specific preferences. Experimental results demonstrate that HYDRA outperforms existing state-of-the-art prompt-based methods by an average relative improvement of 9.01% across five diverse personalization tasks in the LaMP benchmark. Our implementation is available at https://github.com/night-chen/HYDRA. △ Less

Submitted 10 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: 24 pages, 6 figures, work in progress

arXiv:2406.02856 [pdf, other]

Xmodel-LM Technical Report

Authors: Yichuan Wang, Yang Liu, Yu Yan, Qun Wang, Xucheng Huang, Ling Jiang

Abstract: We introduce Xmodel-LM, a compact and efficient 1.1B language model pre-trained on around 2 trillion tokens. Trained on our self-built dataset (Xdata), which balances Chinese and English corpora based on downstream task optimization, Xmodel-LM exhibits remarkable performance despite its smaller size. It notably surpasses existing open-source language models of similar scale. Our model checkpoints… ▽ More We introduce Xmodel-LM, a compact and efficient 1.1B language model pre-trained on around 2 trillion tokens. Trained on our self-built dataset (Xdata), which balances Chinese and English corpora based on downstream task optimization, Xmodel-LM exhibits remarkable performance despite its smaller size. It notably surpasses existing open-source language models of similar scale. Our model checkpoints and code are publicly accessible on GitHub at https://github.com/XiaoduoAILab/XmodelLM. △ Less

Submitted 26 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.02578 [pdf, other]

Pretrained Mobility Transformer: A Foundation Model for Human Mobility

Authors: Xinhua Wu, Haoyu He, Yanchao Wang, Qi Wang

Abstract: Ubiquitous mobile devices are generating vast amounts of location-based service data that reveal how individuals navigate and utilize urban spaces in detail. In this study, we utilize these extensive, unlabeled sequences of user trajectories to develop a foundation model for understanding urban space and human mobility. We introduce the \textbf{P}retrained \textbf{M}obility \textbf{T}ransformer (P… ▽ More Ubiquitous mobile devices are generating vast amounts of location-based service data that reveal how individuals navigate and utilize urban spaces in detail. In this study, we utilize these extensive, unlabeled sequences of user trajectories to develop a foundation model for understanding urban space and human mobility. We introduce the \textbf{P}retrained \textbf{M}obility \textbf{T}ransformer (PMT), which leverages the transformer architecture to process user trajectories in an autoregressive manner, converting geographical areas into tokens and embedding spatial and temporal information within these representations. Experiments conducted in three U.S. metropolitan areas over a two-month period demonstrate PMT's ability to capture underlying geographic and socio-demographic characteristics of regions. The proposed PMT excels across various downstream tasks, including next-location prediction, trajectory imputation, and trajectory generation. These results support PMT's capability and effectiveness in decoding complex patterns of human mobility, offering new insights into urban spatial functionality and individual mobility preferences. △ Less

Submitted 28 May, 2024; originally announced June 2024.

arXiv:2406.02461 [pdf, other]

RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting

Authors: Qi Wang, Ruijie Lu, Xudong Xu, Jingbo Wang, Michael Yu Wang, Bo Dai, Gang Zeng, Dan Xu

Abstract: The advancement of diffusion models has pushed the boundary of text-to-3D object generation. While it is straightforward to composite objects into a scene with reasonable geometry, it is nontrivial to texture such a scene perfectly due to style inconsistency and occlusions between objects. To tackle these problems, we propose a coarse-to-fine 3D scene texturing framework, referred to as RoomTex, t… ▽ More The advancement of diffusion models has pushed the boundary of text-to-3D object generation. While it is straightforward to composite objects into a scene with reasonable geometry, it is nontrivial to texture such a scene perfectly due to style inconsistency and occlusions between objects. To tackle these problems, we propose a coarse-to-fine 3D scene texturing framework, referred to as RoomTex, to generate high-fidelity and style-consistent textures for untextured compositional scene meshes. In the coarse stage, RoomTex first unwraps the scene mesh to a panoramic depth map and leverages ControlNet to generate a room panorama, which is regarded as the coarse reference to ensure the global texture consistency. In the fine stage, based on the panoramic image and perspective depth maps, RoomTex will refine and texture every single object in the room iteratively along a series of selected camera views, until this object is completely painted. Moreover, we propose to maintain superior alignment between RGB and depth spaces via subtle edge detection methods. Extensive experiments show our method is capable of generating high-quality and diverse room textures, and more importantly, supporting interactive fine-grained texture control and flexible scene editing thanks to our inpainting-based framework and compositional mesh input. Our project page is available at https://qwang666.github.io/RoomTex/. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.02234 [pdf, other]

On the Limitations of Fractal Dimension as a Measure of Generalization

Authors: Charlie Tan, Inés García-Redondo, Qiquan Wang, Michael M. Bronstein, Anthea Monod

Abstract: Bounding and predicting the generalization gap of overparameterized neural networks remains a central open problem in theoretical machine learning. Neural network optimization trajectories have been proposed to possess fractal structure, leading to bounds and generalization measures based on notions of fractal dimension on these trajectories. Prominently, both the Hausdorff dimension and the persi… ▽ More Bounding and predicting the generalization gap of overparameterized neural networks remains a central open problem in theoretical machine learning. Neural network optimization trajectories have been proposed to possess fractal structure, leading to bounds and generalization measures based on notions of fractal dimension on these trajectories. Prominently, both the Hausdorff dimension and the persistent homology dimension have been proposed to correlate with generalization gap, thus serving as a measure of generalization. This work performs an extended evaluation of these topological generalization measures. We demonstrate that fractal dimension fails to predict generalization of models trained from poor initializations. We further identify that the $\ell^2$ norm of the final parameter iterate, one of the simplest complexity measures in learning theory, correlates more strongly with the generalization gap than these notions of fractal dimension. Finally, our study reveals the intriguing manifestation of model-wise double descent in persistent homology-based generalization measures. This work lays the ground for a deeper investigation of the causal relationships between fractal geometry, topological data analysis, and neural network optimization. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 17 pages, 6 figures

arXiv:2406.02092 [pdf, other]

MaskSR: Masked Language Model for Full-band Speech Restoration

Authors: Xu Li, Qirui Wang, Xiaoyu Liu

Abstract: Speech restoration aims at restoring high quality speech in the presence of a diverse set of distortions. Although several deep learning paradigms have been studied for this task, the power of the recently emerging language models has not been fully explored. In this paper, we propose MaskSR, a masked language model capable of restoring full-band 44.1 kHz speech jointly considering noise, reverb,… ▽ More Speech restoration aims at restoring high quality speech in the presence of a diverse set of distortions. Although several deep learning paradigms have been studied for this task, the power of the recently emerging language models has not been fully explored. In this paper, we propose MaskSR, a masked language model capable of restoring full-band 44.1 kHz speech jointly considering noise, reverb, clipping, and low bandwidth. MaskSR works with discrete acoustic tokens extracted using a pre-trained neural codec. During training, MaskSR is optimized to predict randomly masked tokens extracted from the high quality target speech, conditioned on the corrupted speech with various distortions. During inference, MaskSR reconstructs the target speech tokens with efficient iterative sampling. Extensive experiments show that MaskSR obtains competitive results on both the full-band speech restoration task and also on sub-tasks compared with a wide range of models. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH 2024. Demo page: https://masksr.github.io/MaskSR/

arXiv:2406.01830 [pdf, ps, other]

Affine vertex operator superalgebra $L_{\widehat{osp(1|2)}}(\mathcal{l},0)$ at admissible level

Authors: Huaimin Li, Qing Wang

Abstract: Let $L_{\widehat{osp(1|2)}}(\mathcal{l},0)$ be the simple affine vertex operator superalgebra with admissible level $\mathcal{l}$. We prove that the category of weak $L_{\widehat{osp(1|2)}}(\mathcal{l},0)$-modules on which the positive part of $\widehat{osp(1|2)}$ acts locally nilpotent is semisimple. Then we prove that $\mathbb{Q}$-graded vertex operator superalgebras… ▽ More Let $L_{\widehat{osp(1|2)}}(\mathcal{l},0)$ be the simple affine vertex operator superalgebra with admissible level $\mathcal{l}$. We prove that the category of weak $L_{\widehat{osp(1|2)}}(\mathcal{l},0)$-modules on which the positive part of $\widehat{osp(1|2)}$ acts locally nilpotent is semisimple. Then we prove that $\mathbb{Q}$-graded vertex operator superalgebras $(L_{\widehat{osp(1|2)}}(\mathcal{l},0),ω_ξ)$ with new Virasoro elements $ω_ξ$ are rational and the irreducible modules are exactly the admissible modules for $\widehat{osp(1|2)}$, where $0<ξ<1$ is a rational number. Furthermore, we determine the Zhu's algebras $A(L_{\widehat{osp(1|2)}}(\mathcal{l},0))$ and their bimodules $A(L(\mathcal{l},\mathcal{j}))$ for $(L_{\widehat{osp(1|2)}}(\mathcal{l},0),ω_ξ)$, where $\mathcal{j}$ is the admissible weight. As an application, we calculate the fusion rules among the irreducible ordinary modules of $(L_{\widehat{osp(1|2)}}(\mathcal{l},0),ω_ξ)$. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 25 pages

arXiv:2406.01605 [pdf, other]

An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

Authors: Zijun Gao, Qi Wang, Taiyuan Mei, Xiaohan Cheng, Yun Zi, Haowei Yang

Abstract: The traditional SegNet architecture commonly encounters significant information loss during the sampling process, which detrimentally affects its accuracy in image semantic segmentation tasks. To counter this challenge, we introduce an innovative encoder-decoder network structure enhanced with residual connections. Our approach employs a multi-residual connection strategy designed to preserve the… ▽ More The traditional SegNet architecture commonly encounters significant information loss during the sampling process, which detrimentally affects its accuracy in image semantic segmentation tasks. To counter this challenge, we introduce an innovative encoder-decoder network structure enhanced with residual connections. Our approach employs a multi-residual connection strategy designed to preserve the intricate details across various image scales more effectively, thus minimizing the information loss inherent to down-sampling procedures. Additionally, to enhance the convergence rate of network training and mitigate sample imbalance issues, we have devised a modified cross-entropy loss function incorporating a balancing factor. This modification optimizes the distribution between positive and negative samples, thus improving the efficiency of model training. Experimental evaluations of our model demonstrate a substantial reduction in information loss and improved accuracy in semantic segmentation. Notably, our proposed network architecture demonstrates a substantial improvement in the finely annotated mean Intersection over Union (mIoU) on the dataset compared to the conventional SegNet. The proposed network structure not only reduces operational costs by decreasing manual inspection needs but also scales up the deployment of AI-driven image analysis across different sectors. △ Less

Submitted 26 May, 2024; originally announced June 2024.

arXiv:2406.01559 [pdf, other]

Prototypical Transformer as Unified Motion Learners

Authors: Cheng Han, Yawen Lu, Guohao Sun, James C. Liang, Zhiwen Cao, Qifan Wang, Qiang Guan, Sohail A. Dianat, Raghuveer M. Rao, Tong Geng, Zhiqiang Tao, Dongfang Liu

Abstract: In this work, we introduce the Prototypical Transformer (ProtoFormer), a general and unified framework that approaches various motion tasks from a prototype perspective. ProtoFormer seamlessly integrates prototype learning with Transformer by thoughtfully considering motion dynamics, introducing two innovative designs. First, Cross-Attention Prototyping discovers prototypes based on signature moti… ▽ More In this work, we introduce the Prototypical Transformer (ProtoFormer), a general and unified framework that approaches various motion tasks from a prototype perspective. ProtoFormer seamlessly integrates prototype learning with Transformer by thoughtfully considering motion dynamics, introducing two innovative designs. First, Cross-Attention Prototyping discovers prototypes based on signature motion patterns, providing transparency in understanding motion scenes. Second, Latent Synchronization guides feature representation learning via prototypes, effectively mitigating the problem of motion uncertainty. Empirical results demonstrate that our approach achieves competitive performance on popular motion tasks such as optical flow and scene depth. Furthermore, it exhibits generality across various downstream tasks, including object tracking and video stabilization. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 21 pages, 10 figures

arXiv:2406.01332 [pdf, ps, other]

Measurements of the branching fractions of semileptonic $D^{+}_s$ decays via $e^+e^-\to D_s^{*+}D_s^{*-}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: We measure the absolute branching fractions of semileptonic $D^+_s$ decays via the $e^+e^-\to D_s^{*+}D_s^{*-}$ process using $e^+e^-$ collision data corresponding to an integrated luminosity of $10.64~\mathrm{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies between 4.237 and 4.699 GeV. The branching fractions are… ▽ More We measure the absolute branching fractions of semileptonic $D^+_s$ decays via the $e^+e^-\to D_s^{*+}D_s^{*-}$ process using $e^+e^-$ collision data corresponding to an integrated luminosity of $10.64~\mathrm{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies between 4.237 and 4.699 GeV. The branching fractions are ${\mathcal B}(D_s^+\to ηe^+ν_e)=(2.35\pm0.11_{\rm stat}\pm 0.10_{\rm syst})\%,$ ${\mathcal B}(D_s^+\to η^\prime e^+ν_e)=(0.82\pm0.09_{\rm stat}\pm 0.04_{\rm syst})\%,$ ${\mathcal B}(D_s^+\to φe^+ν_e)=(2.21\pm0.16_{\rm stat}\pm 0.11_{\rm syst})\%,$ ${\mathcal B}(D_s^+\to f_0(980) e^+ν_e,f_0(980)\toπ^+π^-)=(0.15\pm0.02_{\rm stat}\pm 0.01_{\rm syst})\%,$ ${\mathcal B}(D_s^+\to K^0 e^+ν_e)=(0.24\pm0.04_{\rm stat}\pm 0.01_{\rm syst})\%,$ and ${\mathcal B}(D_s^+\to K^{*0} e^+ν_e)=(0.19\pm0.03_{\rm stat}\pm 0.01_{\rm syst})\%.$ These results are consistent with those measured via the $e^+e^-\to D_s^{*\pm}D_s^{\mp}$ process by BESIII and CLEO. The hadronic transition form factors $D^+_s\to ηe^+ν_e$, $D^+_s\to η^\prime e^+ν_e$, and $D^+_s\to K^0 e^+ν_e$ at four-momentum transfer squared $q^2$ = 0 are determined to be $f^η_+(0) = 0.482 \pm 0.011_{\rm stat} \pm 0.009_{\rm syst}\pm0.004_{\rm input},$ $f^{η^{\prime}}_+(0) = 0.562 \pm 0.031_{\rm stat} \pm 0.014_{\rm syst}\pm0.003_{\rm input},$ and $f^{K^0}_+(0) = 0.624 \pm 0.052_{\rm stat} \pm 0.013_{\rm syst}\pm0.002_{\rm input}.$ △ Less

Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: 14 pages, 3 figures

arXiv:2406.01304 [pdf, other]

CodeR: Issue Resolving with Multi-Agent and Task Graphs

Authors: Dong Chen, Shaoxin Lin, Muhan Zeng, Daoguang Zan, Jian-Gang Wang, Anton Cheshkov, Jun Sun, Hao Yu, Guoliang Dong, Artem Aliev, Jie Wang, Xiao Cheng, Guangtai Liang, Yuchi Ma, Pan Bian, Tao Xie, Qianxiang Wang

Abstract: GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issue… ▽ More GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issues, when submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction. △ Less

Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: https://github.com/NL2Code/CodeR

arXiv:2406.00813 [pdf, other]

A Thermodynamically Consistent Model for Yield Stress Fluids

Authors: Nan Jiang, Qi Wang

Abstract: In this study, we formulate a thermodynamically consistent rheological model for yield stress fluids by introducing an internal dynamic variable and extending the framework established by Kamani et al (2021) and the classical Oldroyd-B model. The dynamics of the internal variable capture the material's transient response to changes in deformation, characterized by an effective relaxation time, ela… ▽ More In this study, we formulate a thermodynamically consistent rheological model for yield stress fluids by introducing an internal dynamic variable and extending the framework established by Kamani et al (2021) and the classical Oldroyd-B model. The dynamics of the internal variable capture the material's transient response to changes in deformation, characterized by an effective relaxation time, elastic modulus, and viscosity. To assess the model's validity and range of applicability, we compare it with the recently developed Kamani-Donley-Rogers (KDR) model in terms of various material and rheometric functions, highlighting both divergences and parallels between the two models. Our numerical results on a host of material functions and rheological parameters illustrate the practical applicability and advantages of the new thermodynamically consistent model over the KDR model. Specifically, the new model complies with the second law of thermodynamics and can describe a broader range of rheological properties of yield stress fluids. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00726 [pdf, other]

Versatile Braiding of Non-Hermitian Topological Edge States

Authors: Bofeng Zhu, Qiang Wang, You Wang, Qi Jie Wang, Y. D. Chong

Abstract: Among the most intriguing features of non-Hermitian (NH) systems is the capability of their complex energies to form braids under parametric variation. Several braiding behaviors, including link and knot formation, have been observed in experiments on synthetic NH systems such as looped optical fibers. Though the exact conditions for these phenomena remain unsettled, existing demonstrations have i… ▽ More Among the most intriguing features of non-Hermitian (NH) systems is the capability of their complex energies to form braids under parametric variation. Several braiding behaviors, including link and knot formation, have been observed in experiments on synthetic NH systems such as looped optical fibers. Though the exact conditions for these phenomena remain unsettled, existing demonstrations have involved long-range nonreciprocal hoppings, which are hard to implement on many experimental platforms. Here, we introduce a route to realize complex energy braids using 1D NH Aubry-Andre-Harper lattices. Under purely local gain and loss modulation, the eigenstates exhibit a variety of different braiding behaviors, including unknots, Hopf links, trefoil knots, Solomon links and catenanes. We show how these are created by the interplay between non-Hermiticity and the lattice bulk states and topological edge states. The transitions between different braiding configurations are marked by changes in the global Berry phase of the NH lattice. These findings reveal interesting connections between the braiding of complex energies and NH band topology. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00644 [pdf, other]

Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance

Authors: Jun Li, Tongkun Su, Baoliang Zhao, Faqin Lv, Qiong Wang, Nassir Navab, Ying Hu, Zhongliang Jiang

Abstract: Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation proces… ▽ More Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation process. Our framework incorporates unsupervised learning methods to extract potential knowledge from ultrasound text reports, serving as the prior information to guide the model in aligning visual and textual features, thereby addressing the challenge of feature discrepancy. Additionally, we design a global semantic comparison mechanism to enhance the performance of generating more comprehensive and accurate medical reports. To enable the implementation of ultrasound report generation, we constructed three large-scale ultrasound image-text datasets from different organs for training and validation purposes. Extensive evaluations with other state-of-the-art approaches exhibit its superior performance across all three datasets. Code and dataset are valuable at this link. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00417 [pdf, other]

Construct ideal cotorsion pairs by recollement of triangulated categories

Authors: Qikai Wang, Haiyan Zhu

Abstract: Let $(\mathcal{T}',\mathcal{T},\mathcal{T}'')$ be a recollement of triangulated categories. Two complete ideal cotorsion pairs in $\mathcal{T}'$ and $\mathcal{T}''$ can be induced by a complete ideal cotorsion pair in $\mathcal{T}$. If $(\mathcal{I},\mathcal{I}^\perp )$ and $(\mathcal{J},\mathcal{J}^\perp)$ are two complete ideal cotorsion pair in triangulated category, then… ▽ More Let $(\mathcal{T}',\mathcal{T},\mathcal{T}'')$ be a recollement of triangulated categories. Two complete ideal cotorsion pairs in $\mathcal{T}'$ and $\mathcal{T}''$ can be induced by a complete ideal cotorsion pair in $\mathcal{T}$. If $(\mathcal{I},\mathcal{I}^\perp )$ and $(\mathcal{J},\mathcal{J}^\perp)$ are two complete ideal cotorsion pair in triangulated category, then $(\mathcal{I}\cap\mathcal{J},\langle\mathcal{I}^\perp,\mathcal{J}^\perp\rangle)$ is also a complete ideal cotorsion pair. In this way, a series of ideal cotorsion pairs in $\mathcal{T}$ can be induced by two ideal cotorsion pairs in $\mathcal{T}'$ and $\mathcal{T}''$. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:1501.06810 by other authors

MSC Class: 18E40; 18G80

arXiv:2406.00330 [pdf, other]

Magnetic ground state of monolayer CeI$_{2}$: occupation matrix control and DFT+U calculations

Authors: Yue-Fei Hou, Shujing Li, Xinlong Yang, Wei Jiang, Qiuhao Wang, Fawei Zheng, Zhen-Guo Fu, Ping Zhang

Abstract: The magnetic ground state is crucial for the applications of the two-dimension magnets as it decides fundamental magnetic properties of the material, such as magnetic order, magnetic transition temperature, and low-energy excitation of the spin waves. However, the simulations for magnetism of local-electron systems are challenging due to the existence of metastable states. In this study, occupatio… ▽ More The magnetic ground state is crucial for the applications of the two-dimension magnets as it decides fundamental magnetic properties of the material, such as magnetic order, magnetic transition temperature, and low-energy excitation of the spin waves. However, the simulations for magnetism of local-electron systems are challenging due to the existence of metastable states. In this study, occupation matrix control (OMC) and density functional theory plus Hubbard $U$ calculations are applied to investigate the magnetic ground state of monolayer CeI$_{2}$. Following the predicted ferrimagnetic (FM) order, the FM ground state and the FM metastable states are identified and found to have different values of the magnetic parameters. Based on the calculated magnetic parameters of the FM ground state, the Curie temperature is estimated to be $128$ K for monolayer CeI$_{2}$. When spin-orbit coupling (SOC) is considered,the FM ground state is further confirmed to contain both off-plane and in-plane components of magnetization. SOC is shown to be essential for reasonably describing not only magnetic anisotropy but also local electronic orbital state of monolayer CeI$_{2}$. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 4 figures. Comments are welcome

arXiv:2406.00242 [pdf, other]

Observational test for $f(Q)$ gravity with weak gravitational lensing

Authors: Qingqing Wang, Xin Ren, Yi-Fu Cai, Wentao Luo, Emmanuel N. Saridakis

Abstract: In this article we confront a class of $f(Q)$ gravity models with observational data of galaxy-galaxy lensing. Specifically, we consider the $f(Q)$ gravity models containing a small quadratic correction when compared with General Relativity (GR), and quantify this correction by a model parameter $α$. To derive the observational constraints, we start by extracting the spherically symmetric solution… ▽ More In this article we confront a class of $f(Q)$ gravity models with observational data of galaxy-galaxy lensing. Specifically, we consider the $f(Q)$ gravity models containing a small quadratic correction when compared with General Relativity (GR), and quantify this correction by a model parameter $α$. To derive the observational constraints, we start by extracting the spherically symmetric solutions which correspond to the deviations from the Schwarzschild solution that depends on the model parameter in a two-fold way, i.e., a renormalized mass and a new term proportional to $r^{-2}$. Then, we calculate the effective lensing potential, the deflection angle, the shear component, and the effective Excess Surface Density (ESD) profile. After that, we employ the group catalog and shape catalog from the SDSS DR7 for the lens and source samples respectively. Moreover, we handle the off-center radius as a free parameter and constrain it using the MCMC. Concerning the deviation parameter from GR we derive $α=1.202^{+0.277}_{-0.179}\times 10^{-6} {\rm Mpc}^{-2}$ at 1 $σ$ confidence level, and then compare the fitting efficiency with the standard $Λ$CDM paradigm by applying the AIC and BIC information criteria. Our results indicate that the $f(Q)$ corrections alongside off-center effects yield a scenario that is slightly favored. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 12pages,2figures

arXiv:2406.00085 [pdf, other]

Augmentation-based Unsupervised Cross-Domain Functional MRI Adaptation for Major Depressive Disorder Identification

Authors: Yunling Ma, Chaojun Zhang, Xiaochuan Wang, Qianqian Wang, Liang Cao, Limei Zhang, Mingxia Liu

Abstract: Major depressive disorder (MDD) is a common mental disorder that typically affects a person's mood, cognition, behavior, and physical health. Resting-state functional magnetic resonance imaging (rs-fMRI) data are widely used for computer-aided diagnosis of MDD. While multi-site fMRI data can provide more data for training reliable diagnostic models, significant cross-site data heterogeneity would… ▽ More Major depressive disorder (MDD) is a common mental disorder that typically affects a person's mood, cognition, behavior, and physical health. Resting-state functional magnetic resonance imaging (rs-fMRI) data are widely used for computer-aided diagnosis of MDD. While multi-site fMRI data can provide more data for training reliable diagnostic models, significant cross-site data heterogeneity would result in poor model generalizability. Many domain adaptation methods are designed to reduce the distributional differences between sites to some extent, but usually ignore overfitting problem of the model on the source domain. Intuitively, target data augmentation can alleviate the overfitting problem by forcing the model to learn more generalized features and reduce the dependence on source domain data. In this work, we propose a new augmentation-based unsupervised cross-domain fMRI adaptation (AUFA) framework for automatic diagnosis of MDD. The AUFA consists of 1) a graph representation learning module for extracting rs-fMRI features with spatial attention, 2) a domain adaptation module for feature alignment between source and target data, 3) an augmentation-based self-optimization module for alleviating model overfitting on the source domain, and 4) a classification module. Experimental results on 1,089 subjects suggest that AUFA outperforms several state-of-the-art methods in MDD identification. Our approach not only reduces data heterogeneity between different sites, but also localizes disease-related functional connectivity abnormalities and provides interpretability for the model. △ Less

Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.20883 [pdf, other]

Scalable Distance-based Multi-Agent Relative State Estimation via Block Multiconvex Optimization

Authors: Tianyue Wu, Gongye Zaitian, Qianhao Wang, Fei Gao

Abstract: This paper explores the distance-based relative state estimation problem in large-scale systems, which is hard to solve effectively due to its high-dimensionality and non-convexity. In this paper, we alleviate this inherent hardness to simultaneously achieve scalability and robustness of inference on this problem. Our idea is launched from a universal geometric formulation, called \emph{generalize… ▽ More This paper explores the distance-based relative state estimation problem in large-scale systems, which is hard to solve effectively due to its high-dimensionality and non-convexity. In this paper, we alleviate this inherent hardness to simultaneously achieve scalability and robustness of inference on this problem. Our idea is launched from a universal geometric formulation, called \emph{generalized graph realization}, for the distance-based relative state estimation problem. Based on this formulation, we introduce two collaborative optimization models, one of which is convex and thus globally solvable, and the other enables fast searching on non-convex landscapes to refine the solution offered by the convex one. Importantly, both models enjoy \emph{multiconvex} and \emph{decomposable} structures, allowing efficient and safe solutions using \emph{block coordinate descent} that enjoys scalability and a distributed nature. The proposed algorithms collaborate to demonstrate superior or comparable solution precision to the current centralized convex relaxation-based methods, which are known for their high optimality. Distinctly, the proposed methods demonstrate scalability beyond the reach of previous convex relaxation-based methods. We also demonstrate that the combination of the two proposed algorithms achieves a more robust pipeline than deploying the local search method alone in a continuous-time scenario. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: To appear in Robotics: Science and System 2024

arXiv:2405.20676 [pdf, other]

Search for $e^{+}e^{-}\toη'ψ(2S)$ at center-of-mass energies from 4.66 to 4.95 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using data samples with an integrated luminosity of $4.67~\mathrm{fb}^{-1}$ collected by the BESIII detector operating at the BEPCII collider, we search for the process $e^+e^- \rightarrow η' ψ(2S)$ at center-of-mass energies from $4.66$ to $4.95~\mathrm{GeV}$. No significant signal is observed, and upper limits for the Born cross sections $σ^B(e^+e^-\rightarrowη'ψ(2S))$ at the 90\% confidence lev… ▽ More Using data samples with an integrated luminosity of $4.67~\mathrm{fb}^{-1}$ collected by the BESIII detector operating at the BEPCII collider, we search for the process $e^+e^- \rightarrow η' ψ(2S)$ at center-of-mass energies from $4.66$ to $4.95~\mathrm{GeV}$. No significant signal is observed, and upper limits for the Born cross sections $σ^B(e^+e^-\rightarrowη'ψ(2S))$ at the 90\% confidence level are determined. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20638 [pdf, other]

Study of the decays $χ_{cJ} \rightarrow Λ\barΛφ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (637 additional authors not shown)

Abstract: Based on $(2712.4 \pm 14.3) \times 10^{6}$ $ e^{+}e^{-}\toψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, we report the first evidence of $χ_{c0}\to Λ\bar Λφ$ decays and the first observation of $χ_{c1,2}\to Λ\bar Λφ$ decays, with significances of $4.5σ$, $11.3σ$ and $13.0σ$, respectively. The decay branching fractions of $χ_{c0,1,2}\to Λ\bar Λφ$ are measured t… ▽ More Based on $(2712.4 \pm 14.3) \times 10^{6}$ $ e^{+}e^{-}\toψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, we report the first evidence of $χ_{c0}\to Λ\bar Λφ$ decays and the first observation of $χ_{c1,2}\to Λ\bar Λφ$ decays, with significances of $4.5σ$, $11.3σ$ and $13.0σ$, respectively. The decay branching fractions of $χ_{c0,1,2}\to Λ\bar Λφ$ are measured to be $( 2.99\pm1.24\pm0.19) \times 10^{-5}$, $(6.01\pm0.90\pm0.40 )\times 10^{-5}$, and $(7.13\pm0.81\pm0.36) \times 10^{-5}$, where the first uncertainties are statistical and the second systematic. No obvious enhancement near the $Λ\barΛ$ production threshold or excited $Λ$ state is found in the $Λφ$ (or $\barΛφ$) system. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 10 pages, 9 figures

arXiv:2405.20448 [pdf, other]

Knockout: A simple way to handle missing inputs

Authors: Minh Nguyen, Batuhan K. Karaman, Heejong Kim, Alan Q. Wang, Fengbei Liu, Mert R. Sabuncu

Abstract: Deep learning models can extract predictive and actionable information from complex inputs. The richer the inputs, the better these models usually perform. However, models that leverage rich inputs (e.g., multi-modality) can be difficult to deploy widely, because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training mul… ▽ More Deep learning models can extract predictive and actionable information from complex inputs. The richer the inputs, the better these models usually perform. However, models that leverage rich inputs (e.g., multi-modality) can be difficult to deploy widely, because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training multiple models. Marginalization can obtain calibrated predictions but it is computationally costly and therefore only feasible for low dimensional inputs. Imputation may result in inaccurate predictions because it employs point estimates for missing variables and does not work well for high dimensional inputs (e.g., images). Training multiple models whereby each model takes different subsets of inputs can work well but requires knowing missing input patterns in advance. Furthermore, training and retaining multiple models can be costly. We propose an efficient way to learn both the conditional distribution using full inputs and the marginal distributions. Our method, Knockout, randomly replaces input features with appropriate placeholder values during training. We provide a theoretical justification of Knockout and show that it can be viewed as an implicit marginalization strategy. We evaluate Knockout in a wide range of simulations and real-world datasets and show that it can offer strong empirical performance. △ Less

Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.20330 [pdf, other]

4DHands: Reconstructing Interactive Hands in 4D with Transformers

Authors: Dixuan Lin, Yuxiang Zhang, Mengcheng Li, Yebin Liu, Wei Jing, Qi Yan, Qianying Wang, Hongwen Zhang

Abstract: In this paper, we introduce 4DHands, a robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Our approach addresses two major limitations of previous methods: lacking a unified solution for handling various hand image inputs and neglecting the positional relationship of two hands within images. To overcome these challenges, we develop a transforme… ▽ More In this paper, we introduce 4DHands, a robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Our approach addresses two major limitations of previous methods: lacking a unified solution for handling various hand image inputs and neglecting the positional relationship of two hands within images. To overcome these challenges, we develop a transformer-based architecture with novel tokenization and feature fusion strategies. Specifically, we propose a Relation-aware Two-Hand Tokenization (RAT) method to embed positional relation information into the hand tokens. In this way, our network can handle both single-hand and two-hand inputs and explicitly leverage relative hand positions, facilitating the reconstruction of intricate hand interactions in real-world scenarios. As such tokenization indicates the relative relationship of two hands, it also supports more effective feature fusion. To this end, we further develop a Spatio-temporal Interaction Reasoning (SIR) module to fuse hand tokens in 4D with attention and decode them into 3D hand meshes and relative temporal movements. The efficacy of our approach is validated on several benchmark datasets. The results on in-the-wild videos and real-world scenarios demonstrate the superior performances of our approach for interactive hand reconstruction. More video results can be found on the project page: https://4dhands.github.io. △ Less

Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: More demo videos can be seen at our project page: https://4dhands.github.io

arXiv:2405.20015 [pdf, other]

Efficient LLM-Jailbreaking by Introducing Visual Modality

Authors: Zhenxing Niu, Yuyao Sun, Haodong Ren, Haoxuan Ji, Quan Wang, Xiaoke Ma, Gang Hua, Rong Jin

Abstract: This paper focuses on jailbreaking attacks against large language models (LLMs), eliciting them to generate objectionable content in response to harmful user queries. Unlike previous LLM-jailbreaks that directly orient to LLMs, our approach begins by constructing a multimodal large language model (MLLM) through the incorporation of a visual module into the target LLM. Subsequently, we conduct an e… ▽ More This paper focuses on jailbreaking attacks against large language models (LLMs), eliciting them to generate objectionable content in response to harmful user queries. Unlike previous LLM-jailbreaks that directly orient to LLMs, our approach begins by constructing a multimodal large language model (MLLM) through the incorporation of a visual module into the target LLM. Subsequently, we conduct an efficient MLLM-jailbreak to generate jailbreaking embeddings embJS. Finally, we convert the embJS into text space to facilitate the jailbreaking of the target LLM. Compared to direct LLM-jailbreaking, our approach is more efficient, as MLLMs are more vulnerable to jailbreaking than pure LLM. Additionally, to improve the attack success rate (ASR) of jailbreaking, we propose an image-text semantic matching scheme to identify a suitable initial input. Extensive experiments demonstrate that our approach surpasses current state-of-the-art methods in terms of both efficiency and effectiveness. Moreover, our approach exhibits superior cross-class jailbreaking capabilities. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19695 [pdf, other]

Distribution Aligned Semantics Adaption for Lifelong Person Re-Identification

Authors: Qizao Wang, Xuelin Qian, Bin Li, Xiangyang Xue

Abstract: In real-world scenarios, person Re-IDentification (Re-ID) systems need to be adaptable to changes in space and time. Therefore, the adaptation of Re-ID models to new domains while preserving previously acquired knowledge is crucial, known as Lifelong person Re-IDentification (LReID). Advanced LReID methods rely on replaying exemplars from old domains and applying knowledge distillation in logits w… ▽ More In real-world scenarios, person Re-IDentification (Re-ID) systems need to be adaptable to changes in space and time. Therefore, the adaptation of Re-ID models to new domains while preserving previously acquired knowledge is crucial, known as Lifelong person Re-IDentification (LReID). Advanced LReID methods rely on replaying exemplars from old domains and applying knowledge distillation in logits with old models. However, due to privacy concerns, retaining previous data is inappropriate. Additionally, the fine-grained and open-set characteristics of Re-ID limit the effectiveness of the distillation paradigm for accumulating knowledge. We argue that a Re-ID model trained on diverse and challenging pedestrian images at a large scale can acquire robust and general human semantic knowledge. These semantics can be readily utilized as shared knowledge for lifelong applications. In this paper, we identify the challenges and discrepancies associated with adapting a pre-trained model to each application domain, and introduce the Distribution Aligned Semantics Adaption (DASA) framework. It efficiently adjusts Batch Normalization (BN) to mitigate interference from data distribution discrepancy and freezes the pre-trained convolutional layers to preserve shared knowledge. Additionally, we propose the lightweight Semantics Adaption (SA) module, which effectively adapts learned semantics to enhance pedestrian representations. Extensive experiments demonstrate the remarkable superiority of our proposed framework over advanced LReID methods, and it exhibits significantly reduced storage consumption. DASA presents a novel and cost-effective perspective on effectively adapting pre-trained models for LReID. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19596 [pdf, ps, other]

The weight hierarchies of three classes of linear codes

Authors: Wei Lu, Qingyao Wang, Xiaoqiang Wang, Dabin Zheng

Abstract: Studying the generalized Hamming weights of linear codes is a significant research area within coding theory, as it provides valuable structural information about the codes and plays a crucial role in determining their performance in various applications. However, determining the generalized Hamming weights of linear codes, particularly their weight hierarchy, is generally a challenging task. In t… ▽ More Studying the generalized Hamming weights of linear codes is a significant research area within coding theory, as it provides valuable structural information about the codes and plays a crucial role in determining their performance in various applications. However, determining the generalized Hamming weights of linear codes, particularly their weight hierarchy, is generally a challenging task. In this paper, we focus on investigating the generalized Hamming weights of three classes of linear codes over finite fields. These codes are constructed by different defining sets. By analysing the intersections between the definition sets and the duals of all $r$-dimensional subspaces, we get the inequalities on the sizes of these intersections. Then constructing subspaces that reach the upper bounds of these inequalities, we successfully determine the complete weight hierarchies of these codes. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.19419 [pdf, other]

Supernova Electron-Neutrino Interactions with Xenon in the nEXO Detector

Authors: nEXO Collaboration, S. Hedges, S. Al Kharusi, E. Angelico, J. P. Brodsky, G. Richardson, S. Wilde, A. Amy, A. Anker, I. J. Arnquist, P. Arsenault, A. Atencio, I. Badhrees, J. Bane, V. Belov, E. P. Bernard, T. Bhatta, A. Bolotnikov, J. Breslin, P. A. Breur, E. Brown, T. Brunner, E. Caden, G. F. Cao, L. Q. Cao , et al. (121 additional authors not shown)

Abstract: Electron-neutrino charged-current interactions with xenon nuclei were modeled in the nEXO neutrinoless double-beta decay detector (~5-tonne, 90% ${}^{136}$Xe, 10% ${}^{134}$Xe) to evaluate its sensitivity to supernova neutrinos. Predictions for event rates and detectable signatures were modeled using the MARLEY event generator. We find good agreement between MARLEY's predictions and existing theor… ▽ More Electron-neutrino charged-current interactions with xenon nuclei were modeled in the nEXO neutrinoless double-beta decay detector (~5-tonne, 90% ${}^{136}$Xe, 10% ${}^{134}$Xe) to evaluate its sensitivity to supernova neutrinos. Predictions for event rates and detectable signatures were modeled using the MARLEY event generator. We find good agreement between MARLEY's predictions and existing theoretical calculations of the inclusive cross sections at supernova neutrino energies. The interactions modeled by MARLEY were simulated within the nEXO simulation framework and were run through an example reconstruction algorithm to determine the detector's efficiency for reconstructing these events. The simulated data, incorporating the detector response, were used to study the ability of nEXO to reconstruct the incident electron-neutrino spectrum and these results were extended to a larger xenon detector of the same isotope enrichment. We estimate that nEXO will be able to observe electron-neutrino interactions with xenon from supernovae as far as 5 to 8 kpc from earth, while the ability to reconstruct incident electron-neutrino spectrum parameters from observed interactions in nEXO is limited to closer supernovae. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 17 pages, 16 figures

Report number: LLNL-JRNL-864783-DRAFT

arXiv:2405.19007 [pdf, other]

Non-Hermitian theory of valley excitons in two-dimensional semiconductors

Authors: Qiutong Wang, Ci Li, Qingjun Tong

Abstract: Electron-hole exchange interaction in two-dimensional transition metal dichalcogenides is extremely strong due to the dimension reduction, which promises valley-superposed excitonic states with linearly polarized optical emissions. However, strong circular polarization reflecting valley-polarized excitonic states is commonly observed in helicity-resolved optical experiments. Here we present a non-… ▽ More Electron-hole exchange interaction in two-dimensional transition metal dichalcogenides is extremely strong due to the dimension reduction, which promises valley-superposed excitonic states with linearly polarized optical emissions. However, strong circular polarization reflecting valley-polarized excitonic states is commonly observed in helicity-resolved optical experiments. Here we present a non-Hermitian theory of valley excitons by incorporating optical pumping and intrinsic decay, which unveils an anomalous valley-polarized excitonic state with elliptically polarized optical emission. This novel state arises from the non-Hermiticity induced parity-time ($\mathcal{PT}$)-symmetry breaking, which impedes the experimental observation of intervalley excitonic coherence effect. At large excitonic center-of-mass momenta, the $\mathcal{PT}$-symmetry is restored and the excitonic states recover their valley coherence. Interestingly, the linear polarization directions in optical emissions from these valley-superposed excitonic states are non-orthogonal and even become parallel at exceptional points. Our non-Hermitian theory also predicts a non-zero Berry curvature for valley excitons, which admits a topological excitonic Hall transport beyond the Hermitian predictions. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 6 pages, 4 figures

arXiv:2405.18801 [pdf, other]

SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation

Authors: Zhenbei Wu, Qiang Wang, Jie Yang

Abstract: The scarcity of free-hand sketch presents a challenging problem. Despite the emergence of some large-scale sketch datasets, these datasets primarily consist of sketches at the single-object level. There continues to be a lack of large-scale paired datasets for scene sketches. In this paper, we propose a self-supervised method for scene sketch generation that does not rely on any existing scene ske… ▽ More The scarcity of free-hand sketch presents a challenging problem. Despite the emergence of some large-scale sketch datasets, these datasets primarily consist of sketches at the single-object level. There continues to be a lack of large-scale paired datasets for scene sketches. In this paper, we propose a self-supervised method for scene sketch generation that does not rely on any existing scene sketch, enabling the transformation of single-object sketches into scene sketches. To accomplish this, we introduce a method for vector sketch captioning and sketch semantic expansion. Additionally, we design a sketch generation network that incorporates a fusion of multi-modal perceptual constraints, suitable for application in zero-shot image-to-sketch downstream task, demonstrating state-of-the-art performance through experimental validation. Finally, leveraging our proposed sketch-to-sketch generation method, we contribute a large-scale dataset centered around scene sketches, comprising highly semantically consistent "text-sketch-image" triplets. Our research confirms that this dataset can significantly enhance the capabilities of existing models in sketch-based image retrieval and sketch-controlled image synthesis tasks. We will make our dataset and code publicly available. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18258 [pdf, other]

Text-only Synthesis for Image Captioning

Authors: Qing Zhou, Junlin Huang, Qiang Li, Junyu Gao, Qi Wang

Abstract: From paired image-text training to text-only training for image captioning, the pursuit of relaxing the requirements for high-cost and large-scale annotation of good quality data remains consistent. In this paper, we propose Text-only Synthesis for Image Captioning (ToCa), which further advances this relaxation with fewer human labor and less computing time. Specifically, we deconstruct caption te… ▽ More From paired image-text training to text-only training for image captioning, the pursuit of relaxing the requirements for high-cost and large-scale annotation of good quality data remains consistent. In this paper, we propose Text-only Synthesis for Image Captioning (ToCa), which further advances this relaxation with fewer human labor and less computing time. Specifically, we deconstruct caption text into structures and lexical words, which serve as the fundamental components of the caption. By combining different structures and lexical words as inputs to the large language model, massive captions that contain various patterns of lexical words are generated. This method not only approaches the target domain but also surpasses it by generating new captions, thereby enhancing the zero-shot generalization ability of the model. Considering the different levels of data access in the real world, we define three synthesis scenarios: cross-domain synthesis, in-domain synthesis, and data-efficient synthesis. Experiments in these scenarios demonstrate the generalizability, transferability and practicability of ToCa with a nearly 5 CIDEr improvement for zero-shot cross-domain captioning and a maximum increase of over 20 CIDEr for data-efficient captioning. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.18156 [pdf, other]

VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation

Authors: Qilin Wang, Zhengkai Jiang, Chengming Xu, Jiangning Zhang, Yabiao Wang, Xinyi Zhang, Yun Cao, Weijian Cao, Chengjie Wang, Yanwei Fu

Abstract: Human image animation involves generating a video from a static image by following a specified pose sequence. Current approaches typically adopt a multi-stage pipeline that separately learns appearance and motion, which often leads to appearance degradation and temporal inconsistencies. To address these issues, we propose VividPose, an innovative end-to-end pipeline based on Stable Video Diffusion… ▽ More Human image animation involves generating a video from a static image by following a specified pose sequence. Current approaches typically adopt a multi-stage pipeline that separately learns appearance and motion, which often leads to appearance degradation and temporal inconsistencies. To address these issues, we propose VividPose, an innovative end-to-end pipeline based on Stable Video Diffusion (SVD) that ensures superior temporal stability. To enhance the retention of human identity, we propose an identity-aware appearance controller that integrates additional facial information without compromising other appearance details such as clothing texture and background. This approach ensures that the generated videos maintain high fidelity to the identity of human subject, preserving key facial features across various poses. To accommodate diverse human body shapes and hand movements, we introduce a geometry-aware pose controller that utilizes both dense rendering maps from SMPL-X and sparse skeleton maps. This enables accurate alignment of pose and shape in the generated videos, providing a robust framework capable of handling a wide range of body shapes and dynamic hand movements. Extensive qualitative and quantitative experiments on the UBCFashion and TikTok benchmarks demonstrate that our method achieves state-of-the-art performance. Furthermore, VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset. Codes and models will be available. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17792 [pdf, other]

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 4 tables

arXiv:2405.17778 [pdf, other]

Synthetic non-Abelian charges in degenerate Fermi gases

Authors: Qi-Dong Wang, Yan-Qing Zhu, Shi-Liang Zhu, Zhen Zheng

Abstract: Topological phases associated with non-Abelian charges can exhibit a distinguished bulk-edge correspondence compared to Abelian phases, although elucidating this relationship remains challenging in traditional solid-state systems. In this paper, we propose a theoretical framework for synthesizing non-Abelian quaternion charges in degenerate Fermi gases. By designing artificial spin-orbit coupling… ▽ More Topological phases associated with non-Abelian charges can exhibit a distinguished bulk-edge correspondence compared to Abelian phases, although elucidating this relationship remains challenging in traditional solid-state systems. In this paper, we propose a theoretical framework for synthesizing non-Abelian quaternion charges in degenerate Fermi gases. By designing artificial spin-orbit coupling patterns, the topological edge modes demonstrate a clear correspondence with the band topology determined by various quaternion charges. This paves the way for observing the interface modes whose existence is attributed to the non-conservation multiplication relation, which is fundamental to non-Abelian charges. This scheme can be readily implemented using current ultracold atom techniques, offering a promising approach to explore the intriguing non-Abelian characteristics of the system. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 8 pages, 6 figures

arXiv:2405.17306 [pdf, other]

Controllable Longer Image Animation with Diffusion Models

Authors: Qiang Wang, Minghua Liu, Junjun Hu, Fan Jiang, Mu Xu

Abstract: Generating realistic animated videos from static images is an important area of research in computer vision. Methods based on physical simulation and motion prediction have achieved notable advances, but they are often limited to specific object textures and motion trajectories, failing to exhibit highly complex environments and physical dynamics. In this paper, we introduce an open-domain control… ▽ More Generating realistic animated videos from static images is an important area of research in computer vision. Methods based on physical simulation and motion prediction have achieved notable advances, but they are often limited to specific object textures and motion trajectories, failing to exhibit highly complex environments and physical dynamics. In this paper, we introduce an open-domain controllable image animation method using motion priors with video diffusion models. Our method achieves precise control over the direction and speed of motion in the movable region by extracting the motion field information from videos and learning moving trajectories and strengths. Current pretrained video generation models are typically limited to producing very short videos, typically less than 30 frames. In contrast, we propose an efficient long-duration video generation method based on noise reschedule specifically tailored for image animation tasks, facilitating the creation of videos over 100 frames in length while maintaining consistency in content scenery and motion coordination. Specifically, we decompose the denoise process into two distinct phases: the shaping of scene contours and the refining of motion details. Then we reschedule the noise to control the generated frame sequences maintaining long-distance noise correlation. We conducted extensive experiments with 10 baselines, encompassing both commercial tools and academic methodologies, which demonstrate the superiority of our method. Our project page: https://wangqiang9.github.io/Controllable.github.io/ △ Less

Submitted 27 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

Comments: https://wangqiang9.github.io/Controllable.github.io/

arXiv:2405.17279 [pdf]

Socially-Aware Shared Control Navigation for Assistive Mobile Robots in the Built Environment

Authors: Yifan Xu, Qianwei Wang, Vineet Kamat, Carol Menassa

Abstract: As the number of Persons with Disabilities (PWD), particularly those with one or more physical impairments, increases, there is an increasing demand for assistive robotic technologies that can support independent mobility in the built environment and reduce the burden on caregivers. Current assistive mobility platforms (e.g., robotic wheelchairs) often fail to incorporate user preferences and cont… ▽ More As the number of Persons with Disabilities (PWD), particularly those with one or more physical impairments, increases, there is an increasing demand for assistive robotic technologies that can support independent mobility in the built environment and reduce the burden on caregivers. Current assistive mobility platforms (e.g., robotic wheelchairs) often fail to incorporate user preferences and control, leading to reduced trust and efficiency. Existing shared control algorithms do not allow the incorporation of the user control preferences inside the navigation framework or the path planning algorithm. In addition, existing dynamic local planner algorithms for robotic wheelchairs do not take into account the social spaces of people, potentially leading such platforms to infringe upon these areas and cause discomfort. To address these concerns, this work introduces a novel socially-aware shared autonomy-based navigation system for assistive mobile robotic platforms. Our navigation framework comprises a Global Planner and a Local Planner. To implement the Global Planner, the proposed approach introduces a novel User Preference Field (UPF) theory within its global planning framework, explicitly acknowledging user preferences to adeptly navigate away from congested areas. For the Local Planner, we propose a Socially-aware Shared Control-based Model Predictive Control with Dynamic Control Barrier Function (SS-MPC-DCBF) to adjust movements in real-time, integrating user preferences for safer, more autonomous navigation. Evaluation results show that our Global Planner aligns closely with user preferences compared to baselines, and our Local Planner demonstrates enhanced safety and efficiency in dynamic and static scenarios. This integrated approach fosters trust and autonomy, crucial for the acceptance of assistive mobility technologies in the built environment. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 42 pages, 14 figures

arXiv:2405.16952 [pdf, other]

A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and Recognition

Authors: Zilu Guo, Qing Wang, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui

Abstract: In this paper, we propose a variance-preserving interpolation framework to improve diffusion models for single-channel speech enhancement (SE) and automatic speech recognition (ASR). This new variance-preserving interpolation diffusion model (VPIDM) approach requires only 25 iterative steps and obviates the need for a corrector, an essential element in the existing variance-exploding interpolation… ▽ More In this paper, we propose a variance-preserving interpolation framework to improve diffusion models for single-channel speech enhancement (SE) and automatic speech recognition (ASR). This new variance-preserving interpolation diffusion model (VPIDM) approach requires only 25 iterative steps and obviates the need for a corrector, an essential element in the existing variance-exploding interpolation diffusion model (VEIDM). Two notable distinctions between VPIDM and VEIDM are the scaling function of the mean of state variables and the constraint imposed on the variance relative to the mean's scale. We conduct a systematic exploration of the theoretical mechanism underlying VPIDM and develop insights regarding VPIDM's applications in SE and ASR using VPIDM as a frontend. Our proposed approach, evaluated on two distinct data sets, demonstrates VPIDM's superior performances over conventional discriminative SE algorithms. Furthermore, we assess the performance of the proposed model under varying signal-to-noise ratio (SNR) levels. The investigation reveals VPIDM's improved robustness in target noise elimination when compared to VEIDM. Furthermore, utilizing the mid-outputs of both VPIDM and VEIDM results in enhanced ASR accuracies, thereby highlighting the practical efficacy of our proposed approach. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16947 [pdf, other]

Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models

Authors: Qian Wang, Abdelrahman Eldesokey, Mohit Mendiratta, Fangneng Zhan, Adam Kortylewski, Christian Theobalt, Peter Wonka

Abstract: We introduce the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. A growing research direction attempts to employ diffusion models to perform downstream vision tasks by exploiting their deep understanding of image semantics. Yet, the majority of these approaches have focused on image-related tasks like semantic correspondence and segmentation, w… ▽ More We introduce the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. A growing research direction attempts to employ diffusion models to perform downstream vision tasks by exploiting their deep understanding of image semantics. Yet, the majority of these approaches have focused on image-related tasks like semantic correspondence and segmentation, with less emphasis on video tasks such as VSS. Ideally, diffusion-based image semantic segmentation approaches can be applied to videos in a frame-by-frame manner. However, we find their performance on videos to be subpar due to the absence of any modeling of temporal information inherent in the video data. To this end, we tackle this problem and introduce a framework tailored for VSS based on pre-trained image and video diffusion models. We propose building a scene context model based on the diffusion features, where the model is autoregressively updated to adapt to scene changes. This context model predicts per-frame coarse segmentation maps that are temporally consistent. To refine these maps further, we propose a correspondence-based refinement strategy that aggregates predictions temporally, resulting in more confident predictions. Finally, we introduce a masked modulation approach to upsample the coarse maps to the full resolution at a high quality. Experiments show that our proposed approach outperforms existing zero-shot image semantic segmentation approaches significantly on various VSS benchmarks without any training or fine-tuning. Moreover, it rivals supervised VSS approaches on the VSPW dataset despite not being explicitly trained for VSS. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Project webpage: https://qianwangx.github.io/VidSeg_diffusion/

arXiv:2405.16600 [pdf, other]

Image-Text-Image Knowledge Transferring for Lifelong Person Re-Identification with Hybrid Clothing States

Authors: Qizao Wang, Xuelin Qian, Bin Li, Yanwei Fu, Xiangyang Xue

Abstract: With the continuous expansion of intelligent surveillance networks, lifelong person re-identification (LReID) has received widespread attention, pursuing the need of self-evolution across different domains. However, existing LReID studies accumulate knowledge with the assumption that people would not change their clothes. In this paper, we propose a more practical task, namely lifelong person re-i… ▽ More With the continuous expansion of intelligent surveillance networks, lifelong person re-identification (LReID) has received widespread attention, pursuing the need of self-evolution across different domains. However, existing LReID studies accumulate knowledge with the assumption that people would not change their clothes. In this paper, we propose a more practical task, namely lifelong person re-identification with hybrid clothing states (LReID-Hybrid), which takes a series of cloth-changing and cloth-consistent domains into account during lifelong learning. To tackle the challenges of knowledge granularity mismatch and knowledge presentation mismatch that occurred in LReID-Hybrid, we take advantage of the consistency and generalization of the text space, and propose a novel framework, dubbed $Teata$, to effectively align, transfer and accumulate knowledge in an "image-text-image" closed loop. Concretely, to achieve effective knowledge transfer, we design a Structured Semantic Prompt (SSP) learning to decompose the text prompt into several structured pairs to distill knowledge from the image space with a unified granularity of text description. Then, we introduce a Knowledge Adaptation and Projection strategy (KAP), which tunes text knowledge via a slow-paced learner to adapt to different tasks without catastrophic forgetting. Extensive experiments demonstrate the superiority of our proposed $Teata$ for LReID-Hybrid as well as on conventional LReID benchmarks over advanced methods. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16597 [pdf, other]

Content and Salient Semantics Collaboration for Cloth-Changing Person Re-Identification

Authors: Qizao Wang, Xuelin Qian, Bin Li, Lifeng Chen, Yanwei Fu, Xiangyang Xue

Abstract: Cloth-changing person Re-IDentification (Re-ID) aims at recognizing the same person with clothing changes across non-overlapping cameras. Conventional person Re-ID methods usually bias the model's focus on cloth-related appearance features rather than identity-sensitive features associated with biological traits. Recently, advanced cloth-changing person Re-ID methods either resort to identity-rela… ▽ More Cloth-changing person Re-IDentification (Re-ID) aims at recognizing the same person with clothing changes across non-overlapping cameras. Conventional person Re-ID methods usually bias the model's focus on cloth-related appearance features rather than identity-sensitive features associated with biological traits. Recently, advanced cloth-changing person Re-ID methods either resort to identity-related auxiliary modalities (e.g., sketches, silhouettes, keypoints and 3D shapes) or clothing labels to mitigate the impact of clothes. However, relying on unpractical and inflexible auxiliary modalities or annotations limits their real-world applicability. In this paper, we promote cloth-changing person Re-ID by effectively leveraging abundant semantics present within pedestrian images without the need for any auxiliaries. Specifically, we propose the Content and Salient Semantics Collaboration (CSSC) framework, facilitating cross-parallel semantics interaction and refinement. Our framework is simple yet effective, and the vital design is the Semantics Mining and Refinement (SMR) module. It extracts robust identity features about content and salient semantics, while mitigating interference from clothing appearances effectively. By capitalizing on the mined abundant semantic features, our proposed approach achieves state-of-the-art performance on three cloth-changing benchmarks as well as conventional benchmarks, demonstrating its superiority over advanced competitors. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Showing 101–150 of 5,068 results for author: Wang, Q