subscribe to arXiv mailings

GraphMamba: An Efficient Graph Structure Learning Vision Mamba for Hyperspectral Image Classification

Authors: Aitao Yang, Min Li, Yao Ding, Leyuan Fang, Yaoming Cai, Yujie He

Abstract: Efficient extraction of spectral sequences and geospatial information has always been a hot topic in hyperspectral image classification. In terms of spectral sequence feature capture, RNN and Transformer have become mainstream classification frameworks due to their long-range feature capture capabilities. In terms of spatial information aggregation, CNN enhances the receptive field to retain integ… ▽ More Efficient extraction of spectral sequences and geospatial information has always been a hot topic in hyperspectral image classification. In terms of spectral sequence feature capture, RNN and Transformer have become mainstream classification frameworks due to their long-range feature capture capabilities. In terms of spatial information aggregation, CNN enhances the receptive field to retain integrated spatial information as much as possible. However, the spectral feature-capturing architectures exhibit low computational efficiency, and CNNs lack the flexibility to perceive spatial contextual information. To address these issues, this paper proposes GraphMamba--an efficient graph structure learning vision Mamba classification framework that fully considers HSI characteristics to achieve deep spatial-spectral information mining. Specifically, we propose a novel hyperspectral visual GraphMamba processing paradigm (HVGM) that preserves spatial-spectral features by constructing spatial-spectral cubes and utilizes linear spectral encoding to enhance the operability of subsequent tasks. The core components of GraphMamba include the HyperMamba module for improving computational efficiency and the SpectralGCN module for adaptive spatial context awareness. The HyperMamba mitigates clutter interference by employing the global mask (GM) and introduces a parallel training inference architecture to alleviate computational bottlenecks. The SpatialGCN incorporates weighted multi-hop aggregation (WMA) spatial encoding to focus on highly correlated spatial structural features, thus flexibly aggregating contextual information while mitigating spatial noise interference. Extensive experiments were conducted on three different scales of real HSI datasets, and compared with the state-of-the-art classification frameworks, GraphMamba achieved optimal performance. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 13 pages, 10 figures

arXiv:2407.06116 [pdf]

Data-driven Nucleus Subclassification on Colon H&E using Style-transferred Digital Pathology

Authors: Lucas W. Remedios, Shunxing Bao, Samuel W. Remedios, Ho Hin Lee, Leon Y. Cai, Thomas Li, Ruining Deng, Nancy R. Newlin, Adam M. Saunders, Can Cui, Jia Li, Qi Liu, Ken S. Lau, Joseph T. Roland, Mary K Washington, Lori A. Coburn, Keith T. Wilson, Yuankai Huo, Bennett A. Landman

Abstract: Understanding the way cells communicate, co-locate, and interrelate is essential to furthering our understanding of how the body functions. H&E is widely available, however, cell subtyping often requires expert knowledge and the use of specialized stains. To reduce the annotation burden, AI has been proposed for the classification of cells on H&E. For example, the recent Colon Nucleus Identificati… ▽ More Understanding the way cells communicate, co-locate, and interrelate is essential to furthering our understanding of how the body functions. H&E is widely available, however, cell subtyping often requires expert knowledge and the use of specialized stains. To reduce the annotation burden, AI has been proposed for the classification of cells on H&E. For example, the recent Colon Nucleus Identification and Classification (CoNIC) Challenge focused on labeling 6 cell types on H&E of the colon. However, the CoNIC Challenge was unable to classify epithelial subtypes (progenitor, enteroendocrine, goblet), lymphocyte subtypes (B, helper T, cytotoxic T), and connective subtypes (fibroblasts). We use inter-modality learning to label previously un-labelable cell types on H&E. We take advantage of multiplexed immunofluorescence (MxIF) histology to label 14 cell subclasses. We performed style transfer on the same MxIF tissues to synthesize realistic virtual H&E which we paired with the MxIF-derived cell subclassification labels. We evaluated the efficacy of using a supervised learning scheme where the input was realistic-quality virtual H&E and the labels were MxIF-derived cell subclasses. We assessed our model on private virtual H&E and public real H&E. On virtual H&E, we were able to classify helper T cells and epithelial progenitors with positive predictive values of $0.34 \pm 0.15$ (prevalence $0.03 \pm 0.01$) and $0.47 \pm 0.1$ (prevalence $0.07 \pm 0.02$) respectively, when using ground truth centroid information. On real H&E we could classify helper T cells and epithelial progenitors with upper bound positive predictive values of $0.43 \pm 0.03$ (parent class prevalence 0.21) and $0.94 \pm 0.02$ (parent class prevalence 0.49) when using ground truth centroid information. This is the first work to provide cell type classification for helper T and epithelial progenitor nuclei on H&E. △ Less

Submitted 15 May, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2401.05602

arXiv:2407.04016 [pdf, other]

Mitigating Low-Frequency Bias: Feature Recalibration and Frequency Attention Regularization for Adversarial Robustness

Authors: Kejia Zhang, Juanjuan Weng, Yuanzheng Cai, Zhiming Luo, Shaozi Li

Abstract: Ensuring the robustness of computer vision models against adversarial attacks is a significant and long-lasting objective. Motivated by adversarial attacks, researchers have devoted considerable efforts to enhancing model robustness by adversarial training (AT). However, we observe that while AT improves the models' robustness against adversarial perturbations, it fails to improve their ability to… ▽ More Ensuring the robustness of computer vision models against adversarial attacks is a significant and long-lasting objective. Motivated by adversarial attacks, researchers have devoted considerable efforts to enhancing model robustness by adversarial training (AT). However, we observe that while AT improves the models' robustness against adversarial perturbations, it fails to improve their ability to effectively extract features across all frequency components. Each frequency component contains distinct types of crucial information: low-frequency features provide fundamental structural insights, while high-frequency features capture intricate details and textures. In particular, AT tends to neglect the reliance on susceptible high-frequency features. This low-frequency bias impedes the model's ability to effectively leverage the potentially meaningful semantic information present in high-frequency features. This paper proposes a novel module called High-Frequency Feature Disentanglement and Recalibration (HFDR), which separates features into high-frequency and low-frequency components and recalibrates the high-frequency feature to capture latent useful semantics. Additionally, we introduce frequency attention regularization to magnitude the model's extraction of different frequency features and mitigate low-frequency bias during AT. Extensive experiments showcase the immense potential and superiority of our approach in resisting various white-box attacks, transfer attacks, and showcasing strong generalization capabilities. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.02745 [pdf, other]

PWTO: A Heuristic Approach for Trajectory Optimization in Complex Terrains

Authors: Yilin Cai, Zhongqiang Ren

Abstract: This paper considers a trajectory planning problem for a robot navigating complex terrains, which arises in applications ranging from autonomous mining vehicles to planetary rovers. The problem seeks to find a low-cost dynamically feasible trajectory for the robot. The problem is challenging as it requires solving a non-linear optimization problem that often has many local minima due to the comple… ▽ More This paper considers a trajectory planning problem for a robot navigating complex terrains, which arises in applications ranging from autonomous mining vehicles to planetary rovers. The problem seeks to find a low-cost dynamically feasible trajectory for the robot. The problem is challenging as it requires solving a non-linear optimization problem that often has many local minima due to the complex terrain. To address the challenge, we propose a method called Pareto-optimal Warm-started Trajectory Optimization (PWTO) that attempts to combine the benefits of graph search and trajectory optimization, two very different approaches to planning. PWTO first creates a state lattice using simplified dynamics of the robot and leverages a multi-objective graph search method to obtain a set of paths. Each of the paths is then used to warm-start a local trajectory optimization process, so that different local minima are explored to find a globally low-cost solution. In our tests, the solution cost computed by PWTO is often less than half of the costs computed by the baselines. In addition, we verify the trajectories generated by PWTO in Gazebo simulation in complex terrains with both wheeled and quadruped robots. The code of this paper is open sourced and can be found at https://github.com/rap-lab-org/public_pwto. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02395 [pdf, other]

Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval

Authors: Jiexin Wang, Xitong Luo, Liuwen Cao, Hongkui He, Hailin Huang, Jiayuan Xie, Adam Jatowt, Yi Cai

Abstract: Large language models (LLMs) have brought significant advancements to code generation and code repair, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, raises the risk of inadvertently propagating security vulnerabilities. Despite numerous studies investigating the safety of code LLMs, there remains a gap… ▽ More Large language models (LLMs) have brought significant advancements to code generation and code repair, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, raises the risk of inadvertently propagating security vulnerabilities. Despite numerous studies investigating the safety of code LLMs, there remains a gap in comprehensively addressing their security features. In this work, we aim to present a comprehensive study aimed at precisely evaluating and enhancing the security aspects of code LLMs. To support our research, we introduce CodeSecEval, a meticulously curated dataset designed to address 44 critical vulnerability types with 180 distinct samples. CodeSecEval serves as the foundation for the automatic evaluation of code models in two crucial tasks: code generation and code repair, with a strong emphasis on security. Our experimental results reveal that current models frequently overlook security issues during both code generation and repair processes, resulting in the creation of vulnerable code. In response, we propose different strategies that leverage vulnerability-aware information and insecure code explanations to mitigate these security vulnerabilities. Furthermore, our findings highlight that certain vulnerability types particularly challenge model performance, influencing their effectiveness in real-world applications. Based on these findings, we believe our study will have a positive impact on the software engineering community, inspiring the development of improved methods for training and utilizing LLMs, thereby leading to safer and more trustworthy model deployment. △ Less

Submitted 4 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.16263

arXiv:2407.02353 [pdf, other]

Roadmap to Neuromorphic Computing with Emerging Technologies

Authors: Adnan Mehonic, Daniele Ielmini, Kaushik Roy, Onur Mutlu, Shahar Kvatinsky, Teresa Serrano-Gotarredona, Bernabe Linares-Barranco, Sabina Spiga, Sergey Savelev, Alexander G Balanov, Nitin Chawla, Giuseppe Desoli, Gerardo Malavena, Christian Monzio Compagnoni, Zhongrui Wang, J Joshua Yang, Ghazi Sarwat Syed, Abu Sebastian, Thomas Mikolajick, Beatriz Noheda, Stefan Slesazeck, Bernard Dieny, Tuo-Hung, Hou, Akhil Varri , et al. (28 additional authors not shown)

Abstract: The roadmap is organized into several thematic sections, outlining current computing challenges, discussing the neuromorphic computing approach, analyzing mature and currently utilized technologies, providing an overview of emerging technologies, addressing material challenges, exploring novel computing concepts, and finally examining the maturity level of emerging technologies while determining t… ▽ More The roadmap is organized into several thematic sections, outlining current computing challenges, discussing the neuromorphic computing approach, analyzing mature and currently utilized technologies, providing an overview of emerging technologies, addressing material challenges, exploring novel computing concepts, and finally examining the maturity level of emerging technologies while determining the next essential steps for their advancement. △ Less

Submitted 5 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: 90 pages, 22 figures, roadmap, neuromorphic

arXiv:2407.02210 [pdf, other]

Baryon Acoustic Oscillations analyses with Density-Split Statistics

Authors: Tengpeng Xu, Yan-Chuan Cai, Yun Chen, Mark Neyrinck, Liang Gao, Qiao Wang

Abstract: Accurate modeling for the evolution of the Baryon Acoustic Oscillations (BAO) is essential for using it as a standard ruler to probe cosmology. We explore the non-linearity of the BAO in different environments using the density-split statistics and compare them to the case of the conventional two-point correlation function (2PCF). We detect density-dependent shifts for the position of the BAO with… ▽ More Accurate modeling for the evolution of the Baryon Acoustic Oscillations (BAO) is essential for using it as a standard ruler to probe cosmology. We explore the non-linearity of the BAO in different environments using the density-split statistics and compare them to the case of the conventional two-point correlation function (2PCF). We detect density-dependent shifts for the position of the BAO with respect to its linear version using halos from N-body simulations. Around low/high-densities, the scale of the BAO expands/contracts due to non-linear peculiar velocities. As the simulation evolves from redshift 1 to 0, the difference in the magnitude of the shifts between high- and low-density regions increases from the sub-percent to the percent level. In contrast, the scale of the BAO does not evolve in the total 2PCF in the same redshift range. The width of the BAO around high density regions increases as the universe evolves, similar to the known broadening of the BAO in the 2PCF due to non-linear evolution. In contrast, the width is smaller and stable for low density regions. We discuss possible implications for the reconstructions of the BAO in light of our results. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 16 pages, 10 figures

arXiv:2407.01896 [pdf, other]

LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis

Authors: Tianyu Cui, Shiyu Ma, Ziang Chen, Tong Xiao, Shimin Tao, Yilun Liu, Shenglin Zhang, Duoming Lin, Changchang Liu, Yuzhe Cai, Weibin Meng, Yongqian Sun, Dan Pei

Abstract: Log analysis is crucial for ensuring the orderly and stable operation of information systems, particularly in the field of Artificial Intelligence for IT Operations (AIOps). Large Language Models (LLMs) have demonstrated significant potential in natural language processing tasks. In the AIOps domain, they excel in tasks such as anomaly detection, root cause analysis of faults, operations and maint… ▽ More Log analysis is crucial for ensuring the orderly and stable operation of information systems, particularly in the field of Artificial Intelligence for IT Operations (AIOps). Large Language Models (LLMs) have demonstrated significant potential in natural language processing tasks. In the AIOps domain, they excel in tasks such as anomaly detection, root cause analysis of faults, operations and maintenance script generation, and alert information summarization. However, the performance of current LLMs in log analysis tasks remains inadequately validated. To address this gap, we introduce LogEval, a comprehensive benchmark suite designed to evaluate the capabilities of LLMs in various log analysis tasks for the first time. This benchmark covers tasks such as log parsing, log anomaly detection, log fault diagnosis, and log summarization. LogEval evaluates each task using 4,000 publicly available log data entries and employs 15 different prompts for each task to ensure a thorough and fair assessment. By rigorously evaluating leading LLMs, we demonstrate the impact of various LLM technologies on log analysis performance, focusing on aspects such as self-consistency and few-shot contextual learning. We also discuss findings related to model quantification, Chinese-English question-answering evaluation, and prompt engineering. These findings provide insights into the strengths and weaknesses of LLMs in multilingual environments and the effectiveness of different prompt strategies. Various evaluation methods are employed for different tasks to accurately measure the performance of LLMs in log analysis, ensuring a comprehensive assessment. The insights gained from LogEvals evaluation reveal the strengths and limitations of LLMs in log analysis tasks, providing valuable guidance for researchers and practitioners. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01033 [pdf, other]

Neural Networks Trained by Weight Permutation are Universal Approximators

Authors: Yongqiang Cai, Gaohang Chen, Zhonghua Qiao

Abstract: The universal approximation property is fundamental to the success of neural networks, and has traditionally been achieved by training networks without any constraints on their parameters. However, recent experimental research proposed a novel permutation-based training method, which exhibited a desired classification performance without modifying the exact weight values. In this paper, we provide… ▽ More The universal approximation property is fundamental to the success of neural networks, and has traditionally been achieved by training networks without any constraints on their parameters. However, recent experimental research proposed a novel permutation-based training method, which exhibited a desired classification performance without modifying the exact weight values. In this paper, we provide a theoretical guarantee of this permutation training method by proving its ability to guide a ReLU network to approximate one-dimensional continuous functions. Our numerical results further validate this method's efficiency in regression tasks with various initializations. The notable observations during weight permutation suggest that permutation training can provide an innovative tool for describing network learning behavior. △ Less

Submitted 1 July, 2024; originally announced July 2024.

MSC Class: 41A30; 68T05; 68T07

arXiv:2406.18616 [pdf, other]

Towards Large Language Model Aided Program Refinement

Authors: Yufan Cai, Zhe Hou, Xiaokun Luan, David Miguel Sanan Baena, Yun Lin, Jun Sun, Jin Song Dong

Abstract: Program refinement involves correctness-preserving transformations from formal high-level specification statements into executable programs. Traditional verification tool support for program refinement is highly interactive and lacks automation. On the other hand, the emergence of large language models (LLMs) enables automatic code generations from informal natural language specifications. However… ▽ More Program refinement involves correctness-preserving transformations from formal high-level specification statements into executable programs. Traditional verification tool support for program refinement is highly interactive and lacks automation. On the other hand, the emergence of large language models (LLMs) enables automatic code generations from informal natural language specifications. However, code generated by LLMs is often unreliable. Moreover, the opaque procedure from specification to code provided by LLM is an uncontrolled black box. We propose LLM4PR, a tool that combines formal program refinement techniques with informal LLM-based methods to (1) transform the specification to preconditions and postconditions, (2) automatically build prompts based on refinement calculus, (3) interact with LLM to generate code, and finally, (4) verify that the generated code satisfies the conditions of refinement calculus, thus guaranteeing the correctness of the code. We have implemented our tool using GPT4, Coq, and Coqhammer, and evaluated it on the HumanEval and EvalPlus datasets. △ Less

Submitted 26 June, 2024; originally announced June 2024.

ACM Class: K.6.3

arXiv:2406.18050 [pdf, other]

A Multi-Stage Goal-Driven Network for Pedestrian Trajectory Prediction

Authors: Xiuen Wu, Tao Wang, Yuanzheng Cai, Lingyu Liang, George Papageorgiou

Abstract: Pedestrian trajectory prediction plays a pivotal role in ensuring the safety and efficiency of various applications, including autonomous vehicles and traffic management systems. This paper proposes a novel method for pedestrian trajectory prediction, called multi-stage goal-driven network (MGNet). Diverging from prior approaches relying on stepwise recursive prediction and the singular forecastin… ▽ More Pedestrian trajectory prediction plays a pivotal role in ensuring the safety and efficiency of various applications, including autonomous vehicles and traffic management systems. This paper proposes a novel method for pedestrian trajectory prediction, called multi-stage goal-driven network (MGNet). Diverging from prior approaches relying on stepwise recursive prediction and the singular forecasting of a long-term goal, MGNet directs trajectory generation by forecasting intermediate stage goals, thereby reducing prediction errors. The network comprises three main components: a conditional variational autoencoder (CVAE), an attention module, and a multi-stage goal evaluator. Trajectories are encoded using conditional variational autoencoders to acquire knowledge about the approximate distribution of pedestrians' future trajectories, and combined with an attention mechanism to capture the temporal dependency between trajectory sequences. The pivotal module is the multi-stage goal evaluator, which utilizes the encoded feature vectors to predict intermediate goals, effectively minimizing cumulative errors in the recursive inference process. The effectiveness of MGNet is demonstrated through comprehensive experiments on the JAAD and PIE datasets. Comparative evaluations against state-of-the-art algorithms reveal significant performance improvements achieved by our proposed method. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Paper accepted by 5th International Conference on Computer Vision, Image and Deep Learning (CVIDL 2024)

arXiv:2406.16549 [pdf, other]

Parity-violating primordial gravitational waves from null energy condition violation

Authors: Zi-Wei Jiang, Yong Cai, Fei Wang, Yun-Song Piao

Abstract: We investigate the parity-violating effects in primordial gravitational waves (GWs) due to null energy condition (NEC) violation in two very early universe scenarios: bounce-inflation and intermediate NEC violation during inflation. In both scenarios, we numerically solve the power spectra of parity-violating primordial GWs generated by coupling the background field and the spectator field with th… ▽ More We investigate the parity-violating effects in primordial gravitational waves (GWs) due to null energy condition (NEC) violation in two very early universe scenarios: bounce-inflation and intermediate NEC violation during inflation. In both scenarios, we numerically solve the power spectra of parity-violating primordial GWs generated by coupling the background field and the spectator field with the Nieh-Yan term, respectively. We find that the background field can significantly enhance parity-violating effects at scales corresponding to the maximum of the GW power spectra. In contrast, the parity-violating effects produced by the spectator show significantly weaker observability even if the coupling constant is large. Therefore, in NEC-violating scenarios, the significant observable parity-violating effects in primordial GWs primarily arise from the physics directly related to NEC violation. This result highlights the potential of primordial GWs as crucial tools for exploring NEC-violating and parity-violating physics. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 31 pages

arXiv:2406.16170 [pdf, other]

SimCE: Simplifying Cross-Entropy Loss for Collaborative Filtering

Authors: Xiaodong Yang, Huiyuan Chen, Yuchen Yan, Yuxin Tang, Yuying Zhao, Eric Xu, Yiwei Cai, Hanghang Tong

Abstract: The learning objective is integral to collaborative filtering systems, where the Bayesian Personalized Ranking (BPR) loss is widely used for learning informative backbones. However, BPR often experiences slow convergence and suboptimal local optima, partially because it only considers one negative item for each positive item, neglecting the potential impacts of other unobserved items. To address t… ▽ More The learning objective is integral to collaborative filtering systems, where the Bayesian Personalized Ranking (BPR) loss is widely used for learning informative backbones. However, BPR often experiences slow convergence and suboptimal local optima, partially because it only considers one negative item for each positive item, neglecting the potential impacts of other unobserved items. To address this issue, the recently proposed Sampled Softmax Cross-Entropy (SSM) compares one positive sample with multiple negative samples, leading to better performance. Our comprehensive experiments confirm that recommender systems consistently benefit from multiple negative samples during training. Furthermore, we introduce a \underline{Sim}plified Sampled Softmax \underline{C}ross-\underline{E}ntropy Loss (SimCE), which simplifies the SSM using its upper bound. Our validation on 12 benchmark datasets, using both MF and LightGCN backbones, shows that SimCE significantly outperforms both BPR and SSM. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.15819 [pdf, other]

Automatic AI Model Selection for Wireless Systems: Online Learning via Digital Twinning

Authors: Qiushuo Hou, Matteo Zecchin, Sangwoo Park, Yunlong Cai, Guanding Yu, Kaushik Chowdhury, Osvaldo Simeone

Abstract: In modern wireless network architectures, such as O-RAN, artificial intelligence (AI)-based applications are deployed at intelligent controllers to carry out functionalities like scheduling or power control. The AI "apps" are selected on the basis of contextual information such as network conditions, topology, traffic statistics, and design goals. The mapping between context and AI model parameter… ▽ More In modern wireless network architectures, such as O-RAN, artificial intelligence (AI)-based applications are deployed at intelligent controllers to carry out functionalities like scheduling or power control. The AI "apps" are selected on the basis of contextual information such as network conditions, topology, traffic statistics, and design goals. The mapping between context and AI model parameters is ideally done in a zero-shot fashion via an automatic model selection (AMS) mapping that leverages only contextual information without requiring any current data. This paper introduces a general methodology for the online optimization of AMS mappings. Optimizing an AMS mapping is challenging, as it requires exposure to data collected from many different contexts. Therefore, if carried out online, this initial optimization phase would be extremely time consuming. A possible solution is to leverage a digital twin of the physical system to generate synthetic data from multiple simulated contexts. However, given that the simulator at the digital twin is imperfect, a direct use of simulated data for the optimization of the AMS mapping would yield poor performance when tested in the real system. This paper proposes a novel method for the online optimization of AMS mapping that corrects for the bias of the simulator by means of limited real data collected from the physical system. Experimental results for a graph neural network-based power control app demonstrate the significant advantages of the proposed approach. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: submitted for a journal publication

arXiv:2406.14086 [pdf]

Seg-LSTM: Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images

Authors: Qinfeng Zhu, Yuanzhi Cai, Lei Fan

Abstract: Recent advancements in autoregressive networks with linear complexity have driven significant research progress, demonstrating exceptional performance in large language models. A representative model is the Extended Long Short-Term Memory (xLSTM), which incorporates gating mechanisms and memory structures, performing comparably to Transformer architectures in long-sequence language tasks. Autoregr… ▽ More Recent advancements in autoregressive networks with linear complexity have driven significant research progress, demonstrating exceptional performance in large language models. A representative model is the Extended Long Short-Term Memory (xLSTM), which incorporates gating mechanisms and memory structures, performing comparably to Transformer architectures in long-sequence language tasks. Autoregressive networks such as xLSTM can utilize image serialization to extend their application to visual tasks such as classification and segmentation. Although existing studies have demonstrated Vision-LSTM's impressive results in image classification, its performance in image semantic segmentation remains unverified. Our study represents the first attempt to evaluate the effectiveness of Vision-LSTM in the semantic segmentation of remotely sensed images. This evaluation is based on a specifically designed encoder-decoder architecture named Seg-LSTM, and comparisons with state-of-the-art segmentation networks. Our study found that Vision-LSTM's performance in semantic segmentation was limited and generally inferior to Vision-Transformers-based and Vision-Mamba-based models in most comparative tests. Future research directions for enhancing Vision-LSTM are recommended. The source code is available from https://github.com/zhuqinfeng1999/Seg-LSTM. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.12425 [pdf, other]

Accessing the stringy structure of proton in the framework of Color Glass Condensate

Authors: Wenchang Xiang, Yanbing Cai, Mengliang Wang, Daicui Zhou

Abstract: To investigate the possible geometric structure of the proton, an improved stringy proton model is constructed beyond the smallest distance approximation, where the constituent quarks are connected by gluon tubes which merge at the Fermat point of the quark triangle. The exclusive diffractive vector meson production process in electron-proton deep inelastic scattering is used to test the stringy s… ▽ More To investigate the possible geometric structure of the proton, an improved stringy proton model is constructed beyond the smallest distance approximation, where the constituent quarks are connected by gluon tubes which merge at the Fermat point of the quark triangle. The exclusive diffractive vector meson production process in electron-proton deep inelastic scattering is used to test the stringy structure of the proton. We calculate the coherent and incoherent differential cross sections of the exclusive diffractive $J/Ψ$ photoproduction in the framework of Color Glass Condensate. The results show that our calculations are in good agreement with HERA data. Especially, our results give a better description of the HERA data at small $t$ as compared to the ones from the hot spot model where the constituent quarks are uncorrelated distributed in the proton. Meanwhile, the radius of the proton resulting from the improved stringy proton model is coincident with the one from fitting to the data from GlueX Collaboration at Jefferson Lab, which indicates that the predictive power of the stringy proton model is significantly improved once it goes beyond the smallest distance approximation. Moreover, we assume that the transverse shape of gluon tube satisfies Gaussian distribution, and explore the distribution width of the individual gluon tubes. We find an interesting result that the up quark induced gluon tube seems to have larger distribution width than the down quark induced gluon tube, which is favored by the HERA data. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 13 pages, 4 figures

arXiv:2406.11572 [pdf, other]

Propagative Distance Optimization for Constrained Inverse Kinematics

Authors: Yu Chen, Yilin Cai, Jinyun Xu, Zhongqiang Ren, Guanya Shi, Howie Choset

Abstract: This paper investigates a constrained inverse kinematic (IK) problem that seeks a feasible configuration of an articulated robot under various constraints such as joint limits and obstacle collision avoidance. Due to the high-dimensionality and complex constraints, this problem is often solved numerically via iterative local optimization. Classic local optimization methods take joint angles as the… ▽ More This paper investigates a constrained inverse kinematic (IK) problem that seeks a feasible configuration of an articulated robot under various constraints such as joint limits and obstacle collision avoidance. Due to the high-dimensionality and complex constraints, this problem is often solved numerically via iterative local optimization. Classic local optimization methods take joint angles as the decision variable, which suffers from non-linearity caused by the trigonometric constraints. Recently, distance-based IK methods have been developed as an alternative approach that formulates IK as an optimization over the distances among points attached to the robot and the obstacles. Although distance-based methods have demonstrated unique advantages, they still suffer from low computational efficiency, since these approaches usually ignore the chain structure in the kinematics of serial robots. This paper proposes a new method called propagative distance optimization for constrained inverse kinematics (PDO-IK), which captures and leverages the chain structure in the distance-based formulation and expedites the optimization by computing forward kinematics and the Jacobian propagatively along the kinematic chain. Test results show that PDO-IK runs up to two orders of magnitude faster than the existing distance-based methods under joint limits constraints and obstacle avoidance constraints. It also achieves up to three times higher success rates than the conventional joint-angle-based optimization methods for IK problems. The high runtime efficiency of PDO-IK allows the real-time computation (10$-$1500 Hz) and enables a simulated humanoid robot with 19 degrees of freedom (DoFs) to avoid moving obstacles, which is otherwise hard to achieve with the baselines. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10838 [pdf, other]

Digital Wireless Image Transmission via Distribution Matching

Authors: Pujing Yang, Guangyi Zhang, Yunlong Cai

Abstract: Deep learning-based joint source-channel coding (JSCC) is emerging as a potential technology to meet the demand for effective data transmission, particularly for image transmission. Nevertheless, most existing advancements only consider analog transmission, where the channel symbols are continuous, making them incompatible with practical digital communication systems. In this work, we address this… ▽ More Deep learning-based joint source-channel coding (JSCC) is emerging as a potential technology to meet the demand for effective data transmission, particularly for image transmission. Nevertheless, most existing advancements only consider analog transmission, where the channel symbols are continuous, making them incompatible with practical digital communication systems. In this work, we address this by involving the modulation process and consider mapping the continuous channel symbols into discrete space. Recognizing the non-uniform distribution of the output channel symbols in existing methods, we propose two effective methods to improve the performance. Firstly, we introduce a uniform modulation scheme, where the distance between two constellations is adjustable to match the non-uniform nature of the distribution. In addition, we further design a non-uniform modulation scheme according to the output distribution. To this end, we first generate the constellations by performing feature clustering on an analog image transmission system, then the generated constellations are employed to modulate the continuous channel symbols. For both schemes, we fine-tune the digital system to alleviate the performance loss caused by modulation. Here, the straight-through estimator (STE) is considered to overcome the non-differentiable nature. Our experimental results demonstrate that the proposed schemes significantly outperform existing digital image transmission systems. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10631 [pdf, other]

Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms

Authors: Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng

Abstract: Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-descent-ascent (OGDA). While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several adva… ▽ More Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-descent-ascent (OGDA). While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix and $\widetilde{O}(1/T)$ convergence to coarse correlated equilibria even in general-sum games. However, in terms of last-iterate convergence in two-player zero-sum games, an increasingly popular topic in this area, OGDA guarantees that the duality gap shrinks at a rate of $O(1/\sqrt{T})$, while the best existing last-iterate convergence for OMWU depends on some game-dependent constant that could be arbitrarily large. This begs the question: is this potentially slow last-iterate convergence an inherent disadvantage of OMWU, or is the current analysis too loose? Somewhat surprisingly, we show that the former is true. More generally, we prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue: for any arbitrarily small $δ>0$, there exists a $2\times 2$ matrix game such that the algorithm admits a constant duality gap even after $1/δ$ rounds. This class of algorithms includes OMWU and other standard optimistic follow-the-regularized-leader algorithms. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 27 pages, 4 figures

arXiv:2406.10391 [pdf, other]

BEACON: Benchmark for Comprehensive RNA Tasks and Language Models

Authors: Yuchen Ren, Zhiyuan Chen, Lifeng Qiao, Hongtai Jing, Yuchen Cai, Sheng Xu, Peng Ye, Xinzhu Ma, Siqi Sun, Hongliang Yan, Dong Yuan, Wanli Ouyang, Xihui Liu

Abstract: RNA plays a pivotal role in translating genetic instructions into functional outcomes, underscoring its importance in biological processes and disease mechanisms. Despite the emergence of numerous deep learning approaches for RNA, particularly universal RNA language models, there remains a significant lack of standardized benchmarks to assess the effectiveness of these methods. In this study, we i… ▽ More RNA plays a pivotal role in translating genetic instructions into functional outcomes, underscoring its importance in biological processes and disease mechanisms. Despite the emergence of numerous deep learning approaches for RNA, particularly universal RNA language models, there remains a significant lack of standardized benchmarks to assess the effectiveness of these methods. In this study, we introduce the first comprehensive RNA benchmark BEACON (\textbf{BE}nchm\textbf{A}rk for \textbf{CO}mprehensive R\textbf{N}A Task and Language Models). First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications, enabling a comprehensive assessment of the performance of methods on various RNA understanding tasks. Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models. Third, we investigate the vital RNA language model components from the tokenizer and positional encoding aspects. Notably, our findings emphasize the superiority of single nucleotide tokenization and the effectiveness of Attention with Linear Biases (ALiBi) over traditional positional encoding methods. Based on these insights, a simple yet strong baseline called BEACON-B is proposed, which can achieve outstanding performance with limited data and computational resources. The datasets and source code of our benchmark are available at https://github.com/terry-r123/RNABenchmark. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.10193 [pdf]

Three-dimensional quantum Griffiths singularity in bulk iron-pnictide superconductors

Authors: Shao-Bo Liu, Congkuan Tian, Yongqing Cai, Hang Cui, Xinjian Wei, Mantang Chen, Yang Zhao, Yuan Sui, Shuyue Guan, Shuang Jia, Yu Zhang, Ya Feng, Jiankun Li, Jian Cui, Yuanjun Song, Tingting Hao, Chaoyu Chen, Jian-Hao Chen

Abstract: The quantum Griffiths singularity (QGS) is a phenomenon driven by quenched disorders that break conventional scaling invariance and result in a divergent dynamical critical exponent during quantum phase transitions (QPT). While this phenomenon has been well-documented in low-dimensional conventional superconductors and in three-dimensional (3D) magnetic metal systems, its presence in 3D supercondu… ▽ More The quantum Griffiths singularity (QGS) is a phenomenon driven by quenched disorders that break conventional scaling invariance and result in a divergent dynamical critical exponent during quantum phase transitions (QPT). While this phenomenon has been well-documented in low-dimensional conventional superconductors and in three-dimensional (3D) magnetic metal systems, its presence in 3D superconducting systems and in unconventional high-temperature superconductors (high-Tc SCs) remains unclear. In this study, we report the observation of robust QGS in the superconductor-metal transition (SMT) of both quasi-2D and 3D anisotropic unconventional high-Tc superconductor CaFe1-xNixAsF (x < 5%) bulk single crystals, where the QGS states persist to up to 5.3 K. A comprehensive quantum phase diagram is established that delineates the 3D anisotropic QGS of SMT induced by perpendicular and parallel magnetic field. Our findings reveal the universality of QGS in 3D superconducting systems and unconventional high-Tc SCs, thereby substantially expanding the range of applicability of QGS. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 17 pages, 4 figures

arXiv:2406.08654 [pdf, other]

Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization

Authors: Yuhang Cai, Jingfeng Wu, Song Mei, Michael Lindsey, Peter L. Bartlett

Abstract: The typical training of neural networks using large stepsize gradient descent (GD) under the logistic loss often involves two distinct phases, where the empirical risk oscillates in the first phase but decreases monotonically in the second phase. We investigate this phenomenon in two-layer networks that satisfy a near-homogeneity condition. We show that the second phase begins once the empirical r… ▽ More The typical training of neural networks using large stepsize gradient descent (GD) under the logistic loss often involves two distinct phases, where the empirical risk oscillates in the first phase but decreases monotonically in the second phase. We investigate this phenomenon in two-layer networks that satisfy a near-homogeneity condition. We show that the second phase begins once the empirical risk falls below a certain threshold, dependent on the stepsize. Additionally, we show that the normalized margin grows nearly monotonically in the second phase, demonstrating an implicit bias of GD in training non-homogeneous predictors. If the dataset is linearly separable and the derivative of the activation function is bounded away from zero, we show that the average empirical risk decreases, implying that the first phase must stop in finite steps. Finally, we demonstrate that by choosing a suitably large stepsize, GD that undergoes this phase transition is more efficient than GD that monotonically decreases the risk. Our analysis applies to networks of any width, beyond the well-known neural tangent kernel and mean-field regimes. △ Less

Submitted 26 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: Clarify our results on sigmoid neural networks

arXiv:2406.07961 [pdf, other]

Accurate Explanation Model for Image Classifiers using Class Association Embedding

Authors: Ruitao Xie, Jingbang Chen, Limai Jiang, Rui Xiao, Yi Pan, Yunpeng Cai

Abstract: Image classification is a primary task in data analysis where explainable models are crucially demanded in various applications. Although amounts of methods have been proposed to obtain explainable knowledge from the black-box classifiers, these approaches lack the efficiency of extracting global knowledge regarding the classification task, thus is vulnerable to local traps and often leads to poor… ▽ More Image classification is a primary task in data analysis where explainable models are crucially demanded in various applications. Although amounts of methods have been proposed to obtain explainable knowledge from the black-box classifiers, these approaches lack the efficiency of extracting global knowledge regarding the classification task, thus is vulnerable to local traps and often leads to poor accuracy. In this study, we propose a generative explanation model that combines the advantages of global and local knowledge for explaining image classifiers. We develop a representation learning method called class association embedding (CAE), which encodes each sample into a pair of separated class-associated and individual codes. Recombining the individual code of a given sample with altered class-associated code leads to a synthetic real-looking sample with preserved individual characters but modified class-associated features and possibly flipped class assignments. A building-block coherency feature extraction algorithm is proposed that efficiently separates class-associated features from individual ones. The extracted feature space forms a low-dimensional manifold that visualizes the classification decision patterns. Explanation on each individual sample can be then achieved in a counter-factual generation manner which continuously modifies the sample in one direction, by shifting its class-associated code along a guided path, until its classification outcome is changed. We compare our method with state-of-the-art ones on explaining image classification tasks in the form of saliency maps, demonstrating that our method achieves higher accuracies. The code is available at https://github.com/xrt11/XAI-CODE. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 40th IEEE International Conference on Data Engineering

arXiv:2406.07806 [pdf, other]

Probing the Shock Breakout Signal of SN 2024ggi from the Transformation of Early Flash Spectroscopy

Authors: Jujia Zhang, Luc Dessart, Xiaofeng Wang, Qian Zhai, Yi Yang, Liping Li, Han Lin, Giorgio Valerin, Yongzhi Cai, Zhen Guo, Lingzhi Wang, Zeyi Zhao, Zhenyu Wang, Shengyu Yan

Abstract: We present early-time, hour-to-day cadence spectroscopy of the nearby type II supernova (SN II) 2024ggi, which was discovered at a phase when the SN shock just emerged from the red-supergiant (RSG) progenitor star. Over the first few days after the first light, SN 2024ggi exhibited prominent narrow emission lines formed through intense and persistent photoionization of the nearby circumstellar mat… ▽ More We present early-time, hour-to-day cadence spectroscopy of the nearby type II supernova (SN II) 2024ggi, which was discovered at a phase when the SN shock just emerged from the red-supergiant (RSG) progenitor star. Over the first few days after the first light, SN 2024ggi exhibited prominent narrow emission lines formed through intense and persistent photoionization of the nearby circumstellar material (CSM). In the first 63 hours, spectral lines of He, C, N, and O revealed a rapid rise in ionization, as a result of the progressive sweeping-up of the CSM by the shock. The duration of the IIn-like spectra indicates a dense and relatively confined CSM distribution extending up to $\sim 4 \times 10^{14}$ cm. Spectral modeling reveals a CSM mass loss rate at this region exceeding $5 \times\, 10^{-3}${\rm M}_{\odot} yr$^{-1}$ is required to reproduce low-ionization emissions, which dramatically exceeds that of an RSG. Analyzing H$α$ emission shift implies the velocity of the unshocked outer CSM to be between 20 and 40 km s$^{-1}$, matching the typical wind velocity of an RSG. The differences between the inner and outer layers of the CSM and an RSG progenitor highlight a complex mass loss history before the explosion of SN 2024ggi. △ Less

Submitted 29 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: 10 pages and 5 figures in the main text (16 pages and 9 figures in total). Accepted for publication in ApJL

arXiv:2406.05762 [pdf, ps, other]

Large data global existence for coupled massive-massless wave-type systems

Authors: Yuan Cai, Shijie Dong, Kuijie Li, Jingya Zhao

Abstract: We consider 3D Klein-Gordon-Zakharov (KGZ) and Dirac-Klein-Gordon (DKG) systems, where a common feature is that there exist both massless and massive fields in each system. We establish global existence and asymptotic behavior for both systems with a class of large data. More precisely, in the KGZ system, we allow the massless field to be large, while in the DKG system we allow the massive field t… ▽ More We consider 3D Klein-Gordon-Zakharov (KGZ) and Dirac-Klein-Gordon (DKG) systems, where a common feature is that there exist both massless and massive fields in each system. We establish global existence and asymptotic behavior for both systems with a class of large data. More precisely, in the KGZ system, we allow the massless field to be large, while in the DKG system we allow the massive field to be large. △ Less

Submitted 10 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: All comments are welcome. 58 pages

arXiv:2406.05437 [pdf, ps, other]

From Analog to Digital: Multi-Order Digital Joint Coding-Modulation for Semantic Communication

Authors: Guangyi Zhang, Pujing Yang, Yunlong Cai, Qiyu Hu, Guanding Yu

Abstract: Recent studies in joint source-channel coding (JSCC) have fostered a fresh paradigm in end-to-end semantic communication. Despite notable performance achievements, present initiatives in building semantic communication systems primarily hinge on the transmission of continuous channel symbols, thus presenting challenges in compatibility with established digital systems. In this paper, we introduce… ▽ More Recent studies in joint source-channel coding (JSCC) have fostered a fresh paradigm in end-to-end semantic communication. Despite notable performance achievements, present initiatives in building semantic communication systems primarily hinge on the transmission of continuous channel symbols, thus presenting challenges in compatibility with established digital systems. In this paper, we introduce a novel approach to address this challenge by developing a multi-order digital joint coding-modulation (MDJCM) scheme for semantic communications. Initially, we construct a digital semantic communication system by integrating a multi-order modulation/demodulation module into a nonlinear transform source-channel coding (NTSCC) framework. Recognizing the non-differentiable nature of modulation/demodulation, we propose a novel substitution training strategy. Herein, we treat modulation/demodulation as a constrained quantization process and introduce scaling operations alongside manually crafted noise to approximate this process. As a result, employing this approximation in training semantic communication systems can be deployed in practical modulation/demodulation scenarios with superior performance. Additionally, we demonstrate the equivalence by analyzing the involved probability distribution. Moreover, to further upgrade the performance, we develop a hierarchical dimension-reduction strategy to provide a gradual information extraction process. Extensive experimental evaluations demonstrate the superiority of our proposed method over existing digital and non-digital JSCC techniques. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.03262 [pdf, other]

ADer: A Comprehensive Benchmark for Multi-class Visual Anomaly Detection

Authors: Jiangning Zhang, Haoyang He, Zhenye Gan, Qingdong He, Yuxuan Cai, Zhucun Xue, Yabiao Wang, Chengjie Wang, Lei Xie, Yong Liu

Abstract: Visual anomaly detection aims to identify anomalous regions in images through unsupervised learning paradigms, with increasing application demand and value in fields such as industrial inspection and medical lesion detection. Despite significant progress in recent years, there is a lack of comprehensive benchmarks to adequately evaluate the performance of various mainstream methods across differen… ▽ More Visual anomaly detection aims to identify anomalous regions in images through unsupervised learning paradigms, with increasing application demand and value in fields such as industrial inspection and medical lesion detection. Despite significant progress in recent years, there is a lack of comprehensive benchmarks to adequately evaluate the performance of various mainstream methods across different datasets under the practical multi-class setting. The absence of standardized experimental setups can lead to potential biases in training epochs, resolution, and metric results, resulting in erroneous conclusions. This paper addresses this issue by proposing a comprehensive visual anomaly detection benchmark, \textbf{\textit{ADer}}, which is a modular framework that is highly extensible for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. Additionally, we have open-sourced the GPU-assisted \href{https://pypi.org/project/ADEval}{ADEval} package to address the slow evaluation problem of metrics like time-consuming mAU-PRO on large-scale data, significantly reducing evaluation time by more than \textit{1000-fold}. Through extensive experimental results, we objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection. We hope that \textbf{\textit{ADer}} will become a valuable resource for researchers and practitioners in the field, promoting the development of more robust and generalizable anomaly detection systems. Full codes have been attached in Appendix and open-sourced at \url{https://github.com/zhangzjn/ader}. △ Less

Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.02816 [pdf, other]

Red eminence: The intermediate-luminosity red transient AT 2022fnm

Authors: S. Moran, R. Kotak, M. Fraser, A. Pastorello, Y. -Z. Cai, G. Valerin, S. Mattila, E. Cappellaro, T. Kravtsov, C. P. Gutiérrez, N. Elias-Rosa, A. Reguitti, P. Lundqvist, T. G. Brink, A. V. Filippenko, X. -F. Wang

Abstract: We present results from a five-month-long observing campaign of the unusual transient AT 2022fnm, which displays properties common to both luminous red novae (LRNe) and intermediate-luminosity red transients (ILRTs). Although its photometric evolution is broadly consistent with that of LRNe, no second peak is apparent in its light curve, and its spectral properties are more reminiscent of ILRTs. I… ▽ More We present results from a five-month-long observing campaign of the unusual transient AT 2022fnm, which displays properties common to both luminous red novae (LRNe) and intermediate-luminosity red transients (ILRTs). Although its photometric evolution is broadly consistent with that of LRNe, no second peak is apparent in its light curve, and its spectral properties are more reminiscent of ILRTs. It has a fairly rapid rise time of 5.3$\pm$1.5 d, reaching a peak absolute magnitude of $-12.7\pm$0.1 (in the ATLAS $o$ band). We find some evidence for circumstellar interaction, and a near-infrared excess becomes apparent at approximately +100 d after discovery. We attribute this to a dust echo. Finally, from an analytical diffusion toy model, we attempted to reproduce the pseudo-bolometric light curve and find that a mass of $\sim$4 M$_\odot$ is needed. Overall, the characteristics of AT 2022fnm are consistent with a weak stellar eruption or an explosion reminiscent of low-energy type IIP supernovae, which is compatible with expectations for ILRTs. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted to A&A

arXiv:2406.02222 [pdf, other]

Towards an Extensible Model-Based Digital Twin Framework for Space Launch Vehicles

Authors: Ran Wei, Ruizhe Yang, Shijun Liu, Chongsheng Fan, Rong Zhou, Zekun Wu, Haochi Wang, Yifan Cai, Zhe Jiang

Abstract: The concept of Digital Twin (DT) is increasingly applied to systems on different levels of abstraction across domains, to support monitoring, analysis, diagnosis, decision making and automated control. Whilst the interest in applying DT is growing, the definition of DT is unclear, neither is there a clear pathway to develop DT to fully realise its capacities. In this paper, we revise the concept o… ▽ More The concept of Digital Twin (DT) is increasingly applied to systems on different levels of abstraction across domains, to support monitoring, analysis, diagnosis, decision making and automated control. Whilst the interest in applying DT is growing, the definition of DT is unclear, neither is there a clear pathway to develop DT to fully realise its capacities. In this paper, we revise the concept of DT and its categorisation. We propose a DT maturity matrix, based on which we propose a model-based DT development methodology. We also discuss how model-based tools can be used to support the methodology and present our own supporting tool. We report our preliminary findings with a discussion on a case study, in which we use our proposed methodology and our supporting tool to develop an extensible DT platform for the assurance of Electrical and Electronics systems of space launch vehicles. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.02079 [pdf, ps, other]

Assessing the Performance of Chinese Open Source Large Language Models in Information Extraction Tasks

Authors: Yida Cai, Hao Sun, Hsiu-Yuan Huang, Yunfang Wu

Abstract: Information Extraction (IE) plays a crucial role in Natural Language Processing (NLP) by extracting structured information from unstructured text, thereby facilitating seamless integration with various real-world applications that rely on structured data. Despite its significance, recent experiments focusing on English IE tasks have shed light on the challenges faced by Large Language Models (LLMs… ▽ More Information Extraction (IE) plays a crucial role in Natural Language Processing (NLP) by extracting structured information from unstructured text, thereby facilitating seamless integration with various real-world applications that rely on structured data. Despite its significance, recent experiments focusing on English IE tasks have shed light on the challenges faced by Large Language Models (LLMs) in achieving optimal performance, particularly in sub-tasks like Named Entity Recognition (NER). In this paper, we delve into a comprehensive investigation of the performance of mainstream Chinese open-source LLMs in tackling IE tasks, specifically under zero-shot conditions where the models are not fine-tuned for specific tasks. Additionally, we present the outcomes of several few-shot experiments to further gauge the capability of these models. Moreover, our study includes a comparative analysis between these open-source LLMs and ChatGPT, a widely recognized language model, on IE performance. Through meticulous experimentation and analysis, we aim to provide insights into the strengths, limitations, and potential enhancements of existing Chinese open-source LLMs in the domain of Information Extraction within the context of NLP. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01863 [pdf, other]

Towards Effective Time-Aware Language Representation: Exploring Enhanced Temporal Understanding in Language Models

Authors: Jiexin Wang, Adam Jatowt, Yi Cai

Abstract: In the evolving field of Natural Language Processing, understanding the temporal context of text is increasingly crucial. This study investigates methods to incorporate temporal information during pre-training, aiming to achieve effective time-aware language representation for improved performance on time-related tasks. In contrast to common pre-trained models like BERT, which rely on synchronic d… ▽ More In the evolving field of Natural Language Processing, understanding the temporal context of text is increasingly crucial. This study investigates methods to incorporate temporal information during pre-training, aiming to achieve effective time-aware language representation for improved performance on time-related tasks. In contrast to common pre-trained models like BERT, which rely on synchronic document collections such as BookCorpus and Wikipedia, our research introduces BiTimeBERT 2.0, a novel language model pre-trained on a temporal news article collection. BiTimeBERT 2.0 utilizes this temporal news collection, focusing on three innovative pre-training objectives: Time-Aware Masked Language Modeling (TAMLM), Document Dating (DD), and Time-Sensitive Entity Replacement (TSER). Each objective targets a unique aspect of temporal information. TAMLM is designed to enhance the understanding of temporal contexts and relations, DD integrates document timestamps as chronological markers, and TSER focuses on the temporal dynamics of "Person" entities, recognizing their inherent temporal significance. The experimental results consistently demonstrate that BiTimeBERT 2.0 outperforms models like BERT and other existing pre-trained models, achieving substantial gains across a variety of downstream NLP tasks and applications where time plays a pivotal role. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00965 [pdf, other]

Efficient Behavior Tree Planning with Commonsense Pruning and Heuristic

Authors: Xinglin Chen, Yishuai Cai, Yunxin Mao, Minglong Li, Zhou Yang, Wen Shanghua, Wenjing Yang, Weixia Xu, Ji Wang

Abstract: Behavior Tree (BT) planning is crucial for autonomous robot behavior control, yet its application in complex scenarios is hampered by long planning times. Pruning and heuristics are common techniques to accelerate planning, but it is difficult to design general pruning strategies and heuristic functions for BT planning problems. This paper proposes improving BT planning efficiency for everyday ser… ▽ More Behavior Tree (BT) planning is crucial for autonomous robot behavior control, yet its application in complex scenarios is hampered by long planning times. Pruning and heuristics are common techniques to accelerate planning, but it is difficult to design general pruning strategies and heuristic functions for BT planning problems. This paper proposes improving BT planning efficiency for everyday service robots leveraging commonsense reasoning provided by Large Language Models (LLMs), leading to model-free pre-planning action space pruning and heuristic generation. This approach takes advantage of the modularity and interpretability of BT nodes, represented by predicate logic, to enable LLMs to predict the task-relevant action predicates and objects, and even the optimal path, without an explicit action model. We propose the Heuristic Optimal Behavior Tree Expansion Algorithm (HOBTEA) with two heuristic variants and provide a formal comparison and discussion of their efficiency and optimality. We introduce a learnable and transferable commonsense library to enhance the LLM's reasoning performance without fine-tuning. The action space expansion based on the commonsense library can further increase the success rate of planning. Experiments show the theoretical bounds of commonsense pruning and heuristic, and demonstrate the actual performance of LLM learning and reasoning with the commonsense library. Results in four datasets showcase the practical effectiveness of our approach in everyday service robot applications. △ Less

Submitted 3 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00242 [pdf, other]

Observational test for $f(Q)$ gravity with weak gravitational lensing

Authors: Qingqing Wang, Xin Ren, Yi-Fu Cai, Wentao Luo, Emmanuel N. Saridakis

Abstract: In this article we confront a class of $f(Q)$ gravity models with observational data of galaxy-galaxy lensing. Specifically, we consider the $f(Q)$ gravity models containing a small quadratic correction when compared with General Relativity (GR), and quantify this correction by a model parameter $α$. To derive the observational constraints, we start by extracting the spherically symmetric solution… ▽ More In this article we confront a class of $f(Q)$ gravity models with observational data of galaxy-galaxy lensing. Specifically, we consider the $f(Q)$ gravity models containing a small quadratic correction when compared with General Relativity (GR), and quantify this correction by a model parameter $α$. To derive the observational constraints, we start by extracting the spherically symmetric solutions which correspond to the deviations from the Schwarzschild solution that depends on the model parameter in a two-fold way, i.e., a renormalized mass and a new term proportional to $r^{-2}$. Then, we calculate the effective lensing potential, the deflection angle, the shear component, and the effective Excess Surface Density (ESD) profile. After that, we employ the group catalog and shape catalog from the SDSS DR7 for the lens and source samples respectively. Moreover, we handle the off-center radius as a free parameter and constrain it using the MCMC. Concerning the deviation parameter from GR we derive $α=1.202^{+0.277}_{-0.179}\times 10^{-6} {\rm Mpc}^{-2}$ at 1 $σ$ confidence level, and then compare the fitting efficiency with the standard $Λ$CDM paradigm by applying the AIC and BIC information criteria. Our results indicate that the $f(Q)$ corrections alongside off-center effects yield a scenario that is slightly favored. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 12pages,2figures

arXiv:2405.20693 [pdf, other]

R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction

Authors: Ruyi Zha, Tao Jun Lin, Yuanhao Cai, Jiwen Cao, Yanhao Zhang, Hongdong Li

Abstract: 3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction. However, its potential in volumetric reconstruction tasks, such as X-ray computed tomography, remains under-explored. This paper introduces R2-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction. By carefully deriving X-ray rasterization functions, we discover a p… ▽ More 3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction. However, its potential in volumetric reconstruction tasks, such as X-ray computed tomography, remains under-explored. This paper introduces R2-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction. By carefully deriving X-ray rasterization functions, we discover a previously unknown integration bias in the standard 3DGS formulation, which hampers accurate volume retrieval. To address this issue, we propose a novel rectification technique via refactoring the projection from 3D to 2D Gaussians. Our new method presents three key innovations: (1) introducing tailored Gaussian kernels, (2) extending rasterization to X-ray imaging, and (3) developing a CUDA-based differentiable voxelizer. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches by 0.93 dB in PSNR and 0.014 in SSIM. Crucially, it delivers high-quality results in 3 minutes, which is 12x faster than NeRF-based methods and on par with traditional algorithms. The superior performance and rapid convergence of our method highlight its practical value. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.19804 [pdf]

Exploring Key Factors for Long-Term Vessel Incident Risk Prediction

Authors: Tianyi Chen, Hua Wang, Yutong Cai, Maohan Liang, Qiang Meng

Abstract: Factor analysis acts a pivotal role in enhancing maritime safety. Most previous studies conduct factor analysis within the framework of incident-related label prediction, where the developed models can be categorized into short-term and long-term prediction models. The long-term models offer a more strategic approach, enabling more proactive risk management, compared to the short-term ones. Nevert… ▽ More Factor analysis acts a pivotal role in enhancing maritime safety. Most previous studies conduct factor analysis within the framework of incident-related label prediction, where the developed models can be categorized into short-term and long-term prediction models. The long-term models offer a more strategic approach, enabling more proactive risk management, compared to the short-term ones. Nevertheless, few studies have devoted to rigorously identifying the key factors for the long-term prediction and undertaking comprehensive factor analysis. Hence, this study aims to delve into the key factors for predicting the incident risk levels in the subsequent year given a specific datestamp. The majority of candidate factors potentially contributing to the incident risk are collected from vessels' historical safety performance data spanning up to five years. An improved embedded feature selection, which integrates Random Forest classifier with a feature filtering process is proposed to identify key risk-contributing factors from the candidate pool. The results demonstrate superior performance of the proposed method in incident prediction and factor interpretability. Comprehensive analysis is conducted upon the key factors, which could help maritime stakeholders formulate management strategies for incident prevenion. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.17792 [pdf, other]

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 4 tables

arXiv:2405.16466 [pdf, other]

High-Performance Temporal Reversible Spiking Neural Networks with $O(L)$ Training Memory and $O(1)$ Inference Cost

Authors: JiaKui Hu, Man Yao, Xuerui Qiu, Yuhong Chou, Yuxuan Cai, Ning Qiao, Yonghong Tian, Bo XU, Guoqi Li

Abstract: Multi-timestep simulation of brain-inspired Spiking Neural Networks (SNNs) boost memory requirements during training and increase inference energy cost. Current training methods cannot simultaneously solve both training and inference dilemmas. This work proposes a novel Temporal Reversible architecture for SNNs (T-RevSNN) to jointly address the training and inference challenges by altering the for… ▽ More Multi-timestep simulation of brain-inspired Spiking Neural Networks (SNNs) boost memory requirements during training and increase inference energy cost. Current training methods cannot simultaneously solve both training and inference dilemmas. This work proposes a novel Temporal Reversible architecture for SNNs (T-RevSNN) to jointly address the training and inference challenges by altering the forward propagation of SNNs. We turn off the temporal dynamics of most spiking neurons and design multi-level temporal reversible interactions at temporal turn-on spiking neurons, resulting in a $O(L)$ training memory. Combined with the temporal reversible nature, we redesign the input encoding and network organization of SNNs to achieve $O(1)$ inference energy cost. Then, we finely adjust the internal units and residual connections of the basic SNN block to ensure the effectiveness of sparse temporal information interaction. T-RevSNN achieves excellent accuracy on ImageNet, while the memory efficiency, training time acceleration, and inference energy efficiency can be significantly improved by $8.6 \times$, $2.0 \times$, and $1.6 \times$, respectively. This work is expected to break the technical bottleneck of significantly increasing memory cost and training time for large-scale SNNs while maintaining high performance and low inference energy cost. Source code and models are available at: https://github.com/BICLab/T-RevSNN. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: Accepted by ICML2024

arXiv:2405.15426 [pdf, other]

AuthNet: Neural Network with Integrated Authentication Logic

Authors: Yuling Cai, Fan Xiang, Guozhu Meng, Yinzhi Cao, Kai Chen

Abstract: Model stealing, i.e., unauthorized access and exfiltration of deep learning models, has become one of the major threats. Proprietary models may be protected by access controls and encryption. However, in reality, these measures can be compromised due to system breaches, query-based model extraction or a disgruntled insider. Security hardening of neural networks is also suffering from limits, for e… ▽ More Model stealing, i.e., unauthorized access and exfiltration of deep learning models, has become one of the major threats. Proprietary models may be protected by access controls and encryption. However, in reality, these measures can be compromised due to system breaches, query-based model extraction or a disgruntled insider. Security hardening of neural networks is also suffering from limits, for example, model watermarking is passive, cannot prevent the occurrence of piracy and not robust against transformations. To this end, we propose a native authentication mechanism, called AuthNet, which integrates authentication logic as part of the model without any additional structures. Our key insight is to reuse redundant neurons with low activation and embed authentication bits in an intermediate layer, called a gate layer. Then, AuthNet fine-tunes the layers after the gate layer to embed authentication logic so that only inputs with special secret key can trigger the correct logic of AuthNet. It exhibits two intuitive advantages. It provides the last line of defense, i.e., even being exfiltrated, the model is not usable as the adversary cannot generate valid inputs without the key. Moreover, the authentication logic is difficult to inspect and identify given millions or billions of neurons in the model. We theoretically demonstrate the high sensitivity of AuthNet to the secret key and its high confusion for unauthorized samples. AuthNet is compatible with any convolutional neural network, where our extensive evaluations show that AuthNet successfully achieves the goal in rejecting unauthenticated users (whose average accuracy drops to 22.03%) with a trivial accuracy decrease (1.18% on average) for legitimate users, and is robust against model transformation and adaptive attacks. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15267 [pdf, other]

Off-the-shelf ChatGPT is a Good Few-shot Human Motion Predictor

Authors: Haoxuan Qu, Zhaoyang He, Zeyu Hu, Yujun Cai, Jun Liu

Abstract: To facilitate the application of motion prediction in practice, recently, the few-shot motion prediction task has attracted increasing research attention. Yet, in existing few-shot motion prediction works, a specific model that is dedicatedly trained over human motions is generally required. In this work, rather than tackling this task through training a specific human motion prediction model, we… ▽ More To facilitate the application of motion prediction in practice, recently, the few-shot motion prediction task has attracted increasing research attention. Yet, in existing few-shot motion prediction works, a specific model that is dedicatedly trained over human motions is generally required. In this work, rather than tackling this task through training a specific human motion prediction model, we instead propose a novel FMP-OC framework. In FMP-OC, in a totally training-free manner, we enable Few-shot Motion Prediction, which is a non-language task, to be performed directly via utilizing the Off-the-shelf language model ChatGPT. Specifically, to lead ChatGPT as a language model to become an accurate motion predictor, in FMP-OC, we first introduce several novel designs to facilitate extracting implicit knowledge from ChatGPT. Moreover, we also incorporate our framework with a motion-in-context learning mechanism. Extensive experiments demonstrate the efficacy of our proposed framework. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15196 [pdf, other]

DisC-GS: Discontinuity-aware Gaussian Splatting

Authors: Haoxuan Qu, Zhuoling Li, Hossein Rahmani, Yujun Cai, Jun Liu

Abstract: Recently, Gaussian Splatting, a method that represents a 3D scene as a collection of Gaussian distributions, has gained significant attention in addressing the task of novel view synthesis. In this paper, we highlight a fundamental limitation of Gaussian Splatting: its inability to accurately render discontinuities and boundaries in images due to the continuous nature of Gaussian distributions. To… ▽ More Recently, Gaussian Splatting, a method that represents a 3D scene as a collection of Gaussian distributions, has gained significant attention in addressing the task of novel view synthesis. In this paper, we highlight a fundamental limitation of Gaussian Splatting: its inability to accurately render discontinuities and boundaries in images due to the continuous nature of Gaussian distributions. To address this issue, we propose a novel framework enabling Gaussian Splatting to perform discontinuity-aware image rendering. Additionally, we introduce a Bézier-boundary gradient approximation strategy within our framework to keep the ``differentiability'' of the proposed discontinuity-aware rendering process. Extensive experiments demonstrate the efficacy of our framework. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.15193 [pdf, other]

CuckooGraph: A Scalable and Space-Time Efficient Data Structure for Large-Scale Dynamic Graphs

Authors: Zhuochen Fan, Yalun Cai, Zirui Liu, Jiarui Guo, Xin Fan, Tong Yang, Bin Cui

Abstract: Graphs play an increasingly important role in various big data applications. However, existing graph data structures cannot simultaneously address the performance bottlenecks caused by the dynamic updates, large scale, and high query complexity of current graphs. This paper proposes a novel data structure for large-scale dynamic graphs called CuckooGraph. It does not need to know the amount of gra… ▽ More Graphs play an increasingly important role in various big data applications. However, existing graph data structures cannot simultaneously address the performance bottlenecks caused by the dynamic updates, large scale, and high query complexity of current graphs. This paper proposes a novel data structure for large-scale dynamic graphs called CuckooGraph. It does not need to know the amount of graph data in advance, and can adaptively resize to the most memory-efficient form according to the data scale, realizing multiple graph analytic tasks faster. The key techniques of CuckooGraph include TRANSFORMATION and DENYLIST. TRANSFORMATION fully utilizes the limited memory by designing related data structures that allow flexible space transformations to smoothly expand/tighten the required space depending on the number of incoming items. DENYLIST efficiently handles item insertion failures and further improves processing speed. We conduct extensive experiments, and the results show that CuckooGraph significantly reduces query time by four orders of magnitude on 1-hop successor and precursor queries compared to the state-of-the-art. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.15125 [pdf, other]

HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting

Authors: Yuanhao Cai, Zihao Xiao, Yixun Liang, Minghan Qin, Yulun Zhang, Xiaokang Yang, Yaoyao Liu, Alan Yuille

Abstract: High dynamic range (HDR) novel view synthesis (NVS) aims to create photorealistic images from novel viewpoints using HDR imaging techniques. The rendered HDR images capture a wider range of brightness levels containing more details of the scene than normal low dynamic range (LDR) images. Existing HDR NVS methods are mainly based on NeRF. They suffer from long training time and slow inference speed… ▽ More High dynamic range (HDR) novel view synthesis (NVS) aims to create photorealistic images from novel viewpoints using HDR imaging techniques. The rendered HDR images capture a wider range of brightness levels containing more details of the scene than normal low dynamic range (LDR) images. Existing HDR NVS methods are mainly based on NeRF. They suffer from long training time and slow inference speed. In this paper, we propose a new framework, High Dynamic Range Gaussian Splatting (HDR-GS), which can efficiently render novel HDR views and reconstruct LDR images with a user input exposure time. Specifically, we design a Dual Dynamic Range (DDR) Gaussian point cloud model that uses spherical harmonics to fit HDR color and employs an MLP-based tone-mapper to render LDR color. The HDR and LDR colors are then fed into two Parallel Differentiable Rasterization (PDR) processes to reconstruct HDR and LDR views. To establish the data foundation for the research of 3D Gaussian splatting-based methods in HDR NVS, we recalibrate the camera parameters and compute the initial positions for Gaussian point clouds. Experiments demonstrate that our HDR-GS surpasses the state-of-the-art NeRF-based method by 3.84 and 1.91 dB on LDR and HDR NVS while enjoying 1000x inference speed and only requiring 6.3% training time. Code, models, and recalibrated data will be publicly available at https://github.com/caiyuanhao1998/HDR-GS △ Less

Submitted 27 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: The first 3D Gaussian Splatting-based method for HDR imaging

arXiv:2405.13084 [pdf, other]

The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG)

Authors: Yucheng Cai, Si Chen, Yi Huang, Junlan Feng, Zhijian Ou

Abstract: The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG), Co-located with SLT 2024 The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG), Co-located with SLT 2024 △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.13014 [pdf, other]

QCRD: Quality-guided Contrastive Rationale Distillation for Large Language Models

Authors: Wei Wang, Zhaowei Li, Qi Xu, Yiqing Cai, Hang Song, Qi Qi, Ran Zhou, Zhida Huang, Tao Wang, Li Xiao

Abstract: Deploying large language models (LLMs) poses challenges in terms of resource limitations and inference efficiency. To address these challenges, recent research has focused on using smaller task-specific language models, which are enhanced by distilling the knowledge rationales generated by LLMs. However, previous works mostly emphasize the effectiveness of positive knowledge, while overlooking the… ▽ More Deploying large language models (LLMs) poses challenges in terms of resource limitations and inference efficiency. To address these challenges, recent research has focused on using smaller task-specific language models, which are enhanced by distilling the knowledge rationales generated by LLMs. However, previous works mostly emphasize the effectiveness of positive knowledge, while overlooking the knowledge noise and the exploration of negative knowledge. In this paper, we first propose a general approach called quality-guided contrastive rationale distillation for reasoning capacity learning, considering contrastive learning perspectives. For the learning of positive knowledge, we collect positive rationales through self-consistency to denoise the LLM rationales generated by temperature sampling. For the negative knowledge distillation, we generate negative rationales using temperature sampling for the iteration-before smaller language models themselves. Finally, a contrastive loss is designed to better distill the positive and negative rationales into the smaller language model, where an online-update discriminator is used to judge the qualities of rationales and assign weights for better optimizing the training process. Through extensive experiments on multiple reasoning tasks, we demonstrate that our method consistently outperforms the previous distillation methods and produces higher-quality rationales. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.12725 [pdf, other]

Nearest is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks

Authors: Boheng Li, Yishuo Cai, Haowei Li, Feng Xue, Zhifeng Li, Yiming Li

Abstract: Model quantization is widely used to compress and accelerate deep neural networks. However, recent studies have revealed the feasibility of weaponizing model quantization via implanting quantization-conditioned backdoors (QCBs). These special backdoors stay dormant on released full-precision models but will come into effect after standard quantization. Due to the peculiarity of QCBs, existing defe… ▽ More Model quantization is widely used to compress and accelerate deep neural networks. However, recent studies have revealed the feasibility of weaponizing model quantization via implanting quantization-conditioned backdoors (QCBs). These special backdoors stay dormant on released full-precision models but will come into effect after standard quantization. Due to the peculiarity of QCBs, existing defenses have minor effects on reducing their threats or are even infeasible. In this paper, we conduct the first in-depth analysis of QCBs. We reveal that the activation of existing QCBs primarily stems from the nearest rounding operation and is closely related to the norms of neuron-wise truncation errors (i.e., the difference between the continuous full-precision weights and its quantized version). Motivated by these insights, we propose Error-guided Flipped Rounding with Activation Preservation (EFRAP), an effective and practical defense against QCBs. Specifically, EFRAP learns a non-nearest rounding strategy with neuron-wise error norm and layer-wise activation preservation guidance, flipping the rounding strategies of neurons crucial for backdoor effects but with minimal impact on clean accuracy. Extensive evaluations on benchmark datasets demonstrate that our EFRAP can defeat state-of-the-art QCB attacks under various settings. Code is available at https://github.com/AntigoneRandy/QuantBackdoor_EFRAP. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: Accepted to CVPR 2024. 19 pages, 9 figures

arXiv:2405.08493 [pdf]

Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study

Authors: Qinfeng Zhu, Yuan Fang, Yuanzhi Cai, Cheng Chen, Lei Fan

Abstract: Deep learning methods, especially Convolutional Neural Networks (CNN) and Vision Transformer (ViT), are frequently employed to perform semantic segmentation of high-resolution remotely sensed images. However, CNNs are constrained by their restricted receptive fields, while ViTs face challenges due to their quadratic complexity. Recently, the Mamba model, featuring linear complexity and a global re… ▽ More Deep learning methods, especially Convolutional Neural Networks (CNN) and Vision Transformer (ViT), are frequently employed to perform semantic segmentation of high-resolution remotely sensed images. However, CNNs are constrained by their restricted receptive fields, while ViTs face challenges due to their quadratic complexity. Recently, the Mamba model, featuring linear complexity and a global receptive field, has gained extensive attention for vision tasks. In such tasks, images need to be serialized to form sequences compatible with the Mamba model. Numerous research efforts have explored scanning strategies to serialize images, aiming to enhance the Mamba model's understanding of images. However, the effectiveness of these scanning strategies remains uncertain. In this research, we conduct a comprehensive experimental investigation on the impact of mainstream scanning directions and their combinations on semantic segmentation of remotely sensed images. Through extensive experiments on the LoveDA, ISPRS Potsdam, and ISPRS Vaihingen datasets, we demonstrate that no single scanning strategy outperforms others, regardless of their complexity or the number of scanning directions involved. A simple, single scanning direction is deemed sufficient for semantic segmentation of high-resolution remotely sensed images. Relevant directions for future research are also recommended. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.08327 [pdf, other]

Multiband Simultaneous Photometry of Type II SN 2023ixf with Mephisto and the Twin 50-cm Telescopes

Authors: Yuan-Pei Yang, Xiangkun Liu, Yu Pan, Xinzhong Er, Dezi Liu, Yuan Fang, Guowang Du, Yongzhi Cai, Xian Xu, Xinlei Chen, Xingzhu Zou, Helong Guo, Chenxu Liu, Yehao Cheng, Brajesh Kumar, Xiaowei Liu

Abstract: SN 2023ixf, recently reported in the nearby galaxy M101 at a distance of $6.85~{\rm Mpc}$, was one of the closest and brightest core-collapse supernovae (CCSNe) in the last decade. In this work, we present multi-wavelength photometric observation of SN 2023ixf with the Multi-channel Photometric Survey Telescope (Mephisto) in $uvgr$ bands and with the twin 50-cm telescopes in $griz$ bands. We find… ▽ More SN 2023ixf, recently reported in the nearby galaxy M101 at a distance of $6.85~{\rm Mpc}$, was one of the closest and brightest core-collapse supernovae (CCSNe) in the last decade. In this work, we present multi-wavelength photometric observation of SN 2023ixf with the Multi-channel Photometric Survey Telescope (Mephisto) in $uvgr$ bands and with the twin 50-cm telescopes in $griz$ bands. We find that the bolometric luminosity reached the maximum value of $3\times10^{43}~{\rm erg~s^{-1}}$ at 3.9 days after the explosion and fully settled onto the radioactive tail at $\sim90$ days. The effective temperature decreased from $3.2\times10^4~{\rm K}$ at the first observation and approached to a constant of $\sim(3000-4000)~{\rm K}$ after the first two months. The evolution of the photospheric radius is consistent with a homologous expansion with a velocity of $8700~{\rm km~s^{-1}}$ in the first two months, and it shrunk subsequently. Based on the radioactive tail, the initial nickel mass is about $M_{\rm Ni}\sim 0.098M_\odot$. The explosion energy and the ejecta mass are estimated to be $E\simeq(1.0-5.7)\times10^{51}~{\rm erg}$ and $M_{\rm ej}\simeq(3.8-16)M_\odot$, respectively. The peak bolometric luminosity is proposed to be contributed by the interaction between the ejecta and the circumstellar medium (CSM). We find a shocked CSM mass of $M_{\rm CSM}\sim0.013M_\odot$, a CSM density of $ρ_{\rm CSM}\sim2.5\times10^{-13}~{\rm g~cm^{-3}}$ and a mass loss rate of the progenitor of $\dot M\sim0.022M_\odot~{\rm yr^{-1}}$. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 15 pages, 7 figures, 3 tables. Accepted for publication in ApJ. Comments welcome!

arXiv:2405.07964 [pdf, other]

Early phase simultaneous multi-band observations of Type II supernova SN 2024ggi with Mephisto

Authors: Xinlei Chen, Brajesh Kumar, Xinzhong Er, Helong Guo, Yuan-Pei Yang, Weikang Lin, Yuan Fang, Guowang Du, Chenxu Liu, Jiewei Zhao, Tianyu Zhang, Yuxi Bao, Xingzhu Zou, Yu Pan, Yu Wang, Xufeng Zhu, Kaushik Chatterjee, Xiangkun Liu, Dezi Liu, Edoardo P. Lagioia, Geeta Rangwal, Shiyan Zhong, Jinghua Zhang, Jianhui Lian, Yongzhi Cai , et al. (2 additional authors not shown)

Abstract: We present early-phase good cadence simultaneous multi-band ($ugi$, $vrz$--bands) imaging of nearby supernova SN 2024ggi, which exploded in the nearby galaxy, NGC~3621. A quick follow-up was conducted within less than a day after the explosion and continued $\sim$23 days. The $uvg$-band light curves display a rapid rise ($\sim$1.4 mag day$^{-1}$) to maximum in $\sim$4 days and absolute magnitude… ▽ More We present early-phase good cadence simultaneous multi-band ($ugi$, $vrz$--bands) imaging of nearby supernova SN 2024ggi, which exploded in the nearby galaxy, NGC~3621. A quick follow-up was conducted within less than a day after the explosion and continued $\sim$23 days. The $uvg$-band light curves display a rapid rise ($\sim$1.4 mag day$^{-1}$) to maximum in $\sim$4 days and absolute magnitude $M_{g}\sim$--17.75 mag. The post-peak decay rate in redder bands is $\sim$0.01 mag day$^{-1}$. Different colors (e.g., $u-g$ and $v-r$) of SN~2024ggi are slightly redder than SN~2023ixf. A significant rise ($\sim$12.5 kK) in black-body temperature (optical) was noticed within $\sim$2 days after the explosion, which successively decreased, indicating shock break out inside a dense circumstellar medium (CSM) surrounding the progenitor. Using semi-analytical modeling, the ejecta mass and progenitor radius were estimated as 1.2 M$_{\odot}$ and $\sim$550 R$_{\odot}$, respectively. The archival deep images ($g,r,i,z$-bands) from the Dark Energy Camera Legacy Survey (DECaLS) were examined, and a possible progenitor was detected in each band ($\sim$22--22.5 mag) and had a mass range of 14--17 M$_{\odot}$. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: Pages 9, Table 1, Figures 7

arXiv:2405.07474 [pdf, other]

Integrating Intent Understanding and Optimal Behavior Planning for Behavior Tree Generation from Human Instructions

Authors: Xinglin Chen, Yishuai Cai, Yunxin Mao, Minglong Li, Wenjing Yang, Weixia Xu, Ji Wang

Abstract: Robots executing tasks following human instructions in domestic or industrial environments essentially require both adaptability and reliability. Behavior Tree (BT) emerges as an appropriate control architecture for these scenarios due to its modularity and reactivity. Existing BT generation methods, however, either do not involve interpreting natural language or cannot theoretically guarantee the… ▽ More Robots executing tasks following human instructions in domestic or industrial environments essentially require both adaptability and reliability. Behavior Tree (BT) emerges as an appropriate control architecture for these scenarios due to its modularity and reactivity. Existing BT generation methods, however, either do not involve interpreting natural language or cannot theoretically guarantee the BTs' success. This paper proposes a two-stage framework for BT generation, which first employs large language models (LLMs) to interpret goals from high-level instructions, then constructs an efficient goal-specific BT through the Optimal Behavior Tree Expansion Algorithm (OBTEA). We represent goals as well-formed formulas in first-order logic, effectively bridging intent understanding and optimal behavior planning. Experiments in the service robot validate the proficiency of LLMs in producing grammatically correct and accurately interpreted goals, demonstrate OBTEA's superiority over the baseline BT Expansion algorithm in various metrics, and finally confirm the practical deployability of our framework. The project website is https://dids-ei.github.io/Project/LLM-OBTEA/. △ Less

Submitted 27 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.06885 [pdf, ps, other]

Light curves of the explosion of ONe WD+CO WD merger remnant and type Icn supernovae

Authors: Chengyuan Wu, Shuai Zha, Yongzhi Cai, Zhengyang Zhang, Yi Yang, Danfeng Xiang, Weili Lin, Xiaofeng Wang, Bo Wang

Abstract: Type Icn supernovae (SNe Icn) are a newly detected rare subtype of interacting stripped-envelope supernovae which show narrow P-Cygni lines of highly ionized carbon, oxygen, and neon in their early spectra due to the interactions of the SNe ejecta with dense hydrogen- and helium-deficient circumstellar material (CSM). It has been suggested that SNe Icn may have multiple progenitor channels, such a… ▽ More Type Icn supernovae (SNe Icn) are a newly detected rare subtype of interacting stripped-envelope supernovae which show narrow P-Cygni lines of highly ionized carbon, oxygen, and neon in their early spectra due to the interactions of the SNe ejecta with dense hydrogen- and helium-deficient circumstellar material (CSM). It has been suggested that SNe Icn may have multiple progenitor channels, such as the explosion of carbon-rich Wolf-Rayet stars, or the explosion of stripped-envelope SNe which undergo binary interactions. Among the SNe Icn, SN 2019jc shows unique properties, and previous work inferred that it may stem from the ultra-stripped supernova, but other possibilities still exist. In this work, we aim to simulate the light curves from the explosions of oxygen-neon and carbon-oxygen double white dwarf (WD) merger remnants, and to further investigate whether the corresponding explosions can appear as some particular SNe Icn. We generate the light curves from the explosive remnants and analyse the influence of different parameters on the light curves, such as the ejecta mass, explosion energy, mass of Ni56 and CSM properties. Comparing our results with some SNe Icn, we found that the light curves from the explosions of double WD merger remnants can explain the observable properties of SN 2019jc, which inferred that this special SN Icn may have a different progenitor. Our results indicated that double WD merger may be an alternative model in producing at least one of the SNe Icn. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 11 pages, 5 figures, accepted for publication in ApJL

Showing 1–50 of 1,452 results for author: Cai, Y