subscribe to arXiv mailings

Determining the viewing angle from TeV light curve of GRB 221009A

Abstract: Gamma-ray bursts (GRBs) are among the most powerful explosive events in the universe. LHAASO recently observed the most luminous one: GRB 221009A, and unveiled its TeV light curve. The light curve exhibits a distinct jet break at around 670 seconds, enabling the derivation of the viewing angle based on the smoothness of the jet break. We constructed two models with or without considering the high-… ▽ More Gamma-ray bursts (GRBs) are among the most powerful explosive events in the universe. LHAASO recently observed the most luminous one: GRB 221009A, and unveiled its TeV light curve. The light curve exhibits a distinct jet break at around 670 seconds, enabling the derivation of the viewing angle based on the smoothness of the jet break. We constructed two models with or without considering the high-latitude radiation, where the viewing angle was treated as a free parameter, to fit the TeV light curve. We obtained the viewing angles being 9.4 $\times 10^{-4}$ radians and 5.9 $\times 10^{-3}$ radians, respectively. These values closely resemble an on-axis scenario, given the opening angle is 1.4 $\times 10^{-2}$ radians. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.01364 [pdf]

Co-benefits of Agricultural Diversification and Technology for Food and Nutrition Security in China

Authors: Thomas Cherico Wanger, Estelle Raveloaritiana, Siyan Zeng, Haixiu Gao, Xueqing He, Yiwen Shao, Panlong Wu, Kris A. G. Wyckhuys, Wenwu Zhou, Yi Zou, Zengrong Zhu, Ling Li, Haiyan Cen, Yunhui Liu, Shenggen Fan

Abstract: China is the leading crop producer and has successfully implemented sustainable development programs related to agriculture. Sustainable agriculture has been promoted to achieve national food security targets such as food self-sufficiency through the well-facilitated farmland construction (WFFC) approach. The WFFC is introduced in Chinas current national 10-year plan to consolidate farmlands into… ▽ More China is the leading crop producer and has successfully implemented sustainable development programs related to agriculture. Sustainable agriculture has been promoted to achieve national food security targets such as food self-sufficiency through the well-facilitated farmland construction (WFFC) approach. The WFFC is introduced in Chinas current national 10-year plan to consolidate farmlands into large and simplified production areas to maximise automation, and improve soil fertility and productivity. However, research suggests that diversified and smaller farms faciliate ecosystem services, can improve yield resilience, defuse human health threats, and increase farm profitability. Currently, WFFC has not considered ecological farmland improvements and it may miss long-term environmental benefits including ecosystem service preservation conducive to yields. Moreover, the nutritional status in China has changed in recent decades with undernutrition being dramatically reduced, but the prevalence of overweight, obesity, and chronic diseases being increased. While a strategic choice and management of crop and livestock species can improve nutrition, the environmental and production benefits of agricultural diversification are currently not well interlinked with Chinas food and nutrition security discussions. Lastly, the role of agricultural technology for socioeconomic benefits and the link with diversified agricultural production may provide vast benefits for food security. Here, we focus on the opportunities and co-benefits of agricultural diversification and technology innovations to advance food and nutrition security in China through ecosystem service and yield benefits. Our applied five-point research agenda can provide evidence-based opportunities to support China in reaching its ambitious food security targets through agricultural diversification with global ramifications. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00700 [pdf, other]

Study of $τ^- \to ωπ^- ν_τ$ decay in resonance chiral theory with tensor sources

Authors: Feng-Zhi Chen, Xin-Qiang Li, Shi-Can Peng, Ya-Dong Yang, Yuan-He Zou

Abstract: In this work, we make a study of the $τ^- \to ωπ^-ν_τ$ decay in the framework of low-energy effective field theory. The $J^{\mathcal{P}G}$ decompositions of the quark currents and the $ωπ$ final state show that, besides the Standard Model vector interaction, only the non-standard tensor interaction can have a non-zero contribution to the decay. To discuss its effect, a reliable calculation of the… ▽ More In this work, we make a study of the $τ^- \to ωπ^-ν_τ$ decay in the framework of low-energy effective field theory. The $J^{\mathcal{P}G}$ decompositions of the quark currents and the $ωπ$ final state show that, besides the Standard Model vector interaction, only the non-standard tensor interaction can have a non-zero contribution to the decay. To discuss its effect, a reliable calculation of the $ωπ$ tensor form factors is necessary. After constructing the Lagrangian of resonance chiral theory with external tensor sources, we calculate both the vector and tensor form factors with the relevant resonance couplings determined by combining the QCD short-distance constraints, the fit to the spectral function of $τ^- \to ωπ^-ν_τ$ decay, as well as the matching between the $\mathcal{O}(p^4)$ odd-intrinsic-parity operators after integrating out the vector resonances and the $\mathcal{O}(p^6)$ operators of chiral perturbation theory. The new physics effect is then investigated in the distributions of the spectral function and the forward-backward asymmetry of $τ^- \to ωπ^-ν_τ$ decay. We find that the spectral function is dominated by the Standard Model, and the non-standard tensor contribution is negligible. However, since the forward-backward asymmetry can be only generated with a non-zero tensor interaction, the observable is quite sensitive to this kind of new physics. A future measurement of the observable at the Belle II experiment as well as at the proposed Tera-Z and STCF facilities is, therefore, strongly called for to check the existence of such a non-standard tensor interaction. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 27 pages, 4 tables, and 2 figures

arXiv:2406.17841 [pdf, other]

Probing many-body Bell correlation depth with superconducting qubits

Authors: Ke Wang, Weikang Li, Shibo Xu, Mengyao Hu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong Jin, Xuhao Zhu, Yu Gao, Ziqi Tan, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, Jinfeng Deng, Hang Dong, Xu Zhang, Pengfei Zhang, Wenjie Jiang , et al. (10 additional authors not shown)

Abstract: Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing… ▽ More Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing to machine learning. Nevertheless, the detection of nonlocality, especially in quantum many-body systems, is notoriously challenging. Here, we report an experimental certification of genuine multipartite Bell correlations, which signal nonlocality in quantum many-body systems, up to 24 qubits with a fully programmable superconducting quantum processor. In particular, we employ energy as a Bell correlation witness and variationally decrease the energy of a many-body system across a hierarchy of thresholds, below which an increasing Bell correlation depth can be certified from experimental data. As an illustrating example, we variationally prepare the low-energy state of a two-dimensional honeycomb model with 73 qubits and certify its Bell correlations by measuring an energy that surpasses the corresponding classical bound with up to 48 standard deviations. In addition, we variationally prepare a sequence of low-energy states and certify their genuine multipartite Bell correlations up to 24 qubits via energies measured efficiently by parity oscillation and multiple quantum coherence techniques. Our results establish a viable approach for preparing and certifying multipartite Bell correlations, which provide not only a finer benchmark beyond entanglement for quantum devices, but also a valuable guide towards exploiting multipartite Bell correlation in a wide spectrum of practical applications. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 11 pages,6 figures + 14 pages, 6 figures

arXiv:2406.16722 [pdf, other]

Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba

Authors: Yuchen Zou, Yineng Chen, Zuchao Li, Lefei Zhang, Hai Zhao

Abstract: Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. Nevertheless, the recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into… ▽ More Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. Nevertheless, the recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering: (i) the functioning of the Mamba mechanism and its foundation on the principles of structured state space models; (ii) the proposed improvements and the integration of Mamba with various networks, exploring its potential as a substitute for Transformers; (iii) the combination of Transformers and Mamba to compensate for each other's shortcomings. We have also made efforts to interpret Mamba and Transformer in the framework of kernel functions, allowing for a comparison of their mathematical nature within a unified context. Our paper encompasses the vast majority of improvements related to Mamba to date. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16487 [pdf, other]

Decomposing God Header File via Multi-View Graph Clustering

Authors: Yue Wang, Wenhui Chang, Yanzhen Zou, Tongwei Deng, Bing Xie

Abstract: God Header File refers to a header file with large code size and wide file impact. Such files pose difficulties in code comprehension and slow down compilation, ultimately increasing the maintenance cost during software evolution. Although this concept is similar to God Class, existing refactoring methods for God Classes are inappropriate for God Header Files. The reason lies in the fact that the… ▽ More God Header File refers to a header file with large code size and wide file impact. Such files pose difficulties in code comprehension and slow down compilation, ultimately increasing the maintenance cost during software evolution. Although this concept is similar to God Class, existing refactoring methods for God Classes are inappropriate for God Header Files. The reason lies in the fact that the code elements in header files are mostly short declaration types, and build dependencies of the entire system should be considered with the aim of improving compilation efficiency. Meanwhile, these methods overlook the concern of cyclic dependencies, which holds immense importance in the God Header File decomposition. To address these challenges, this paper proposes a God Header File decomposing approach based on multi-view graph clustering. It first constructs a code element graph with multiple relationships. Then after coarsening the graph, a novel multi-view graph clustering algorithm is applied to identify clusters of closely related code elements, and a heuristic algorithm is introduced to address the cyclic dependencies in the clustering result. We evaluate our approach on a synthetic dataset as well as six real-world God Header Files from different projects. The results show that our approach could achieve 11.5% higher accuracy in comparison to existing God Class refactoring methods. Moreover, our decomposition results attain better modularity on all the real-world God Header Files and reduce recompilation time for historical commits by 15% to 60%. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Be accepted by ICSME 2024

arXiv:2406.15339 [pdf, other]

Image Conductor: Precision Control for Interactive Video Synthesis

Authors: Yaowei Li, Xintao Wang, Zhaoyang Zhang, Zhouxia Wang, Ziyang Yuan, Liangbin Xie, Yuexian Zou, Ying Shan

Abstract: Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for… ▽ More Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for precise control of camera transitions and object movements to generate video assets from a single image. An well-cultivated training strategy is proposed to separate distinct camera and object motion by camera LoRA weights and object LoRA weights. To further address cinematographic variations from ill-posed trajectories, we introduce a camera-free guidance technique during inference, enhancing object movements while eliminating camera transitions. Additionally, we develop a trajectory-oriented video motion data curation pipeline for training. Quantitative and qualitative experiments demonstrate our method's precision and fine-grained control in generating motion-controllable videos from images, advancing the practical application of interactive video synthesis. Project webpage available at https://liyaowei-stu.github.io/project/ImageConductor/ △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Project webpage available at https://liyaowei-stu.github.io/project/ImageConductor/

arXiv:2406.14232 [pdf, other]

Enhancing robustness of data-driven SHM models: adversarial training with circle loss

Authors: Xiangli Yang, Xijie Deng, Hanwei Zhang, Yang Zou, Jianxi Yang

Abstract: Structural health monitoring (SHM) is critical to safeguarding the safety and reliability of aerospace, civil, and mechanical infrastructure. Machine learning-based data-driven approaches have gained popularity in SHM due to advancements in sensors and computational power. However, machine learning models used in SHM are vulnerable to adversarial examples -- even small changes in input can lead to… ▽ More Structural health monitoring (SHM) is critical to safeguarding the safety and reliability of aerospace, civil, and mechanical infrastructure. Machine learning-based data-driven approaches have gained popularity in SHM due to advancements in sensors and computational power. However, machine learning models used in SHM are vulnerable to adversarial examples -- even small changes in input can lead to different model outputs. This paper aims to address this problem by discussing adversarial defenses in SHM. In this paper, we propose an adversarial training method for defense, which uses circle loss to optimize the distance between features in training to keep examples away from the decision boundary. Through this simple yet effective constraint, our method demonstrates substantial improvements in model robustness, surpassing existing defense mechanisms. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 12 pages, 9 figures

arXiv:2406.13626 [pdf, other]

Fine-Tuning Gemma-7B for Enhanced Sentiment Analysis of Financial News Headlines

Authors: Kangtong Mo, Wenyan Liu, Xuanzhen Xu, Chang Yu, Yuelin Zou, Fangqing Xia

Abstract: In this study, we explore the application of sentiment analysis on financial news headlines to understand investor sentiment. By leveraging Natural Language Processing (NLP) and Large Language Models (LLM), we analyze sentiment from the perspective of retail investors. The FinancialPhraseBank dataset, which contains categorized sentiments of financial news headlines, serves as the basis for our an… ▽ More In this study, we explore the application of sentiment analysis on financial news headlines to understand investor sentiment. By leveraging Natural Language Processing (NLP) and Large Language Models (LLM), we analyze sentiment from the perspective of retail investors. The FinancialPhraseBank dataset, which contains categorized sentiments of financial news headlines, serves as the basis for our analysis. We fine-tuned several models, including distilbert-base-uncased, Llama, and gemma-7b, to evaluate their effectiveness in sentiment classification. Our experiments demonstrate that the fine-tuned gemma-7b model outperforms others, achieving the highest precision, recall, and F1 score. Specifically, the gemma-7b model showed significant improvements in accuracy after fine-tuning, indicating its robustness in capturing the nuances of financial sentiment. This model can be instrumental in providing market insights, risk management, and aiding investment decisions by accurately predicting the sentiment of financial news. The results highlight the potential of advanced LLMs in transforming how we analyze and interpret financial information, offering a powerful tool for stakeholders in the financial industry. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13450 [pdf, other]

Federating to Grow Transformers with Constrained Resources without Model Sharing

Authors: Shikun Shen, Yifei Zou, Yuan Yuan, Yanwei Zheng, Peng Li, Xiuzhen Cheng, Dongxiao Yu

Abstract: The high resource consumption of large-scale models discourages resource-constrained users from developing their customized transformers. To this end, this paper considers a federated framework named Fed-Grow for multiple participants to cooperatively scale a transformer from their pre-trained small models. Under the Fed-Grow, a Dual-LiGO (Dual Linear Growth Operator) architecture is designed to h… ▽ More The high resource consumption of large-scale models discourages resource-constrained users from developing their customized transformers. To this end, this paper considers a federated framework named Fed-Grow for multiple participants to cooperatively scale a transformer from their pre-trained small models. Under the Fed-Grow, a Dual-LiGO (Dual Linear Growth Operator) architecture is designed to help participants expand their pre-trained small models to a transformer. In Dual-LiGO, the Local-LiGO part is used to address the heterogeneity problem caused by the various pre-trained models, and the Global-LiGO part is shared to exchange the implicit knowledge from the pre-trained models, local data, and training process of participants. Instead of model sharing, only sharing the Global-LiGO strengthens the privacy of our approach. Compared with several state-of-the-art methods in simulation, our approach has higher accuracy, better precision, and lower resource consumption on computations and communications. To the best of our knowledge, most of the previous model-scaling works are centralized, and our work is the first one that cooperatively grows a transformer from multiple pre-trained heterogeneous models with the user privacy protected in terms of local data and models. We hope that our approach can extend the transformers to the broadly distributed scenarios and encourage more resource-constrained users to enjoy the bonus taken by the large-scale transformers. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13351 [pdf, other]

A Resource-Adaptive Approach for Federated Learning under Resource-Constrained Environments

Authors: Ruirui Zhang, Xingze Wu, Yifei Zou, Zhenzhen Xie, Peng Li, Xiuzhen Cheng, Dongxiao Yu

Abstract: The paper studies a fundamental federated learning (FL) problem involving multiple clients with heterogeneous constrained resources. Compared with the numerous training parameters, the computing and communication resources of clients are insufficient for fast local training and real-time knowledge sharing. Besides, training on clients with heterogeneous resources may result in the straggler proble… ▽ More The paper studies a fundamental federated learning (FL) problem involving multiple clients with heterogeneous constrained resources. Compared with the numerous training parameters, the computing and communication resources of clients are insufficient for fast local training and real-time knowledge sharing. Besides, training on clients with heterogeneous resources may result in the straggler problem. To address these issues, we propose Fed-RAA: a Resource-Adaptive Asynchronous Federated learning algorithm. Different from vanilla FL methods, where all parameters are trained by each participating client regardless of resource diversity, Fed-RAA adaptively allocates fragments of the global model to clients based on their computing and communication capabilities. Each client then individually trains its assigned model fragment and asynchronously uploads the updated result. Theoretical analysis confirms the convergence of our approach. Additionally, we design an online greedy-based algorithm for fragment allocation in Fed-RAA, achieving fairness comparable to an offline strategy. We present numerical results on MNIST, CIFAR-10, and CIFAR-100, along with necessary comparisons and ablation studies, demonstrating the advantages of our work. To the best of our knowledge, this paper represents the first resource-adaptive asynchronous method for fragment-based FL with guaranteed theoretical convergence. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.10744 [pdf, other]

Technique Report of CVPR 2024 PBDL Challenges

Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches. △ Less

Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

arXiv:2406.10534 [pdf, other]

A Finite Difference Informed Graph Network for Solving Steady-State Incompressible Flows on Block-Structured Grids

Authors: Yiye Zou, Tianyu Li, Shufan Zou, Jingyu Wang, Laiping Zhang, Xiaogang Deng

Abstract: Recently, advancements in deep learning have enabled physics-informed neural networks (PINNs) to solve partial differential equations (PDEs). Numerical differentiation (ND) using the finite difference (FD) method is efficient in physics-constrained designs, even in parameterized settings, often employing body-fitted block-structured grids for complex flow cases. However, convolution operators in C… ▽ More Recently, advancements in deep learning have enabled physics-informed neural networks (PINNs) to solve partial differential equations (PDEs). Numerical differentiation (ND) using the finite difference (FD) method is efficient in physics-constrained designs, even in parameterized settings, often employing body-fitted block-structured grids for complex flow cases. However, convolution operators in CNNs for finite differences are typically limited to single-block grids. To address this, we use graphs and graph networks (GNs) to learn flow representations across multi-block structured grids. We propose a graph convolution-based finite difference method (GC-FDM) to train GNs in a physics-constrained manner, enabling differentiable finite difference operations on graph unstructured outputs. Our goal is to solve parametric steady incompressible Navier-Stokes equations for flows around a backward-facing step, a circular cylinder, and double cylinders, using multi-block structured grids. Comparing our method to a CFD solver under various boundary conditions, we demonstrate improved training efficiency and accuracy, achieving a minimum relative error of $10^{-3}$ in velocity field prediction and a 20\% reduction in training cost compared to PINNs. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10248 [pdf, other]

On the Worst Prompt Performance of Large Language Models

Authors: Bowen Cao, Deng Cai, Zhisong Zhang, Yuexian Zou, Wai Lam

Abstract: The performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts, which raises significant concerns about their reliability in real-world scenarios. Existing studies often divide prompts into task-level instructions and case-level inputs and primarily focus on evaluating and improving robustness against variations in tasks-level instructions. However, this setup fail… ▽ More The performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts, which raises significant concerns about their reliability in real-world scenarios. Existing studies often divide prompts into task-level instructions and case-level inputs and primarily focus on evaluating and improving robustness against variations in tasks-level instructions. However, this setup fails to fully address the diversity of real-world user queries and assumes the existence of task-specific datasets. To address these limitations, we introduce RobustAlpacaEval, a new benchmark that consists of semantically equivalent case-level queries and emphasizes the importance of using the worst prompt performance to gauge the lower bound of model performance. Extensive experiments on RobustAlpacaEval with ChatGPT and six open-source LLMs from the Llama, Mistral, and Gemma families uncover substantial variability in model performance; for instance, a difference of 45.48% between the worst and best performance for the Llama-2-70B-chat model, with its worst performance dipping as low as 9.38%. We further illustrate the difficulty in identifying the worst prompt from both model-agnostic and model-dependent perspectives, emphasizing the absence of a shortcut to characterize the worst prompt. We also attempt to enhance the worst prompt performance using existing prompt engineering and prompt consistency methods, but find that their impact is limited. These findings underscore the need to create more resilient LLMs that can maintain high performance across diverse prompts. Data and code are available at https://github.com/cbwbuaa/On-the-Worst-Prompt- Performance-of-LLMs. △ Less

Submitted 21 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.10239 [pdf]

Predict Click-Through Rates with Deep Interest Network Model in E-commerce Advertising

Authors: Chang Zhou, Yang Zhao, Yuelin Zou, Jin Cao, Wenhan Fan, Yi Zhao, Chiyu Cheng

Abstract: This paper proposes new methods to enhance click-through rate (CTR) prediction models using the Deep Interest Network (DIN) model, specifically applied to the advertising system of Alibaba's Taobao platform. Unlike traditional deep learning approaches, this research focuses on localized user behavior activation for tailored ad targeting by leveraging extensive user behavior data. Compared to tradi… ▽ More This paper proposes new methods to enhance click-through rate (CTR) prediction models using the Deep Interest Network (DIN) model, specifically applied to the advertising system of Alibaba's Taobao platform. Unlike traditional deep learning approaches, this research focuses on localized user behavior activation for tailored ad targeting by leveraging extensive user behavior data. Compared to traditional models, this method demonstrates superior ability to handle diverse and dynamic user data, thereby improving the efficiency of ad systems and increasing revenue. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by the 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS 2024), 2024 IEEE

arXiv:2406.09683 [pdf, other]

Interstellar Nitrogen Isotope Ratios: Measurements on tracers of C$^{14}$N and C$^{15}$N

Authors: J. L. Chen, J. S. Zhang, C. Henkel, Y. T. Yan, H. Z. Yu, Y. X. Wang, Y. P. Zou, J. Y. Zhao, X. Y. Wang

Abstract: The nitrogen isotope ratio 14N/15N is a powerful tool to trace Galactic stellar nucleosynthesis and constraining Galactic chemical evolution. Previous observations have found lower 14N/15N ratios in the Galactic center and higher values in the Galactic disk. This is consistent with the inside-out formation scenario of our Milky Way. However, previous studies mostly utilized double isotope ratios a… ▽ More The nitrogen isotope ratio 14N/15N is a powerful tool to trace Galactic stellar nucleosynthesis and constraining Galactic chemical evolution. Previous observations have found lower 14N/15N ratios in the Galactic center and higher values in the Galactic disk. This is consistent with the inside-out formation scenario of our Milky Way. However, previous studies mostly utilized double isotope ratios also including 12C/13C, which introduces additional uncertainties. Here we therefore present observations of C14N and its rare isotopologue, C15N, toward a sample of star forming regions, measured by the IRAM 30 m and/or the ARO 12 m telescope at $λ$ ~3 mm wavelength. For those 35 sources detected in both isotopologues, physical parameters are determined. Furthermore we have obtained nitrogen isotope ratios using the strongest hyperfine components of CN and C15N. For those sources showing small deviations from Local Thermodynamical Equilibrium and/or self-absorption, the weakest hyperfine component, likely free of the latter effect, was used to obtain reliable 14N/15N values. Our measured 14N/15N isotope ratios from C14N and C15N measurements are compatible with those from our earlier measurements of NH3 and 15NH3 (Paper I), i.e., increasing ratios to a Galacticentric distance of ~9 kpc. The unweighted second order polynomial fit yields $\frac{{\rm C^{14}N}}{{\rm C^{15}N}} = (-4.85 \pm 1.89)\;{\rm kpc^{-2}} \times R_{\rm GC}^{2} + (82.11 \pm 31.93) \;{\rm kpc^{-1}} \times R_{\rm GC} - (28.12 \pm 126.62)$. Toward the outer galaxy, the isotope ratio tends to decrease, supporting an earlier finding by H13CN/HC15N. Galactic chemical evolution models are consistent with our measurements of the 14N/15N isotope ratio, i.e. a rising trend from the Galactic center region to approximately 9 kpc, followed by a decreasing trend with increasing $R_{\rm GC}$ toward the outer Galaxy. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 34 pages, 9 figures, 6 tables

Journal ref: The Astrophysical Journal (2004)

arXiv:2406.09555 [pdf, other]

Approximate quantum error correcting codes from conformal field theory

Authors: Shengqi Sang, Timothy H. Hsieh, Yijian Zou

Abstract: The low-energy subspace of a conformal field theory (CFT) can serve as a quantum error correcting code, with important consequences in holography and quantum gravity. We consider generic 1+1D CFT codes under extensive local dephasing channels and analyze their error correctability in the thermodynamic limit. We show that (i) there is a finite decoding threshold if and only if the minimal nonzero s… ▽ More The low-energy subspace of a conformal field theory (CFT) can serve as a quantum error correcting code, with important consequences in holography and quantum gravity. We consider generic 1+1D CFT codes under extensive local dephasing channels and analyze their error correctability in the thermodynamic limit. We show that (i) there is a finite decoding threshold if and only if the minimal nonzero scaling dimension in the fusion algebra generated by the jump operator of the channel is larger than $1/2$ and (ii) the number of protected logical qubits $k \geq Ω( \log \log n)$, where $n$ is the number of physical qubits. As an application, we show that the one-dimensional quantum critical Ising model has a finite threshold for certain types of dephasing noise. Our general results also imply that a CFT code with continuous symmetry saturates a bound on the recovery fidelity for covariant codes. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 5+12 pages, 7 figures

arXiv:2406.08431 [pdf, other]

Diffusion Soup: Model Merging for Text-to-Image Diffusion Models

Authors: Benjamin Biggs, Arjun Seshadri, Yang Zou, Achin Jain, Aditya Golatkar, Yusheng Xie, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto

Abstract: We present Diffusion Soup, a compartmentalization method for Text-to-Image Generation that averages the weights of diffusion models trained on sharded data. By construction, our approach enables training-free continual learning and unlearning with no additional memory or inference costs, since models corresponding to data shards can be added or removed by re-averaging. We show that Diffusion Soup… ▽ More We present Diffusion Soup, a compartmentalization method for Text-to-Image Generation that averages the weights of diffusion models trained on sharded data. By construction, our approach enables training-free continual learning and unlearning with no additional memory or inference costs, since models corresponding to data shards can be added or removed by re-averaging. We show that Diffusion Soup samples from a point in weight space that approximates the geometric mean of the distributions of constituent datasets, which offers anti-memorization guarantees and enables zero-shot style mixing. Empirically, Diffusion Soup outperforms a paragon model trained on the union of all data shards and achieves a 30% improvement in Image Reward (.34 $\to$ .44) on domain sharded data, and a 59% improvement in IR (.37 $\to$ .59) on aesthetic data. In both cases, souping also prevails in TIFA score (respectively, 85.5 $\to$ 86.5 and 85.6 $\to$ 86.8). We demonstrate robust unlearning -- removing any individual domain shard only lowers performance by 1% in IR (.45 $\to$ .44) -- and validate our theoretical insights on anti-memorization using real data. Finally, we showcase Diffusion Soup's ability to blend the distinct styles of models finetuned on different shards, resulting in the zero-shot generation of hybrid styles. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.05685 [pdf, other]

Understanding Open Source Contributor Profiles in Popular Machine Learning Libraries

Authors: Jiawen Liu, Haoxiang Zhang, Ying Zou

Abstract: With the increasing popularity of machine learning (ML), many open-source software (OSS) contributors are attracted to developing and adopting ML approaches. Comprehensive understanding of ML contributors is crucial for successful ML OSS development and maintenance. Without such knowledge, there is a risk of inefficient resource allocation and hindered collaboration in ML OSS projects. Existing re… ▽ More With the increasing popularity of machine learning (ML), many open-source software (OSS) contributors are attracted to developing and adopting ML approaches. Comprehensive understanding of ML contributors is crucial for successful ML OSS development and maintenance. Without such knowledge, there is a risk of inefficient resource allocation and hindered collaboration in ML OSS projects. Existing research focuses on understanding the difficulties and challenges perceived by ML contributors by user surveys. There is a lack of understanding of ML contributors based on their activities tracked from software repositories. In this paper, we aim to understand ML contributors by identifying contributor profiles in ML libraries. We further study contributors' OSS engagement from three aspects: workload composition, work preferences, and technical importance. By investigating 7,640 contributors from 6 popular ML libraries (TensorFlow, PyTorch, Keras, MXNet, Theano, and ONNX), we identify four contributor profiles: Core-Afterhour, Core-Workhour, Peripheral-Afterhour, and Peripheral-Workhour. We find that: 1) project experience, authored files, collaborations, and geographical location are significant features of all profiles; 2) contributors in Core profiles exhibit significantly different OSS engagement compared to Peripheral profiles; 3) contributors' work preferences and workload compositions significantly impact project popularity; 4) long-term contributors evolve towards making fewer, constant, balanced and less technical contributions. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.04151 [pdf, other]

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

Authors: Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Dingwen Yang, Chenyang Liao, Xin Guo, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang

Abstract: Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervis… ▽ More Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervision, which is hard to scale and limits environmental exploration; or they let agents explore and learn in isolated environments, resulting in specialist agents with limited generalization. In this paper, we take the first step towards building generally-capable LLM-based agents with self-evolution ability. We identify a trinity of ingredients: 1) diverse environments for agent exploration and learning, 2) a trajectory set to equip agents with basic capabilities and prior knowledge, and 3) an effective and scalable evolution method. We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration. AgentGym also includes a database with expanded instructions, a benchmark suite, and high-quality trajectories across environments. Next, we propose a novel method, AgentEvol, to investigate the potential of agent self-evolution beyond previously seen data across tasks and environments. Experimental results show that the evolved agents can achieve results comparable to SOTA models. We release the AgentGym suite, including the platform, dataset, benchmark, checkpoints, and algorithm implementations. The AgentGym suite is available on https://github.com/WooooDyy/AgentGym. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Project site: https://agentgym.github.io

arXiv:2406.00432 [pdf, other]

Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner

Authors: Xing Cui, Peipei Li, Zekun Li, Xuannan Liu, Yueying Zou, Zhaofeng He

Abstract: Flexible and accurate drag-based editing is a challenging task that has recently garnered significant attention. Current methods typically model this problem as automatically learning ``how to drag'' through point dragging and often produce one deterministic estimation, which presents two key limitations: 1) Overlooking the inherently ill-posed nature of drag-based editing, where multiple results… ▽ More Flexible and accurate drag-based editing is a challenging task that has recently garnered significant attention. Current methods typically model this problem as automatically learning ``how to drag'' through point dragging and often produce one deterministic estimation, which presents two key limitations: 1) Overlooking the inherently ill-posed nature of drag-based editing, where multiple results may correspond to a given input, as illustrated in Fig.1; 2) Ignoring the constraint of image quality, which may lead to unexpected distortion. To alleviate this, we propose LucidDrag, which shifts the focus from ``how to drag'' to a paradigm of ``what-then-how''. LucidDrag comprises an intention reasoner and a collaborative guidance sampling mechanism. The former infers several optimal editing strategies, identifying what content and what semantic direction to be edited. Based on the former, the latter addresses "how to drag" by collaboratively integrating existing editing guidance with the newly proposed semantic guidance and quality guidance. Specifically, semantic guidance is derived by establishing a semantic editing direction based on reasoned intentions, while quality guidance is achieved through classifier guidance using an image fidelity discriminator. Both qualitative and quantitative comparisons demonstrate the superiority of LucidDrag over previous methods. The code will be released. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.20852 [pdf, other]

Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning

Authors: Xuxin Cheng, Wanshi Xu, Zhihong Zhu, Hongxiang Li, Yuexian Zou

Abstract: Spoken language understanding (SLU) is a core task in task-oriented dialogue systems, which aims at understanding the user's current goal through constructing semantic frames. SLU usually consists of two subtasks, including intent detection and slot filling. Although there are some SLU frameworks joint modeling the two subtasks and achieving high performance, most of them still overlook the inhere… ▽ More Spoken language understanding (SLU) is a core task in task-oriented dialogue systems, which aims at understanding the user's current goal through constructing semantic frames. SLU usually consists of two subtasks, including intent detection and slot filling. Although there are some SLU frameworks joint modeling the two subtasks and achieving high performance, most of them still overlook the inherent relationships between intents and slots and fail to achieve mutual guidance between the two subtasks. To solve the problem, we propose a multi-level multi-grained SLU framework MMCL to apply contrastive learning at three levels, including utterance level, slot level, and word level to enable intent and slot to mutually guide each other. For the utterance level, our framework implements coarse granularity contrastive learning and fine granularity contrastive learning simultaneously. Besides, we also apply the self-distillation method to improve the robustness of the model. Experimental results and further analysis demonstrate that our proposed model achieves new state-of-the-art results on two public multi-intent SLU datasets, obtaining a 2.6 overall accuracy improvement on the MixATIS dataset compared to previous best models. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.17022 [pdf, other]

Compositional Few-Shot Class-Incremental Learning

Authors: Yixiong Zou, Shanghang Zhang, Haichen Zhou, Yuhua Li, Ruixuan Li

Abstract: Few-shot class-incremental learning (FSCIL) is proposed to continually learn from novel classes with only a few samples after the (pre-)training on base classes with sufficient data. However, this remains a challenge. In contrast, humans can easily recognize novel classes with a few samples. Cognitive science demonstrates that an important component of such human capability is compositional learni… ▽ More Few-shot class-incremental learning (FSCIL) is proposed to continually learn from novel classes with only a few samples after the (pre-)training on base classes with sufficient data. However, this remains a challenge. In contrast, humans can easily recognize novel classes with a few samples. Cognitive science demonstrates that an important component of such human capability is compositional learning. This involves identifying visual primitives from learned knowledge and then composing new concepts using these transferred primitives, making incremental learning both effective and interpretable. To imitate human compositional learning, we propose a cognitive-inspired method for the FSCIL task. We define and build a compositional model based on set similarities, and then equip it with a primitive composition module and a primitive reuse module. In the primitive composition module, we propose to utilize the Centered Kernel Alignment (CKA) similarity to approximate the similarity between primitive sets, allowing the training and evaluation based on primitive compositions. In the primitive reuse module, we enhance primitive reusability by classifying inputs based on primitives replaced with the closest primitives from other classes. Experiments on three datasets validate our method, showing it outperforms current state-of-the-art methods with improved interpretability. Our code is available at https://github.com/Zoilsen/Comp-FSCIL. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16437 [pdf, other]

Incremental Pseudo-Labeling for Black-Box Unsupervised Domain Adaptation

Authors: Yawen Zou, Chunzhi Gu, Jun Yu, Shangce Gao, Chao Zhang

Abstract: Black-Box unsupervised domain adaptation (BBUDA) learns knowledge only with the prediction of target data from the source model without access to the source data and source model, which attempts to alleviate concerns about the privacy and security of data. However, incorrect pseudo-labels are prevalent in the prediction generated by the source model due to the cross-domain discrepancy, which may s… ▽ More Black-Box unsupervised domain adaptation (BBUDA) learns knowledge only with the prediction of target data from the source model without access to the source data and source model, which attempts to alleviate concerns about the privacy and security of data. However, incorrect pseudo-labels are prevalent in the prediction generated by the source model due to the cross-domain discrepancy, which may substantially degrade the performance of the target model. To address this problem, we propose a novel approach that incrementally selects high-confidence pseudo-labels to improve the generalization ability of the target model. Specifically, we first generate pseudo-labels using a source model and train a crude target model by a vanilla BBUDA method. Second, we iteratively select high-confidence data from the low-confidence data pool by thresholding the softmax probabilities, prototype labels, and intra-class similarity. Then, we iteratively train a stronger target network based on the crude target model to correct the wrongly labeled samples to improve the accuracy of the pseudo-label. Experimental results demonstrate that the proposed method achieves state-of-the-art black-box unsupervised domain adaptation performance on three benchmark datasets. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16148 [pdf, other]

Accelerating Transformers with Spectrum-Preserving Token Merging

Authors: Hoai-Chau Tran, Duy M. H. Nguyen, Duy M. Nguyen, Trung-Tin Nguyen, Ngan Le, Pengtao Xie, Daniel Sonntag, James Y. Zou, Binh T. Nguyen, Mathias Niepert

Abstract: Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Pr… ▽ More Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Prior works have proposed algorithms based on Bipartite Soft Matching (BSM), which divides tokens into distinct sets and merges the top k similar tokens. However, these methods have significant drawbacks, such as sensitivity to token-splitting strategies and damage to informative tokens in later layers. This paper presents a novel paradigm called PiToMe, which prioritizes the preservation of informative tokens using an additional metric termed the energy score. This score identifies large clusters of similar tokens as high-energy, indicating potential candidates for merging, while smaller (unique and isolated) clusters are considered as low-energy and preserved. Experimental findings demonstrate that PiToMe saved from 40-60\% FLOPs of the base models while exhibiting superior off-the-shelf performance on image classification (0.5\% average performance drop of ViT-MAE-H compared to 2.6\% as baselines), image-text retrieval (0.3\% average performance drop of CLIP on Flickr30k compared to 4.5\% as others), and analogously in visual questions answering with LLaVa-7B. Furthermore, PiToMe is theoretically shown to preserve intrinsic spectral properties of the original token space under mild conditions △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: Version 1

arXiv:2405.15460 [pdf]

TD3 Based Collision Free Motion Planning for Robot Navigation

Authors: Hao Liu, Yi Shen, Chang Zhou, Yuelin Zou, Zijun Gao, Qi Wang

Abstract: This paper addresses the challenge of collision-free motion planning in automated navigation within complex environments. Utilizing advancements in Deep Reinforcement Learning (DRL) and sensor technologies like LiDAR, we propose the TD3-DWA algorithm, an innovative fusion of the traditional Dynamic Window Approach (DWA) with the Twin Delayed Deep Deterministic Policy Gradient (TD3). This hybrid al… ▽ More This paper addresses the challenge of collision-free motion planning in automated navigation within complex environments. Utilizing advancements in Deep Reinforcement Learning (DRL) and sensor technologies like LiDAR, we propose the TD3-DWA algorithm, an innovative fusion of the traditional Dynamic Window Approach (DWA) with the Twin Delayed Deep Deterministic Policy Gradient (TD3). This hybrid algorithm enhances the efficiency of robotic path planning by optimizing the sampling interval parameters of DWA to effectively navigate around both static and dynamic obstacles. The performance of the TD3-DWA algorithm is validated through various simulation experiments, demonstrating its potential to significantly improve the reliability and safety of autonomous navigation systems. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15308 [pdf, other]

Nudging Users to Change Breached Passwords Using the Protection Motivation Theory

Authors: Yixin Zou, Khue Le, Peter Mayer, Alessandro Acquisti, Adam J. Aviv, Florian Schaub

Abstract: We draw on the Protection Motivation Theory (PMT) to design nudges that encourage users to change breached passwords. Our online experiment ($n$=$1,386$) compared the effectiveness of a threat appeal (highlighting negative consequences of breached passwords) and a coping appeal (providing instructions on how to change the breached password) in a 2x2 factorial design. Compared to the control condit… ▽ More We draw on the Protection Motivation Theory (PMT) to design nudges that encourage users to change breached passwords. Our online experiment ($n$=$1,386$) compared the effectiveness of a threat appeal (highlighting negative consequences of breached passwords) and a coping appeal (providing instructions on how to change the breached password) in a 2x2 factorial design. Compared to the control condition, participants receiving the threat appeal were more likely to intend to change their passwords, and participants receiving both appeals were more likely to end up changing their passwords; both comparisons have a small effect size. Participants' password change behaviors are further associated with other factors such as their security attitudes (SA-6) and time passed since the breach, suggesting that PMT-based nudges are useful but insufficient to fully motivate users to change their passwords. Our study contributes to PMT's application in security research and provides concrete design implications for improving compromised credential notifications. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Manuscript under review at ACM Transactions on Computer-Human Interaction

arXiv:2405.15245 [pdf, other]

Cooperative Backdoor Attack in Decentralized Reinforcement Learning with Theoretical Guarantee

Authors: Mengtong Gao, Yifei Zou, Zuyuan Zhang, Xiuzhen Cheng, Dongxiao Yu

Abstract: The safety of decentralized reinforcement learning (RL) is a challenging problem since malicious agents can share their poisoned policies with benign agents. The paper investigates a cooperative backdoor attack in a decentralized reinforcement learning scenario. Differing from the existing methods that hide a whole backdoor attack behind their shared policies, our method decomposes the backdoor be… ▽ More The safety of decentralized reinforcement learning (RL) is a challenging problem since malicious agents can share their poisoned policies with benign agents. The paper investigates a cooperative backdoor attack in a decentralized reinforcement learning scenario. Differing from the existing methods that hide a whole backdoor attack behind their shared policies, our method decomposes the backdoor behavior into multiple components according to the state space of RL. Each malicious agent hides one component in its policy and shares its policy with the benign agents. When a benign agent learns all the poisoned policies, the backdoor attack is assembled in its policy. The theoretical proof is given to show that our cooperative method can successfully inject the backdoor into the RL policies of benign agents. Compared with the existing backdoor attacks, our cooperative method is more covert since the policy from each attacker only contains a component of the backdoor attack and is harder to detect. Extensive simulations are conducted based on Atari environments to demonstrate the efficiency and covertness of our method. To the best of our knowledge, this is the first paper presenting a provable cooperative backdoor attack in decentralized reinforcement learning. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15222 [pdf, other]

Leveraging Unknown Objects to Construct Labeled-Unlabeled Meta-Relationships for Zero-Shot Object Navigation

Authors: Yanwei Zheng, Changrui Li, Chuanlin Lan, Yaling Li, Xiao Zhang, Yifei Zou, Dongxiao Yu, Zhipeng Cai

Abstract: Zero-shot object navigation (ZSON) addresses situation where an agent navigates to an unseen object that does not present in the training set. Previous works mainly train agent using seen objects with known labels, and ignore the seen objects without labels. In this paper, we introduce seen objects without labels, herein termed as ``unknown objects'', into training procedure to enrich the agent's… ▽ More Zero-shot object navigation (ZSON) addresses situation where an agent navigates to an unseen object that does not present in the training set. Previous works mainly train agent using seen objects with known labels, and ignore the seen objects without labels. In this paper, we introduce seen objects without labels, herein termed as ``unknown objects'', into training procedure to enrich the agent's knowledge base with distinguishable but previously overlooked information. Furthermore, we propose the label-wise meta-correlation module (LWMCM) to harness relationships among objects with and without labels, and obtain enhanced objects information. Specially, we propose target feature generator (TFG) to generate the features representation of the unlabeled target objects. Subsequently, the unlabeled object identifier (UOI) module assesses whether the unlabeled target object appears in the current observation frame captured by the camera and produces an adapted target features representation specific to the observed context. In meta contrastive feature modifier (MCFM), the target features is modified via approaching the features of objects within the observation frame while distancing itself from features of unobserved objects. Finally, the meta object-graph learner (MOGL) module is utilized to calculate the relationships among objects based on the features. Experiments conducted on AI2THOR and RoboTHOR platforms demonstrate the effectiveness of our proposed method. △ Less

Submitted 26 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.11826 [pdf, other]

Data quality control system and long-term performance monitor of the LHAASO-KM2A

Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures

arXiv:2405.08586 [pdf, other]

Cross-Domain Feature Augmentation for Domain Generalization

Authors: Yingnan Liu, Yingtian Zou, Rui Qiao, Fusheng Liu, Mong Li Lee, Wynne Hsu

Abstract: Domain generalization aims to develop models that are robust to distribution shifts. Existing methods focus on learning invariance across domains to enhance model robustness, and data augmentation has been widely used to learn invariant predictors, with most methods performing augmentation in the input space. However, augmentation in the input space has limited diversity whereas in the feature spa… ▽ More Domain generalization aims to develop models that are robust to distribution shifts. Existing methods focus on learning invariance across domains to enhance model robustness, and data augmentation has been widely used to learn invariant predictors, with most methods performing augmentation in the input space. However, augmentation in the input space has limited diversity whereas in the feature space is more versatile and has shown promising results. Nonetheless, feature semantics is seldom considered and existing feature augmentation methods suffer from a limited variety of augmented features. We decompose features into class-generic, class-specific, domain-generic, and domain-specific components. We propose a cross-domain feature augmentation method named XDomainMix that enables us to increase sample diversity while emphasizing the learning of invariant representations to achieve domain generalization. Experiments on widely used benchmark datasets demonstrate that our proposed method is able to achieve state-of-the-art performance. Quantitative analysis indicates that our feature augmentation approach facilitates the learning of effective models that are invariant across different domains. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Accepted to the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024); Code is available at https://github.com/NancyQuris/XDomainMix

arXiv:2405.07691 [pdf, other]

Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i… ▽ More The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 11 pages, 5 figures

arXiv:2405.06600 [pdf, other]

Multi-Object Tracking in the Dark

Authors: Xinzhe Wang, Kang Ma, Qiankun Liu, Yunhao Zou, Ying Fu

Abstract: Low-light scenes are prevalent in real-world applications (e.g. autonomous driving and surveillance at night). Recently, multi-object tracking in various practical use cases have received much attention, but multi-object tracking in dark scenes is rarely considered. In this paper, we focus on multi-object tracking in dark scenes. To address the lack of datasets, we first build a Low-light Multi-Ob… ▽ More Low-light scenes are prevalent in real-world applications (e.g. autonomous driving and surveillance at night). Recently, multi-object tracking in various practical use cases have received much attention, but multi-object tracking in dark scenes is rarely considered. In this paper, we focus on multi-object tracking in dark scenes. To address the lack of datasets, we first build a Low-light Multi-Object Tracking (LMOT) dataset. LMOT provides well-aligned low-light video pairs captured by our dual-camera system, and high-quality multi-object tracking annotations for all videos. Then, we propose a low-light multi-object tracking method, termed as LTrack. We introduce the adaptive low-pass downsample module to enhance low-frequency components of images outside the sensor noises. The degradation suppression learning strategy enables the model to learn invariant information under noise disturbance and image quality degradation. These components improve the robustness of multi-object tracking in dark scenes. We conducted a comprehensive analysis of our LMOT dataset and proposed LTrack. Experimental results demonstrate the superiority of the proposed method and its competitiveness in real night low-light scenes. Dataset and Code: https: //github.com/ying-fu/LMOT △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: Accepted by CVPR2024

arXiv:2405.06007 [pdf]

Anomalous properties of spark plasma sintered boron nitride solids

Authors: Abhijit Biswas, Peter Serles, Gustavo A. Alvarez, Jesse Schimpf, Michel Hache, Jonathan Kong, Pedro Guerra Demingos, Bo Yuan, Tymofii S. Pieshkov, Chenxi Li, Anand B. Puthirath, Bin Gao, Tia Gray, Xiang Zhang, Jishnu Murukeshan, Robert Vajtai, Pengcheng Dai, Chandra Veer Singh, Jane Howe, Yu Zou, Lane W. Martin, James Patrick Clancy, Zhiting Tian, Tobin Filleter, Pulickel M. Ajayan

Abstract: Hexagonal boron nitride (h-BN) is brittle, however, its atomic-scale structural engineering can lead to unprecedented physical properties. Here we report the bulk synthesis of high-density crystalline h-BN solids by using high-temperature spark plasma sintering (SPS) of micron size h-BN powders. In addition to the high mechanical strength and ductile response of such materials, we have obtained an… ▽ More Hexagonal boron nitride (h-BN) is brittle, however, its atomic-scale structural engineering can lead to unprecedented physical properties. Here we report the bulk synthesis of high-density crystalline h-BN solids by using high-temperature spark plasma sintering (SPS) of micron size h-BN powders. In addition to the high mechanical strength and ductile response of such materials, we have obtained anomalous values of dielectric constant beyond theoretical limits, high thermal conductivity, and exceptional neutron radiation shielding capability. Through exhaustive characterizations we reveal that SPS induces non-basal plane crystallinity, twisting of layers, and facilitates inter-grain fusion with a high degree of in-plane alignment across macroscale dimensions, resulting in near-theoretical density and anomalous properties. Our findings highlight the importance of material design, via new approaches such as twisting and interconnections between atomically thin layers, to create novel ceramics with properties that could go beyond their intrinsic theoretical predictions. △ Less

Submitted 10 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: Authors revised version, 46 pages, 4 figures

arXiv:2405.04918 [pdf, other]

Delve into Base-Novel Confusion: Redundancy Exploration for Few-Shot Class-Incremental Learning

Authors: Haichen Zhou, Yixiong Zou, Ruixuan Li, Yuhua Li, Kui Xiao

Abstract: Few-shot class-incremental learning (FSCIL) aims to acquire knowledge from novel classes with limited samples while retaining information about base classes. Existing methods address catastrophic forgetting and overfitting by freezing the feature extractor during novel-class learning. However, these methods usually tend to cause the confusion between base and novel classes, i.e., classifying novel… ▽ More Few-shot class-incremental learning (FSCIL) aims to acquire knowledge from novel classes with limited samples while retaining information about base classes. Existing methods address catastrophic forgetting and overfitting by freezing the feature extractor during novel-class learning. However, these methods usually tend to cause the confusion between base and novel classes, i.e., classifying novel-class samples into base classes. In this paper, we delve into this phenomenon to study its cause and solution. We first interpret the confusion as the collision between the novel-class and the base-class region in the feature space. Then, we find the collision is caused by the label-irrelevant redundancies within the base-class feature and pixel space. Through qualitative and quantitative experiments, we identify this redundancy as the shortcut in the base-class training, which can be decoupled to alleviate the collision. Based on this analysis, to alleviate the collision between base and novel classes, we propose a method for FSCIL named Redundancy Decoupling and Integration (RDI). RDI first decouples redundancies from base-class space to shrink the intra-base-class feature space. Then, it integrates the redundancies as a dummy class to enlarge the inter-base-class feature space. This process effectively compresses the base-class feature space, creating buffer space for novel classes and alleviating the model's confusion between the base and novel classes. Extensive experiments across benchmark datasets, including CIFAR-100, miniImageNet, and CUB-200-2011 demonstrate that our method achieves state-of-the-art performance. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.04867 [pdf, other]

MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

arXiv:2405.04466 [pdf, other]

A fully differentiable GNN-based PDE Solver: With Applications to Poisson and Navier-Stokes Equations

Authors: Tianyu Li, Yiye Zou, Shufan Zou, Xinghua Chang, Laiping Zhang, Xiaogang Deng

Abstract: In this study, we present a novel computational framework that integrates the finite volume method with graph neural networks to address the challenges in Physics-Informed Neural Networks(PINNs). Our approach leverages the flexibility of graph neural networks to adapt to various types of two-dimensional unstructured grids, enhancing the model's applicability across different physical equations and… ▽ More In this study, we present a novel computational framework that integrates the finite volume method with graph neural networks to address the challenges in Physics-Informed Neural Networks(PINNs). Our approach leverages the flexibility of graph neural networks to adapt to various types of two-dimensional unstructured grids, enhancing the model's applicability across different physical equations and boundary conditions. The core innovation lies in the development of an unsupervised training algorithm that utilizes GPU parallel computing to implement a fully differentiable finite volume method discretization process. This method includes differentiable integral and gradient reconstruction algorithms, enabling the model to directly solve partial-differential equations(PDEs) during training without the need for pre-computed data. Our results demonstrate the model's superior mesh generalization and its capability to handle multiple boundary conditions simultaneously, significantly boosting its generalization capabilities. The proposed method not only shows potential for extensive applications in CFD but also establishes a new paradigm for integrating traditional numerical methods with deep learning technologies, offering a robust platform for solving complex physical problems. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.04434 [pdf, other]

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models. △ Less

Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.02521 [pdf, other]

The hyperbolic X-ray transform: new range characterizations, mapping properties and functional relations

Authors: Nikolas Eptaminitakis, François Monard, Yuzhou Zou

Abstract: We derive new singular value decompositions and range characterizations for the X-ray transform on the Poincaré disk, a surjectivity result for the backprojection operator, as well as new functional settings and intertwining relations with appropriately defined differential operators. The approach mainly exploits analogous results recently obtained for the Euclidean disk, together with the project… ▽ More We derive new singular value decompositions and range characterizations for the X-ray transform on the Poincaré disk, a surjectivity result for the backprojection operator, as well as new functional settings and intertwining relations with appropriately defined differential operators. The approach mainly exploits analogous results recently obtained for the Euclidean disk, together with the projective equivalence between the two models. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 43 pages, 1 figure

arXiv:2405.02062 [pdf, other]

Dyna-Style Learning with A Macroscopic Model for Vehicle Platooning in Mixed-Autonomy Traffic

Authors: Yichuan Zou, Li Jin, Xi Xiong

Abstract: Platooning of connected and autonomous vehicles (CAVs) plays a vital role in modernizing highways, ushering in enhanced efficiency and safety. This paper explores the significance of platooning in smart highways, employing a coupled partial differential equation (PDE) and ordinary differential equation (ODE) model to elucidate the complex interaction between bulk traffic flow and CAV platoons. Our… ▽ More Platooning of connected and autonomous vehicles (CAVs) plays a vital role in modernizing highways, ushering in enhanced efficiency and safety. This paper explores the significance of platooning in smart highways, employing a coupled partial differential equation (PDE) and ordinary differential equation (ODE) model to elucidate the complex interaction between bulk traffic flow and CAV platoons. Our study focuses on developing a Dyna-style planning and learning framework tailored for platoon control, with a specific goal of reducing fuel consumption. By harnessing the coupled PDE-ODE model, we improve data efficiency in Dyna-style learning through virtual experiences. Simulation results validate the effectiveness of our macroscopic model in modeling platoons within mixed-autonomy settings, demonstrating a notable $10.11\%$ reduction in vehicular fuel consumption compared to conventional approaches. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.02004 [pdf, other]

M${^2}$Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation

Authors: Yingshuang Zou, Yikang Ding, Xi Qiu, Haoqian Wang, Haotian Zhang

Abstract: This paper presents a novel self-supervised two-frame multi-camera metric depth estimation network, termed M${^2}$Depth, which is designed to predict reliable scale-aware surrounding depth in autonomous driving. Unlike the previous works that use multi-view images from a single time-step or multiple time-step images from a single camera, M${^2}$Depth takes temporally adjacent two-frame images from… ▽ More This paper presents a novel self-supervised two-frame multi-camera metric depth estimation network, termed M${^2}$Depth, which is designed to predict reliable scale-aware surrounding depth in autonomous driving. Unlike the previous works that use multi-view images from a single time-step or multiple time-step images from a single camera, M${^2}$Depth takes temporally adjacent two-frame images from multiple cameras as inputs and produces high-quality surrounding depth. We first construct cost volumes in spatial and temporal domains individually and propose a spatial-temporal fusion module that integrates the spatial-temporal information to yield a strong volume presentation. We additionally combine the neural prior from SAM features with internal features to reduce the ambiguity between foreground and background and strengthen the depth edges. Extensive experimental results on nuScenes and DDAD benchmarks show M${^2}$Depth achieves state-of-the-art performance. More results can be found in https://heiheishuang.xyz/M2Depth . △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.01927 [pdf, other]

SlotGAT: Slot-based Message Passing for Heterogeneous Graph Neural Network

Authors: Ziang Zhou, Jieming Shi, Renchi Yang, Yuanhang Zou, Qing Li

Abstract: Heterogeneous graphs are ubiquitous to model complex data. There are urgent needs on powerful heterogeneous graph neural networks to effectively support important applications. We identify a potential semantic mixing issue in existing message passing processes, where the representations of the neighbors of a node $v$ are forced to be transformed to the feature space of $v$ for aggregation, though… ▽ More Heterogeneous graphs are ubiquitous to model complex data. There are urgent needs on powerful heterogeneous graph neural networks to effectively support important applications. We identify a potential semantic mixing issue in existing message passing processes, where the representations of the neighbors of a node $v$ are forced to be transformed to the feature space of $v$ for aggregation, though the neighbors are in different types. That is, the semantics in different node types are entangled together into node $v$'s representation. To address the issue, we propose SlotGAT with separate message passing processes in slots, one for each node type, to maintain the representations in their own node-type feature spaces. Moreover, in a slot-based message passing layer, we design an attention mechanism for effective slot-wise message aggregation. Further, we develop a slot attention technique after the last layer of SlotGAT, to learn the importance of different slots in downstream tasks. Our analysis indicates that the slots in SlotGAT can preserve different semantics in various feature spaces. The superiority of SlotGAT is evaluated against 13 baselines on 6 datasets for node classification and link prediction. Our code is at https://github.com/scottjiao/SlotGAT_ICML23/. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: Published as a conference paper at ICML 2023

arXiv:2405.00311 [pdf]

Three-layer deep learning network random trees for fault detection in chemical production process

Authors: Ming Lu, Zhen Gao, Ying Zou, Zuguo Chen, Pei Li

Abstract: With the development of technology, the chemical production process is becoming increasingly complex and large-scale, making fault detection particularly important. However, current detective methods struggle to address the complexities of large-scale production processes. In this paper, we integrate the strengths of deep learning and machine learning technologies, combining the advantages of bidi… ▽ More With the development of technology, the chemical production process is becoming increasingly complex and large-scale, making fault detection particularly important. However, current detective methods struggle to address the complexities of large-scale production processes. In this paper, we integrate the strengths of deep learning and machine learning technologies, combining the advantages of bidirectional long and short-term memory neural networks, fully connected neural networks, and the extra trees algorithm to propose a novel fault detection model named three-layer deep learning network random trees (TDLN-trees). First, the deep learning component extracts temporal features from industrial data, combining and transforming them into a higher-level data representation. Second, the machine learning component processes and classifies the features extracted in the first step. An experimental analysis based on the Tennessee Eastman process verifies the superiority of the proposed method. △ Less

Submitted 11 July, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.17559 [pdf, other]

Decoherence in Neutrino Oscillation at the ESSnuSB Experiment

Authors: ESSnuSB, :, J. Aguilar, M. Anastasopoulos, E. Baussan, A. K. Bhattacharyya, A. Bignami, M. Blennow, M. Bogomilov, B. Bolling, E. Bouquerel, F. Bramati, A. Branca, G. Brunetti, I. Bustinduy, C. J. Carlile, J. Cederkall, T. W. Choi, S. Choubey, P. Christiansen, M. Collins, E. Cristaldo Morales, P. Cupiał, H. Danared, D. Dancila , et al. (72 additional authors not shown)

Abstract: Neutrino oscillation experiments provide a unique window in exploring several new physics scenarios beyond the standard three flavour. One such scenario is quantum decoherence in neutrino oscillation which tends to destroy the interference pattern of neutrinos reaching the far detector from the source. In this work, we study the decoherence in neutrino oscillation in the context of the ESSnuSB exp… ▽ More Neutrino oscillation experiments provide a unique window in exploring several new physics scenarios beyond the standard three flavour. One such scenario is quantum decoherence in neutrino oscillation which tends to destroy the interference pattern of neutrinos reaching the far detector from the source. In this work, we study the decoherence in neutrino oscillation in the context of the ESSnuSB experiment. We consider the energy-independent decoherence parameter and derive the analytical expressions for P$_{μe}$ and P$_{μμ}$ probabilities in vacuum. We have computed the capability of ESSnuSB to put bounds on the decoherence parameters namely, $Γ_{21}$ and $Γ_{32}$ and found that the constraints on $Γ_{21}$ are competitive compared to the DUNE bounds and better than the current T2K and MINOS ones. We have also investigated the impact of decoherence on the ESSnuSB measurement of the Dirac CP phase $δ_{\rm CP}$ and concluded that it remains robust in the presence of new physics. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 30 pages, 9 figures, 2 tables

arXiv:2404.17379 [pdf]

Adaptive speed planning for Unmanned Vehicle Based on Deep Reinforcement Learning

Authors: Hao Liu, Yi Shen, Wenjing Zhou, Yuelin Zou, Chang Zhou, Shuyao He

Abstract: In order to solve the problem of frequent deceleration of unmanned vehicles when approaching obstacles, this article uses a Deep Q-Network (DQN) and its extension, the Double Deep Q-Network (DDQN), to develop a local navigation system that adapts to obstacles while maintaining optimal speed planning. By integrating improved reward functions and obstacle angle determination methods, the system demo… ▽ More In order to solve the problem of frequent deceleration of unmanned vehicles when approaching obstacles, this article uses a Deep Q-Network (DQN) and its extension, the Double Deep Q-Network (DDQN), to develop a local navigation system that adapts to obstacles while maintaining optimal speed planning. By integrating improved reward functions and obstacle angle determination methods, the system demonstrates significant enhancements in maneuvering capabilities without frequent decelerations. Experiments conducted in simulated environments with varying obstacle densities confirm the effectiveness of the proposed method in achieving more stable and efficient path planning. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.17046 [pdf, other]

Unraveling Code Clone Dynamics in Deep Learning Frameworks

Authors: Maram Assi, Safwat Hassan, Ying Zou

Abstract: Deep Learning (DL) frameworks play a critical role in advancing artificial intelligence, and their rapid growth underscores the need for a comprehensive understanding of software quality and maintainability. DL frameworks, like other systems, are prone to code clones. Code clones refer to identical or highly similar source code fragments within the same project or even across different projects. C… ▽ More Deep Learning (DL) frameworks play a critical role in advancing artificial intelligence, and their rapid growth underscores the need for a comprehensive understanding of software quality and maintainability. DL frameworks, like other systems, are prone to code clones. Code clones refer to identical or highly similar source code fragments within the same project or even across different projects. Code cloning can have positive and negative implications for software development, influencing maintenance, readability, and bug propagation. In this paper, we aim to address the knowledge gap concerning the evolutionary dimension of code clones in DL frameworks and the extent of code reuse across these frameworks. We empirically analyze code clones in nine popular DL frameworks, i.e., TensorFlow, Paddle, PyTorch, Aesara, Ray, MXNet, Keras, Jax and BentoML, to investigate (1) the characteristics of the long-term code cloning evolution over releases in each framework, (2) the short-term, i.e., within-release, code cloning patterns and their influence on the long-term trends, and (3) the file-level code clones within the DL frameworks. Our findings reveal that DL frameworks adopt four distinct cloning trends and that these trends present some common and distinct characteristics. For instance, bug-fixing activities persistently happen in clones irrespective of the clone evolutionary trend but occur more in the "Serpentine" trend. Moreover, the within release level investigation demonstrates that short-term code cloning practices impact long-term cloning trends. The cross-framework code clone investigation reveals the presence of functional and architectural adaptation file-level cross-framework code clones across the nine studied frameworks. We provide insights that foster robust clone practices and collaborative maintenance in the development of DL frameworks. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 37 pages

arXiv:2404.16484 [pdf, other]

Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: CVPR 2024, AI for Streaming (AIS) Workshop

arXiv:2404.15878 [pdf, other]

Simulating unsteady fluid flows on a superconducting quantum processor

Authors: Zhaoyuan Meng, Jiarun Zhong, Shibo Xu, Ke Wang, Jiachen Chen, Feitong Jin, Xuhao Zhu, Yu Gao, Yaozu Wu, Chuanyu Zhang, Ning Wang, Yiren Zou, Aosai Zhang, Zhengyi Cui, Fanhao Shen, Zehang Bao, Zitian Zhu, Ziqi Tan, Tingting Li, Pengfei Zhang, Shiying Xiong, Hekang Li, Qiujiang Guo, Zhen Wang, Chao Song , et al. (2 additional authors not shown)

Abstract: Recent advancements of intermediate-scale quantum processors have triggered tremendous interest in the exploration of practical quantum advantage. The simulation of fluid dynamics, a highly challenging problem in classical physics but vital for practical applications, emerges as a good candidate for showing quantum utility. Here, we report an experiment on the digital simulation of unsteady flows,… ▽ More Recent advancements of intermediate-scale quantum processors have triggered tremendous interest in the exploration of practical quantum advantage. The simulation of fluid dynamics, a highly challenging problem in classical physics but vital for practical applications, emerges as a good candidate for showing quantum utility. Here, we report an experiment on the digital simulation of unsteady flows, which consists of quantum encoding, evolution, and detection of flow states, with a superconducting quantum processor. The quantum algorithm is based on the Hamiltonian simulation using the hydrodynamic formulation of the Schrödinger equation. With the median fidelities of 99.97% and 99.67% for parallel single- and two-qubit gates respectively, we simulate the dynamics of a two-dimensional (2D) compressible diverging flow and a 2D decaying vortex with ten qubits. The experimental results well capture the temporal evolution of averaged density and momentum profiles, and qualitatively reproduce spatial flow fields with moderate noises. This work demonstrates the potential of quantum computing in simulating more complex flows, such as turbulence, for practical applications. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.13016 [pdf, other]

Optimizing Calibration by Gaining Aware of Prediction Correctness

Authors: Yuchi Liu, Lei Wang, Yuli Zou, James Zou, Liang Zheng

Abstract: Model calibration aims to align confidence with prediction correctness. The Cross-Entropy (CE) loss is widely used for calibrator training, which enforces the model to increase confidence on the ground truth class. However, we find the CE loss has intrinsic limitations. For example, for a narrow misclassification, a calibrator trained by the CE loss often produces high confidence on the wrongly pr… ▽ More Model calibration aims to align confidence with prediction correctness. The Cross-Entropy (CE) loss is widely used for calibrator training, which enforces the model to increase confidence on the ground truth class. However, we find the CE loss has intrinsic limitations. For example, for a narrow misclassification, a calibrator trained by the CE loss often produces high confidence on the wrongly predicted class (e.g., a test sample is wrongly classified and its softmax score on the ground truth class is around 0.4), which is undesirable. In this paper, we propose a new post-hoc calibration objective derived from the aim of calibration. Intuitively, the proposed objective function asks that the calibrator decrease model confidence on wrongly predicted samples and increase confidence on correctly predicted samples. Because a sample itself has insufficient ability to indicate correctness, we use its transformed versions (e.g., rotated, greyscaled and color-jittered) during calibrator training. Trained on an in-distribution validation set and tested with isolated, individual test samples, our method achieves competitive calibration performance on both in-distribution and out-of-distribution test sets compared with the state of the art. Further, our analysis points out the difference between our method and commonly used objectives such as CE loss and mean square error loss, where the latters sometimes deviates from the calibration aim. △ Less

Submitted 24 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

Showing 1–50 of 807 results for author: Zou, Y