subscribe to arXiv mailings

HRRPGraphNet: A Graph Neural Network Based Approach for HRRP Radar Target Recognition

Authors: Lingfeng Chen, Panhe Hu, Zhiliang Pan, Xiao Sun, Zehao Wang

Abstract: High Resolution Range Profiles (HRRP) have become a key area of focus in the domain of Radar Automatic Target Recognition (RATR). Despite the success of data-driven neural network-based HRRP recognition, challenges such as insufficient training samples persist in its real-world application. This letter introduces HRRPGraphNet, a novel Graph Neural Network (GNN) model designed specifically for HRRP… ▽ More High Resolution Range Profiles (HRRP) have become a key area of focus in the domain of Radar Automatic Target Recognition (RATR). Despite the success of data-driven neural network-based HRRP recognition, challenges such as insufficient training samples persist in its real-world application. This letter introduces HRRPGraphNet, a novel Graph Neural Network (GNN) model designed specifically for HRRP target recognition that leverages new insights to address these challenges. A pivotal innovation is the transformation of HRRP data into a graph structure, utilizing a range cell amplitude-based node vector and a range-relative adjacency matrix. This graph-based approach facilitates both local feature extraction via one-dimensional convolution layers and global feature extraction through a graph convolution layer, capitalizing on the intrinsic relationships between range cells which is a distinct advantage over existing sequence-based methods. Experiments on the aircraft electromagnetic simulation dataset and the measured dataset have confirmed HRRPGraphNet's superior accuracy and robustness, particularly in fewer training sample environments, underscoring the potential of graph-driven innovations in HRRP-based RATR. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 5 pages, 4 figures

arXiv:2407.06494 [pdf, other]

A Generative Approach to Control Complex Physical Systems

Authors: Long Wei, Peiyan Hu, Ruiqi Feng, Haodong Feng, Yixuan Du, Tao Zhang, Rui Wang, Yue Wang, Zhi-Ming Ma, Tailin Wu

Abstract: Controlling the evolution of complex physical systems is a fundamental task across science and engineering. Classical techniques suffer from limited applicability or huge computational costs. On the other hand, recent deep learning and reinforcement learning-based approaches often struggle to optimize long-term control sequences under the constraints of system dynamics. In this work, we introduce… ▽ More Controlling the evolution of complex physical systems is a fundamental task across science and engineering. Classical techniques suffer from limited applicability or huge computational costs. On the other hand, recent deep learning and reinforcement learning-based approaches often struggle to optimize long-term control sequences under the constraints of system dynamics. In this work, we introduce Diffusion Physical systems Control (DiffPhyCon), a new class of method to address the physical systems control problem. DiffPhyCon excels by simultaneously minimizing both the learned generative energy function and the predefined control objectives across the entire trajectory and control sequence. Thus, it can explore globally and identify near-optimal control sequences. Moreover, we enhance DiffPhyCon with prior reweighting, enabling the discovery of control sequences that significantly deviate from the training distribution. We test our method in 1D Burgers' equation and 2D jellyfish movement control in a fluid environment. Our method outperforms widely applied classical approaches and state-of-the-art deep learning and reinforcement learning methods. Notably, DiffPhyCon unveils an intriguing fast-close-slow-open pattern observed in the jellyfish, aligning with established findings in the field of fluid dynamics. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.03621 [pdf, other]

The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model

Authors: Brenden Smith, Dallin Baker, Clayton Chase, Myles Barney, Kaden Parker, Makenna Allred, Peter Hu, Alex Evans, Nancy Fulda

Abstract: Large Language Models (LLMs) have an unrivaled and invaluable ability to "align" their output to a diverse range of human preferences, by mirroring them in the text they generate. The internal characteristics of such models, however, remain largely opaque. This work presents the Injectable Realignment Model (IRM) as a novel approach to language model interpretability and explainability. Inspired b… ▽ More Large Language Models (LLMs) have an unrivaled and invaluable ability to "align" their output to a diverse range of human preferences, by mirroring them in the text they generate. The internal characteristics of such models, however, remain largely opaque. This work presents the Injectable Realignment Model (IRM) as a novel approach to language model interpretability and explainability. Inspired by earlier work on Neural Programming Interfaces, we construct and train a small network -- the IRM -- to induce emotion-based alignments within a 7B parameter LLM architecture. The IRM outputs are injected via layerwise addition at various points during the LLM's forward pass, thus modulating its behavior without changing the weights of the original model. This isolates the alignment behavior from the complex mechanisms of the transformer model. Analysis of the trained IRM's outputs reveals a curious pattern. Across more than 24 training runs and multiple alignment datasets, patterns of IRM activations align themselves in striations associated with a neuron's index within each transformer layer, rather than being associated with the layers themselves. Further, a single neuron index (1512) is strongly correlated with all tested alignments. This result, although initially counterintuitive, is directly attributable to design choices present within almost all commercially available transformer architectures, and highlights a potential weak point in Meta's pretrained Llama 2 models. It also demonstrates the value of the IRM architecture for language model analysis and interpretability. Our code and datasets are available at https://github.com/DRAGNLabs/injectable-alignment-model △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 21 pages, 17 figures

arXiv:2406.16655 [pdf, other]

Large Language Models Are Cross-Lingual Knowledge-Free Reasoners

Authors: Peng Hu, Sizhe Liu, Changjiang Gao, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang

Abstract: Large Language Models have demonstrated impressive reasoning capabilities across multiple languages. However, the relationship between capabilities in different languages is less explored. In this work, we decompose the process of reasoning tasks into two separated parts: knowledge retrieval and knowledge-free reasoning, and analyze the cross-lingual transferability of them. With adapted and const… ▽ More Large Language Models have demonstrated impressive reasoning capabilities across multiple languages. However, the relationship between capabilities in different languages is less explored. In this work, we decompose the process of reasoning tasks into two separated parts: knowledge retrieval and knowledge-free reasoning, and analyze the cross-lingual transferability of them. With adapted and constructed knowledge-free reasoning datasets, we show that the knowledge-free reasoning capability can be nearly perfectly transferred across various source-target language directions despite the secondary impact of resource in some specific target languages, while cross-lingual knowledge retrieval significantly hinders the transfer. Moreover, by analyzing the hidden states and feed-forward network neuron activation during the reasoning tasks, we show that higher similarity of hidden representations and larger overlap of activated neurons could explain the better cross-lingual transferability of knowledge-free reasoning than knowledge retrieval. Thus, we hypothesize that knowledge-free reasoning embeds in some language-shared mechanism, while knowledge is stored separately in different languages. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.15160 [pdf, other]

Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios

Authors: Ya Jiang, Qing Wang, Jun Du, Maocheng Hu, Pengfei Hu, Zeyan Liu, Shi Cheng, Zhaoxu Nian, Yuxuan Dong, Mingqi Cai, Xin Fang, Chin-Hui Lee

Abstract: This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich c… ▽ More This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich collection of audio data with multiple data augmentation techniques, to an audio-visual student model trained with only a limited set of multi-modal data. Next, we propose a two-stage audio-visual fusion strategy, consisting of an early feature fusion and a late video-guided decision fusion to exploit synergies between audio and video modalities. Finally, we introduce an innovative video pixel swapping (VPS) technique to extend an audio channel swapping (ACS) method to an audio-visual joint augmentation. Evaluation results on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge data set demonstrate significant improvements in SELD performances. Furthermore, our submission to the SELD task of the DCASE 2023 Challenge ranks first place by effectively integrating the proposed techniques into a model ensemble. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: accepted by icme2024

arXiv:2406.10928 [pdf, other]

doi 10.1145/3637528.3671708

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask

Authors: Jingyu Xiao, Zhiyao Xu, Qingsong Zou, Qing Li, Dan Zhao, Dong Fang, Ruoyu Li, Wenxin Tang, Kang Li, Xudong Zuo, Penghui Hu, Yong Jiang, Zixuan Weng, Michael R. Lyv

Abstract: Smart homes, powered by the Internet of Things, offer great convenience but also pose security concerns due to abnormal behaviors, such as improper operations of users and potential attacks from malicious attackers. Several behavior modeling methods have been proposed to identify abnormal behaviors and mitigate potential risks. However, their performance often falls short because they do not effec… ▽ More Smart homes, powered by the Internet of Things, offer great convenience but also pose security concerns due to abnormal behaviors, such as improper operations of users and potential attacks from malicious attackers. Several behavior modeling methods have been proposed to identify abnormal behaviors and mitigate potential risks. However, their performance often falls short because they do not effectively learn less frequent behaviors, consider temporal context, or account for the impact of noise in human behaviors. In this paper, we propose SmartGuard, an autoencoder-based unsupervised user behavior anomaly detection framework. First, we design a Loss-guided Dynamic Mask Strategy (LDMS) to encourage the model to learn less frequent behaviors, which are often overlooked during learning. Second, we propose a Three-level Time-aware Position Embedding (TTPE) to incorporate temporal information into positional embedding to detect temporal context anomaly. Third, we propose a Noise-aware Weighted Reconstruction Loss (NWRL) that assigns different weights for routine behaviors and noise behaviors to mitigate the interference of noise behaviors during inference. Comprehensive experiments on three datasets with ten types of anomaly behaviors demonstrates that SmartGuard consistently outperforms state-of-the-art baselines and also offers highly interpretable results. △ Less

Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

Comments: KDD 2024

arXiv:2406.08757 [pdf, other]

SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding

Authors: Jiefeng Ma, Yan Wang, Chenyu Liu, Jun Du, Yu Hu, Zhenrong Zhang, Pengfei Hu, Qing Wang, Jianshu Zhang

Abstract: Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks but are typically limited to local and entity-level annotations. This limitation overlooks the hierarchically structured representation of documents,… ▽ More Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks but are typically limited to local and entity-level annotations. This limitation overlooks the hierarchically structured representation of documents, constraining comprehensive understanding of complex forms. To address this issue, we present the SRFUND, a hierarchically structured multi-task form understanding benchmark. SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets, encompassing five tasks: (1) word to text-line merging, (2) text-line to entity merging, (3) entity category classification, (4) item table localization, and (5) entity-based full-document hierarchical structure recovery. We meticulously supplemented the original dataset with missing annotations at various levels of granularity and added detailed annotations for multi-item table regions within the forms. Additionally, we introduce global hierarchical structure dependencies for entity relation prediction tasks, surpassing traditional local key-value associations. The SRFUND dataset includes eight languages including English, Chinese, Japanese, German, French, Spanish, Italian, and Portuguese, making it a powerful tool for cross-lingual form understanding. Extensive experimental results demonstrate that the SRFUND dataset presents new challenges and significant opportunities in handling diverse layouts and global hierarchical structures of forms, thus providing deep insights into the field of form understanding. The original dataset and implementations of baseline methods are available at https://sprateam-ustc.github.io/SRFUND △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: NeurIPS 2024 Track on Datasets and Benchmarks under review

arXiv:2406.08454 [pdf, other]

Towards Musically Informed Evaluation of Piano Transcription Models

Authors: Patricia Hu, Lukáš Samuel Marták, Carlos Cancino-Chacón, Gerhard Widmer

Abstract: Automatic piano transcription models are typically evaluated using simple frame- or note-wise information retrieval (IR) metrics. Such benchmark metrics do not provide insights into the transcription quality of specific musical aspects such as articulation, dynamics, or rhythmic precision of the output, which are essential in the context of expressive performance analysis. Furthermore, in recent y… ▽ More Automatic piano transcription models are typically evaluated using simple frame- or note-wise information retrieval (IR) metrics. Such benchmark metrics do not provide insights into the transcription quality of specific musical aspects such as articulation, dynamics, or rhythmic precision of the output, which are essential in the context of expressive performance analysis. Furthermore, in recent years, MAESTRO has become the de-facto training and evaluation dataset for such models. However, inference performance has been observed to deteriorate substantially when applied on out-of-distribution data, thereby questioning the suitability and reliability of transcribed outputs from such models for specific MIR tasks. In this work, we investigate the performance of three state-of-the-art piano transcription models in two experiments. In the first one, we propose a variety of musically informed evaluation metrics which, in contrast to the IR metrics, offer more detailed insight into the musical quality of the transcriptions. In the second experiment, we compare inference performance on real-world and perturbed audio recordings, and highlight musical dimensions which our metrics can help explain. Our experimental results highlight the weaknesses of existing piano transcription metrics and contribute to a more musically sound error analysis of transcription outputs. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07393 [pdf, other]

Limited Out-of-Context Knowledge Reasoning in Large Language Models

Authors: Peng Hu, Changjiang Gao, Ruiqi Gao, Jiajun Chen, Shujian Huang

Abstract: Large Language Models (LLMs) have demonstrated strong capabilities as knowledge bases and significant in-context reasoning capabilities. However, previous work challenges their out-of-context reasoning ability, i.e., the ability to infer information from their training data, instead of from the context or prompt. This paper focuses on a significant facet of out-of-context reasoning: Out-of-Context… ▽ More Large Language Models (LLMs) have demonstrated strong capabilities as knowledge bases and significant in-context reasoning capabilities. However, previous work challenges their out-of-context reasoning ability, i.e., the ability to infer information from their training data, instead of from the context or prompt. This paper focuses on a significant facet of out-of-context reasoning: Out-of-Context Knowledge Reasoning (OCKR), which is to combine multiple knowledge to infer new knowledge. We designed a synthetic dataset with seven representative OCKR tasks to systematically assess the OCKR capabilities of LLMs. Using this dataset, we evaluated the LLaMA2-13B-chat model and discovered that its proficiency in this aspect is limited, regardless of whether the knowledge is trained in a separate or adjacent training settings. Moreover, training the model to reason with complete reasoning data did not result in significant improvement. Training the model to perform explicit knowledge retrieval helps in only one of the tasks, indicating that the model's limited OCKR capabilities are due to difficulties in retrieving relevant knowledge. Furthermore, we treat cross-lingual knowledge transfer as a distinct form of OCKR, and evaluate this ability. Our results show that the evaluated model also exhibits limited ability in transferring knowledge across languages. The dataset used in this study is available at https://github.com/NJUNLP/ID-OCKR. △ Less

Submitted 24 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.02019 [pdf, other]

Uncorrelated estimations of $H_0$ redshift evolution from DESI baryon acoustic oscillation observations

Authors: X. D. Jia, J. P. Hu, F. Y. Wang

Abstract: The Dark Energy Spectroscopic Instrumnet (DESI) collaboration recently released the first year data of baryon acoustic oscillations (BAOs). Basing on the five different tracers, the cosmological constraint shows a hint of deviation from the standard $Λ$CDM model. In this letter, We combine the DESI BAOs with other cosmic probes to constrain the evolution of Hubble constant as a function of redshif… ▽ More The Dark Energy Spectroscopic Instrumnet (DESI) collaboration recently released the first year data of baryon acoustic oscillations (BAOs). Basing on the five different tracers, the cosmological constraint shows a hint of deviation from the standard $Λ$CDM model. In this letter, We combine the DESI BAOs with other cosmic probes to constrain the evolution of Hubble constant as a function of redshift in flat $Λ$CDM model. The non-parametric method is used to estimate the value of Hubble constant at different redshift bins. The correlation among different bins are removed by diagonalizing the covariance matrix. The joint data sample demonstrate a decreasing trend of Hubble constant with a significance of $8.6 σ$, which can naturally resolve the Hubble tension. It may be due to dynamical dark energy or modified gravity. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 7 pages, 2 figures, 1 table, submitted to AAS journal

arXiv:2406.01953 [pdf, other]

On-Demand Routing in LEO Mega-Constellations with Dynamic Laser Inter-Satellite Links

Authors: Dhiraj Bhattacharjee, Pablo G. Madoery, Aizaz U. Chaudhry, Halim Yanikomeroglu, Gunes Karabulut Kurt, Peng Hu, Khaled Ahmed, Stephane Martel

Abstract: Low Earth orbit (LEO) satellite mega constellations are beginning to include laser inter-satellite links (LISLs) to extend the Internet to the most remote locations on Earth. Since the process of establishing these links incurs a setup delay on the order of seconds, a static network topology is generally established well in advance, which is then used for the routing calculations. However, this in… ▽ More Low Earth orbit (LEO) satellite mega constellations are beginning to include laser inter-satellite links (LISLs) to extend the Internet to the most remote locations on Earth. Since the process of establishing these links incurs a setup delay on the order of seconds, a static network topology is generally established well in advance, which is then used for the routing calculations. However, this involves keeping links active even when they are not being used to forward traffic, leading to poor energy efficiency. Motivated by technological advances that are gradually decreasing the LISL setup delays, we foresee scenarios where it will be possible to compute routes and establish dynamic LISLs on demand. This will require considering setup delays as penalties that will affect the end-to-end latency. In this paper, we present a nonlinear optimization model that considers these penalties in the cost function and propose three heuristic algorithms that solve the problem in a tractable way. The algorithms establish different trade-offs in terms of performance and computational complexity. We extensively analyze metrics including average latency, route change rate, outage probability, and jitter in Starlink's Phase I version 2 constellation. The results show the benefit of adaptive routing schemes according to the link setup delay. In particular, more complex schemes can decrease the average end-to-end latency in exchange for an increase in execution time. On the other hand, depending on the maximum tolerated latency, it is possible to use less computationally complex schemes which will be more scalable for the satellite mega constellations of the future. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.17792 [pdf, other]

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 4 tables

arXiv:2405.15438 [pdf, other]

Comparing remote sensing-based forest biomass mapping approaches using new forest inventory plots in contrasting forests in northeastern and southwestern China

Authors: Wenquan Dong, Edward T. A. Mitchard, Yuwei Chen, Man Chen, Congfeng Cao, Peilun Hu, Cong Xu, Steven Hancock

Abstract: Large-scale high spatial resolution aboveground biomass (AGB) maps play a crucial role in determining forest carbon stocks and how they are changing, which is instrumental in understanding the global carbon cycle, and implementing policy to mitigate climate change. The advent of the new space-borne LiDAR sensor, NASA's GEDI instrument, provides unparalleled possibilities for the accurate and unbia… ▽ More Large-scale high spatial resolution aboveground biomass (AGB) maps play a crucial role in determining forest carbon stocks and how they are changing, which is instrumental in understanding the global carbon cycle, and implementing policy to mitigate climate change. The advent of the new space-borne LiDAR sensor, NASA's GEDI instrument, provides unparalleled possibilities for the accurate and unbiased estimation of forest AGB at high resolution, particularly in dense and tall forests, where Synthetic Aperture Radar (SAR) and passive optical data exhibit saturation. However, GEDI is a sampling instrument, collecting dispersed footprints, and its data must be combined with that from other continuous cover satellites to create high-resolution maps, using local machine learning methods. In this study, we developed local models to estimate forest AGB from GEDI L2A data, as the models used to create GEDI L4 AGB data incorporated minimal field data from China. We then applied LightGBM and random forest regression to generate wall-to-wall AGB maps at 25 m resolution, using extensive GEDI footprints as well as Sentinel-1 data, ALOS-2 PALSAR-2 and Sentinel-2 optical data. Through a 5-fold cross-validation, LightGBM demonstrated a slightly better performance than Random Forest across two contrasting regions. However, in both regions, the computation speed of LightGBM is substantially faster than that of the random forest model, requiring roughly one-third of the time to compute on the same hardware. Through the validation against field data, the 25 m resolution AGB maps generated using the local models developed in this study exhibited higher accuracy compared to the GEDI L4B AGB data. We found in both regions an increase in error as slope increased. The trained models were tested on nearby but different regions and exhibited good performance. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.11862 [pdf, other]

SEMv3: A Fast and Robust Approach to Table Separation Line Detection

Authors: Chunxia Qin, Zhenrong Zhang, Pengfei Hu, Chenyu Liu, Jiefeng Ma, Jun Du

Abstract: Table structure recognition (TSR) aims to parse the inherent structure of a table from its input image. The `"split-and-merge" paradigm is a pivotal approach to parse table structure, where the table separation line detection is crucial. However, challenges such as wireless and deformed tables make it demanding. In this paper, we adhere to the "split-and-merge" paradigm and propose SEMv3 (SEM: Spl… ▽ More Table structure recognition (TSR) aims to parse the inherent structure of a table from its input image. The `"split-and-merge" paradigm is a pivotal approach to parse table structure, where the table separation line detection is crucial. However, challenges such as wireless and deformed tables make it demanding. In this paper, we adhere to the "split-and-merge" paradigm and propose SEMv3 (SEM: Split, Embed and Merge), a method that is both fast and robust for detecting table separation lines. During the split stage, we introduce a Keypoint Offset Regression (KOR) module, which effectively detects table separation lines by directly regressing the offset of each line relative to its keypoint proposals. Moreover, in the merge stage, we define a series of merge actions to efficiently describe the table structure based on table grids. Extensive ablation studies demonstrate that our proposed KOR module can detect table separation lines quickly and accurately. Furthermore, on public datasets (e.g. WTW, ICDAR-2019 cTDaR Historical and iFLYTAB), SEMv3 achieves state-of-the-art (SOTA) performance. The code is available at https://github.com/Chunchunwumu/SEMv3. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 9 pages, 6 figures, 5 tables. Accepted by IJCAI2024 main track

arXiv:2404.17875 [pdf, other]

Noisy Node Classification by Bi-level Optimization based Multi-teacher Distillation

Authors: Yujing Liu, Zongqian Wu, Zhengyu Lu, Ci Nie, Guoqiu Wen, Ping Hu, Xiaofeng Zhu

Abstract: Previous graph neural networks (GNNs) usually assume that the graph data is with clean labels for representation learning, but it is not true in real applications. In this paper, we propose a new multi-teacher distillation method based on bi-level optimization (namely BO-NNC), to conduct noisy node classification on the graph data. Specifically, we first employ multiple self-supervised learning me… ▽ More Previous graph neural networks (GNNs) usually assume that the graph data is with clean labels for representation learning, but it is not true in real applications. In this paper, we propose a new multi-teacher distillation method based on bi-level optimization (namely BO-NNC), to conduct noisy node classification on the graph data. Specifically, we first employ multiple self-supervised learning methods to train diverse teacher models, and then aggregate their predictions through a teacher weight matrix. Furthermore, we design a new bi-level optimization strategy to dynamically adjust the teacher weight matrix based on the training progress of the student model. Finally, we design a label improvement module to improve the label quality. Extensive experimental results on real datasets show that our method achieves the best results compared to state-of-the-art methods. △ Less

Submitted 8 May, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.17281 [pdf]

Topological polarization singularities induced by the non-Hermitian Dirac points

Authors: Jun Wang, Jie Liu, Peng Hu, Qiao Jiang, Dezhuan Han

Abstract: A Dirac point in the Hermitian photonic system will split into a pair of exceptional points (EPs) or even spawn a ring of EPs if non-Hermiticity is involved. Here, we present a new type of non-Hermitian Dirac point which is situated in the complex plane of eigenfrequency. When there is differential loss, the Dirac point exhibits a dual behavior: it not only splits into a pair of EPs with opposite… ▽ More A Dirac point in the Hermitian photonic system will split into a pair of exceptional points (EPs) or even spawn a ring of EPs if non-Hermiticity is involved. Here, we present a new type of non-Hermitian Dirac point which is situated in the complex plane of eigenfrequency. When there is differential loss, the Dirac point exhibits a dual behavior: it not only splits into a pair of EPs with opposite chirality in the band structure but also induces a pair of circularly polarized states (C points) with opposite handedness in the far-field radiation. Furthermore, breaking the corresponding mirror symmetries enables independent control of these Dirac-point induced C points, facilitating the merging of two C points and generation of unidirectional guided resonances. Our results demonstrate an explicit relation between the band singularities and polarization singularities, and provide a new mechanism to generate unidirectional emission, which can be useful in the band engineering and polarization manipulation. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.11577 [pdf, other]

Towards Reliable Empirical Machine Unlearning Evaluation: A Game-Theoretic View

Authors: Yiwen Tu, Pingbang Hu, Jiaqi Ma

Abstract: Machine unlearning is the process of updating machine learning models to remove the information of specific training data samples, in order to comply with data protection regulations that allow individuals to request the removal of their personal data. Despite the recent development of numerous unlearning algorithms, reliable evaluation of these algorithms remains an open research question. In thi… ▽ More Machine unlearning is the process of updating machine learning models to remove the information of specific training data samples, in order to comply with data protection regulations that allow individuals to request the removal of their personal data. Despite the recent development of numerous unlearning algorithms, reliable evaluation of these algorithms remains an open research question. In this work, we focus on membership inference attack (MIA) based evaluation, one of the most common approaches for evaluating unlearning algorithms, and address various pitfalls of existing evaluation metrics that lack reliability. Specifically, we propose a game-theoretic framework that formalizes the evaluation process as a game between unlearning algorithms and MIA adversaries, measuring the data removal efficacy of unlearning algorithms by the capability of the MIA adversaries. Through careful design of the game, we demonstrate that the natural evaluation metric induced from the game enjoys provable guarantees that the existing evaluation metrics fail to satisfy. Furthermore, we propose a practical and efficient algorithm to estimate the evaluation metric induced from the game, and demonstrate its effectiveness through both theoretical analysis and empirical experiments. This work presents a novel and reliable approach to empirically evaluating unlearning algorithms, paving the way for the development of more effective unlearning techniques. △ Less

Submitted 12 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.07689 [pdf, other]

Towards resolving bedload flux variability

Authors: Thomas Pähtz, Yulan Chen, Jiafeng Xie, Rémi Monthiller, Raphaël Maurin, Katharina Tholen, Yen-Cheng Lin, Hao-Che Ho, Peng Hu, Zhiguo He, Orencio Durán

Abstract: Bedload transport occurs when a bed composed of sedimentary grains becomes mobile in response to the shearing by a flow of liquid. It shapes the landscapes of Earth and other planetary bodies by promoting the formation and growth of various multiscale geological features. Estimating the rate at which such processes take place requires accurate bedload flux predictions. However, even for highly ide… ▽ More Bedload transport occurs when a bed composed of sedimentary grains becomes mobile in response to the shearing by a flow of liquid. It shapes the landscapes of Earth and other planetary bodies by promoting the formation and growth of various multiscale geological features. Estimating the rate at which such processes take place requires accurate bedload flux predictions. However, even for highly idealized conditions in the laboratory, study-to-study variability of reported bedload flux measurements borders an order of magnitude. This uncertainty stems from physically poorly supported, typically empirical methods of determining the transport-driving bed shear stress, especially for very narrow or shallow channel flows, and from study-to-study grain shape variations. Here, we derive a non-empirical method of bed shear stress determination and apply it to a number of independent grain-shape-controlled data sets, based on well-controlled experiments and CFD-DEM simulations, for a very diverse range of transport conditions. An existing physical bedload model, here generalized to account for grain shape variability, predicts almost all these data within a factor of 1.3, whereas a prominent alternative model (Deal et al., Nature 613, 298-302, 2023) seems falsified. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.04659 [pdf, other]

Multilingual Pretraining and Instruction Tuning Improve Cross-Lingual Knowledge Alignment, But Only Shallowly

Authors: Changjiang Gao, Hongda Hu, Peng Hu, Jiajun Chen, Jixing Li, Shujian Huang

Abstract: Despite their strong ability to retrieve knowledge in English, current large language models show imbalance abilities in different languages. Two approaches are proposed to address this, i.e., multilingual pretraining and multilingual instruction tuning. However, whether and how do such methods contribute to the cross-lingual knowledge alignment inside the models is unknown. In this paper, we prop… ▽ More Despite their strong ability to retrieve knowledge in English, current large language models show imbalance abilities in different languages. Two approaches are proposed to address this, i.e., multilingual pretraining and multilingual instruction tuning. However, whether and how do such methods contribute to the cross-lingual knowledge alignment inside the models is unknown. In this paper, we propose CLiKA, a systematic framework to assess the cross-lingual knowledge alignment of LLMs in the Performance, Consistency and Conductivity levels, and explored the effect of multilingual pretraining and instruction tuning on the degree of alignment. Results show that: while both multilingual pretraining and instruction tuning are beneficial for cross-lingual knowledge alignment, the training strategy needs to be carefully designed. Namely, continued pretraining improves the alignment of the target language at the cost of other languages, while mixed pretraining affect other languages less. Also, the overall cross-lingual knowledge alignment, especially in the conductivity level, is unsatisfactory for all tested LLMs, and neither multilingual pretraining nor instruction tuning can substantially improve the cross-lingual knowledge conductivity. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.04607 [pdf, ps, other]

Quantized perfect transmission in graphene nanoribbons with random hollow adsorbates

Authors: Jia-Le Yu, Zhe Hou, Irfan Hussain Bhat, Pei-Jia Hu, Jia-Wen Sun, Xiao-Feng Chen, Ai-Min Guo, Qing-Feng Sun

Abstract: Impurities exist inevitably in two-dimensional materials as they spontaneously adsorb onto the surface during fabrication, usually exerting detrimental effects on electronic transport. Here, we focus on a special type of impurities that preferentially adsorb onto the hollow regions of graphene nanoribbons (GNRs), and study how they affect the quantum transport in GNRs. Contrary to previous knowled… ▽ More Impurities exist inevitably in two-dimensional materials as they spontaneously adsorb onto the surface during fabrication, usually exerting detrimental effects on electronic transport. Here, we focus on a special type of impurities that preferentially adsorb onto the hollow regions of graphene nanoribbons (GNRs), and study how they affect the quantum transport in GNRs. Contrary to previous knowledge that random adatoms should localize electrons, the so-called Anderson localization, noteworthy quantized conductance peaks (QCPs) are observed at specific electron energies. These QCPs are remarkably robust against variations in system size, GNR edge, and adatom properties, and they can reappear at identical energies following an arithmetic sequence of device width. Further investigation of wavefunction reveals a unique transport mode at each QCP energy which transmits through disordered GNRs reflectionlessly, while all the others become fully Anderson localized, indicating the survival of quantum ballistic transport in the localized regime. Our findings highlight the potential utility of hollow adatoms as a powerful tool to manipulate the conductivity of GNRs, and deepen the understanding of the interplay between impurities and graphene. △ Less

Submitted 9 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

Comments: 8 pages, 4 figures, and 1 table; comments are welcome

arXiv:2404.04346 [pdf, other]

Koala: Key frame-conditioned long video-LLM

Authors: Reuben Tan, Ximeng Sun, Ping Hu, Jui-hsien Wang, Hanieh Deilamsalehy, Bryan A. Plummer, Bryan Russell, Kate Saenko

Abstract: Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships. State-of-the-art video Large Language Models (vLLMs) hold promise as a viable solution due to their demonstrated emergent capabilities on new tasks. However, despite being trained on millions of short seconds-long videos, vLLMs are unable to unde… ▽ More Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships. State-of-the-art video Large Language Models (vLLMs) hold promise as a viable solution due to their demonstrated emergent capabilities on new tasks. However, despite being trained on millions of short seconds-long videos, vLLMs are unable to understand minutes-long videos and accurately answer questions about them. To address this limitation, we propose a lightweight and self-supervised approach, Key frame-conditioned long video-LLM (Koala), that introduces learnable spatiotemporal queries to adapt pretrained vLLMs for generalizing to longer videos. Our approach introduces two new tokenizers that condition on visual tokens computed from sparse video key frames for understanding short and long video moments. We train our proposed approach on HowTo100M and demonstrate its effectiveness on zero-shot long video understanding benchmarks, where it outperforms state-of-the-art large models by 3 - 6% in absolute accuracy across all tasks. Surprisingly, we also empirically show that our approach not only helps a pretrained vLLM to understand long videos but also improves its accuracy on short-term action recognition. △ Less

Submitted 3 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

Comments: Accepted at CVPR 2024 as a poster highlight

arXiv:2404.00855 [pdf, other]

TSOM: Small Object Motion Detection Neural Network Inspired by Avian Visual Circuit

Authors: Pignge Hu, Xiaoteng Zhang, Mengmeng Li, Yingjie Zhu, Li Shi

Abstract: Detecting small moving objects in complex backgrounds from an overhead perspective is a highly challenging task for machine vision systems. As an inspiration from nature, the avian visual system is capable of processing motion information in various complex aerial scenes, and its Retina-OT-Rt visual circuit is highly sensitive to capturing the motion information of small objects from high altitude… ▽ More Detecting small moving objects in complex backgrounds from an overhead perspective is a highly challenging task for machine vision systems. As an inspiration from nature, the avian visual system is capable of processing motion information in various complex aerial scenes, and its Retina-OT-Rt visual circuit is highly sensitive to capturing the motion information of small objects from high altitudes. However, more needs to be done on small object motion detection algorithms based on the avian visual system. In this paper, we conducted mathematical modeling based on extensive studies of the biological mechanisms of the Retina-OT-Rt visual circuit. Based on this, we proposed a novel tectum small object motion detection neural network (TSOM). The neural network includes the retina, SGC dendritic, SGC Soma, and Rt layers, each layer corresponding to neurons in the visual pathway. The Retina layer is responsible for accurately projecting input content, the SGC dendritic layer perceives and encodes spatial-temporal information, the SGC Soma layer computes complex motion information and extracts small objects, and the Rt layer integrates and decodes motion information from multiple directions to determine the position of small objects. Extensive experiments on pigeon neurophysiological experiments and image sequence data showed that the TSOM is biologically interpretable and effective in extracting reliable small object motion features from complex high-altitude backgrounds. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.19386 [pdf, other]

PointCloud-Text Matching: Benchmark Datasets and a Baseline

Authors: Yanglin Feng, Yang Qin, Dezhong Peng, Hongyuan Zhu, Xi Peng, Peng Hu

Abstract: In this paper, we present and study a new instance-level retrieval task: PointCloud-Text Matching~(PTM), which aims to find the exact cross-modal instance that matches a given point-cloud query or text query. PTM could be applied to various scenarios, such as indoor/urban-canyon localization and scene retrieval. However, there exists no suitable and targeted dataset for PTM in practice. Therefore,… ▽ More In this paper, we present and study a new instance-level retrieval task: PointCloud-Text Matching~(PTM), which aims to find the exact cross-modal instance that matches a given point-cloud query or text query. PTM could be applied to various scenarios, such as indoor/urban-canyon localization and scene retrieval. However, there exists no suitable and targeted dataset for PTM in practice. Therefore, we construct three new PTM benchmark datasets, namely 3D2T-SR, 3D2T-NR, and 3D2T-QA. We observe that the data is challenging and with noisy correspondence due to the sparsity, noise, or disorder of point clouds and the ambiguity, vagueness, or incompleteness of texts, which make existing cross-modal matching methods ineffective for PTM. To tackle these challenges, we propose a PTM baseline, named Robust PointCloud-Text Matching method (RoMa). RoMa consists of two modules: a Dual Attention Perception module (DAP) and a Robust Negative Contrastive Learning module (RNCL). Specifically, DAP leverages token-level and feature-level attention to adaptively focus on useful local and global features, and aggregate them into common representations, thereby reducing the adverse impact of noise and ambiguity. To handle noisy correspondence, RNCL divides negative pairs, which are much less error-prone than positive pairs, into clean and noisy subsets, and assigns them forward and reverse optimization directions respectively, thus enhancing robustness against noisy correspondence. We conduct extensive experiments on our benchmarks and demonstrate the superiority of our RoMa. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.16888 [pdf, other]

doi 10.1007/978-981-99-8432-9_11

Towards Balanced RGB-TSDF Fusion for Consistent Semantic Scene Completion by 3D RGB Feature Completion and a Classwise Entropy Loss Function

Authors: Laiyan Ding, Panwen Hu, Jie Li, Rui Huang

Abstract: Semantic Scene Completion (SSC) aims to jointly infer semantics and occupancies of 3D scenes. Truncated Signed Distance Function (TSDF), a 3D encoding of depth, has been a common input for SSC. Furthermore, RGB-TSDF fusion, seems promising since these two modalities provide color and geometry information, respectively. Nevertheless, RGB-TSDF fusion has been considered nontrivial and commonly-used… ▽ More Semantic Scene Completion (SSC) aims to jointly infer semantics and occupancies of 3D scenes. Truncated Signed Distance Function (TSDF), a 3D encoding of depth, has been a common input for SSC. Furthermore, RGB-TSDF fusion, seems promising since these two modalities provide color and geometry information, respectively. Nevertheless, RGB-TSDF fusion has been considered nontrivial and commonly-used naive addition will result in inconsistent results. We argue that the inconsistency comes from the sparsity of RGB features upon projecting into 3D space, while TSDF features are dense, leading to imbalanced feature maps when summed up. To address this RGB-TSDF distribution difference, we propose a two-stage network with a 3D RGB feature completion module that completes RGB features with meaningful values for occluded areas. Moreover, we propose an effective classwise entropy loss function to punish inconsistency. Extensive experiments on public datasets verify that our method achieves state-of-the-art performance among methods that do not adopt extra data. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.13936 [pdf, other]

Secure and Efficient Group Handover Protocol in 5G Non-Terrestrial Networks

Authors: Bohan Zhang, Peng Hu, Ahmad Akbari Azirani, Mohammad A. Salahuddin, Diogo Barradas, Noura Limam, Raouf Boutaba

Abstract: The growing low-Earth orbit (LEO) satellite constellations have become an essential part of the fifth-generation (5G) non-terrestrial network (NTN) market. These satellites can enable direct-to-cell connectivity for mobile devices and support various applications with ubiquitous coverage for 5G and beyond networks. However, satellite-based NTNs bring several challenges to the 5G handover protocol… ▽ More The growing low-Earth orbit (LEO) satellite constellations have become an essential part of the fifth-generation (5G) non-terrestrial network (NTN) market. These satellites can enable direct-to-cell connectivity for mobile devices and support various applications with ubiquitous coverage for 5G and beyond networks. However, satellite-based NTNs bring several challenges to the 5G handover protocol design. The high mobility of satellites can lead to signaling storms and security compromises during handovers. This paper addresses these challenges by proposing a secure and efficient group handover protocol. The protocol's effectiveness is evaluated on a custom discrete-event simulator and compared against the baseline 5G handover scheme. The simulator is made publicly available. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted by the 2024 IEEE International Conference on Communications (ICC), 9-13 June 2024, Denver, CO, USA

arXiv:2403.12131 [pdf, other]

The Surprising Effectiveness of Weyl Gravity in Probing Quantum Corrections to AdS Black Holes

Authors: Liang Ma, Peng-Ju Hu, Yi Pang, Hong Lu

Abstract: Computing leading higher curvature contributions to thermodynamic quantities of AdS black hole is drastically simplified once the higher curvature terms are expressed in terms of powers of Weyl tensor by applying proper field redefinitions, avoiding the usual complications caused by higher derivative Gibbons-Hawking-York (GHY) term or surface counterterms. We establish the method by computing the… ▽ More Computing leading higher curvature contributions to thermodynamic quantities of AdS black hole is drastically simplified once the higher curvature terms are expressed in terms of powers of Weyl tensor by applying proper field redefinitions, avoiding the usual complications caused by higher derivative Gibbons-Hawking-York (GHY) term or surface counterterms. We establish the method by computing the Euclidean action of general rotating AdS black holes in five dimensional quadratic curvature theories with or without supersymmetry and verifying the results numerically. Our result is the state of the art for charged rotating AdS black holes in five dimensional minimal gauged supergravity including corrections from all three supersymmetric curvature squared terms. Our approach facilitates precision tests in the AdS/CFT correspondence and should be applicable in diverse dimensions. △ Less

Submitted 26 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: 5 pages, 1 figures. Clarifications and references added. Version accepted in PRD Letters

arXiv:2403.11549 [pdf, other]

Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters

Authors: Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Ping Hu, Dong Wang, Huchuan Lu, You He

Abstract: Continual learning can empower vision-language models to continuously acquire new knowledge, without the need for access to the entire historical dataset. However, mitigating the performance degradation in large-scale models is non-trivial due to (i) parameter shifts throughout lifelong learning and (ii) significant computational burdens associated with full-model tuning. In this work, we present… ▽ More Continual learning can empower vision-language models to continuously acquire new knowledge, without the need for access to the entire historical dataset. However, mitigating the performance degradation in large-scale models is non-trivial due to (i) parameter shifts throughout lifelong learning and (ii) significant computational burdens associated with full-model tuning. In this work, we present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models. Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters in response to new tasks. To preserve the zero-shot recognition capability of vision-language models, we further introduce a Distribution Discriminative Auto-Selector (DDAS) that automatically routes in-distribution and out-of-distribution inputs to the MoE Adapter and the original CLIP, respectively. Through extensive experiments across various settings, our proposed method consistently outperforms previous state-of-the-art approaches while concurrently reducing parameter training burdens by 60%. Our code locates at https://github.com/JiazuoYu/MoE-Adapters4CL △ Less

Submitted 3 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: This work is accepted by CVPR2024. More modifications may be performed

arXiv:2403.08292 [pdf, other]

Weak Collocation Regression for Inferring Stochastic Dynamics with Lévy Noise

Authors: Liya Guo, Liwei Lu, Zhijun Zeng, Pipi Hu, Yi Zhu

Abstract: With the rapid increase of observational, experimental and simulated data for stochastic systems, tremendous efforts have been devoted to identifying governing laws underlying the evolution of these systems. Despite the broad applications of non-Gaussian fluctuations in numerous physical phenomena, the data-driven approaches to extracting stochastic dynamics with Lévy noise are relatively few. In… ▽ More With the rapid increase of observational, experimental and simulated data for stochastic systems, tremendous efforts have been devoted to identifying governing laws underlying the evolution of these systems. Despite the broad applications of non-Gaussian fluctuations in numerous physical phenomena, the data-driven approaches to extracting stochastic dynamics with Lévy noise are relatively few. In this work, we propose a Weak Collocation Regression (WCR) to explicitly reveal unknown stochastic dynamical systems, i.e., the Stochastic Differential Equation (SDE) with both $α$-stable Lévy noise and Gaussian noise, from discrete aggregate data. This method utilizes the evolution equation of the probability distribution function, i.e., the Fokker-Planck (FP) equation. With the weak form of the FP equation, the WCR constructs a linear system of unknown parameters where all integrals are evaluated by Monte Carlo method with the observations. Then, the unknown parameters are obtained by a sparse linear regression. For a SDE with Lévy noise, the corresponding FP equation is a partial integro-differential equation (PIDE), which contains nonlocal terms, and is difficult to deal with. The weak form can avoid complicated multiple integrals. Our approach can simultaneously distinguish mixed noise types, even in multi-dimensional problems. Numerical experiments demonstrate that our method is accurate and computationally efficient. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 19 pages, 5 figures, 10 tables

arXiv:2403.07153 [pdf, other]

2023 Low-Power Computer Vision Challenge (LPCVC) Summary

Authors: Leo Chen, Benjamin Boardley, Ping Hu, Yiru Wang, Yifan Pu, Xin Jin, Yongqiang Yao, Ruihao Gong, Bo Li, Gao Huang, Xianglong Liu, Zifu Wan, Xinwang Chen, Ning Liu, Ziyi Zhang, Dongping Liu, Ruijie Shan, Zhengping Che, Fachao Zhang, Xiaofeng Mou, Jian Tang, Maxim Chuprov, Ivan Malofeev, Alexander Goncharenko, Andrey Shcherbin , et al. (5 additional authors not shown)

Abstract: This article describes the 2023 IEEE Low-Power Computer Vision Challenge (LPCVC). Since 2015, LPCVC has been an international competition devoted to tackling the challenge of computer vision (CV) on edge devices. Most CV researchers focus on improving accuracy, at the expense of ever-growing sizes of machine models. LPCVC balances accuracy with resource requirements. Winners must achieve high accu… ▽ More This article describes the 2023 IEEE Low-Power Computer Vision Challenge (LPCVC). Since 2015, LPCVC has been an international competition devoted to tackling the challenge of computer vision (CV) on edge devices. Most CV researchers focus on improving accuracy, at the expense of ever-growing sizes of machine models. LPCVC balances accuracy with resource requirements. Winners must achieve high accuracy with short execution time when their CV solutions run on an embedded device, such as Raspberry PI or Nvidia Jetson Nano. The vision problem for 2023 LPCVC is segmentation of images acquired by Unmanned Aerial Vehicles (UAVs, also called drones) after disasters. The 2023 LPCVC attracted 60 international teams that submitted 676 solutions during the submission window of one month. This article explains the setup of the competition and highlights the winners' methods that improve accuracy and shorten execution time. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: LPCVC 2023, website: https://lpcv.ai/

arXiv:2403.05002 [pdf, other]

LHMap-loc: Cross-Modal Monocular Localization Using LiDAR Point Cloud Heat Map

Authors: Xinrui Wu, Jianbo Xu, Puyuan Hu, Guangming Wang, Hesheng Wang

Abstract: Localization using a monocular camera in the pre-built LiDAR point cloud map has drawn increasing attention in the field of autonomous driving and mobile robotics. However, there are still many challenges (e.g. difficulties of map storage, poor localization robustness in large scenes) in accurately and efficiently implementing cross-modal localization. To solve these problems, a novel pipeline ter… ▽ More Localization using a monocular camera in the pre-built LiDAR point cloud map has drawn increasing attention in the field of autonomous driving and mobile robotics. However, there are still many challenges (e.g. difficulties of map storage, poor localization robustness in large scenes) in accurately and efficiently implementing cross-modal localization. To solve these problems, a novel pipeline termed LHMap-loc is proposed, which achieves accurate and efficient monocular localization in LiDAR maps. Firstly, feature encoding is carried out on the original LiDAR point cloud map by generating offline heat point clouds, by which the size of the original LiDAR map is compressed. Then, an end-to-end online pose regression network is designed based on optical flow estimation and spatial attention to achieve real-time monocular visual localization in a pre-built map. In addition, a series of experiments have been conducted to prove the effectiveness of the proposed method. Our code is available at: https://github.com/IRMVLab/LHMap-loc. △ Less

Submitted 10 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: Accepted by 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)

arXiv:2403.03004 [pdf, other]

Ultralight vector dark matter search using data from the KAGRA O3GK run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, I. Abouelfettouh, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi , et al. (1778 additional authors not shown)

Abstract: Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese… ▽ More Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 20 pages, 5 figures

Report number: LIGO-P2300250

arXiv:2403.01702 [pdf]

Hill Function-based Model of Transcriptional Response: Impact of Nonspecific Binding and RNAP Interactions

Authors: Wenjia Shi, Yao Ma, Peilin Hu, Mi Pang, Xiaona Huang, Yiting Dang, Yuxin Xie, Danni Wu

Abstract: Hill function is one of the widely used gene transcription regulation models. Its attribute of fitting may result in a lack of an underlying physical picture, yet the fitting parameters can provide information about biochemical reactions, such as the number of transcription factors (TFs) and the binding energy between regulatory elements. However, it remains unclear when and how much biochemical i… ▽ More Hill function is one of the widely used gene transcription regulation models. Its attribute of fitting may result in a lack of an underlying physical picture, yet the fitting parameters can provide information about biochemical reactions, such as the number of transcription factors (TFs) and the binding energy between regulatory elements. However, it remains unclear when and how much biochemical information can Hill function provide in addition to fitting. Here, started from the interactions between TFs and RNA polymerase during transcription regulation and both of their association-dissociation reactions at specific/nonspecific sites on DNA, the regulatory effect of TFs was deduced as fold change. We found that, for weak promoter, fold change can degrade into the regulatory factor (Freg) which is closely correlated with Hill function. By directly comparing and fitting with Hill function, the fitting parameters and corresponding biochemical reaction parameters in Freg were analyzed and discussed, where the single TF and multiple TFs that with cooperativity and basic logic effects were considered. We concluded the strength of promoter and interactions between TFs determine whether Hill function can reflect the corresponding biochemical information. Our findings highlight the role of Hill function in modeling/fitting for transcriptional regulation, which also benefits the preparation of synthetic regulatory elements. △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2403.01047 [pdf]

doi 10.1088/1361-6404/ad2393

Student Understanding of the Bloch Sphere

Authors: Peter Hu, Yangqiuting Li, Roger S. K. Mong, Chandralekha Singh

Abstract: Quantum information science is a rapidly growing interdisciplinary field that is attracting the attention of academics and industry experts alike. It requires talent from a wide variety of traditional fields, including physics, engineering, chemistry, and computer science, to name a few. To prepare students for such opportunities, it is important to give them a strong foundation in the basics of q… ▽ More Quantum information science is a rapidly growing interdisciplinary field that is attracting the attention of academics and industry experts alike. It requires talent from a wide variety of traditional fields, including physics, engineering, chemistry, and computer science, to name a few. To prepare students for such opportunities, it is important to give them a strong foundation in the basics of quantum information science, in which quantum computing plays a central role. In this study, we discuss the development, validation, and evaluation of a tutorial on the Bloch sphere, a useful visual tool for developing intuition about single quantum bits (qubits), which are the basic building block of any quantum computer. Students' understanding was evaluated after they received traditional lecture-based instruction on the requisite topics, and again after engaging with the tutorial. We observe, analyze, and discuss their improvement in performance on concepts covered in the tutorial. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 19 pages, 1 figure

Journal ref: European Journal of Physics 45 025705 (2024)

arXiv:2402.18080 [pdf]

doi 10.1088/1361-6404/ac9ba3

Challenges in addressing student difficulties with measurement uncertainty of two-state quantum systems using a multiple-choice question sequence in online and in-person classes

Authors: Peter Hu, Yangqiuting Li, Chandralekha Singh

Abstract: Research-validated multiple-choice questions comprise an easy-to-implement instructional tool that serves to scaffold student learning and formatively assess students knowledge. We present findings from the implementation, in consecutive years, of research-validated multiple-choice question sequence on measurement uncertainty as it applies to two-state quantum systems. This study was conducted in… ▽ More Research-validated multiple-choice questions comprise an easy-to-implement instructional tool that serves to scaffold student learning and formatively assess students knowledge. We present findings from the implementation, in consecutive years, of research-validated multiple-choice question sequence on measurement uncertainty as it applies to two-state quantum systems. This study was conducted in an advanced undergraduate quantum mechanics course, in an online and in-person learning environments in consecutive years. Student learning was assessed after receiving traditional lecture-based instruction in relevant concepts, and their performance was compared with that on a similar assessment given after engaging with the multiple-choice question sequence. We analyze and discuss the similar and differing trends observed in the two modes of instruction. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 23 pages, 0 figures

Journal ref: European Journal of Physics 44 015702 (2022)

arXiv:2402.18075 [pdf]

doi 10.1088/1361-6404/ac49f4

Challenges in addressing student difficulties with time-development of two-state quantum systems using a multiple-choice question sequence in virtual and in-person classes

Authors: Peter Hu, Yangqiuting Li, Chandralekha Singh

Abstract: Research-validated clicker questions as instructional tools for formative assessment are relatively easy to implement and can provide effective scaffolding when developed and implemented in a sequence. We present findings from the implementation of a research-validated clicker question sequence (CQS) on student understanding of the time-development of two-state quantum systems. This study was cond… ▽ More Research-validated clicker questions as instructional tools for formative assessment are relatively easy to implement and can provide effective scaffolding when developed and implemented in a sequence. We present findings from the implementation of a research-validated clicker question sequence (CQS) on student understanding of the time-development of two-state quantum systems. This study was conducted in an advanced undergraduate quantum mechanics course for two consecutive years in virtual and in-person classes. The effectiveness of the CQS discussed here in both modes of instruction was determined by evaluating students' performance after traditional lecture-based instruction and comparing it to their performance after engaging with the CQS. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 18 pages, 1 figure

Journal ref: European Journal of Physics 43 025704 (2022)

arXiv:2402.18072 [pdf]

doi 10.1103/PhysRevPhysEducRes.19.020130

Challenges in addressing student difficulties with quantum measurement of two-state quantum systems using a multiple-choice question sequence in online and in-person classes

Authors: Peter Hu, Yangqiuting Li, Chandralekha Singh

Abstract: Research-validated multiple-choice questions comprise an easy-to-implement instructional tool that serves to scaffold student learning and formatively assess students' knowledge. We present findings from the implementation, in consecutive years, of a research-validated multiple-choice question sequence [referred to in this study as a Clicker Question Sequence (CQS)] on quantum measurement as it ap… ▽ More Research-validated multiple-choice questions comprise an easy-to-implement instructional tool that serves to scaffold student learning and formatively assess students' knowledge. We present findings from the implementation, in consecutive years, of a research-validated multiple-choice question sequence [referred to in this study as a Clicker Question Sequence (CQS)] on quantum measurement as it applies to two-state quantum systems. This study was conducted in an advanced undergraduate quantum mechanics course, in both online and in-person learning environments across three years. Student learning was assessed after traditional lecture-based instruction in relevant concepts, and their performance was compared with that on a similar assessment given after engaging with the CQS. We analyze, compare, and discuss the trends observed in the three implementations. △ Less

Submitted 29 May, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: 30 pages, 3 figures

Journal ref: Physical Review Physics Education Research 19, 020130 (2023)

arXiv:2402.18069 [pdf]

doi 10.1088/1361-6404/acf5b3

Challenges in addressing student difficulties with basics and change of basis for two-state quantum systems using a multiple-choice question sequence in online and in-person classes

Authors: Peter Hu, Yangqiuting Li, Chandralekha Singh

Abstract: Research-validated multiple-choice questions comprise an easy-to-implement instructional tool for scaffolding student learning and providing formative assessment of students' knowledge. We present findings from the implementation of a research-validated multiple-choice question sequence on the basics of two-state quantum systems, including inner products, outer products, translation between Dirac… ▽ More Research-validated multiple-choice questions comprise an easy-to-implement instructional tool for scaffolding student learning and providing formative assessment of students' knowledge. We present findings from the implementation of a research-validated multiple-choice question sequence on the basics of two-state quantum systems, including inner products, outer products, translation between Dirac notation and matrix representation in a particular basis, and change of basis. This study was conducted in an advanced undergraduate quantum mechanics course, in both online and in-person learning environments, across three years. For each cohort, students had their learning assessed after traditional lecture-based instruction in relevant concepts before engaging with the multiple-choice question sequence. Their performance was evaluated again afterwards with a similar assessment and compared to their earlier performance. We analyze, compare, and discuss the trends observed in the three implementations. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 25 pages, 0 figures

Journal ref: European Journal of Physics 44 065703 (2023)

arXiv:2402.17271 [pdf, other]

Capacitive coupling study of the HERD SCD prototype: preliminary results

Authors: Ruo-Si Lu, Rui Qiao, Ke Gong, Wen-Xi Peng, Wei-Shuai Zhang, Dong-Ya Guo, Jia-Ju Wei, Yi-Ming Hu, Jian-Hua Guo, Qi Wu, Peng Hu, Xuan Liu, Bing Lu, Yi-Rong Zhang

Abstract: The Silicon Charge Detector (SCD) is a subdetector of the High Energy Cosmic Radiation Detection payload. The dynamic range of the silicon microstrip detector can be extended by the capacitive coupling effect, which is related to the interstrip capacitance and the coupling capacitance. A detector prototype with several sets of parameters was designed and tested in the ion beams at the CERN Super P… ▽ More The Silicon Charge Detector (SCD) is a subdetector of the High Energy Cosmic Radiation Detection payload. The dynamic range of the silicon microstrip detector can be extended by the capacitive coupling effect, which is related to the interstrip capacitance and the coupling capacitance. A detector prototype with several sets of parameters was designed and tested in the ion beams at the CERN Super Proton Synchrotron. The capacitive coupling fractions with readout strip and floating strip incidences were studied using the beam test data and SPICE simulation. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.16297 [pdf, other]

A Poisson-Gamma Dynamic Factor Model with Time-Varying Transition Dynamics

Authors: Jiahao Wang, Sikun Yang, Heinz Koeppl, Xiuzhen Cheng, Pengfei Hu, Guoming Zhang

Abstract: Probabilistic approaches for handling count-valued time sequences have attracted amounts of research attentions because their ability to infer explainable latent structures and to estimate uncertainties, and thus are especially suitable for dealing with \emph{noisy} and \emph{incomplete} count data. Among these models, Poisson-Gamma Dynamical Systems (PGDSs) are proven to be effective in capturing… ▽ More Probabilistic approaches for handling count-valued time sequences have attracted amounts of research attentions because their ability to infer explainable latent structures and to estimate uncertainties, and thus are especially suitable for dealing with \emph{noisy} and \emph{incomplete} count data. Among these models, Poisson-Gamma Dynamical Systems (PGDSs) are proven to be effective in capturing the evolving dynamics underlying observed count sequences. However, the state-of-the-art PGDS still fails to capture the \emph{time-varying} transition dynamics that are commonly observed in real-world count time sequences. To mitigate this gap, a non-stationary PGDS is proposed to allow the underlying transition matrices to evolve over time, and the evolving transition matrices are modeled by sophisticatedly-designed Dirichlet Markov chains. Leveraging Dirichlet-Multinomial-Beta data augmentation techniques, a fully-conjugate and efficient Gibbs sampler is developed to perform posterior simulation. Experiments show that, in comparison with related models, the proposed non-stationary PGDS achieves improved predictive performance due to its capacity to learn non-stationary dependency structure captured by the time-evolving transition matrices. △ Less

Submitted 23 May, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.15141 [pdf, ps, other]

A note on the adjoint method for neural ordinary differential equation network

Authors: Pipi Hu

Abstract: Perturbation and operator adjoint method are used to give the right adjoint form rigourously. From the derivation, we can have following results: 1) The loss gradient is not an ODE, it is an integral and we shows the reason; 2) The traditional adjoint form is not equivalent with the back propagation results. 3) The adjoint operator analysis shows that if and only if the discrete adjoint has the sa… ▽ More Perturbation and operator adjoint method are used to give the right adjoint form rigourously. From the derivation, we can have following results: 1) The loss gradient is not an ODE, it is an integral and we shows the reason; 2) The traditional adjoint form is not equivalent with the back propagation results. 3) The adjoint operator analysis shows that if and only if the discrete adjoint has the same scheme with the discrete neural ODE, the adjoint form would give the same results as BP does. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2401.11818 [pdf, other]

MInD: Improving Multimodal Sentiment Analysis via Multimodal Information Disentanglement

Authors: Weichen Dai, Xingyu Li, Pengbo Hu, Zeyu Wang, Ji Qi, Jianlin Peng, Yi Zhou

Abstract: Learning effective joint representations has been a central task in multimodal sentiment analysis. Previous methods focus on leveraging the correlations between different modalities and enhancing performance through sophisticated fusion techniques. However, challenges still exist due to the inherent heterogeneity of distinct modalities, which may lead to distributional gap, impeding the full explo… ▽ More Learning effective joint representations has been a central task in multimodal sentiment analysis. Previous methods focus on leveraging the correlations between different modalities and enhancing performance through sophisticated fusion techniques. However, challenges still exist due to the inherent heterogeneity of distinct modalities, which may lead to distributional gap, impeding the full exploitation of inter-modal information and resulting in redundancy and impurity in the information extracted from features. To address this problem, we introduce the Multimodal Information Disentanglement (MInD) approach. MInD decomposes the multimodal inputs into a modality-invariant component, a modality-specific component, and a remnant noise component for each modality through a shared encoder and multiple private encoders. The shared encoder aims to explore the shared information and commonality across modalities, while the private encoders are deployed to capture the distinctive information and characteristic features. These representations thus furnish a comprehensive perspective of the multimodal data, facilitating the fusion process instrumental for subsequent prediction tasks. Furthermore, MInD improves the learned representations by explicitly modeling the task-irrelevant noise in an adversarial manner. Experimental evaluations conducted on benchmark datasets, including CMU-MOSI, CMU-MOSEI, and UR-Funny, demonstrate MInD's superior performance over existing state-of-the-art methods in both multimodal emotion recognition and multimodal humor detection tasks. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.11399 [pdf, other]

Prospects for Joint Detection of Gravitational Waves with Counterpart Gamma-Ray Bursts Detected by the HADAR Experiment

Authors: Pei-Jin Hu, Qi-Ling Chen, Tian-Lu Chen, Ming-Ming Kang, Yi-Qing Guo, Dan-Zeng Luo-Bu, You-Liang Feng, Qi Gao, Quan-Bu Gou, Hong-Bo Hu, Hai-Jin Li, Cheng Liu, Mao-Yuan Liu, Wei Liu, Xiang-Li Qian, Bing-Qiang Qiao, Jing-Jing Su, Hui-Ying Sun, Xu Wang, Zhen Wang, Guang-Guang Xin, Chao-Wen Yang, Yu-Hua Yao, Qiang Yuan, Yi Zhang

Abstract: The detection of GW170817/GRB170817A implied the strong association between short gamma-ray bursts (SGRBs) and binary neutron star (BNS) mergers which produce gravitational waves (GWs). More evidence is needed to confirm the association and reveal the physical processes of BNS mergers. The upcoming High Altitude Detection of Astronomical Radiation (HADAR) experiment, excelling in a wide field of v… ▽ More The detection of GW170817/GRB170817A implied the strong association between short gamma-ray bursts (SGRBs) and binary neutron star (BNS) mergers which produce gravitational waves (GWs). More evidence is needed to confirm the association and reveal the physical processes of BNS mergers. The upcoming High Altitude Detection of Astronomical Radiation (HADAR) experiment, excelling in a wide field of view (FOV) and a large effective area above tens of GeV, is a hope for the prompt detection of very-high-energy (VHE; > 10 GeV) SGRBs. The aim of this paper is to simulate and analyse GW/SGRB joint detections by future GW detector networks in synergy with HADAR, including the second generation LIGO, Virgo and KAGRA and the third generation ET and CE. We provide a brief introduction of the HADAR experiment for SGRB simulations and its expected SGRB detections. For GW simulations, we adopt a phenomenological model to describe GWs produced by BNS mergers and introduce the signal-noise ratios (SNRs) as detector responses. Following a theoretical analysis we compute the redshift-dependent efficiency functions of GW detector networks. We then construct the simulation of GW detection by Monte Carlo sampling. We compare the simulated results of LIGO-Virgo O2 and O3 runs with their actual detections as a check. The combination of GW and SGRB models is then discussed for joint detection, including parameter correlations, triggered SNRs and efficiency skymaps. The estimated joint detection rates are 0.09-2.52 per year for LHVK network with HADAR under different possible configurations, and approximately 0.27-7.89 per year for ET+CE network with HADAR. △ Less

Submitted 20 January, 2024; originally announced January 2024.

arXiv:2401.10370 [pdf, other]

Deep Generative Modeling for Financial Time Series with Application in VaR: A Comparative Review

Authors: Lars Ericson, Xuejun Zhu, Xusi Han, Rao Fu, Shuang Li, Steve Guo, Ping Hu

Abstract: In the financial services industry, forecasting the risk factor distribution conditional on the history and the current market environment is the key to market risk modeling in general and value at risk (VaR) model in particular. As one of the most widely adopted VaR models in commercial banks, Historical simulation (HS) uses the empirical distribution of daily returns in a historical window as th… ▽ More In the financial services industry, forecasting the risk factor distribution conditional on the history and the current market environment is the key to market risk modeling in general and value at risk (VaR) model in particular. As one of the most widely adopted VaR models in commercial banks, Historical simulation (HS) uses the empirical distribution of daily returns in a historical window as the forecast distribution of risk factor returns in the next day. The objectives for financial time series generation are to generate synthetic data paths with good variety, and similar distribution and dynamics to the original historical data. In this paper, we apply multiple existing deep generative methods (e.g., CGAN, CWGAN, Diffusion, and Signature WGAN) for conditional time series generation, and propose and test two new methods for conditional multi-step time series generation, namely Encoder-Decoder CGAN and Conditional TimeVAE. Furthermore, we introduce a comprehensive framework with a set of KPIs to measure the quality of the generated time series for financial modeling. The KPIs cover distribution distance, autocorrelation and backtesting. All models (HS, parametric and neural networks) are tested on both historical USD yield curve data and additional data simulated from GARCH and CIR processes. The study shows that top performing models are HS, GARCH and CWGAN models. Future research directions in this area are also discussed. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.07842 [pdf, ps, other]

Closing the Performance and Management Gaps with Satellite Internet: Challenges, Approaches, and Future Directions

Authors: Peng Hu

Abstract: Recent advancements in low-Earth orbit (LEO) satellites represented by large constellations and advanced payloads provide great promises for enabling beyond 5G and 6G telecommunications and high-quality and ubiquitous Internet connectivity to everyone anywhere on Earth. LEO satellite networks are envisioned to bridge the urban-rural connectivity gap for the digital divide. However, the digital div… ▽ More Recent advancements in low-Earth orbit (LEO) satellites represented by large constellations and advanced payloads provide great promises for enabling beyond 5G and 6G telecommunications and high-quality and ubiquitous Internet connectivity to everyone anywhere on Earth. LEO satellite networks are envisioned to bridge the urban-rural connectivity gap for the digital divide. However, the digital divide can hardly be closed by only providing connectivity to rural and remote areas. Various unprecedented challenges brought by the emerging satellite Internet still need to be resolved, such as inconsistent end-to-end performance guarantees and a lack of efficient management and operations in these areas, which are referred to as "performance gap" and "management gap", respectively. This position paper will briefly discuss these gaps, approaches to addressing the gaps, and some research directions based on our recent works. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: Published at the IAB Workshop on Barriers to Internet Access of Services (BIAS) 2024. Available at: https://www.ietf.org/slides/slides-biasws-closing-the-performance-and-management-gaps-with-satellite-internet-challenges-approaches-and-future-directions-01.pdf

arXiv:2401.06786 [pdf, other]

CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation

Authors: Yifei Xu, Yuning Chen, Xumiao Zhang, Xianshang Lin, Pan Hu, Yunfei Ma, Songwu Lu, Wan Du, Zhuoqing Mao, Ennan Zhai, Dennis Cai

Abstract: Among the thriving ecosystem of cloud computing and the proliferation of Large Language Model (LLM)-based code generation tools, there is a lack of benchmarking for code generation in cloud-native applications. In response to this need, we present CloudEval-YAML, a practical benchmark for cloud configuration generation. CloudEval-YAML tackles the diversity challenge by focusing on YAML, the de fac… ▽ More Among the thriving ecosystem of cloud computing and the proliferation of Large Language Model (LLM)-based code generation tools, there is a lack of benchmarking for code generation in cloud-native applications. In response to this need, we present CloudEval-YAML, a practical benchmark for cloud configuration generation. CloudEval-YAML tackles the diversity challenge by focusing on YAML, the de facto standard of numerous cloud-native tools. We develop the CloudEval-YAML benchmark with practicality in mind: the dataset consists of hand-written problems with unit tests targeting practical scenarios. We further enhanced the dataset to meet practical needs by rephrasing questions in a concise, abbreviated, and bilingual manner. The dataset consists of 1011 problems that take more than 1200 human hours to complete. To improve practicality during evaluation, we build a scalable evaluation platform for CloudEval-YAML that achieves a 20 times speedup over a single machine. To the best of our knowledge, the CloudEval-YAML dataset is the first hand-written dataset targeting cloud-native applications. We present an in-depth evaluation of 12 LLMs, leading to a deeper understanding of the problems and LLMs, as well as effective methods to improve task performance and reduce cost. △ Less

Submitted 9 November, 2023; originally announced January 2024.

arXiv:2401.02869 [pdf, ps, other]

Practical Reasoning in DatalogMTL

Authors: Dingmin Wang, Przemysław A. Wałęga, Pan Hu, Bernardo Cuenca Grau

Abstract: DatalogMTL is an extension of Datalog with metric temporal operators that has found an increasing number of applications in recent years. Reasoning in DatalogMTL is, however, of high computational complexity, which makes reasoning in modern data-intensive applications challenging. In this paper we present a practical reasoning algorithm for the full DatalogMTL language, which we have implemented i… ▽ More DatalogMTL is an extension of Datalog with metric temporal operators that has found an increasing number of applications in recent years. Reasoning in DatalogMTL is, however, of high computational complexity, which makes reasoning in modern data-intensive applications challenging. In this paper we present a practical reasoning algorithm for the full DatalogMTL language, which we have implemented in a system called MeTeoR. Our approach effectively combines an optimised (but generally non-terminating) materialisation (a.k.a. forward chaining) procedure, which provides scalable behaviour, with an automata-based component that guarantees termination and completeness. To ensure favourable scalability of the materialisation component, we propose a novel seminaïve materialisation procedure for DatalogMTL enjoying the non-repetition property, which ensures that each specific rule application will be considered at most once throughout the entire execution of the algorithm. Moreover, our materialisation procedure is enhanced with additional optimisations which further reduce the number of redundant computations performed during materialisation by disregarding rules as soon as it is certain that they cannot derive new facts in subsequent materialisation steps. Our extensive evaluation supports the practicality of our approach. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: Under consideration in Theory and Practice of Logic Programming (TPLP). arXiv admin note: text overlap with arXiv:2208.07100

arXiv:2401.01077 [pdf, other]

Constrained Online Two-stage Stochastic Optimization: Algorithm with (and without) Predictions

Authors: Piao Hu, Jiashuo Jiang, Guodong Lyu, Hao Su

Abstract: We consider an online two-stage stochastic optimization with long-term constraints over a finite horizon of $T$ periods. At each period, we take the first-stage action, observe a model parameter realization and then take the second-stage action from a feasible set that depends both on the first-stage decision and the model parameter. We aim to minimize the cumulative objective value while guarante… ▽ More We consider an online two-stage stochastic optimization with long-term constraints over a finite horizon of $T$ periods. At each period, we take the first-stage action, observe a model parameter realization and then take the second-stage action from a feasible set that depends both on the first-stage decision and the model parameter. We aim to minimize the cumulative objective value while guaranteeing that the long-term average second-stage decision belongs to a set. We develop online algorithms for the online two-stage problem from adversarial learning algorithms. Also, the regret bound of our algorithm can be reduced to the regret bound of embedded adversarial learning algorithms. Based on this framework, we obtain new results under various settings. When the model parameters are drawn from unknown non-stationary distributions and we are given machine-learned predictions of the distributions, we develop a new algorithm from our framework with a regret $O(W_T+\sqrt{T})$, where $W_T$ measures the total inaccuracy of the machine-learned predictions. We then develop another algorithm that works when no machine-learned predictions are given and show the performances. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2302.00997

arXiv:2401.00435 [pdf, other]

Bidirectional Trained Tree-Structured Decoder for Handwritten Mathematical Expression Recognition

Authors: Hanbo Cheng, Chenyu Liu, Pengfei Hu, Zhenrong Zhang, Jiefeng Ma, Jun Du

Abstract: The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR. Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models. However, existing methods fail to effectively utilize bidirectional context information during the inference stage. Furthermore, current bidirectional tr… ▽ More The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR. Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models. However, existing methods fail to effectively utilize bidirectional context information during the inference stage. Furthermore, current bidirectional training methods are primarily designed for string decoders and cannot adequately generalize to tree decoders, which offer superior generalization capabilities and structural analysis capacity. In order to overcome these limitations, we propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure. Our method extends the bidirectional training strategy to the tree decoder, allowing for more effective training by leveraging bidirectional information. Additionally, we analyze the impact of the visual and linguistic perception of the HMER model separately and introduce the Shared Language Modeling (SLM) mechanism. Through the SLM, we enhance the model's robustness and generalization when dealing with visual ambiguity, particularly in scenarios with abundant training data. Our approach has been validated through extensive experiments, demonstrating its ability to achieve new state-of-the-art results on the CROHME 2014, 2016, and 2019 datasets, as well as the HME100K dataset. The code used in our experiments will be publicly available. △ Less

Submitted 31 December, 2023; originally announced January 2024.

arXiv:2312.17460 [pdf, other]

Momentum and angular correlations in \texorpdfstring{$Z/γ$}{Z/gamma}-hadron production in relativistic heavy-ion collisions

Authors: Zhan Gao, Lin Chen, Peng-Hui Hu, Man Xie, Han-Zhong Zhang

Abstract: We carry out a detailed study of medium modifications on momentum and angular correlations between a large transverse momentum hadron and a $Z/γ$ trigger in relativistic heavy-ion collisions within a perturbative QCD parton model improved by the Sudakov resummation technique. The total energy loss of a hard parton propagating inside the medium is employed to modify the fragmentation function, whil… ▽ More We carry out a detailed study of medium modifications on momentum and angular correlations between a large transverse momentum hadron and a $Z/γ$ trigger in relativistic heavy-ion collisions within a perturbative QCD parton model improved by the Sudakov resummation technique. The total energy loss of a hard parton propagating inside the medium is employed to modify the fragmentation function, while the medium-induced transverse momentum broadening is included in the resummation approach, and both of them are related to the jet transport parameter and obtained by the high-twist formalism. We obtain good agreements with the existing data on transverse momentum and azimuthal angular correlations for the $Z/γ$-hadron pairs in $pp$ and $AA$ collisions, and predict the correlations for the $γ$-hadron in central $PbPb$ collisions at 5.02 TeV. The numerical analyses for the $Z/γ$-hadron in central $PbPb$ collisions show that the normalized angular distribution is decorrelated due to the medium-induced transverse momentum broadening, however, the angular correlation is enhanced due to the parton energy loss, namely anti-broadening. The observed modification of the angular correlation is a result of the competition between the broadening and the anti-broadening. This work provides a reliable theoretical tool for a comprehensive and precise study of jet quenching in relativistic heavy-ion collisions. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 19 pages, 20 figures

arXiv:2312.11610 [pdf, ps, other]

Improved Reall-Santos method for AdS black holes in general 4-derivative gravities

Authors: Peng-Ju Hu, Liang Ma, H. Lu, Yi Pang

Abstract: For asymptotically flat black holes, Reall-Santos method is a convenient tool to compute leading higher derivative corrections to the thermodynamic quantities without actually solving the modified field equations. However, there are subtleties in its generalization to asymptotically AdS black holes with general higher derivative corrections. First of all, it is necessary to know all the higher der… ▽ More For asymptotically flat black holes, Reall-Santos method is a convenient tool to compute leading higher derivative corrections to the thermodynamic quantities without actually solving the modified field equations. However, there are subtleties in its generalization to asymptotically AdS black holes with general higher derivative corrections. First of all, it is necessary to know all the higher derivative holographic counterterms and the surface terms implementing the variational principle and subtracting the divergence. One then needs to solve for the modified AdS radius and rescale the time coordinate in an appropriate way such that the induced metric on the conformal boundary of AdS black hole is not modified. We observe that Reall-Santos method can be directly applied to a particular 4-derivative gravity model, known as the Einstein-Weyl gravity, which does not modify the AdS radius and requires only the Gibbons-Hawking-York term and holographic counterterms for the 2-derivative theory. We thus suggest that to compute the thermodynamic quantities of AdS black holes in general 4-derivative theories of gravity, one simply needs to transform it to a Einstein-Weyl gravity with identical thermodynamic variables by appropriate field redefinitions. We explicitly verify this proposal with spherically-symmetric and static charged black holes in Einstein-Maxwell theory extended with generic 4-derivative interactions. △ Less

Submitted 17 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: LateX; 33 pages; accepted by SCIENCE CHINA Physics, Mechanics & Astronomy

Showing 1–50 of 391 results for author: Hu, P