subscribe to arXiv mailings

Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending

Authors: Delong Wu, Hao Zhu, Qi Zhang, You Li, Zhan Ma, Xun Cao

Abstract: Implicit Neural Representation (INR) has become a popular method for representing visual signals (e.g., 2D images and 3D scenes), demonstrating promising results in various downstream applications. Given its potential as a medium for visual signals, exploring the development of a neural blending method that utilizes INRs is a natural progression. Neural blending involves merging two INRs to create… ▽ More Implicit Neural Representation (INR) has become a popular method for representing visual signals (e.g., 2D images and 3D scenes), demonstrating promising results in various downstream applications. Given its potential as a medium for visual signals, exploring the development of a neural blending method that utilizes INRs is a natural progression. Neural blending involves merging two INRs to create a new INR that encapsulates information from both original representations. A direct approach involves applying traditional image editing methods to the INR rendering process. However, this method often results in blending distortions, artifacts, and color shifts, primarily due to the discretization of the underlying pixel grid and the introduction of boundary conditions for solving variational problems. To tackle this issue, we introduce the Neural Poisson Solver, a plug-and-play and universally applicable framework across different signal dimensions for blending visual signals represented by INRs. Our Neural Poisson Solver offers a variational problem-solving approach based on the continuous Poisson equation, demonstrating exceptional performance across various domains. Specifically, we propose a gradient-guided neural solver to represent the solution process of the variational problem, refining the target signal to achieve natural blending results. We also develop a Poisson equation-based loss and optimization scheme to train our solver, ensuring it effectively blends the input INR scenes while preserving their inherent structure and semantic content. The lack of dependence on additional prior knowledge makes our method easily adaptable to various task categories, highlighting its versatility. Comprehensive experimental results validate the robustness of our approach across multiple dimensions and blending tasks. △ Less

Submitted 11 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

Comments: accepted by ECCV 2024

arXiv:2407.08127 [pdf, other]

Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment

Authors: Yufan Liu, Wanqian Zhang, Dayan Wu, Zheng Lin, Jingzi Gu, Weiping Wang

Abstract: Model inversion (MI) attack reconstructs the private training data of a target model given its output, posing a significant threat to deep learning models and data privacy. On one hand, most of existing MI methods focus on searching for latent codes to represent the target identity, yet this iterative optimization-based scheme consumes a huge number of queries to the target model, making it unreal… ▽ More Model inversion (MI) attack reconstructs the private training data of a target model given its output, posing a significant threat to deep learning models and data privacy. On one hand, most of existing MI methods focus on searching for latent codes to represent the target identity, yet this iterative optimization-based scheme consumes a huge number of queries to the target model, making it unrealistic especially in black-box scenario. On the other hand, some training-based methods launch an attack through a single forward inference, whereas failing to directly learn high-level mappings from prediction vectors to images. Addressing these limitations, we propose a novel Prediction-to-Image (P2I) method for black-box MI attack. Specifically, we introduce the Prediction Alignment Encoder to map the target model's output prediction into the latent code of StyleGAN. In this way, prediction vector space can be well aligned with the more disentangled latent space, thus establishing a connection between prediction vectors and the semantic facial features. During the attack phase, we further design the Aligned Ensemble Attack scheme to integrate complementary facial attributes of target identity for better reconstruction. Experimental results show that our method outperforms other SOTAs, e.g.,compared with RLB-MI, our method improves attack accuracy by 8.5% and reduces query numbers by 99% on dataset CelebA. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV 2024

arXiv:2407.07416 [pdf, other]

Shadow of slowly rotating Kalb-Ramond black holes

Authors: Wentao Liu, Di Wu, Jieci Wang

Abstract: Real astronomical objects possess spin, yet deriving exact solutions for rotating black holes within gravitational theories is a formidable challenge. To understand the shadow of rotating black holes in Lorentz-violating spacetimes induced by antisymmetric tensor fields, known as Kalb-Ramond (KR) fields, we have focused on the slow-rotation approximation framework. Using this approach, we have obt… ▽ More Real astronomical objects possess spin, yet deriving exact solutions for rotating black holes within gravitational theories is a formidable challenge. To understand the shadow of rotating black holes in Lorentz-violating spacetimes induced by antisymmetric tensor fields, known as Kalb-Ramond (KR) fields, we have focused on the slow-rotation approximation framework. Using this approach, we have obtained first-order rotation series solutions, which describe slowly rotating KR black holes. For this solutions, we have plotted the black hole shadow contours under various parameters using the numerical backward ray-tracing method. As the Lorentz-violating parameter increases, not only the apparent size of the black hole shadow decreases, but also the effects of rotation, such as the D-shaped structure and frame-dragging, are amplified. Furthermore, the KR field also enhances gravitational lensing, causing the shadow to occupy a larger area within the photon ring. This distinctive feature can differentiate KR gravity from general relativity. Additionally, using the latest observational data from EHT on M87* and Sgr A*, we have provided constraints on the Lorentz-violating parameter of rotating KR black holes. We found that, compared to static black holes, rotating black holes allow for the presence of stronger Lorentz violation effects. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 14 pages, 30 figures

arXiv:2407.07026 [pdf, other]

Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition

Authors: Daiqing Wu, Dongbao Yang, Huawen Shen, Can Ma, Yu Zhou

Abstract: With the proliferation of social media posts in recent years, the need to detect sentiments in multimodal (image-text) content has grown rapidly. Since posts are user-generated, the image and text from the same post can express different or even contradictory sentiments, leading to potential \textbf{sentiment discrepancy}. However, existing works mainly adopt a single-branch fusion structure that… ▽ More With the proliferation of social media posts in recent years, the need to detect sentiments in multimodal (image-text) content has grown rapidly. Since posts are user-generated, the image and text from the same post can express different or even contradictory sentiments, leading to potential \textbf{sentiment discrepancy}. However, existing works mainly adopt a single-branch fusion structure that primarily captures the consistent sentiment between image and text. The ignorance or implicit modeling of discrepant sentiment results in compromised unimodal encoding and limited performances. In this paper, we propose a semantics Completion and Decomposition (CoDe) network to resolve the above issue. In the semantics completion module, we complement image and text representations with the semantics of the OCR text embedded in the image, helping bridge the sentiment gap. In the semantics decomposition module, we decompose image and text representations with exclusive projection and contrastive learning, thereby explicitly capturing the discrepant sentiment between modalities. Finally, we fuse image and text representations by cross-attention and combine them with the learned discrepant sentiment for final classification. Extensive experiments conducted on four multimodal sentiment datasets demonstrate the superiority of CoDe against SOTA methods. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 8 pages, 6 figures

arXiv:2407.05917 [pdf, other]

Unveiling nonmagnetic phase and many-body entanglement in two-dimensional random quantum magnets Sr$_2$CuTe$_{1-x}$W$_x$O$_6$

Authors: Dian Wu, Fan Yang, Giuseppe Carleo

Abstract: We apply a random-plaquette $J_1$-$J_2$ model on the square lattice to capture the physics of a series of spin-$1/2$ Heisenberg antiferromagnet compounds Sr$_2$CuTe$_{1-x}$W$_x$O$_6$. With the input of experimentally relevant coupling strengths, our exact diagonalization (ED) study probes the ground state properties beyond previous linear spin-wave approach. An intermediate range of… ▽ More We apply a random-plaquette $J_1$-$J_2$ model on the square lattice to capture the physics of a series of spin-$1/2$ Heisenberg antiferromagnet compounds Sr$_2$CuTe$_{1-x}$W$_x$O$_6$. With the input of experimentally relevant coupling strengths, our exact diagonalization (ED) study probes the ground state properties beyond previous linear spin-wave approach. An intermediate range of $x \in [0.08, 0.55]$ is identified for a nonmagnetic phase without the long-range Néel or stripe order. The absence of both valence-bond-glass order and spin-glass non-ergodic dynamics renders its nature intriguing. Deep inside this phase around $x = 0.3$, we observe signatures potentially linked to randomness-induced short-range spin-liquid-like (SLL) states, including close to zero spin-freezing parameter, vanishing spin-spin correlation beyond nearest neighbors, almost uniform static spin structure factor, as well as a broad tail in the dynamical spin structure factor. The nonmagnetic phase also features multipartite entanglement in the ground state witnessed by quantum Fisher information (QFI), which exhibits universal scaling behaviors at quantum critical points. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 14 pages, 10 figures

arXiv:2407.04162 [pdf, other]

Measurement Embedded Schrödinger Bridge for Inverse Problems

Authors: Yuang Wang, Pengfei Jin, Siyeop Yoon, Matthew Tivnan, Quanzheng Li, Li Zhang, Dufan Wu

Abstract: Score-based diffusion models are frequently employed as structural priors in inverse problems. However, their iterative denoising process, initiated from Gaussian noise, often results in slow inference speeds. The Image-to-Image Schrödinger Bridge (I$^2$SB), which begins with the corrupted image, presents a promising alternative as a prior for addressing inverse problems. In this work, we introduc… ▽ More Score-based diffusion models are frequently employed as structural priors in inverse problems. However, their iterative denoising process, initiated from Gaussian noise, often results in slow inference speeds. The Image-to-Image Schrödinger Bridge (I$^2$SB), which begins with the corrupted image, presents a promising alternative as a prior for addressing inverse problems. In this work, we introduce the Measurement Embedded Schrödinger Bridge (MESB). MESB establishes Schrödinger Bridges between the distribution of corrupted images and the distribution of clean images given observed measurements. Based on optimal transport theory, we derive the forward and backward processes of MESB. Through validation on diverse inverse problems, our proposed approach exhibits superior performance compared to existing Schrödinger Bridge-based inverse problems solvers in both visual quality and quantitative metrics. △ Less

Submitted 22 May, 2024; originally announced July 2024.

Comments: 14 pages, 2 figures, Neurips preprint

arXiv:2407.03871 [pdf, other]

Interior transit orbits in the planar bicircular restricted four-body problem: classification and application

Authors: Shuyue Fu, Di Wu, Shengping Gong

Abstract: Low-energy transfers are advantageous for lunar exploration missions due to low fuel consumption and extended launch periods. This paper is devoted to the classification of interior transit orbits and their application on low-energy transfer in the Sun-Earth/Moon planar bicircular restricted four-body problem (PBCR4BP). First, the Lagrangian coherent structures (LCSs) are introduced to generate th… ▽ More Low-energy transfers are advantageous for lunar exploration missions due to low fuel consumption and extended launch periods. This paper is devoted to the classification of interior transit orbits and their application on low-energy transfer in the Sun-Earth/Moon planar bicircular restricted four-body problem (PBCR4BP). First, the Lagrangian coherent structures (LCSs) are introduced to generate the interior transit orbits. The number of periapses about the Moon is selected as the classification parameter and mapped into the LCSs, achieving clear classification boundaries. Then, the evolution laws of the classifications with respect to energy and solar gravity perturbation are discussed and summarized. Construction strategies for low-energy transfer are proposed based on the classifications and their evolution laws. Numerical simulation of the transfer trajectories verifies the effectiveness of the proposed strategies. The dynamical behaviors and transfer characteristics of transit orbits and their families are revealed, and a direct link between transit orbit families and low-energy transfers is finally established. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 40 pages, 30 figures

arXiv:2407.02208 [pdf, other]

How to Learn in a Noisy World? Self-Correcting the Real-World Data Noise on Machine Translation

Authors: Yan Meng, Di Wu, Christof Monz

Abstract: The massive amounts of web-mined parallel data contain large amounts of noise. Semantic misalignment, as the primary source of the noise, poses a challenge for training machine translation systems. In this paper, we first study the impact of real-world hard-to-detect misalignment noise by proposing a process to simulate the realistic misalignment controlled by semantic similarity. After quantitati… ▽ More The massive amounts of web-mined parallel data contain large amounts of noise. Semantic misalignment, as the primary source of the noise, poses a challenge for training machine translation systems. In this paper, we first study the impact of real-world hard-to-detect misalignment noise by proposing a process to simulate the realistic misalignment controlled by semantic similarity. After quantitatively analyzing the impact of simulated misalignment on machine translation, we show the limited effectiveness of widely used pre-filters to improve the translation performance, underscoring the necessity of more fine-grained ways to handle data noise. By observing the increasing reliability of the model's self-knowledge for distinguishing misaligned and clean data at the token-level, we propose a self-correction approach which leverages the model's prediction distribution to revise the training supervision from the ground-truth data over training time. Through comprehensive experiments, we show that our self-correction method not only improves translation performance in the presence of simulated misalignment noise but also proves effective for real-world noisy web-mined datasets across eight translation tasks. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.01511 [pdf, other]

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

Authors: Tianqi Xu, Linyao Chen, Dai-Jie Wu, Yanjun Chen, Zecheng Zhang, Xiang Yao, Zhiqiang Xie, Yongchao Chen, Shilong Liu, Bochen Qian, Philip Torr, Bernard Ghanem, Guohao Li

Abstract: The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the compl… ▽ More The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the complexities of constructing tasks and evaluators. To overcome these limitations, we introduce Crab, the first agent benchmark framework designed to support cross-environment tasks, incorporating a graph-based fine-grained evaluation method and an efficient mechanism for task and evaluator construction. Our framework supports multiple devices and can be easily extended to any environment with a Python interface. Leveraging Crab, we developed a cross-platform Crab Benchmark-v0 comprising 100 tasks in computer desktop and mobile phone environments. We evaluated four advanced MLMs using different single and multi-agent system configurations on this benchmark. The experimental results demonstrate that the single agent with GPT-4o achieves the best completion ratio of 35.26%. All framework code, agent code, and task datasets are publicly available at https://github.com/camel-ai/crab. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00610 [pdf, other]

Diff-BBO: Diffusion-Based Inverse Modeling for Black-Box Optimization

Authors: Dongxia Wu, Nikki Lijing Kuang, Ruijia Niu, Yi-An Ma, Rose Yu

Abstract: Black-box optimization (BBO) aims to optimize an objective function by iteratively querying a black-box oracle. This process demands sample-efficient optimization due to the high computational cost of function evaluations. While prior studies focus on forward approaches to learn surrogates for the unknown objective function, they struggle with high-dimensional inputs where valid inputs form a smal… ▽ More Black-box optimization (BBO) aims to optimize an objective function by iteratively querying a black-box oracle. This process demands sample-efficient optimization due to the high computational cost of function evaluations. While prior studies focus on forward approaches to learn surrogates for the unknown objective function, they struggle with high-dimensional inputs where valid inputs form a small subspace (e.g., valid protein sequences), which is common in real-world tasks. Recently, diffusion models have demonstrated impressive capability in learning the high-dimensional data manifold. They have shown promising performance in black-box optimization tasks but only in offline settings. In this work, we propose diffusion-based inverse modeling for black-box optimization (Diff-BBO), the first inverse approach leveraging diffusion models for online BBO problem. Diff-BBO distinguishes itself from forward approaches through the design of acquisition function. Instead of proposing candidates in the design space, Diff-BBO employs a novel acquisition function Uncertainty-aware Exploration (UaE) to propose objective function values, which leverages the uncertainty of a conditional diffusion model to generate samples in the design space. Theoretically, we prove that using UaE leads to optimal optimization outcomes. Empirically, we redesign experiments on the Design-Bench benchmark for online settings and show that Diff-BBO achieves state-of-the-art performance. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00377 [pdf, other]

The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention

Authors: Yixin Wan, Di Wu, Haoran Wang, Kai-Wei Chang

Abstract: Prompt-based "diversity interventions" are commonly adopted to improve the diversity of Text-to-Image (T2I) models depicting individuals with various racial or gender traits. However, will this strategy result in nonfactual demographic distribution, especially when generating real historical figures? In this work, we propose DemOgraphic FActualIty Representation (DoFaiR), a benchmark to systematic… ▽ More Prompt-based "diversity interventions" are commonly adopted to improve the diversity of Text-to-Image (T2I) models depicting individuals with various racial or gender traits. However, will this strategy result in nonfactual demographic distribution, especially when generating real historical figures? In this work, we propose DemOgraphic FActualIty Representation (DoFaiR), a benchmark to systematically quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models. DoFaiR consists of 756 meticulously fact-checked test instances to reveal the factuality tax of various diversity prompts through an automated evidence-supported evaluation pipeline. Experiments on DoFaiR unveil that diversity-oriented instructions increase the number of different gender and racial groups in DALLE-3's generations at the cost of historically inaccurate demographic distributions. To resolve this issue, we propose Fact-Augmented Intervention (FAI), which instructs a Large Language Model (LLM) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history, and incorporate it into the generation context of T2I models. By orienting model generations using the reflected historical truths, FAI significantly improves the demographic factuality under diversity interventions while preserving diversity. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00191 [pdf, other]

MetaKP: On-Demand Keyphrase Generation

Authors: Di Wu, Xiaoxian Shen, Kai-Wei Chang

Abstract: Traditional keyphrase prediction methods predict a single set of keyphrases per document, failing to cater to the diverse needs of users and downstream applications. To bridge the gap, we introduce on-demand keyphrase generation, a novel paradigm that requires keyphrases that conform to specific high-level goals or intents. For this task, we present MetaKP, a large-scale benchmark comprising four… ▽ More Traditional keyphrase prediction methods predict a single set of keyphrases per document, failing to cater to the diverse needs of users and downstream applications. To bridge the gap, we introduce on-demand keyphrase generation, a novel paradigm that requires keyphrases that conform to specific high-level goals or intents. For this task, we present MetaKP, a large-scale benchmark comprising four datasets, 7500 documents, and 3760 goals across news and biomedical domains with human-annotated keyphrases. Leveraging MetaKP, we design both supervised and unsupervised methods, including a multi-task fine-tuning approach and a self-consistency prompting method with large language models. The results highlight the challenges of supervised fine-tuning, whose performance is not robust to distribution shifts. By contrast, the proposed self-consistency prompting approach greatly improves the performance of large language models, enabling GPT-4o to achieve 0.548 SemF1, surpassing the performance of a fully fine-tuned BART-base model. Finally, we demonstrate the potential of our method to serve as a general NLP infrastructure, exemplified by its application in epidemic event detection from social media. △ Less

Submitted 28 June, 2024; originally announced July 2024.

arXiv:2407.00167 [pdf, other]

Can GPT-4 Help Detect Quit Vaping Intentions? An Exploration of Automatic Data Annotation Approach

Authors: Sai Krishna Revanth Vuruma, Dezhi Wu, Saborny Sen Gupta, Lucas Aust, Valerie Lookingbill, Wyatt Bellamy, Yang Ren, Erin Kasson, Li-Shiun Chen, Patricia Cavazos-Rehg, Dian Hu, Ming Huang

Abstract: In recent years, the United States has witnessed a significant surge in the popularity of vaping or e-cigarette use, leading to a notable rise in cases of e-cigarette and vaping use-associated lung injury (EVALI) that caused hospitalizations and fatalities during the EVALI outbreak in 2019, highlighting the urgency to comprehend vaping behaviors and develop effective strategies for cessation. Due… ▽ More In recent years, the United States has witnessed a significant surge in the popularity of vaping or e-cigarette use, leading to a notable rise in cases of e-cigarette and vaping use-associated lung injury (EVALI) that caused hospitalizations and fatalities during the EVALI outbreak in 2019, highlighting the urgency to comprehend vaping behaviors and develop effective strategies for cessation. Due to the ubiquity of social media platforms, over 4.7 billion users worldwide use them for connectivity, communications, news, and entertainment with a significant portion of the discourse related to health, thereby establishing social media data as an invaluable organic data resource for public health research. In this study, we extracted a sample dataset from one vaping sub-community on Reddit to analyze users' quit-vaping intentions. Leveraging OpenAI's latest large language model GPT-4 for sentence-level quit vaping intention detection, this study compares the outcomes of this model against layman and clinical expert annotations. Using different prompting strategies such as zero-shot, one-shot, few-shot and chain-of-thought prompting, we developed 8 prompts with varying levels of detail to explain the task to GPT-4 and also evaluated the performance of the strategies against each other. These preliminary findings emphasize the potential of GPT-4 in social media data analysis, especially in identifying users' subtle intentions that may elude human detection. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: Accepted for the AI Applications in Public Health and Social Services workshop at the 22nd International Conference on Artificial Intelligence in Medicine (AIME 2024)

arXiv:2406.18137 [pdf, ps, other]

Sparse deep neural networks for nonparametric estimation in high-dimensional sparse regression

Authors: Dongya Wu, Xin Li

Abstract: Generalization theory has been established for sparse deep neural networks under high-dimensional regime. Beyond generalization, parameter estimation is also important since it is crucial for variable selection and interpretability of deep neural networks. Current theoretical studies concerning parameter estimation mainly focus on two-layer neural networks, which is due to the fact that the conver… ▽ More Generalization theory has been established for sparse deep neural networks under high-dimensional regime. Beyond generalization, parameter estimation is also important since it is crucial for variable selection and interpretability of deep neural networks. Current theoretical studies concerning parameter estimation mainly focus on two-layer neural networks, which is due to the fact that the convergence of parameter estimation heavily relies on the regularity of the Hessian matrix, while the Hessian matrix of deep neural networks is highly singular. To avoid the unidentifiability of deep neural networks in parameter estimation, we propose to conduct nonparametric estimation of partial derivatives with respect to inputs. We first show that model convergence of sparse deep neural networks is guaranteed in that the sample complexity only grows with the logarithm of the number of parameters or the input dimension when the $\ell_{1}$-norm of parameters is well constrained. Then by bounding the norm and the divergence of partial derivatives, we establish that the convergence rate of nonparametric estimation of partial derivatives scales as $\mathcal{O}(n^{-1/4})$, a rate which is slower than the model convergence rate $\mathcal{O}(n^{-1/2})$. To the best of our knowledge, this study combines nonparametric estimation and parametric sparse deep neural networks for the first time. As nonparametric estimation of partial derivatives is of great significance for nonlinear variable selection, the current results show the promising future for the interpretability of deep neural networks. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17456 [pdf, other]

Improving Grammatical Error Correction via Contextual Data Augmentation

Authors: Yixuan Wang, Baoxin Wang, Yijun Liu, Qingfu Zhu, Dayong Wu, Wanxiang Che

Abstract: Nowadays, data augmentation through synthetic data has been widely used in the field of Grammatical Error Correction (GEC) to alleviate the problem of data scarcity. However, these synthetic data are mainly used in the pre-training phase rather than the data-limited fine-tuning phase due to inconsistent error distribution and noisy labels. In this paper, we propose a synthetic data construction me… ▽ More Nowadays, data augmentation through synthetic data has been widely used in the field of Grammatical Error Correction (GEC) to alleviate the problem of data scarcity. However, these synthetic data are mainly used in the pre-training phase rather than the data-limited fine-tuning phase due to inconsistent error distribution and noisy labels. In this paper, we propose a synthetic data construction method based on contextual augmentation, which can ensure an efficient augmentation of the original data with a more consistent error distribution. Specifically, we combine rule-based substitution with model-based generation, using the generative model to generate a richer context for the extracted error patterns. Besides, we also propose a relabeling-based data cleaning method to mitigate the effects of noisy labels in synthetic data. Experiments on CoNLL14 and BEA19-Test show that our proposed augmentation method consistently and substantially outperforms strong baselines and achieves the state-of-the-art level with only a few synthetic data. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Accepted as Findings of ACL 2024

arXiv:2406.16425 [pdf, other]

Spin order and dynamics in the topological rare-earth germanide semimetals

Authors: Yuhao Wang, Zhixuan Zhen, Jing Meng, Igor Plokhikh, Delong Wu, Dariusz J. Gawryluk, Yang Xu, Qingfeng Zhan, Ming Shi, Ekaterina Pomjakushina, Toni Shiroka, Tian Shang

Abstract: The $RE$Al(Si,Ge) ($RE$ = rare earth) family, known to break both the inversion- and time-reversal symmetries, represents one of the most suitable platforms for investigating the interplay between correlated-electron phenomena and topologically nontrivial bands. Here, we report on systematic magnetic, transport, and muon-spin rotation and relaxation ($μ$SR) measurements on (Nd,Sm)AlGe single cryst… ▽ More The $RE$Al(Si,Ge) ($RE$ = rare earth) family, known to break both the inversion- and time-reversal symmetries, represents one of the most suitable platforms for investigating the interplay between correlated-electron phenomena and topologically nontrivial bands. Here, we report on systematic magnetic, transport, and muon-spin rotation and relaxation ($μ$SR) measurements on (Nd,Sm)AlGe single crystals, which exhibit antiferromagnetic (AFM) transitions at $T_\mathrm{N} = 6.1$ and 5.9 K, respectively. In addition, NdAlGe undergoes also an incommensurate-to-commensurate ferrimagnetic transition at 4.5 K. Weak transverse-field $μ$SR measurements confirm the AFM transitions, featuring a $\sim$90 % magnetic volume fraction. In both cases, zero-field (ZF) $μ$SR measurements reveal a more disordered internal field distribution in NdAlGe than in SmAlGe, reflected in a larger transverse muon-spin relaxation rate $λ^\mathrm{T}$ at $T \ll T_\mathrm{N}$. This may be due to the complex magnetic structure of NdAlGe, which undergoes a series of metamagnetic transitions in an external magnetic field, while SmAlGe shows only a robust AFM order. In NdAlGe, the topological Hall effect (THE) appears between the first and the second metamagnetic transitions for $H \parallel c$, while it is absent in SmAlGe. Such THE in NdAlGe is most likely attributed to the field-induced topological spin textures. The longitudinal muon-spin relaxation rate $λ^\mathrm{L}(T)$, diverges near the AFM order, followed by a clear drop at $T < T_\mathrm{N}$. In the magnetically ordered state, spin fluctuations are significantly stronger in NdAlGe than in SmAlGe. In general, our longitudinal-field $μ$SR data indicate vigorous spin fluctuations in NdAlGe, thus providing valuable insights into the origin of THE and of the possible topological spin textures in $RE$Al(Si,Ge) Weyl semimetals. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 13 pages, 14 figures

arXiv:2406.13692 [pdf, other]

Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation

Authors: Di Wu, Jia-Chen Gu, Fan Yin, Nanyun Peng, Kai-Wei Chang

Abstract: Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks. However, there are significant trustworthiness concerns as RALMs are prone to generating unfaithful outputs, including baseless information or contradictions with the retrieved context. This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decodin… ▽ More Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks. However, there are significant trustworthiness concerns as RALMs are prone to generating unfaithful outputs, including baseless information or contradictions with the retrieved context. This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decoding dynamics including sequence likelihood, uncertainty quantification, context influence, and semantic alignment to synchronously detect unfaithful sentences. By integrating efficiently measurable and complementary signals, SynCheck enables accurate and immediate feedback and intervention, achieving 0.85 AUROC in detecting faithfulness errors across six long-form retrieval-augmented generation tasks, improving prior best method by 4%. Leveraging SynCheck, we further introduce FOD, a faithfulness-oriented decoding algorithm guided by beam search for long-form retrieval-augmented generation. Empirical results demonstrate that FOD outperforms traditional strategies such as abstention, reranking, or contrastive decoding significantly in terms of faithfulness, achieving over 10% improvement across six datasets. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13461 [pdf, other]

Static neutral black holes in Kalb-Ramond gravity

Authors: Wentao Liu, Di Wu, Jieci Wang

Abstract: The Kalb-Ramond (KR) gravity theory, a modified gravity theory that nonminimally couples a KR field with a nonzero vacuum expectation value for the gravitational field, can spontaneously break the Lorentz symmetry of gravity. In a recent work, Yang et al. [Phys. Rev. D 108, 124004 (2023)] successfully derived Schwarzschild-like black hole solutions both with and without a nonzero cosmological cons… ▽ More The Kalb-Ramond (KR) gravity theory, a modified gravity theory that nonminimally couples a KR field with a nonzero vacuum expectation value for the gravitational field, can spontaneously break the Lorentz symmetry of gravity. In a recent work, Yang et al. [Phys. Rev. D 108, 124004 (2023)] successfully derived Schwarzschild-like black hole solutions both with and without a nonzero cosmological constant within the framework of KR gravity. However, their analysis did not address the more general case of static, neutral, spherically symmetric black holes. In this paper, we fill this gap by resolving the field equations to construct more general static, neutral, spherically symmetric black hole solutions both with and without a nonzero cosmological constant. Our black hole solutions are shown to obey the first law and the Bekenstein-Smarr mass formulas of black hole thermodynamics. Moreover, we demonstrate that our static neutral spherically symmetric AdS black hole does not always satisfy the reverse isoperimetric inequality (RII), as the isoperimetric ratio can be larger or smaller than unity depending on the placement of the solution parameters within the parameter space. This behavior contrasts with the above-mentioned Schwarzschild-like AdS black hole in the KR gravity theory, which always obeys the RII. Significantly, the present more general static, neutral, spherically symmetric AdS black hole is the first example of a static AdS black hole that can violate the RII. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 22 pages, 2 figures, 1 table, JHEP3.cls

arXiv:2406.12783 [pdf, ps, other]

Zeroing neural dynamics solving time-variant complex conjugate matrix equation

Authors: Jiakuang He, Dongqing Wu

Abstract: Complex conjugate matrix equations (CCME) have aroused the interest of many researchers because of computations and antilinear systems. Existing research is dominated by its time-invariant solving methods, but lacks proposed theories for solving its time-variant version. Moreover, artificial neural networks are rarely studied for solving CCME. In this paper, starting with the earliest CCME, zeroin… ▽ More Complex conjugate matrix equations (CCME) have aroused the interest of many researchers because of computations and antilinear systems. Existing research is dominated by its time-invariant solving methods, but lacks proposed theories for solving its time-variant version. Moreover, artificial neural networks are rarely studied for solving CCME. In this paper, starting with the earliest CCME, zeroing neural dynamics (ZND) is applied to solve its time-variant version. Firstly, the vectorization and Kronecker product in the complex field are defined uniformly. Secondly, Con-CZND1 model and Con-CZND2 model are proposed and theoretically prove convergence and effectiveness. Thirdly, three numerical experiments are designed to illustrate the effectiveness of the two models, compare their differences, highlight the significance of neural dynamics in the complex field, and refine the theory related to ZND. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.11828 [pdf, other]

Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

Authors: Kazusato Oko, Yujin Song, Taiji Suzuki, Denny Wu

Abstract: We study the computational and sample complexity of learning a target function $f_*:\mathbb{R}^d\to\mathbb{R}$ with additive structure, that is, $f_*(x) = \frac{1}{\sqrt{M}}\sum_{m=1}^M f_m(\langle x, v_m\rangle)$, where $f_1,f_2,...,f_M:\mathbb{R}\to\mathbb{R}$ are nonlinear link functions of single-index models (ridge functions) with diverse and near-orthogonal index features $\{v_m\}_{m=1}^M$,… ▽ More We study the computational and sample complexity of learning a target function $f_*:\mathbb{R}^d\to\mathbb{R}$ with additive structure, that is, $f_*(x) = \frac{1}{\sqrt{M}}\sum_{m=1}^M f_m(\langle x, v_m\rangle)$, where $f_1,f_2,...,f_M:\mathbb{R}\to\mathbb{R}$ are nonlinear link functions of single-index models (ridge functions) with diverse and near-orthogonal index features $\{v_m\}_{m=1}^M$, and the number of additive tasks $M$ grows with the dimensionality $M\asymp d^γ$ for $γ\ge 0$. This problem setting is motivated by the classical additive model literature, the recent representation learning theory of two-layer neural network, and large-scale pretraining where the model simultaneously acquires a large number of "skills" that are often localized in distinct parts of the trained network. We prove that a large subset of polynomial $f_*$ can be efficiently learned by gradient descent training of a two-layer neural network, with a polynomial statistical and computational complexity that depends on the number of tasks $M$ and the information exponent of $f_m$, despite the unknown link function and $M$ growing with the dimensionality. We complement this learnability guarantee with computational hardness result by establishing statistical query (SQ) lower bounds for both the correlational SQ and full SQ algorithms. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: COLT 2024

arXiv:2406.11551 [pdf, other]

Simple Yet Efficient: Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment

Authors: Jianan Jiang, Di Wu, Zhilin Jiang, Weiren Yu

Abstract: Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) aims to minimize the distance between sketches and corresponding images in the embedding space. However, scalability is hindered by the growing complexity of solutions, mainly due to the abstract nature of fine-grained sketches. In this paper, we propose a simple yet efficient approach to narrow the gap between the two modes. It mainly facilitate… ▽ More Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) aims to minimize the distance between sketches and corresponding images in the embedding space. However, scalability is hindered by the growing complexity of solutions, mainly due to the abstract nature of fine-grained sketches. In this paper, we propose a simple yet efficient approach to narrow the gap between the two modes. It mainly facilitates unified mutual information sharing both intra- and inter-samples, rather than treating them as a single feature alignment problem between modalities. Specifically, our approach includes: (i) Employing dual weight-sharing networks to optimize alignment within sketch and image domain, which also effectively mitigates model learning saturation issues. (ii) Introducing an objective optimization function based on contrastive loss to enhance the model's ability to align features intra- and inter-samples. (iii) Presenting a learnable TRSM combined of self-attention and cross-attention to promote feature representations among tokens, further enhancing sample alignment in the embedding space. Our framework achieves excellent results on CNN- and ViT-based backbones. Extensive experiments demonstrate its superiority over existing methods. We also introduce Cloths-V1, the first professional fashion sketches and images dataset, utilized to validate our method and will be beneficial for other applications. △ Less

Submitted 22 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: 10 pages,8 figures, 4 tables

arXiv:2406.11449 [pdf, ps, other]

Hermitian-Einstein equations on noncompact manifolds

Authors: Di Wu, Xi Zhang

Abstract: This paper first investigates solvability of Hermitian-Einstein equation on a Hermitian holomorphic vector bundle on the complement of an arbitrary closed subset in a compact complex manifold. The uniqueness of Hermitian-Einstein metrics on a Zariski open subset in a compact Kähler manifold was only figured out by Takuro Mochizuki recently, the second part of this paper gives an affirmative answer… ▽ More This paper first investigates solvability of Hermitian-Einstein equation on a Hermitian holomorphic vector bundle on the complement of an arbitrary closed subset in a compact complex manifold. The uniqueness of Hermitian-Einstein metrics on a Zariski open subset in a compact Kähler manifold was only figured out by Takuro Mochizuki recently, the second part of this paper gives an affirmative answer to a question proposed by Takuro Mochizuki and based on which it leads to an alternative approach to the unique issue. We also prove stability from solvability of Hermitian-Einstein equation, which together with the classical existence result of Carlos Simpson in particular establish a Kobayashi-Hitchin bijective correspondence. The argument is also effective for uniqueness result and Kobayashi-Hitchin bijective correspondence on $\mathbb{C}$, as well as on non-Kähler and semi-stable contexts. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: All comments are welcome

MSC Class: 53C07; 14J60

arXiv:2406.09829 [pdf, other]

Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

Authors: Xiangheng Shan, Dongyue Wu, Guilin Zhu, Yuanjie Shao, Nong Sang, Changxin Gao

Abstract: Open-vocabulary semantic segmentation is a challenging task, which requires the model to output semantic masks of an image beyond a close-set vocabulary. Although many efforts have been made to utilize powerful CLIP models to accomplish this task, they are still easily overfitting to training classes due to the natural gaps in semantic information between training and new classes. To overcome this… ▽ More Open-vocabulary semantic segmentation is a challenging task, which requires the model to output semantic masks of an image beyond a close-set vocabulary. Although many efforts have been made to utilize powerful CLIP models to accomplish this task, they are still easily overfitting to training classes due to the natural gaps in semantic information between training and new classes. To overcome this challenge, we propose a novel framework for openvocabulary semantic segmentation called EBSeg, incorporating an Adaptively Balanced Decoder (AdaB Decoder) and a Semantic Structure Consistency loss (SSC Loss). The AdaB Decoder is designed to generate different image embeddings for both training and new classes. Subsequently, these two types of embeddings are adaptively balanced to fully exploit their ability to recognize training classes and generalization ability for new classes. To learn a consistent semantic structure from CLIP, the SSC Loss aligns the inter-classes affinity in the image feature space with that in the text feature space of CLIP, thereby improving the generalization ability of our model. Furthermore, we employ a frozen SAM image encoder to complement the spatial information that CLIP features lack due to the low training image resolution and image-level supervision inherent in CLIP. Extensive experiments conducted across various benchmarks demonstrate that the proposed EBSeg outperforms the state-of-the-art methods. Our code and trained models will be here: https://github.com/slonetime/EBSeg. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: CVPR2024

arXiv:2406.07880 [pdf, other]

A Comprehensive Survey on Machine Learning Driven Material Defect Detection: Challenges, Solutions, and Future Prospects

Authors: Jun Bai, Di Wu, Tristan Shelley, Peter Schubel, David Twine, John Russell, Xuesen Zeng, Ji Zhang

Abstract: Material defects (MD) represent a primary challenge affecting product performance and giving rise to safety issues in related products. The rapid and accurate identification and localization of MD constitute crucial research endeavours in addressing contemporary challenges associated with MD. Although conventional non-destructive testing methods such as ultrasonic and X-ray approaches have mitigat… ▽ More Material defects (MD) represent a primary challenge affecting product performance and giving rise to safety issues in related products. The rapid and accurate identification and localization of MD constitute crucial research endeavours in addressing contemporary challenges associated with MD. Although conventional non-destructive testing methods such as ultrasonic and X-ray approaches have mitigated issues related to low efficiency in manual inspections, they struggle to meet the diverse requirements of high precision, real-time speed, automation, and intelligence. In recent years, propelled by the swift advancement of machine learning (ML) technologies, particularly exemplified by deep learning, ML has swiftly emerged as the core technology and a prominent research direction for material defect detection (MDD). Through a comprehensive review of the latest literature, we systematically survey the ML techniques applied in MDD into five categories: unsupervised learning, supervised learning, semi-supervised learning, reinforcement learning, and generative learning. We provide a detailed analysis of the main principles and techniques used, together with the advantages and potential challenges associated with these techniques. Furthermore, the survey focuses on the techniques for defect detection in composite materials, which are important types of materials enjoying increasingly wide application in various industries such as aerospace, automotive, construction, and renewable energy. Finally, the survey explores potential future directions in MDD utilizing ML technologies. This comprehensive survey not only consolidates existing literature on ML-based MDD technologies but also serves as a foundational reference for future researchers and industrial practitioners, providing valuable insights and guidance in developing advanced and efficient MDD systems. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.05498 [pdf, other]

SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner

Authors: Xunguang Wang, Daoyuan Wu, Zhenlan Ji, Zongjie Li, Pingchuan Ma, Shuai Wang, Yingjiu Li, Yang Liu, Ning Liu, Juergen Rahmel

Abstract: Jailbreaking is an emerging adversarial attack that bypasses the safety alignment deployed in off-the-shelf large language models (LLMs) and has evolved into four major categories: optimization-based attacks such as Greedy Coordinate Gradient (GCG), jailbreak template-based attacks such as "Do-Anything-Now", advanced indirect attacks like DrAttack, and multilingual jailbreaks. However, delivering… ▽ More Jailbreaking is an emerging adversarial attack that bypasses the safety alignment deployed in off-the-shelf large language models (LLMs) and has evolved into four major categories: optimization-based attacks such as Greedy Coordinate Gradient (GCG), jailbreak template-based attacks such as "Do-Anything-Now", advanced indirect attacks like DrAttack, and multilingual jailbreaks. However, delivering a practical jailbreak defense is challenging because it needs to not only handle all the above jailbreak attacks but also incur negligible delay to user prompts, as well as be compatible with both open-source and closed-source LLMs. Inspired by how the traditional security concept of shadow stacks defends against memory overflow attacks, this paper introduces a generic LLM jailbreak defense framework called SelfDefend, which establishes a shadow LLM defense instance to concurrently protect the target LLM instance in the normal stack and collaborate with it for checkpoint-based access control. The effectiveness of SelfDefend builds upon our observation that existing LLMs (both target and defense LLMs) have the capability to identify harmful prompts or intentions in user queries, which we empirically validate using the commonly used GPT-3.5/4 models across all major jailbreak attacks. Our measurements show that SelfDefend enables GPT-3.5 to suppress the attack success rate (ASR) by 8.97-95.74% (average: 60%) and GPT-4 by even 36.36-100% (average: 83%), while incurring negligible effects on normal queries. To further improve the defense's robustness and minimize costs, we employ a data distillation approach to tune dedicated open-source defense models. These models outperform four SOTA defenses and match the performance of GPT-4-based SelfDefend, with significantly lower extra delays. We also empirically show that the tuned models are robust to targeted GCG and prompt injection attacks. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: This paper completes its earlier vision paper, available at arXiv:2402.15727

arXiv:2406.05039 [pdf, other]

Bootstrapping Referring Multi-Object Tracking

Authors: Yani Zhang, Dongming Wu, Wencheng Han, Xingping Dong

Abstract: Referring multi-object tracking (RMOT) aims at detecting and tracking multiple objects following human instruction represented by a natural language expression. Existing RMOT benchmarks are usually formulated through manual annotations, integrated with static regulations. This approach results in a dearth of notable diversity and a constrained scope of implementation. In this work, our key idea is… ▽ More Referring multi-object tracking (RMOT) aims at detecting and tracking multiple objects following human instruction represented by a natural language expression. Existing RMOT benchmarks are usually formulated through manual annotations, integrated with static regulations. This approach results in a dearth of notable diversity and a constrained scope of implementation. In this work, our key idea is to bootstrap the task of referring multi-object tracking by introducing discriminative language words as much as possible. In specific, we first develop Refer-KITTI into a large-scale dataset, named Refer-KITTI-V2. It starts with 2,719 manual annotations, addressing the issue of class imbalance and introducing more keywords to make it closer to real-world scenarios compared to Refer-KITTI. They are further expanded to a total of 9,758 annotations by prompting large language models, which create 617 different words, surpassing previous RMOT benchmarks. In addition, the end-to-end framework in RMOT is also bootstrapped by a simple yet elegant temporal advancement strategy, which achieves better performance than previous approaches. The source code and dataset is available at https://github.com/zyn213/TempRMOT. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.02059 [pdf, other]

Graph Adversarial Diffusion Convolution

Authors: Songtao Liu, Jinghui Chen, Tianfan Fu, Lu Lin, Marinka Zitnik, Dinghao Wu

Abstract: This paper introduces a min-max optimization formulation for the Graph Signal Denoising (GSD) problem. In this formulation, we first maximize the second term of GSD by introducing perturbations to the graph structure based on Laplacian distance and then minimize the overall loss of the GSD. By solving the min-max optimization problem, we derive a new variant of the Graph Diffusion Convolution (GDC… ▽ More This paper introduces a min-max optimization formulation for the Graph Signal Denoising (GSD) problem. In this formulation, we first maximize the second term of GSD by introducing perturbations to the graph structure based on Laplacian distance and then minimize the overall loss of the GSD. By solving the min-max optimization problem, we derive a new variant of the Graph Diffusion Convolution (GDC) architecture, called Graph Adversarial Diffusion Convolution (GADC). GADC differs from GDC by incorporating an additional term that enhances robustness against adversarial attacks on the graph structure and noise in node features. Moreover, GADC improves the performance of GDC on heterophilic graphs. Extensive experiments demonstrate the effectiveness of GADC across various datasets. Code is available at https://github.com/SongtaoLiu0823/GADC. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by ICML 2024

arXiv:2406.01581 [pdf, other]

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

Authors: Jason D. Lee, Kazusato Oko, Taiji Suzuki, Denny Wu

Abstract: We study the problem of gradient descent learning of a single-index target function $f_*(\boldsymbol{x}) = \textstyleσ_*\left(\langle\boldsymbol{x},\boldsymbolθ\rangle\right)$ under isotropic Gaussian data in $\mathbb{R}^d$, where the link function $σ_*:\mathbb{R}\to\mathbb{R}$ is an unknown degree $q$ polynomial with information exponent $p$ (defined as the lowest degree in the Hermite expansion)… ▽ More We study the problem of gradient descent learning of a single-index target function $f_*(\boldsymbol{x}) = \textstyleσ_*\left(\langle\boldsymbol{x},\boldsymbolθ\rangle\right)$ under isotropic Gaussian data in $\mathbb{R}^d$, where the link function $σ_*:\mathbb{R}\to\mathbb{R}$ is an unknown degree $q$ polynomial with information exponent $p$ (defined as the lowest degree in the Hermite expansion). Prior works showed that gradient-based training of neural networks can learn this target with $n\gtrsim d^{Θ(p)}$ samples, and such statistical complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary polynomial link function with a sample and runtime complexity of $n \asymp T \asymp C(q) \cdot d\mathrm{polylog} d$, where constant $C(q)$ only depends on the degree of $σ_*$, regardless of information exponent; this dimension dependence matches the information theoretic limit up to polylogarithmic factors. Core to our analysis is the reuse of minibatch in the gradient computation, which gives rise to higher-order information beyond correlational queries. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 34 pages

arXiv:2406.01007 [pdf, other]

Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay

Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, J. Cheng, Y. -C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng , et al. (177 additional authors not shown)

Abstract: This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive… ▽ More This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00714 [pdf, other]

A Survey of Deep Learning Based Radar and Vision Fusion for 3D Object Detection in Autonomous Driving

Authors: Di Wu, Feng Yang, Benlian Xu, Pan Liao, Bo Liu

Abstract: With the rapid advancement of autonomous driving technology, there is a growing need for enhanced safety and efficiency in the automatic environmental perception of vehicles during their operation. In modern vehicle setups, cameras and mmWave radar (radar), being the most extensively employed sensors, demonstrate complementary characteristics, inherently rendering them conducive to fusion and faci… ▽ More With the rapid advancement of autonomous driving technology, there is a growing need for enhanced safety and efficiency in the automatic environmental perception of vehicles during their operation. In modern vehicle setups, cameras and mmWave radar (radar), being the most extensively employed sensors, demonstrate complementary characteristics, inherently rendering them conducive to fusion and facilitating the achievement of both robust performance and cost-effectiveness. This paper focuses on a comprehensive survey of radar-vision (RV) fusion based on deep learning methods for 3D object detection in autonomous driving. We offer a comprehensive overview of each RV fusion category, specifically those employing region of interest (ROI) fusion and end-to-end fusion strategies. As the most promising fusion strategy at present, we provide a deeper classification of end-to-end fusion methods, including those 3D bounding box prediction based and BEV based approaches. Moreover, aligning with recent advancements, we delineate the latest information on 4D radar and its cutting-edge applications in autonomous vehicles (AVs). Finally, we present the possible future trends of RV fusion and summarize this paper. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00645 [pdf, other]

FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

Authors: Yuwei Fu, Haichao Zhang, Di Wu, Wei Xu, Benoit Boulet

Abstract: In this work, we investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with pre-defined textual task descriptions. We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks. To address this issue, we introduce a lightweight fine-tuning method, named Fuzzy VLM r… ▽ More In this work, we investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with pre-defined textual task descriptions. We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks. To address this issue, we introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL. Specifically, we enhance the performance of SAC/DrQ baseline agents on sparse reward tasks by fine-tuning VLM representations and using relay RL to avoid local minima. Extensive experiments on the Meta-world benchmark tasks demonstrate the efficacy of the proposed method. Code is available at: https://github.com/fuyw/FuRL. △ Less

Submitted 4 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

Comments: ICML 2024

arXiv:2406.00579 [pdf, other]

Kerr-MOG-(A)dS black hole and its shadow in scalar-tensor-vector gravity theory

Authors: Wentao Liu, Di Wu, Xiongjun Fang, Jiliang Jing, Jieci Wang

Abstract: The scalar-tensor-vector gravity (STVG) theory has attracted significant interest due to its ability to effectively address the issue of galaxy rotation curves and clusters of galaxies without considering the influence of dark matter. In this paper, we construct rotating black hole solutions with a cosmological constant in the STVG theory (i.e., Kerr-MOG-(A)dS black hole solutions), where the impo… ▽ More The scalar-tensor-vector gravity (STVG) theory has attracted significant interest due to its ability to effectively address the issue of galaxy rotation curves and clusters of galaxies without considering the influence of dark matter. In this paper, we construct rotating black hole solutions with a cosmological constant in the STVG theory (i.e., Kerr-MOG-(A)dS black hole solutions), where the import of a gravitational charge as a source modifies the gravitational constant, determined by $ G=G_{\text{N}}(1+α) $. For Kerr-MOG-dS spacetime, the observer is situated at a specific location within the domain of outer communication, rather than being located infinitely far away. Since black hole shadows are shaped by light propagation in spacetime, the interaction between the MOG parameter and the cosmological constant is expected to produce novel effects on these shadows. As the cosmological constant $Λ$ increases, the apparent size of the black hole shadow on the screen decreases. Additionally, the shadow expands with an increase in the MOG parameter $α$, reaching a maximum at a certain value, and its shape becomes more rounded under an arbitrary rotation parameter, which leads to degeneracy between different black hole parameters. However, by employing numerical backward ray-tracing techniques, we have found that gravitational lensing and the frame-dragging effect effectively distinguish this degeneracy. Our work contributes to a deeper understanding of black holes in modified gravity, their observational signatures, and constraints. △ Less

Submitted 6 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

Comments: 12 pages, 25 figures, comments are welcome

arXiv:2406.00262 [pdf, other]

Contrastive Learning Via Equivariant Representation

Authors: Sifan Song, Jinfeng Wang, Qiaochu Zhao, Xiang Li, Dufan Wu, Angelos Stefanidis, Jionglong Su, S. Kevin Zhou, Quanzheng Li

Abstract: Invariant-based Contrastive Learning (ICL) methods have achieved impressive performance across various domains. However, the absence of latent space representation for distortion (augmentation)-related information in the latent space makes ICL sub-optimal regarding training efficiency and robustness in downstream tasks. Recent studies suggest that introducing equivariance into Contrastive Learning… ▽ More Invariant-based Contrastive Learning (ICL) methods have achieved impressive performance across various domains. However, the absence of latent space representation for distortion (augmentation)-related information in the latent space makes ICL sub-optimal regarding training efficiency and robustness in downstream tasks. Recent studies suggest that introducing equivariance into Contrastive Learning (CL) can improve overall performance. In this paper, we rethink the roles of augmentation strategies and equivariance in improving CL efficacy. We propose a novel Equivariant-based Contrastive Learning (ECL) framework, CLeVER (Contrastive Learning Via Equivariant Representation), compatible with augmentation strategies of arbitrary complexity for various mainstream CL methods and model frameworks. Experimental results demonstrate that CLeVER effectively extracts and incorporates equivariant information from data, thereby improving the training efficiency and robustness of baseline models in downstream tasks. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: Preprint. Under review

arXiv:2405.20849 [pdf, ps, other]

Locally Stationary Distributions: A Framework for Analyzing Slow-Mixing Markov Chains

Authors: Kuikui Liu, Sidhanth Mohanty, Prasad Raghavendra, Amit Rajaraman, David X. Wu

Abstract: Many natural Markov chains fail to mix to their stationary distribution in polynomially many steps. Often, this slow mixing is inevitable since it is computationally intractable to sample from their stationary measure. Nevertheless, Markov chains can be shown to always converge quickly to measures that are *locally stationary*, i.e., measures that don't change over a small number of steps. These… ▽ More Many natural Markov chains fail to mix to their stationary distribution in polynomially many steps. Often, this slow mixing is inevitable since it is computationally intractable to sample from their stationary measure. Nevertheless, Markov chains can be shown to always converge quickly to measures that are *locally stationary*, i.e., measures that don't change over a small number of steps. These locally stationary measures are analogous to local minima in continuous optimization, while stationary measures correspond to global minima. While locally stationary measures can be statistically far from stationary measures, do they enjoy provable theoretical guarantees that have algorithmic implications? We study this question in this work and demonstrate three algorithmic applications of locally stationary measures: 1. We show that Glauber dynamics on the hardcore model can be used to find independent sets of size $Ω\left(\frac{\log d}{d} \cdot n\right)$ in triangle-free graphs of degree at most $d$. 2. Let $W$ be a symmetric real matrix with bounded spectral diameter and $v$ be a unit vector. Given the matrix $M = λvv^\top + W$ with a planted rank-one spike along vector $v$, for sufficiently large constant $λ$, Glauber dynamics on the Ising model defined by $M$ samples vectors $x \in \{\pm 1\}^n$ that have constant correlation with the vector $v$. 3. Let $M = A_{\mathbf{G}} - \frac{d}{n}\mathbf{1}\mathbf{1}^\top$ be a centered version of the adjacency matrix where the graph $\mathbf{G}$ is drawn from a sparse 2-community stochastic block model. We show that for sufficiently large constant $λ$, Glauber dynamics on the Ising model defined by $M$ samples vectors $x \in \{\pm 1\}^n$ that have constant correlation with the hidden community vector $\mathbfσ$. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 34 pages

arXiv:2405.20614 [pdf, other]

EPIDetect: Video-based convulsive seizure detection in chronic epilepsy mouse model for anti-epilepsy drug screening

Authors: Junming Ren, Zhoujian Xiao, Yujia Zhang, Yujie Yang, Ling He, Ezra Yoon, Stephen Temitayo Bello, Xi Chen, Dapeng Wu, Micky Tortorella, Jufang He

Abstract: In the preclinical translational studies, drug candidates with remarkable anti-epileptic efficacy demonstrate long-term suppression of spontaneous recurrent seizures (SRSs), particularly convulsive seizures (CSs), in mouse models of chronic epilepsy. However, the current methods for monitoring CSs have limitations in terms of invasiveness, specific laboratory settings, high cost, and complex opera… ▽ More In the preclinical translational studies, drug candidates with remarkable anti-epileptic efficacy demonstrate long-term suppression of spontaneous recurrent seizures (SRSs), particularly convulsive seizures (CSs), in mouse models of chronic epilepsy. However, the current methods for monitoring CSs have limitations in terms of invasiveness, specific laboratory settings, high cost, and complex operation, which hinder drug screening efforts. In this study, a camera-based system for automated detection of CSs in chronically epileptic mice is first established to screen potential anti-epilepsy drugs. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20584 [pdf, other]

Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization

Authors: Yisu Liu, Jinyang An, Wanqian Zhang, Dayan Wu, Jingzi Gu, Zheng Lin, Weiping Wang

Abstract: With the development of diffusion-based customization methods like DreamBooth, individuals now have access to train the models that can generate their personalized images. Despite the convenience, malicious users have misused these techniques to create fake images, thereby triggering a privacy security crisis. In light of this, proactive adversarial attacks are proposed to protect users against cu… ▽ More With the development of diffusion-based customization methods like DreamBooth, individuals now have access to train the models that can generate their personalized images. Despite the convenience, malicious users have misused these techniques to create fake images, thereby triggering a privacy security crisis. In light of this, proactive adversarial attacks are proposed to protect users against customization. The adversarial examples are trained to distort the customization model's outputs and thus block the misuse. In this paper, we propose DisDiff (Disrupting Diffusion), a novel adversarial attack method to disrupt the diffusion model outputs. We first delve into the intrinsic image-text relationships, well-known as cross-attention, and empirically find that the subject-identifier token plays an important role in guiding image generation. Thus, we propose the Cross-Attention Erasure module to explicitly "erase" the indicated attention maps and disrupt the text guidance. Besides,we analyze the influence of the sampling process of the diffusion model on Projected Gradient Descent (PGD) attack and introduce a novel Merit Sampling Scheduler to adaptively modulate the perturbation updating amplitude in a step-aware manner. Our DisDiff outperforms the state-of-the-art methods by 12.75% of FDFR scores and 7.25% of ISM scores across two facial benchmarks and two commonly used prompts on average. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Under review

ACM Class: I.2.10

arXiv:2405.19630 [pdf]

The use of a humanoid robot for older people with dementia in aged care facilities

Authors: Dongjun Wu, Lihui Pu, Jun Jo, Rene Hexel, Wendy Moyle

Abstract: This paper presents an interdisciplinary PhD project using a humanoid robot to encourage interactive activities for people with dementia living in two aged care facilities. The aim of the project was to develop software and use technologies to achieve successful robot-led engagement with older people with dementia. This paper outlines the qualitative findings from the project's feasibility stage.… ▽ More This paper presents an interdisciplinary PhD project using a humanoid robot to encourage interactive activities for people with dementia living in two aged care facilities. The aim of the project was to develop software and use technologies to achieve successful robot-led engagement with older people with dementia. This paper outlines the qualitative findings from the project's feasibility stage. The researcher's observations, the participants' attitudes and the feedback from carers are presented and discussed. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Accepted for the Second Workshop on Care Robots for Older Adults (CROA), RO-MAN 2023, Busan, Korea

arXiv:2405.19161 [pdf]

Origin of the density wave instability in trilayer nickelate La4Ni3O10 revealed by optical and ultrafast spectroscopy

Authors: Shuxiang Xu, Cui-Qun Chen, Mengwu Huo, Deyuan Hu, Hao Wang, Qiong Wu, Rongsheng Li, Dong Wu, Meng Wang, Dao-Xin Yao, Tao Dong, Nanlin Wang

Abstract: Here we employed optical spectroscopy and ultrafast reflectivity measurements to investigate the density wave instability of trilayer nickelate La4Ni3O10 at ambient pressure. Our optical spectroscopy measurements indicate that La4Ni3O10 is metallic with a large plasma frequency at room temperature. As the temperature decreases, we observe the formation of an energy gap in reflectivity below TDW, s… ▽ More Here we employed optical spectroscopy and ultrafast reflectivity measurements to investigate the density wave instability of trilayer nickelate La4Ni3O10 at ambient pressure. Our optical spectroscopy measurements indicate that La4Ni3O10 is metallic with a large plasma frequency at room temperature. As the temperature decreases, we observe the formation of an energy gap in reflectivity below TDW, signaling the charge/spin density wave transition. The Drude component was largely removed due to the gap opening in the Fermi surface. Our Drude-Lorentz analysis reveals that the energy gap in La4Ni3O10 is approximately 61 meV, which is three times larger than that obtained from ARPES measurements. The density wave gap feature is more prominent than that observed in bilayer nickelate La3Ni2O7, suggesting more carriers are gapped at the Fermi surface across the density wave transition. By comparing the measured plasma frequency with the first-principles calculation, we categorize La4Ni3O10 as a moderately electronic correlation material, similar to the parent compound of iron-based superconductors, however, being weaker than the bilayer nickelate La3Ni2O7. Our ultrafast pump-probe experiments also show that the relaxation time diverges near the transition temperature. By analyzing the amplitude and relaxation time with the Rothwarf-Taylor model, we estimate the energy gap to be 58 meV, which agrees with the result of optical spectroscopy. The more prominent gap feature and weaker electronic correlation might be the cause of a lower superconductivity transition temperature in La4Ni3O10 under high pressure. These findings significantly contribute to understanding the origin of density wave and superconductivity in trilayer nickelate La4Ni3O10. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18455 [pdf, other]

Coloring some $(P_6,C_4)$-free graphs with $Δ-1$ colors

Authors: Ran Chen, Di Wu, Xiaowen Zhang

Abstract: The Borodin-Kostochka Conjecture states that for a graph $G$, if $Δ(G)\geq9$, then $χ(G)\leq\max\{Δ(G)-1,ω(G)\}$. We use $P_t$ and $C_t$ to denote a path and a cycle on $t$ vertices, respectively. Let $C=v_1v_2v_3v_4v_5v_1$ be an induced $C_5$. A {\em $C_5^+$} is a graph obtained from $C$ by adding a $C_3=xyzx$ and a $P_2=t_1t_2$ such that (1) $x$ and $y$ are both exactly adjacent to… ▽ More The Borodin-Kostochka Conjecture states that for a graph $G$, if $Δ(G)\geq9$, then $χ(G)\leq\max\{Δ(G)-1,ω(G)\}$. We use $P_t$ and $C_t$ to denote a path and a cycle on $t$ vertices, respectively. Let $C=v_1v_2v_3v_4v_5v_1$ be an induced $C_5$. A {\em $C_5^+$} is a graph obtained from $C$ by adding a $C_3=xyzx$ and a $P_2=t_1t_2$ such that (1) $x$ and $y$ are both exactly adjacent to $v_1,v_2,v_3$ in $V(C)$, $z$ is exactly adjacent to $v_2$ in $V(C)$, $t_1$ is exactly adjacent to $v_4,v_5$ in $V(C)$ and $t_2$ is exactly adjacent to $v_1,v_4,v_5$ in $V(C)$, (2) $t_1$ is exactly adjacent to $z$ in $\{x,y,z\}$ and $t_2$ has no neighbors in $\{x,y,z\}$. In this paper, we show that the Borodin-Kostochka Conjecture holds for ($P_6,C_4,H$)-free graphs, where $H\in \{K_7,C_5^+\}$. This generalizes some results of Gupta and Pradhan in \cite{GP21,GP24}. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.18361 [pdf, other]

Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?

Authors: Yifan Bai, Dongming Wu, Yingfei Liu, Fan Jia, Weixin Mao, Ziheng Zhang, Yucheng Zhao, Jianbing Shen, Xing Wei, Tiancai Wang, Xiangyu Zhang

Abstract: Rapid advancements in Autonomous Driving (AD) tasks turned a significant shift toward end-to-end fashion, particularly in the utilization of vision-language models (VLMs) that integrate robust logical reasoning and cognitive abilities to enable comprehensive end-to-end planning. However, these VLM-based approaches tend to integrate 2D vision tokenizers and a large language model (LLM) for ego-car… ▽ More Rapid advancements in Autonomous Driving (AD) tasks turned a significant shift toward end-to-end fashion, particularly in the utilization of vision-language models (VLMs) that integrate robust logical reasoning and cognitive abilities to enable comprehensive end-to-end planning. However, these VLM-based approaches tend to integrate 2D vision tokenizers and a large language model (LLM) for ego-car planning, which lack 3D geometric priors as a cornerstone of reliable planning. Naturally, this observation raises a critical concern: Can a 2D-tokenized LLM accurately perceive the 3D environment? Our evaluation of current VLM-based methods across 3D object detection, vectorized map construction, and environmental caption suggests that the answer is, unfortunately, NO. In other words, 2D-tokenized LLM fails to provide reliable autonomous driving. In response, we introduce DETR-style 3D perceptrons as 3D tokenizers, which connect LLM with a one-layer linear projector. This simple yet elegant strategy, termed Atlas, harnesses the inherent priors of the 3D physical world, enabling it to simultaneously process high-resolution multi-view images and employ spatiotemporal modeling. Despite its simplicity, Atlas demonstrates superior performance in both 3D detection and ego planning tasks on nuScenes dataset, proving that 3D-tokenized LLM is the key to reliable autonomous driving. The code and datasets will be released. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17792 [pdf, other]

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 4 tables

arXiv:2405.16789 [pdf, other]

NoteLLM-2: Multimodal Large Representation Models for Recommendation

Authors: Chao Zhang, Haoxin Zhang, Shiwei Wu, Di Wu, Tong Xu, Yan Gao, Yao Hu, Enhong Chen

Abstract: Large Language Models (LLMs) have demonstrated exceptional text understanding. Existing works explore their application in text embedding tasks. However, there are few works utilizing LLMs to assist multimodal representation tasks. In this work, we investigate the potential of LLMs to enhance multimodal representation in multimodal item-to-item (I2I) recommendations. One feasible method is the tra… ▽ More Large Language Models (LLMs) have demonstrated exceptional text understanding. Existing works explore their application in text embedding tasks. However, there are few works utilizing LLMs to assist multimodal representation tasks. In this work, we investigate the potential of LLMs to enhance multimodal representation in multimodal item-to-item (I2I) recommendations. One feasible method is the transfer of Multimodal Large Language Models (MLLMs) for representation tasks. However, pre-training MLLMs usually requires collecting high-quality, web-scale multimodal data, resulting in complex training procedures and high costs. This leads the community to rely heavily on open-source MLLMs, hindering customized training for representation scenarios. Therefore, we aim to design an end-to-end training method that customizes the integration of any existing LLMs and vision encoders to construct efficient multimodal representation models. Preliminary experiments show that fine-tuned LLMs in this end-to-end method tend to overlook image content. To overcome this challenge, we propose a novel training framework, NoteLLM-2, specifically designed for multimodal representation. We propose two ways to enhance the focus on visual information. The first method is based on the prompt viewpoint, which separates multimodal content into visual content and textual content. NoteLLM-2 adopts the multimodal In-Content Learning method to teach LLMs to focus on both modalities and aggregate key information. The second method is from the model architecture, utilizing a late fusion mechanism to directly fuse visual information into textual information. Extensive experiments have been conducted to validate the effectiveness of our method. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: 19 pages, 5 figures

arXiv:2405.15176 [pdf, other]

MonoDETRNext: Next-generation Accurate and Efficient Monocular 3D Object Detection Method

Authors: Pan Liao, Feng Yang, Di Wu, Liu Bo

Abstract: Monocular vision-based 3D object detection is crucial in various sectors, yet existing methods face significant challenges in terms of accuracy and computational efficiency. Building on the successful strategies in 2D detection and depth estimation, we propose MonoDETRNext, which seeks to optimally balance precision and processing speed. Our methodology includes the development of an efficient hyb… ▽ More Monocular vision-based 3D object detection is crucial in various sectors, yet existing methods face significant challenges in terms of accuracy and computational efficiency. Building on the successful strategies in 2D detection and depth estimation, we propose MonoDETRNext, which seeks to optimally balance precision and processing speed. Our methodology includes the development of an efficient hybrid visual encoder, enhancement of depth prediction mechanisms, and introduction of an innovative query generation strategy, augmented by an advanced depth predictor. Building on MonoDETR, MonoDETRNext introduces two variants: MonoDETRNext-F, which emphasizes speed, and MonoDETRNext-A, which focuses on precision. We posit that MonoDETRNext establishes a new benchmark in monocular 3D object detection and opens avenues for future research. We conducted an exhaustive evaluation demonstrating the model's superior performance against existing solutions. Notably, MonoDETRNext-A demonstrated a 4.60% improvement in the AP3D metric on the KITTI test benchmark over MonoDETR, while MonoDETRNext-F showed a 2.21% increase. Additionally, the computational efficiency of MonoDETRNext-F slightly exceeds that of its predecessor. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14691 [pdf, other]

CityGPT: Towards Urban IoT Learning, Analysis and Interaction with Multi-Agent System

Authors: Qinghua Guan, Jinhui Ouyang, Di Wu, Weiren Yu

Abstract: The spatiotemporal data generated by massive sensors in the Internet of Things (IoT) is extremely dynamic, heterogeneous, large scale and time-dependent. It poses great challenges (e.g. accuracy, reliability, and stability) in real-time analysis and decision making for different IoT applications. The complexity of IoT data prevents the common people from gaining a deeper understanding of it. Agent… ▽ More The spatiotemporal data generated by massive sensors in the Internet of Things (IoT) is extremely dynamic, heterogeneous, large scale and time-dependent. It poses great challenges (e.g. accuracy, reliability, and stability) in real-time analysis and decision making for different IoT applications. The complexity of IoT data prevents the common people from gaining a deeper understanding of it. Agentized systems help address the lack of data insight for the common people. We propose a generic framework, namely CityGPT, to facilitate the learning and analysis of IoT time series with an end-to-end paradigm. CityGPT employs three agents to accomplish the spatiotemporal analysis of IoT data. The requirement agent facilitates user inputs based on natural language. Then, the analysis tasks are decomposed into temporal and spatial analysis processes, completed by corresponding data analysis agents (temporal and spatial agents). Finally, the spatiotemporal fusion agent visualizes the system's analysis results by receiving analysis results from data analysis agents and invoking sub-visualization agents, and can provide corresponding textual descriptions based on user demands. To increase the insight for common people using our framework, we have agnentized the framework, facilitated by a large language model (LLM), to increase the data comprehensibility. Our evaluation results on real-world data with different time dependencies show that the CityGPT framework can guarantee robust performance in IoT computing. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14081 [pdf, other]

Laboratory-scale Perpendicular Collisionless Shock Generation and Ion Acceleration in Magnetized Head-on Colliding Plasmas

Authors: P. Liu, D. Wu, D. W. Yuan, G. Zhao, Z. M. Sheng, X. T. He, J. Zhang

Abstract: Magnetized collisionless shocks drive particle acceleration broadly in space and astrophysics. We perform the first large-scale particle-in-cell simulations with realistic laboratory parameters (density, temperature, and velocity) to investigate the magnetized shock in head-on colliding plasmas with an applied magnetic field of tens of Tesla. It is shown that a perpendicular collisionless shock is… ▽ More Magnetized collisionless shocks drive particle acceleration broadly in space and astrophysics. We perform the first large-scale particle-in-cell simulations with realistic laboratory parameters (density, temperature, and velocity) to investigate the magnetized shock in head-on colliding plasmas with an applied magnetic field of tens of Tesla. It is shown that a perpendicular collisionless shock is formed with about fourfold density jump when two pre-magnetized flows collide. This shock is also characterized by rapid increase of neutron yield, triggered by the beam-beam nuclear reactions between injected deuterons and ones reflected by the shock. Distinct from the shocks arising from the interaction of injected flows with a magnetized background, the self-generated magnetic field in this colliding plasmas experiences a significant amplification due to the increasing diamagnetic current, approximately 30 times of upstream magnetic field. Moreover, we find that ions, regardless of whether they pass through or are reflected by the shock, can gain energy by the shock surfing acceleration, generating a power-law energy spectrum. In addition, we also demonstrate that the shock mediated only by filamentation instability cannot be generated under the prevailing unmagnetized experimental parameters. These results provide a direct connection of astrophysical field amplification to the magnetized shock formation and nonthermal ion generation. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13260 [pdf, other]

Assessing Proton-Boron Fusion Feasibility under non-Thermal Equilibrium Conditions: Rider's Inhibition Revisited

Authors: S. J. Liu, D. Wu, B. Liu, Y. -K. M. Peng, J. Q. Dong, T. Y. Liang, Z. M. Sheng

Abstract: Compared to the D-T reaction, the neutron-free proton-boron (p-$^{11}$B) fusion has garnered increasing attention in recent years. However, significant Bremsstrahlung losses pose a formidable challenge in p-$^{11}$B plasmas in achieving $Q>1$ in thermal equilibrium. The primary aim of this study is to corroborate Todd H. Rider's seminal work in the 1997 Physics of Plasmas, who investigated the fea… ▽ More Compared to the D-T reaction, the neutron-free proton-boron (p-$^{11}$B) fusion has garnered increasing attention in recent years. However, significant Bremsstrahlung losses pose a formidable challenge in p-$^{11}$B plasmas in achieving $Q>1$ in thermal equilibrium. The primary aim of this study is to corroborate Todd H. Rider's seminal work in the 1997 Physics of Plasmas, who investigated the feasibility of sustaining p-$^{11}$B fusion under non-thermal equilibrium conditions. Employing a series of simulations with new fusion cross-section, we assessed the minimum recirculating power that must be recycled to maintain the system's non-thermal equilibrium and found that it is substantially greater than the fusion power output, aligning with Rider's conclusions, whether under the conditions of non-Maxwellian electron distribution or Maxwellian electron distribution, reactors reliant on non-equilibrium plasmas for p-$^{11}$B fusion are unlikely to achieve net power production without the aid of highly efficient external heat engines. However, maintaining the ion temperature at 300 keV and the Coulomb logarithm at 15, while increasing the electron temperature beyond 23.33 keV set by Rider, leads to diminished electron-ion energy transfer and heightened Bremsstrahlung radiation. When the electron temperature approaches approximately 140 keV, this progression ultimately leads to a scenario where the power of Bremsstrahlung loss equals the power of electron-ion interactions, yet remains inferior to the fusion power. Consequently, this results in a net gain in energy production. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.13054 [pdf, ps, other]

Fibonometry and Beyond

Authors: Nikhil Byrapuram, Adam Ge, Selena Ge, Tanya Khovanova, Sylvia Zia Lee, Rajarshi Mandal, Gordon Redwine, Soham Samanta, Daniel Wu, Danyang Xu, Ray Zhao

Abstract: In 2013, Conway and Ryba wrote a fascinating paper called Fibonometry. The paper, as one might guess, is about the connection between Fibonacci numbers and trigonometry. We were fascinated by this paper and looked at how we could generalize it. We discovered that we weren't the first. In this paper, we describe our journey and summarize the results. In 2013, Conway and Ryba wrote a fascinating paper called Fibonometry. The paper, as one might guess, is about the connection between Fibonacci numbers and trigonometry. We were fascinated by this paper and looked at how we could generalize it. We discovered that we weren't the first. In this paper, we describe our journey and summarize the results. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: 10 pages, 2 tables

MSC Class: 00A08; 11B39; 97G60

arXiv:2405.10825 [pdf, other]

Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities

Authors: Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu

Abstract: Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks bas… ▽ More Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks based on human instructions, paving the way to artificial general intelligence (AGI)-enabled 6G. Given the great potential of LLM technologies, this work aims to provide a comprehensive overview of LLM-enabled telecom networks. In particular, we first present LLM fundamentals, including model architecture, pre-training, fine-tuning, inference and utilization, model evaluation, and telecom deployment. Then, we introduce LLM-enabled key techniques and telecom applications in terms of generation, classification, optimization, and prediction problems. Specifically, the LLM-enabled generation applications include telecom domain knowledge, code, and network configuration generation. After that, the LLM-based classification applications involve network security, text, image, and traffic classification problems. Moreover, multiple LLM-enabled optimization techniques are introduced, such as automated reward function design for reinforcement learning and verbal reinforcement learning. Furthermore, for LLM-aided prediction problems, we discussed time-series prediction models and multi-modality prediction problems for telecom. Finally, we highlight the challenges and identify the future directions of LLM-enabled telecom networks. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.10812 [pdf, other]

VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling

Authors: Siyuan Li, Zedong Wang, Zicheng Liu, Di Wu, Cheng Tan, Jiangbin Zheng, Yufei Huang, Stan Z. Li

Abstract: Similar to natural language models, pre-trained genome language models are proposed to capture the underlying intricacies within genomes with unsupervised sequence modeling. They have become essential tools for researchers and practitioners in biology. However, the hand-crafted tokenization policies used in these models may not encode the most discriminative patterns from the limited vocabulary of… ▽ More Similar to natural language models, pre-trained genome language models are proposed to capture the underlying intricacies within genomes with unsupervised sequence modeling. They have become essential tools for researchers and practitioners in biology. However, the hand-crafted tokenization policies used in these models may not encode the most discriminative patterns from the limited vocabulary of genomic data. In this paper, we introduce VQDNA, a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning. By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings in an end-to-end manner. To further push its limits, we propose Hierarchical Residual Quantization (HRQ), where varying scales of codebooks are designed in a hierarchy to enrich the genome vocabulary in a coarse-to-fine manner. Extensive experiments on 32 genome datasets demonstrate VQDNA's superiority and favorable parameter efficiency compared to existing genome language models. Notably, empirical analysis of SARS-CoV-2 mutations reveals the fine-grained pattern awareness and biological significance of learned HRQ vocabulary, highlighting its untapped potential for broader applications in genomics. △ Less

Submitted 2 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: ICML 2024. Preprint V2 with 17 pages and 5 figures

arXiv:2405.07744 [pdf, other]

MoCo: Fuzzing Deep Learning Libraries via Assembling Code

Authors: Pin Ji, Yang Feng, Duo Wu, Lingyue Yan, Pengling Chen, Jia Liu, Zhihong Zhao

Abstract: The rapidly developing deep learning (DL) techniques have been applied in software systems with various application scenarios. However, they could also pose new safety threats with potentially serious consequences, especially in safety-critical domains. DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts that directly affect the behaviors… ▽ More The rapidly developing deep learning (DL) techniques have been applied in software systems with various application scenarios. However, they could also pose new safety threats with potentially serious consequences, especially in safety-critical domains. DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts that directly affect the behaviors of DL systems. Previous research on fuzzing DL libraries still has limitations in the diversity of test inputs, the construction of test oracles, and the precision of detection. In this paper, we propose MoCo, a novel fuzzing testing method for DL libraries via assembling code. MoCo first disassembles the seed code file to obtain the template and code blocks, and then employs code block mutation operators (e.g., API replacement, random generation, and boundary checking) to generate more new code blocks adapted to the template. By inserting context-appropriate code blocks into the template step by step, MoCo can generate a tree of code files with intergenerational relations. According to the derivation relations in this tree and the applied mutation operators, we construct the test oracle based on the execution state consistency. Since the granularity of code assembly and mutation is controlled rather than randomly divergent, we can quickly pinpoint the lines of code where the bugs are located and the corresponding triggering conditions. We conduct a comprehensive experiment to evaluate the efficiency and effectiveness of MoCo using three widely-used DL libraries (i.e., TensorFlow, PyTorch, and Jittor). During the experiment, MoCo detects 64 new bugs of four types in three DL libraries, where 51 bugs have been confirmed, and 13 bugs have been fixed by developers. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Showing 1–50 of 1,481 results for author: Wu, D