subscribe to arXiv mailings

PID: Physics-Informed Diffusion Model for Infrared Image Generation

Authors: Fangyuan Mao, Jilin Mei, Shun Lu, Fuyang Liu, Liang Chen, Fangzhou Zhao, Yu Hu

Abstract: Infrared imaging technology has gained significant attention for its reliable sensing ability in low visibility conditions, prompting many studies to convert the abundant RGB images to infrared images. However, most existing image translation methods treat infrared images as a stylistic variation, neglecting the underlying physical laws, which limits their practical application. To address these i… ▽ More Infrared imaging technology has gained significant attention for its reliable sensing ability in low visibility conditions, prompting many studies to convert the abundant RGB images to infrared images. However, most existing image translation methods treat infrared images as a stylistic variation, neglecting the underlying physical laws, which limits their practical application. To address these issues, we propose a Physics-Informed Diffusion (PID) model for translating RGB images to infrared images that adhere to physical laws. Our method leverages the iterative optimization of the diffusion model and incorporates strong physical constraints based on prior knowledge of infrared laws during training. This approach enhances the similarity between translated infrared images and the real infrared domain without increasing extra training parameters. Experimental results demonstrate that PID significantly outperforms existing state-of-the-art methods. Our code is available at https://github.com/fangyuanmao/PID. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.08496 [pdf, ps, other]

Convergences of Combinatorial Ricci Flows to Degenerated Circle Packings in Hyperbolic Background Geometry

Authors: Guangming Hu, Sicheng Lu, Dong Tan, Youliang Zhong, Puchun Zhou

Abstract: This paper investigates a kind of degenerated circle packings in hyperbolic background geometry. A main problem is whether a prescribed total geodesic curvature data can be realized by a degenerated circle packing or not. We fully characterize the sufficient and necessary conditions and show the uniqueness. Furthermore, we introduce the combinatoral Ricci flow to find the desired degenerated circl… ▽ More This paper investigates a kind of degenerated circle packings in hyperbolic background geometry. A main problem is whether a prescribed total geodesic curvature data can be realized by a degenerated circle packing or not. We fully characterize the sufficient and necessary conditions and show the uniqueness. Furthermore, we introduce the combinatoral Ricci flow to find the desired degenerated circle packed surface, analougus to the methods of Chow-Luo and Takatsu. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 36 pages, 9 figures

MSC Class: 52C26; 57M50

arXiv:2407.07016 [pdf]

Is Large Language Model All You Need to Predict the Synthesizability and Precursors of Crystal Structures?

Authors: Zhilong Song, Shuaihua Lu, Minggang Ju, Qionghua Zhou, Jinlan Wang

Abstract: Accessing the synthesizability of crystal structures is pivotal for advancing the practical application of theoretical material structures designed by machine learning or high-throughput screening. However, a significant gap exists between the actual synthesizability and thermodynamic or kinetic stability, which is commonly used for screening theoretical structures for experiments. To address this… ▽ More Accessing the synthesizability of crystal structures is pivotal for advancing the practical application of theoretical material structures designed by machine learning or high-throughput screening. However, a significant gap exists between the actual synthesizability and thermodynamic or kinetic stability, which is commonly used for screening theoretical structures for experiments. To address this, we develop the Crystal Synthesis Large Language Models (CSLLM) framework, which includes three LLMs for predicting the synthesizability, synthesis methods, and precursors. We create a comprehensive synthesizability dataset including 140,120 crystal structures and develop an efficient text representation method for crystal structures to fine-tune the LLMs. The Synthesizability LLM achieves a remarkable 98.6% accuracy, significantly outperforming traditional synthesizability screening based on thermodynamic and kinetic stability by 106.1% and 44.5%, respectively. The Methods LLM achieves a classification accuracy of 91.02%, and the Precursors LLM has an 80.2% success rate in predicting synthesis precursors. Furthermore, we develop a user-friendly graphical interface that enables automatic predictions of synthesizability and precursors from uploaded crystal structure files. Through these contributions, CSLLM bridges the gap between theoretical material design and experimental synthesis, paving the way for the rapid discovery of novel and synthesizable functional materials. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06915 [pdf, ps, other]

FE-GUT: Factor Graph Optimization hybrid with Extended Kalman Filter for tightly coupled GNSS/UWB Integration

Authors: Qijia Zhao, Shaolin Lü, Jianan Lou, Rong Zhang

Abstract: Precise positioning and navigation information has been increasingly important with the development of the consumer electronics market. Due to some deficits of Global Navigation Satellite System (GNSS), such as susceptible to interferences, integrating of GNSS with additional alternative sensors is a promising approach to overcome the performance limitations of GNSS-based localization systems. Ult… ▽ More Precise positioning and navigation information has been increasingly important with the development of the consumer electronics market. Due to some deficits of Global Navigation Satellite System (GNSS), such as susceptible to interferences, integrating of GNSS with additional alternative sensors is a promising approach to overcome the performance limitations of GNSS-based localization systems. Ultra-Wideband (UWB) can be used to enhance GNSS in constructing an integrated localization system. However, most low-cost UWB devices lack a hardware-level time synchronization feature, which necessitates the estimation and compensation of the time-offset in the tightly coupled GNSS/UWB integration. Given the flexibility of probabilistic graphical models, the time-offset can be modeled as an invariant constant in the discretization of the continuous model. This work proposes a novel architecture in which Factor Graph Optimization (FGO) is hybrid with Extend Kalman Filter (EKF) for tightly coupled GNSS/UWB integration with online Temporal calibration (FE-GUT). FGO is utilized to precisely estimate the time-offset, while EKF provides initailization for the new factors and performs time-offset compensation. Simulation-based experiments validate the integrated localization performance of FE-GUT. In a four-wheeled robot scenario, the results demonstrate that, compared to EKF, FE-GUT can improve horizontal and vertical localization accuracy by 58.59\% and 34.80\%, respectively, while the time-offset estimation accuracy is improved by 76.80\%. All the source codes and datasets can be gotten via https://github.com/zhaoqj23/FE-GUT/. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06784 [pdf, other]

Preasymptotic error estimates of EEM and CIP-EEM for the time-harmonic Maxwell equations with large wave number

Authors: Shuaishuai Lu, Haijun Wu

Abstract: Preasymptotic error estimates are derived for the linear edge element method (EEM) and the linear $\boldsymbol{H}(\boldsymbol{\mathrm{curl}})$-conforming interior penalty edge element method (CIP-EEM) for the time-harmonic Maxwell equations with large wave number. It is shown that under the mesh condition that $κ^3 h^2$ is sufficiently small, the errors of the solutions to both methods are bounded… ▽ More Preasymptotic error estimates are derived for the linear edge element method (EEM) and the linear $\boldsymbol{H}(\boldsymbol{\mathrm{curl}})$-conforming interior penalty edge element method (CIP-EEM) for the time-harmonic Maxwell equations with large wave number. It is shown that under the mesh condition that $κ^3 h^2$ is sufficiently small, the errors of the solutions to both methods are bounded by $\mathcal{O} (κh + κ^3 h^2 )$ in the energy norm and $\mathcal{O} (κh^2 + κ^2 h^2 )$ in the $\boldsymbol{L}^2$ norm, where $κ$ is the wave number and $h$ is the mesh size. Numerical tests are provided to verify our theoretical results and to illustrate the potential of CIP-EEM in significantly reducing the pollution effect. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06489 [pdf]

T2MAT (text-to-materials): A universal framework for generating material structures with goal properties from a single sentence

Authors: Zhilong Song, Shuaihua Lu, Qionghua Zhou, Jinlan Wang

Abstract: Artificial Intelligence-Generated Content (AIGC)-content autonomously produced by AI systems without human intervention-has significantly boosted efficiency across various fields. However, the AIGC in material science faces challenges in the ability to efficiently discover innovative materials that surpass existing databases, alongside the invariances and stability considerations of crystal struct… ▽ More Artificial Intelligence-Generated Content (AIGC)-content autonomously produced by AI systems without human intervention-has significantly boosted efficiency across various fields. However, the AIGC in material science faces challenges in the ability to efficiently discover innovative materials that surpass existing databases, alongside the invariances and stability considerations of crystal structures. To address these challenges, we develop T2MAT (Text-to-Material), a comprehensive framework processing from a user-input sentence to inverse design material structures with goal properties beyond the existing database via globally exploring chemical space, followed by an entirely automated workflow of first principal validation. Furthermore, we propose CGTNet (Crystal Graph Transformer NETwork), a novel graph neural network model that captures long-term interactions, to enhance the accuracy and data efficiency of property prediction and thereby improve the reliability of inverse design. Through these contributions, T2MAT minimizes the dependency on human expertise and significantly enhances the efficiency of designing novel, high-performance functional materials, thereby actualizing AIGC in the materials design domain. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05078 [pdf, ps, other]

Function and derivative approximation by shallow neural networks

Authors: Yuanyuan Li, Shuai Lu

Abstract: We investigate a Tikhonov regularization scheme specifically tailored for shallow neural networks within the context of solving a classic inverse problem: approximating an unknown function and its derivatives within a unit cubic domain based on noisy measurements. The proposed Tikhonov regularization scheme incorporates a penalty term that takes three distinct yet intricately related network (semi… ▽ More We investigate a Tikhonov regularization scheme specifically tailored for shallow neural networks within the context of solving a classic inverse problem: approximating an unknown function and its derivatives within a unit cubic domain based on noisy measurements. The proposed Tikhonov regularization scheme incorporates a penalty term that takes three distinct yet intricately related network (semi)norms: the extended Barron norm, the variation norm, and the Radon-BV seminorm. These choices of the penalty term are contingent upon the specific architecture of the neural network being utilized. We establish the connection between various network norms and particularly trace the dependence of the dimensionality index, aiming to deepen our understanding of how these norms interplay with each other. We revisit the universality of function approximation through various norms, establish rigorous error-bound analysis for the Tikhonov regularization scheme, and explicitly elucidate the dependency of the dimensionality index, providing a clearer understanding of how the dimensionality affects the approximation performance and how one designs a neural network with diverse approximating tasks. △ Less

Submitted 6 July, 2024; originally announced July 2024.

MSC Class: 65D15; 65F22; 65J20

arXiv:2407.04995 [pdf]

A Broadband Algorithm for Adiabatic Mode Evolution and An Application on Polarization Splitter-Rotator on LNOI Platform

Authors: Geng Chen, Chijun Li, Xuanhao Wang, Yuankang Huang, Siyu Lu, Yiqi Dai, Xiangyu Meng, Cheng Zeng, Jinsong Xia

Abstract: Adiabatic mode evolution waveguides (AMEWs) are widely utilized in integrated photonics, including tapered waveguides, edge couplers, mode converters, splitters, etc. An analytical theory and a novel AMEW design algorithm are developed to create shortcuts to adiabaticity (STA). With the new algorithm, we demonstrate a broadband and highly efficient polarization splitter-rotator (PSR) on a lithium-… ▽ More Adiabatic mode evolution waveguides (AMEWs) are widely utilized in integrated photonics, including tapered waveguides, edge couplers, mode converters, splitters, etc. An analytical theory and a novel AMEW design algorithm are developed to create shortcuts to adiabaticity (STA). With the new algorithm, we demonstrate a broadband and highly efficient polarization splitter-rotator (PSR) on a lithium-niobate-on-insulator (LNOI) platform with an LN thickness of 500 nm. The fabricated PSR, with a total length of 2 mm, exhibits an insertion loss (IL) of 0.8 dB and a polarization extinction ratio (ER) of 12 dB over a wavelength range exceeding 76 nm. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: 9 pages, 6 figures, 2 tables

arXiv:2407.04909 [pdf, ps, other]

The averaging principle of stochastic functional partial differential equations with Hölder coefficients and infinite delay

Authors: Shuaishuai Lu, Xue Yang, Yong Li

Abstract: In this paper, we establish the averaging principle for stochastic functional partial differential equations (SFPDEs) characterized by Hölder coefficients and infinite delay. Firstly, we rigorously establish the existence and uniqueness of strong solutions for a specific class of finite-dimensional systems characterized by Hölder continuous coefficients and infinite delay. We extend these results… ▽ More In this paper, we establish the averaging principle for stochastic functional partial differential equations (SFPDEs) characterized by Hölder coefficients and infinite delay. Firstly, we rigorously establish the existence and uniqueness of strong solutions for a specific class of finite-dimensional systems characterized by Hölder continuous coefficients and infinite delay. We extend these results to their infinite-dimensional counterparts using the variational approach and Galerkin projection technique. Subsequently, we establish the averaging principle (the first Bogolyubov theorem) for SFPDEs with infinite delay, subject to conditions of linear growth and Hölder continuity. This is achieved through classical Khasminskii time discretization and reductio ad absurdum, illustrating the convergence of solutions from the original Cauchy problem to those of the averaged equation across the finite interval [0, T]. To illustrate our findings, we present two applications: stochastic generalized porous media equations and stochastic reaction-diffusion equations with Hölder coefficients. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.02973 [pdf, other]

NOEMA formIng Cluster survEy (NICE): Characterizing eight massive galaxy groups at $1.5 < z < 4$ in the COSMOS field

Authors: Nikolaj B. Sillassen, Shuowen Jin, Georgios E. Magdis, Emanuele Daddi, Tao Wang, Shiying Lu, Hanwen Sun, Vinod Arumugam, Daizhong Liu, Malte Brinch, Chiara D'Eugenio, Raphael Gobat, Carlos Gómez-Guijarro, Michael Rich, Eva Schinnerer, Veronica Strazzullo, Qinghua Tan, Francesco Valentino, Yijun Wang, Mengyuan Xiao, Luwenjia Zhou, David Blánquez-Sesé, Zheng Cai, Yanmei Chen, Laure Ciesla , et al. (19 additional authors not shown)

Abstract: The NOEMA formIng Cluster survEy (NICE) is a large program targeting 69 massive galaxy group candidates at $z>2$ in six deep fields. We report spectroscopic confirmation of eight groups at $1.65\leq z\leq3.61$ in COSMOS. Homogeneously selected as significant overdensities of red IRAC sources with red Herschel colors, four groups are confirmed by CO and [CI] with NOEMA 3mm observations, three are c… ▽ More The NOEMA formIng Cluster survEy (NICE) is a large program targeting 69 massive galaxy group candidates at $z>2$ in six deep fields. We report spectroscopic confirmation of eight groups at $1.65\leq z\leq3.61$ in COSMOS. Homogeneously selected as significant overdensities of red IRAC sources with red Herschel colors, four groups are confirmed by CO and [CI] with NOEMA 3mm observations, three are confirmed with ALMA, and one is confirmed by H$α$ from Subaru/FMOS. We constructed the integrated FIR SEDs for the eight groups, obtaining total IR SFR $=260-1300~{\rm M_\odot}$~yr$^{-1}$. We adopted six methods to estimate the dark matter masses, including stellar mass to halo mass relations, overdensity with galaxy bias, and NFW profile fitting to radial stellar mass density. We found the radial stellar mass density are consistent with a NFW profile, supporting that they are collapsed structures hosted by a single dark matter halo. The best halo mass estimates are $\log(M_{\rm h}/{\rm M_\odot})=12.8-13.7$ with uncertainty of 0.3 dex. From halo mass estimates, we derive baryonic accretion rate ${\rm BAR}=(1-8)\times10^{3}\,{\rm M_{\odot}/yr}$ for this sample. We find a quasi-linear correlation between the integrated SFR/BAR and the theoretical halo mass limit for cold streams, $M_{\rm stream}/M_{\rm h}$, with ${\rm SFR/BAR}=10^{-0.46\pm0.22}\left({M_{\rm stream}/M_{\rm h}}\right)^{0.71\pm0.16}$ with a scatter of $0.40\,{\rm dex}$. Further, we compare halo masses and stellar masses with simulations, and find all structures are consistent with being progenitors of $M_{\rm h}(z=0)>10^{14}\,{\rm M_{\odot}}$ galaxy clusters, and the most massive central galaxies have stellar masses consistent with brightest cluster galaxies (BCGs) progenitors in the TNG300 simulation. The results strongly suggest these structures are forming massive galaxy clusters via baryonic and dark matter accretion. △ Less

Submitted 5 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

Comments: 44 pages (27pp appendix), 32 figures, 18 tables, accepted for publication in A&A

arXiv:2407.00588 [pdf, other]

Forward and backward problems for coupled subdiffusion systems

Authors: Dian Feng, Yikan Liu, Shuai Lu

Abstract: In this article, we investigate both forward and backward problems for coupled systems of time-fractional diffusion equations, encompassing scenarios of strong coupling. For the forward problem, we establish the well-posedness of the system, leveraging the eigensystem of the corresponding elliptic system as the foundation. When considering the backward problem, specifically the determination of in… ▽ More In this article, we investigate both forward and backward problems for coupled systems of time-fractional diffusion equations, encompassing scenarios of strong coupling. For the forward problem, we establish the well-posedness of the system, leveraging the eigensystem of the corresponding elliptic system as the foundation. When considering the backward problem, specifically the determination of initial values through final time observations, we demonstrate a Lipschitz stability estimate, which is consist with the stability observed in the case of a single equation. To numerically address this backward problem, we refer to the explicit formulation of Tikhonov regularization to devise a multi-channel neural network architecture. This innovative architecture offers a versatile approach, exhibiting its efficacy in multidimensional settings through numerical examples and its robustness in handling initial values that have not been trained. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 26 pages, 7 figures

MSC Class: 35R11; 35K58; 35B44

arXiv:2407.00178 [pdf, other]

Shower Separation in Five Dimensions for Highly Granular Calorimeters using Machine Learning

Authors: S. Lai, J. Utehs, A. Wilhahn, M. C. Fouz, O. Bach, E. Brianne, A. Ebrahimi, K. Gadow, P. Göttlicher, O. Hartbrich, D. Heuchel, A. Irles, K. Krüger, J. Kvasnicka, S. Lu, C. Neubüser, A. Provenza, M. Reinecke, F. Sefkow, S. Schuwalow, M. De Silva, Y. Sudo, H. L. Tran, L. Liu, R. Masuda , et al. (26 additional authors not shown)

Abstract: To achieve state-of-the-art jet energy resolution for Particle Flow, sophisticated energy clustering algorithms must be developed that can fully exploit available information to separate energy deposits from charged and neutral particles. Three published neural network-based shower separation models were applied to simulation and experimental data to measure the performance of the highly granular… ▽ More To achieve state-of-the-art jet energy resolution for Particle Flow, sophisticated energy clustering algorithms must be developed that can fully exploit available information to separate energy deposits from charged and neutral particles. Three published neural network-based shower separation models were applied to simulation and experimental data to measure the performance of the highly granular CALICE Analogue Hadronic Calorimeter (AHCAL) technological prototype in distinguishing the energy deposited by a single charged and single neutral hadron for Particle Flow. The performance of models trained using only standard spatial and energy and charged track position information from an event was compared to models trained using timing information available from AHCAL, which is expected to improve sensitivity to shower development and, therefore, aid in clustering. Both simulation and experimental data were used to train and test the models and their performances were compared. The best-performing neural network achieved significantly superior event reconstruction when timing information was utilised in training for the case where the charged hadron had more energy than the neutral one, motivating temporally sensitive calorimeters. All models under test were observed to tend to allocate energy deposited by the more energetic of the two showers to the less energetic one. Similar shower reconstruction performance was observed for a model trained on simulation and applied to data and a model trained and applied to data. △ Less

Submitted 28 June, 2024; originally announced July 2024.

arXiv:2406.16005 [pdf, other]

A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications

Authors: Lei Chen, Shi Liu, Chenxi Wang, Haoran Ma, Yifan Qiao, Zhe Wang, Chenggang Wu, Youyou Lu, Xiaobing Feng, Huimin Cui, Shan Lu, Harry Xu

Abstract: With rapid advances in network hardware, far memory has gained a great deal of traction due to its ability to break the memory capacity wall. Existing far memory systems fall into one of two data paths: one that uses the kernel's paging system to transparently access far memory at the page granularity, and a second that bypasses the kernel, fetching data at the object granularity. While it is gene… ▽ More With rapid advances in network hardware, far memory has gained a great deal of traction due to its ability to break the memory capacity wall. Existing far memory systems fall into one of two data paths: one that uses the kernel's paging system to transparently access far memory at the page granularity, and a second that bypasses the kernel, fetching data at the object granularity. While it is generally believed that object fetching outperforms paging due to its fine-grained access, it requires significantly more compute resources to run object-level LRU and eviction. We built Atlas, a hybrid data plane enabled by a runtime-kernel co-design that simultaneously enables accesses via these two data paths to provide high efficiency for real-world applications. Atlas uses always-on profiling to continuously measure page locality. For workloads already with good locality, paging is used to fetch data, whereas for those without, object fetching is employed. Object fetching moves objects that are accessed close in time to contiguous local space, dynamically improving locality and making the execution increasingly amenable to paging, which is much more resource-efficient. Our evaluation shows that Atlas improves the throughput (e.g., by 1.5x and 3.2x) and reduces the tail latency (e.g., by one and two orders of magnitude) when using remote memory, compared with AIFM and Fastswap, the state-of-the-art techniques respectively in the two categories. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.15484 [pdf, other]

JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models

Authors: Ze Wang, Zekun Wu, Xin Guan, Michael Thaler, Adriano Koshiyama, Skylar Lu, Sachin Beepath, Ediz Ertekin Jr., Maria Perez-Ortiz

Abstract: This paper presents a novel framework for benchmarking hierarchical gender hiring bias in Large Language Models (LLMs) for resume scoring, revealing significant issues of reverse bias and overdebiasing. Our contributions are fourfold: First, we introduce a framework using a real, anonymized resume dataset from the Healthcare, Finance, and Construction industries, meticulously used to avoid confoun… ▽ More This paper presents a novel framework for benchmarking hierarchical gender hiring bias in Large Language Models (LLMs) for resume scoring, revealing significant issues of reverse bias and overdebiasing. Our contributions are fourfold: First, we introduce a framework using a real, anonymized resume dataset from the Healthcare, Finance, and Construction industries, meticulously used to avoid confounding factors. It evaluates gender hiring biases across hierarchical levels, including Level bias, Spread bias, Taste-based bias, and Statistical bias. This framework can be generalized to other social traits and tasks easily. Second, we propose novel statistical and computational hiring bias metrics based on a counterfactual approach, including Rank After Scoring (RAS), Rank-based Impact Ratio, Permutation Test-Based Metrics, and Fixed Effects Model-based Metrics. These metrics, rooted in labor economics, NLP, and law, enable holistic evaluation of hiring biases. Third, we analyze hiring biases in ten state-of-the-art LLMs. Six out of ten LLMs show significant biases against males in healthcare and finance. An industry-effect regression reveals that the healthcare industry is the most biased against males. GPT-4o and GPT-3.5 are the most biased models, showing significant bias in all three industries. Conversely, Gemini-1.5-Pro, Llama3-8b-Instruct, and Llama3-70b-Instruct are the least biased. The hiring bias of all LLMs, except for Llama3-8b-Instruct and Claude-3-Sonnet, remains consistent regardless of random expansion or reduction of resume content. Finally, we offer a user-friendly demo to facilitate adoption and practical application of the framework. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Submitted to EMNLP 2024

arXiv:2406.14523 [pdf, other]

doi 10.1103/PhysRevB.109.245119

Optical and Raman selection rules for odd-parity clean superconductors

Authors: Shuangyuan Lu, Xu Yang, Yuan-Ming Lu

Abstract: We derive selection rules in optical absorption and Raman scattering spectra, that can determine the parity of pairing order parameters under inversion symmetry in two classes of \emph{clean} superconductors: (i) chiral superconductors with strong spin-orbit couplings, (ii) singlet superconductors with negligible spin-orbit couplings. Experimentally, the inversion parity of pair wave functions can… ▽ More We derive selection rules in optical absorption and Raman scattering spectra, that can determine the parity of pairing order parameters under inversion symmetry in two classes of \emph{clean} superconductors: (i) chiral superconductors with strong spin-orbit couplings, (ii) singlet superconductors with negligible spin-orbit couplings. Experimentally, the inversion parity of pair wave functions can be determined by comparing the "optical gap" $Δ_\text{op}$ in Raman and optical spectroscopy and the "thermodynamic gap" $2Δ$ in specific heat measurements, and the selection rules apply when $Δ_\text{op}>2Δ$. We demonstrate the selection rules in superconductivity in models of (i) doped Weyl semimetals and (ii) doped graphene. Our derivation is based on the relation between pairing symmetry and fermion projective symmetry group of a superconductor. We further derive similar selection rules for two-dimensional superconductors with 2-fold rotational symmetry, and discuss how they apply to the superconducting state in magic-angle twisted bilayer graphene. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 16 pages, 12 figures

Journal ref: Phys. Rev. B 109, 245119 (2024)

arXiv:2406.12718 [pdf, other]

AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Authors: Wenbin An, Feng Tian, Sicong Leng, Jiahao Nie, Haonan Lin, QianYing Wang, Guang Dai, Ping Chen, Shijian Lu

Abstract: Despite their great success across various multimodal tasks, Large Vision-Language Models (LVLMs) are facing a prevalent problem with object hallucinations, where the generated textual responses are inconsistent with ground-truth objects in the given image. This paper investigates various LVLMs and pinpoints attention deficiency toward discriminative local image features as one root cause of objec… ▽ More Despite their great success across various multimodal tasks, Large Vision-Language Models (LVLMs) are facing a prevalent problem with object hallucinations, where the generated textual responses are inconsistent with ground-truth objects in the given image. This paper investigates various LVLMs and pinpoints attention deficiency toward discriminative local image features as one root cause of object hallucinations. Specifically, LVLMs predominantly attend to prompt-independent global image features, while failing to capture prompt-relevant local features, consequently undermining the visual grounding capacity of LVLMs and leading to hallucinations. To this end, we propose Assembly of Global and Local Attention (AGLA), a training-free and plug-and-play approach that mitigates object hallucinations by exploring an ensemble of global features for response generation and local features for visual discrimination simultaneously. Our approach exhibits an image-prompt matching scheme that captures prompt-relevant local features from images, leading to an augmented view of the input image where prompt-relevant content is reserved while irrelevant distractions are masked. With the augmented view, a calibrated decoding distribution can be derived by integrating generative global features from the original image and discriminative local features from the augmented image. Extensive experiments show that AGLA consistently mitigates object hallucinations and enhances general perception capability for LVLMs across various discriminative and generative benchmarks. Our code will be released at https://github.com/Lackel/AGLA. △ Less

Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12386 [pdf, other]

IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models

Authors: Qiyao Wang, Jianguo Huang, Shule Lu, Yuan Lin, Kan Xu, Liang Yang, Hongfei Lin

Abstract: The rapid development of Large Language Models (LLMs) in vertical domains, including intellectual property (IP), lacks a specific evaluation benchmark for assessing their understanding, application, and reasoning abilities. To fill this gap, we introduce IPEval, the first evaluation benchmark tailored for IP agency and consulting tasks. IPEval comprises 2657 multiple-choice questions across four m… ▽ More The rapid development of Large Language Models (LLMs) in vertical domains, including intellectual property (IP), lacks a specific evaluation benchmark for assessing their understanding, application, and reasoning abilities. To fill this gap, we introduce IPEval, the first evaluation benchmark tailored for IP agency and consulting tasks. IPEval comprises 2657 multiple-choice questions across four major dimensions: creation, application, protection, and management of IP. These questions span patent rights (inventions, utility models, designs), trademarks, copyrights, trade secrets, and other related laws. Evaluation methods include zero-shot, 5-few-shot, and Chain of Thought (CoT) for seven LLM types, predominantly in English or Chinese. Results show superior English performance by models like GPT series and Qwen series, while Chinese-centric LLMs excel in Chinese tests, albeit specialized IP LLMs lag behind general-purpose ones. Regional and temporal aspects of IP underscore the need for LLMs to grasp legal nuances and evolving laws. IPEval aims to accurately gauge LLM capabilities in IP and spur development of specialized models. Website: \url{https://ipeval.github.io/} △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.11937 [pdf, other]

Using graph neural networks to reconstruct charged pion showers in the CMS High Granularity Calorimeter

Authors: M. Aamir, B. Acar, G. Adamov, T. Adams, C. Adloff, S. Afanasiev, C. Agrawal, C. Agrawal, A. Ahmad, H. A. Ahmed, S. Akbar, N. Akchurin, B. Akgul, B. Akgun, R. O. Akpinar, E. Aktas, A. AlKadhim, V. Alexakhin, J. Alimena, J. Alison, A. Alpana, W. Alshehri, P. Alvarez Dominguez, M. Alyari, C. Amendola , et al. (550 additional authors not shown)

Abstract: A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadr… ▽ More A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadronic section. The shower reconstruction method is based on graph neural networks and it makes use of a dynamic reduction network architecture. It is shown that the algorithm is able to capture and mitigate the main effects that normally hinder the reconstruction of hadronic showers using classical reconstruction methods, by compensating for fluctuations in the multiplicity, energy, and spatial distributions of the shower's constituents. The performance of the algorithm is evaluated using test beam data collected in 2018 prototype of the CMS HGCAL accompanied by a section of the CALICE AHCAL prototype. The capability of the method to mitigate the impact of energy leakage from the calorimeter is also demonstrated. △ Less

Submitted 30 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Prepared for submission to JINST

arXiv:2406.11571 [pdf, other]

PRIMER: JWST/MIRI reveals the evolution of star-forming structures in galaxies at z<2.5

Authors: Yipeng Lyu, Benjamin Magnelli, David Elbaz, Pablo G. Pérez-González, Camila Correa, Emanuele Daddi, Carlos Gómez-Guijarro, James S. Dunlop, Norman A. Grogin, Anton M. Koekemoer, Derek J. McLeod, Shiying Lu

Abstract: The stellar structures of star-forming galaxies (SFGs) undergo significant size growth during their mass assembly and must pass through a compaction phase as they evolve into quiescent galaxies (QGs). To shed light on the mechanisms behind this structural evolution, we study the morphology of the star-forming components of 665 SFGs at 0<z<2.5 measured using JWST/MIRI observation and compare them w… ▽ More The stellar structures of star-forming galaxies (SFGs) undergo significant size growth during their mass assembly and must pass through a compaction phase as they evolve into quiescent galaxies (QGs). To shed light on the mechanisms behind this structural evolution, we study the morphology of the star-forming components of 665 SFGs at 0<z<2.5 measured using JWST/MIRI observation and compare them with the morphology of their stellar components taken from the literature. The stellar and star-forming components of most SFGs (66%) have extended disk-like structures that are aligned with each other and are of the same size. The star-forming components of these galaxies follow a mass-size relation, similar to that followed by their stellar components. At the highest mass, the optical Sérsic index of these SFGs increases to 2.5, suggesting the presence of a dominant stellar bulge. Because their star-forming components remain disk-like, these bulges cannot have formed by secular in-situ growth. We identify a second population of galaxies lying below the MIR mass-size relation, with compact star-forming components embedded in extended stellar components (EC galaxy). These galaxies are overall rare (15%) but become more dominant (30%) at high mass ($>10^{10.5}M_\odot$). The compact star-forming components of these galaxies are also concentrated and slightly spheroidal, suggesting that this compaction phase can build dense bulge in-situ. Finally, we identify a third population of SFGs (19%), with both compact stellar and star-forming components. The density of their stellar cores resemble those of QGs and are compatible with being the descendants of EC galaxy. Overall, the structural evolution of SFGs is mainly dominated by a secular inside-out growth, which can, however, be interrupted by violent compaction phase(s) that can build dominant stellar bulges like those in massive SFGs or QGs. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 24 pages, 17 figures, submitted to A&A, comments are welcome

arXiv:2406.10724 [pdf, other]

Beyond the Visible: Jointly Attending to Spectral and Spatial Dimensions with HSI-Diffusion for the FINCH Spacecraft

Authors: Ian Vyse, Rishit Dagli, Dav Vrat Chadha, John P. Ma, Hector Chen, Isha Ruparelia, Prithvi Seran, Matthew Xie, Eesa Aamer, Aidan Armstrong, Naveen Black, Ben Borstein, Kevin Caldwell, Orrin Dahanaggamaarachchi, Joe Dai, Abeer Fatima, Stephanie Lu, Maxime Michet, Anoushka Paul, Carrie Ann Po, Shivesh Prakash, Noa Prosser, Riddhiman Roy, Mirai Shinjo, Iliya Shofman , et al. (4 additional authors not shown)

Abstract: Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and… ▽ More Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and spatial information, it is prone to various types of noise, including random noise, stripe noise, and dead pixels. Effective denoising of these images is crucial for downstream scientific tasks. Traditional methods, including hand-crafted techniques encoding strong priors, learned 2D image denoising methods applied across different hyperspectral bands, or diffusion generative models applied independently on bands, often struggle with varying noise strengths across spectral bands, leading to significant spectral distortion. This paper presents a novel approach to hyperspectral image denoising using latent diffusion models that integrate spatial and spectral information. We particularly do so by building a 3D diffusion model and presenting a 3-stage training approach on real and synthetically crafted datasets. The proposed method preserves image structure while reducing noise. Evaluations on both popular hyperspectral denoising datasets and synthetically crafted datasets for the FINCH mission demonstrate the effectiveness of this approach. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: To appear in 38th Annual Small Satellite Conference

arXiv:2406.10511 [pdf, other]

Efficient Hardware Accelerator Based on Medium Granularity Dataflow for SpTRSV

Authors: Qian Chen, Xiaofeng Yang, Shengli Lu

Abstract: Sparse triangular solve (SpTRSV) is widely used in various domains. Numerous studies have been conducted using CPUs, GPUs, and specific hardware accelerators, where dataflow can be categorized into coarse and fine granularity. Coarse dataflow offers good spatial locality but suffers from low parallelism, while fine dataflow provides high parallelism but disrupts the spatial structure, leading to i… ▽ More Sparse triangular solve (SpTRSV) is widely used in various domains. Numerous studies have been conducted using CPUs, GPUs, and specific hardware accelerators, where dataflow can be categorized into coarse and fine granularity. Coarse dataflow offers good spatial locality but suffers from low parallelism, while fine dataflow provides high parallelism but disrupts the spatial structure, leading to increased nodes and poor data reuse. This paper proposes a novel hardware accelerator for SpTRSV or SpTRSV-like DAGs. The accelerator implements a medium granularity dataflow through hardware-software codesign and achieves both excellent spatial locality and high parallelism. Additionally, a partial sum caching mechanism is introduced to reduce the blocking frequency of processing elements (PEs), and a reordering algorithm of intra-node edges computation is developed to enhance data reuse. Experimental results on 264 benchmarks with node counts reaching up to 85,392 demonstrate that this work achieves average performance improvements of 12.2$\times$ (up to 874.5$\times$) over CPUs and 10.1$\times$ (up to 740.4$\times$) over GPUs. Compared to the state-of-the-art technique (DPU-v2), this work shows a 2.5$\times$ (up to 5.9$\times$) average performance improvement and 1.8$\times$ (up to 4.1$\times$) average energy efficiency enhancement. △ Less

Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10416 [pdf, other]

Byzantine-Robust Decentralized Federated Learning

Authors: Minghong Fang, Zifan Zhang, Hairi, Prashant Khanduri, Jia Liu, Songtao Lu, Yuchen Liu, Neil Gong

Abstract: Federated learning (FL) enables multiple clients to collaboratively train machine learning models without revealing their private training data. In conventional FL, the system follows the server-assisted architecture (server-assisted FL), where the training process is coordinated by a central server. However, the server-assisted FL framework suffers from poor scalability due to a communication bot… ▽ More Federated learning (FL) enables multiple clients to collaboratively train machine learning models without revealing their private training data. In conventional FL, the system follows the server-assisted architecture (server-assisted FL), where the training process is coordinated by a central server. However, the server-assisted FL framework suffers from poor scalability due to a communication bottleneck at the server, and trust dependency issues. To address challenges, decentralized federated learning (DFL) architecture has been proposed to allow clients to train models collaboratively in a serverless and peer-to-peer manner. However, due to its fully decentralized nature, DFL is highly vulnerable to poisoning attacks, where malicious clients could manipulate the system by sending carefully-crafted local models to their neighboring clients. To date, only a limited number of Byzantine-robust DFL methods have been proposed, most of which are either communication-inefficient or remain vulnerable to advanced poisoning attacks. In this paper, we propose a new algorithm called BALANCE (Byzantine-robust averaging through local similarity in decentralization) to defend against poisoning attacks in DFL. In BALANCE, each client leverages its own local model as a similarity reference to determine if the received model is malicious or benign. We establish the theoretical convergence guarantee for BALANCE under poisoning attacks in both strongly convex and non-convex settings. Furthermore, the convergence rate of BALANCE under poisoning attacks matches those of the state-of-the-art counterparts in Byzantine-free settings. Extensive experiments also demonstrate that BALANCE outperforms existing DFL methods and effectively defends against poisoning attacks. △ Less

Submitted 20 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

Comments: To appear in ACM Conference on Computer and Communications Security 2024 (CCS '24)

arXiv:2406.09121 [pdf, other]

MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

Authors: Jiahao Nie, Gongjie Zhang, Wenbin An, Yap-Peng Tan, Alex C. Kot, Shijian Lu

Abstract: Despite the recent advancements in Multi-modal Large Language Models (MLLMs), understanding inter-object relations, i.e., interactions or associations between distinct objects, remains a major challenge for such models. This issue significantly hinders their advanced reasoning capabilities and is primarily due to the lack of large-scale, high-quality, and diverse multi-modal data essential for tra… ▽ More Despite the recent advancements in Multi-modal Large Language Models (MLLMs), understanding inter-object relations, i.e., interactions or associations between distinct objects, remains a major challenge for such models. This issue significantly hinders their advanced reasoning capabilities and is primarily due to the lack of large-scale, high-quality, and diverse multi-modal data essential for training and evaluating MLLMs. In this paper, we provide a taxonomy of inter-object relations and introduce Multi-Modal Relation Understanding (MMRel), a comprehensive dataset designed to bridge this gap by providing large-scale, high-quality and diverse data for studying inter-object relations with MLLMs. MMRel features three distinctive attributes: (i) It includes over 15K question-answer pairs, which are sourced from three distinct domains, ensuring large scale and high diversity; (ii) It contains a subset featuring highly unusual relations, on which MLLMs often fail due to hallucinations, thus are very challenging; (iii) It provides manually verified high-quality labels for inter-object relations. Thanks to these features, MMRel is ideal for evaluating MLLMs on relation understanding, as well as being used to fine-tune MLLMs to enhance relation understanding and even benefit overall performance in various vision-language tasks. Extensive experiments on various popular MLLMs validate the effectiveness of MMRel. Both MMRel dataset and the complete labeling scripts have been made publicly available. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.04252 [pdf]

Sub-nanometer depth resolution and single dopant visualization achieved by tilt-coupled multislice electron ptychography

Authors: Zehao Dong, Yang Zhang, Chun-Chien Chiu, Sicheng Lu, Jianbing Zhang, Yu-Chen Liu, Suya Liu, Jan-Chi Yang, Pu Yu, Yayu Wang, Zhen Chen

Abstract: Real-space imaging of three-dimensional atomic structures is a critical yet challenging task in materials science. Although scanning transmission electron microscopy has achieved sub-angstrom lateral resolution through techniques like electron ptychography1,2, depth resolution remains limited to only 2 to 3 nanometers with a single projection setup3,4. Attaining better depth resolution typically n… ▽ More Real-space imaging of three-dimensional atomic structures is a critical yet challenging task in materials science. Although scanning transmission electron microscopy has achieved sub-angstrom lateral resolution through techniques like electron ptychography1,2, depth resolution remains limited to only 2 to 3 nanometers with a single projection setup3,4. Attaining better depth resolution typically necessitates large sample tilt angles and many projections, as seen in atomic electron tomography5,6. Here, we develop a new algorithm based on multislice electron ptychography which couples only a few projections at small tilt angles, but is sufficient to improve the depth resolution by more than threefold to the sub-nanometer scale, and potentially to the atomic level. This technique maintains high resolving power for both light and heavy atoms, and significantly improves the visibility of single dopants. We are thus able to experimentally detect dilute substitutional praseodymium dopants in a brownmillerite oxide, Ca2Co2O5, in three dimensions and observe the accompanying lattice distortion. This technique requires only a moderate level of data acquisition or processing, and can be seamlessly integrated into electron microscopes equipped with conventional components. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 27 pages, 5 figures, 10 supplementary figures

arXiv:2406.03496 [pdf, other]

Wings: Learning Multimodal LLMs without Text-only Forgetting

Authors: Yi-Kai Zhang, Shiyin Lu, Yang Li, Yanqing Ma, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

Abstract: Multimodal large language models (MLLMs), initiated with a trained LLM, first align images with text and then fine-tune on multimodal mixed inputs. However, the MLLM catastrophically forgets the text-only instructions, which do not include images and can be addressed within the initial LLM. In this paper, we present Wings, a novel MLLM that excels in both text-only dialogues and multimodal compreh… ▽ More Multimodal large language models (MLLMs), initiated with a trained LLM, first align images with text and then fine-tune on multimodal mixed inputs. However, the MLLM catastrophically forgets the text-only instructions, which do not include images and can be addressed within the initial LLM. In this paper, we present Wings, a novel MLLM that excels in both text-only dialogues and multimodal comprehension. Analyzing MLLM attention in multimodal instructions reveals that text-only forgetting is related to the attention shifts from pre-image to post-image text. From that, we construct extra modules that act as the boosted learner to compensate for the attention shift. The complementary visual and textual learners, like "wings" on either side, are connected in parallel within each layer's attention block. Initially, image and text inputs are aligned with visual learners operating alongside the main attention, balancing focus on visual elements. Textual learners are later collaboratively integrated with attention-based routing to blend the outputs of the visual and textual learners. We design the Low-Rank Residual Attention (LoRRA) to guarantee high efficiency for learners. Our experimental results demonstrate that Wings outperforms equally-scaled MLLMs in both text-only and visual question-answering tasks. On a newly constructed Interleaved Image-Text (IIT) benchmark, Wings exhibits superior performance from text-only-rich to multimodal-rich question-answering tasks. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.02672 [pdf, other]

A comparison of pre-existing $Λ$CDM predictions with the abundance of JWST galaxies at high redshift

Authors: Shengdong Lu, Carlos S. Frenk, Sownak Bose, Cedric G. Lacey, Shaun Cole, Carlton M. Baugh, John C. Helly

Abstract: Observations with the James Webb Space Telescope have revealed a high abundance of bright galaxies at redshift, $z\gtrsim 12$, which has been widely interpreted as conflicting with the $Λ$CDM model. In Cowley et al. (2018) predictions were made - prior to the JWST observations - for the expected abundance of these galaxies using the Durham semi-analytic galaxy formation model, GALFORM, which is kn… ▽ More Observations with the James Webb Space Telescope have revealed a high abundance of bright galaxies at redshift, $z\gtrsim 12$, which has been widely interpreted as conflicting with the $Λ$CDM model. In Cowley et al. (2018) predictions were made - prior to the JWST observations - for the expected abundance of these galaxies using the Durham semi-analytic galaxy formation model, GALFORM, which is known to produce a realistic population of galaxies at lower redshifts including the present day. Key to this model is the assumption of a "top-heavy" initial mass function of stars formed in bursts (required to explain the number counts and redshift distribution of sub-millimetre galaxies). Here, we compare the rest-frame ultraviolet luminosity functions derived from JWST observations with those predicted by the Cowley et al. model up to $z=14$ and make further predictions for $z=16$. We find that below $z\sim 10$, the Cowley et al. predictions agree very well with observations, while agreement at $z\gtrsim12$ requires extending the model to take into account the timescale for the growth of obscuring dust grains and its dependence on gas metallicity. We trace the evolution of these galaxies from $z=14$ to $z=0$ and find that their descendants typically reside in halos with a median mass of $10^{13.6}\,h^{-1}\,\mathrm{M_{\odot}}$. The stellar masses of the descendants range from $10^{7}\,h^{-1}\,\mathrm{M_{\odot}}$ to $10^{11.5}\,h^{-1}\,\mathrm{M_{\odot}}$. Although these galaxies were all central galaxies at $z=14$, nearly half of their descendants end up as satellites in massive halos. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 14 pages, 8 figures, submitted to MNRAS on 4 June, 2024

arXiv:2406.02539 [pdf, other]

Parrot: Multilingual Visual Instruction Tuning

Authors: Hai-Long Sun, Da-Wei Zhou, Yang Li, Shiyin Lu, Chao Yi, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

Abstract: The rapid development of Multimodal Large Language Models (MLLMs) like GPT-4V has marked a significant step towards artificial general intelligence. Existing methods mainly focus on aligning vision encoders with LLMs through supervised fine-tuning (SFT) to endow LLMs with multimodal abilities, making MLLMs' inherent ability to react to multiple languages progressively deteriorate as the training p… ▽ More The rapid development of Multimodal Large Language Models (MLLMs) like GPT-4V has marked a significant step towards artificial general intelligence. Existing methods mainly focus on aligning vision encoders with LLMs through supervised fine-tuning (SFT) to endow LLMs with multimodal abilities, making MLLMs' inherent ability to react to multiple languages progressively deteriorate as the training process evolves. We empirically find that the imbalanced SFT datasets, primarily composed of English-centric image-text pairs, lead to significantly reduced performance in non-English languages. This is due to the failure of aligning the vision encoder and LLM with multilingual tokens during the SFT process. In this paper, we introduce Parrot, a novel method that utilizes textual guidance to drive visual token alignment at the language level. Parrot makes the visual tokens condition on diverse language inputs and uses Mixture-of-Experts (MoE) to promote the alignment of multilingual tokens. Specifically, to enhance non-English visual tokens alignment, we compute the cross-attention using the initial visual features and textual embeddings, the result of which is then fed into the MoE router to select the most relevant experts. The selected experts subsequently convert the initial visual tokens into language-specific visual tokens. Moreover, considering the current lack of benchmarks for evaluating multilingual capabilities within the field, we collect and make available a Massive Multilingual Multimodal Benchmark which includes 6 languages, 15 categories, and 12,000 questions, named as MMMB. Our method not only demonstrates state-of-the-art performance on multilingual MMBench and MMMB, but also excels across a broad range of multimodal tasks. Both the source code and the training dataset of Parrot will be made publicly available. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.02260 [pdf]

Near-Room-Temperature Field-Controllable Exchange Bias in 2D van der Waals Ferromagnet Fe3GaTe2

Authors: Jifeng Shao, Xiaolong Yin, Chunhao Bao, Sirong Lu, Xiaoming Ma, Shu Guo, Le Wang, Xi Zhang, Zhiyue Li, Longxiang Li, Yue Zhao, Tingyong Chen

Abstract: Exchange bias (EB) is a cornerstone of modern magnetic memory and sensing technologies. Its extension to the realm of two-dimensional (2D) van der Waals (vdW) magnets holds promise for revolutionary advancements in miniaturized and efficient atomic spintronic devices. However, the blocking temperature of EB in 2D vdW magnets is currently well below room temperature ~130 K. This study reports a rob… ▽ More Exchange bias (EB) is a cornerstone of modern magnetic memory and sensing technologies. Its extension to the realm of two-dimensional (2D) van der Waals (vdW) magnets holds promise for revolutionary advancements in miniaturized and efficient atomic spintronic devices. However, the blocking temperature of EB in 2D vdW magnets is currently well below room temperature ~130 K. This study reports a robust EB phenomenon in Fe3GaTe2 thin-layer devices, which significantly increases the blocking temperature to a near-room-temperature record of 280 K. Both the bias direction and magnitude can be isothermally tuned by adjusting the field sweep range, in striking contrast to the conventional EB in ferromagnetic/antiferromagnetic (FM/AFM) bilayers. We propose an exchange spring model in which crystal defects with higher coercivity act as the pivotal pinning source for the observed EB phenomenon, deviating from the conventional FM/AFM interface mechanism. Cumulative growth of minor loops and multiple magnetization reversal paths are observed in field cycles below the saturation field, consistent with the hard FM defects behavior of our exchange spring model. These findings provide insights into the complex magnetic order in 2D ferromagnets and open new avenues for developing practical ultrathin vdW spintronic devices with EB-like properties at room temperature. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 14 pages, 5 figures

arXiv:2406.00734 [pdf, other]

GLADformer: A Mixed Perspective for Graph-level Anomaly Detection

Authors: Fan Xu, Nan Wang, Hao Wu, Xuezhi Wen, Dalin Zhang, Siyang Lu, Binyong Li, Wei Gong, Hai Wan, Xibin Zhao

Abstract: Graph-Level Anomaly Detection (GLAD) aims to distinguish anomalous graphs within a graph dataset. However, current methods are constrained by their receptive fields, struggling to learn global features within the graphs. Moreover, most contemporary methods are based on spatial domain and lack exploration of spectral characteristics. In this paper, we propose a multi-perspective hybrid graph-level… ▽ More Graph-Level Anomaly Detection (GLAD) aims to distinguish anomalous graphs within a graph dataset. However, current methods are constrained by their receptive fields, struggling to learn global features within the graphs. Moreover, most contemporary methods are based on spatial domain and lack exploration of spectral characteristics. In this paper, we propose a multi-perspective hybrid graph-level anomaly detector namely GLADformer, consisting of two key modules. Specifically, we first design a Graph Transformer module with global spectrum enhancement, which ensures balanced and resilient parameter distributions by fusing global features and spectral distribution characteristics. Furthermore, to uncover local anomalous attributes, we customize a band-pass spectral GNN message passing module that further enhances the model's generalization capability. Through comprehensive experiments on ten real-world datasets from multiple domains, we validate the effectiveness and robustness of GLADformer. This demonstrates that GLADformer outperforms current state-of-the-art models in graph-level anomaly detection, particularly in effectively capturing global anomaly representations and spectral characteristics. △ Less

Submitted 3 July, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.20797 [pdf, other]

Ovis: Structural Embedding Alignment for Multimodal Large Language Model

Authors: Shiyin Lu, Yang Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Han-Jia Ye

Abstract: Current Multimodal Large Language Models (MLLMs) typically integrate a pre-trained LLM with another pre-trained vision transformer through a connector, such as an MLP, endowing the LLM with visual capabilities. However, the misalignment between two embedding strategies in MLLMs -- the structural textual embeddings based on an embedding look-up table and the continuous embeddings generated directly… ▽ More Current Multimodal Large Language Models (MLLMs) typically integrate a pre-trained LLM with another pre-trained vision transformer through a connector, such as an MLP, endowing the LLM with visual capabilities. However, the misalignment between two embedding strategies in MLLMs -- the structural textual embeddings based on an embedding look-up table and the continuous embeddings generated directly by the vision encoder -- makes challenges for a more seamless fusion of visual and textual information. We propose Ovis, a novel MLLM architecture designed to structurally align visual and textual embeddings. Ovis integrates an additional learnable visual embedding table into the visual encoder's process. To capture rich visual semantics, each image patch indexes the visual embedding table multiple times, resulting in a final visual embedding that is a probabilistic combination of the indexed embeddings. This structural approach mirrors the method used for generating textual embeddings. Empirical evaluations on various multimodal benchmarks show that Ovis outperforms open-source MLLMs of similar parameter scales and even surpasses the proprietary model Qwen-VL-Plus overall. These results highlight the potential of Ovis' structured visual representation for advancing MLLM architectural design and promoting more effective multimodal learning. Code, datasets, and models are available at https://github.com/AIDC-AI/Ovis. △ Less

Submitted 17 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20598 [pdf]

Mott insulating phase and coherent-incoherent crossover across magnetic phase transition in 2D antiferromagnetic CrSBr

Authors: Fan Wu, Xuefeng Zhang, Yi Chen, Ding Pei, Mengwen Zhan, Zicheng Tao, Cheng Chen, Shipeng Lu, Jingzhi Chen, Shujie Tang, Xia Wang, Yanfeng Guo, Lexian Yang, Yan Zhang, Yulin Chen, Qixi Mi, Gang Li, Zhongkai Liu

Abstract: In two-dimensional van der Waals magnetic materials, the interplay between magnetism and electron correlation can give rise to new ground states and lead to novel transport and optical properties. A fundamental question in these materials is how the electron correlation manifests and interacts with the magnetic orders. In this study, we demonstrate that the recently discovered 2D antiferromagnetic… ▽ More In two-dimensional van der Waals magnetic materials, the interplay between magnetism and electron correlation can give rise to new ground states and lead to novel transport and optical properties. A fundamental question in these materials is how the electron correlation manifests and interacts with the magnetic orders. In this study, we demonstrate that the recently discovered 2D antiferromagnetic material, CrSBr, is a Mott insulator, through the combined use of resonant and temperature-dependent angle-resolved photoemission spectroscopy techiniques, supplemented by dynamical mean-field theory analysis. Intriguingly, we found that as the system transitions from the antiferromagnetic to the paramagnetic phases, its Mott bands undergo a reconfiguration, and a coherent-incoherent crossover, driven by the dissolution of the magnetic order. Our findings reveal a distinctive evolution of band structure associated with magnetic phase transitions, shedding light on the investigation of intricate interplay between correlation and magnetic orders in strongly correlated van der Waals magnetic materials. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.20340 [pdf, other]

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Authors: Ling-Hao Chen, Shunlin Lu, Ailing Zeng, Hao Zhang, Benyou Wang, Ruimao Zhang, Lei Zhang

Abstract: This study delves into the realm of multi-modality (i.e., video and motion modalities) human behavior understanding by leveraging the powerful capabilities of Large Language Models (LLMs). Diverging from recent LLMs designed for video-only or motion-only understanding, we argue that understanding human behavior necessitates joint modeling from both videos and motion sequences (e.g., SMPL sequences… ▽ More This study delves into the realm of multi-modality (i.e., video and motion modalities) human behavior understanding by leveraging the powerful capabilities of Large Language Models (LLMs). Diverging from recent LLMs designed for video-only or motion-only understanding, we argue that understanding human behavior necessitates joint modeling from both videos and motion sequences (e.g., SMPL sequences) to capture nuanced body part dynamics and semantics effectively. In light of this, we present MotionLLM, a straightforward yet effective framework for human motion understanding, captioning, and reasoning. Specifically, MotionLLM adopts a unified video-motion training strategy that leverages the complementary advantages of existing coarse video-text data and fine-grained motion-text data to glean rich spatial-temporal insights. Furthermore, we collect a substantial dataset, MoVid, comprising diverse videos, motions, captions, and instructions. Additionally, we propose the MoVid-Bench, with carefully manual annotations, for better evaluation of human behavior understanding on video and motion. Extensive experiments show the superiority of MotionLLM in the caption, spatial-temporal comprehension, and reasoning ability. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: MotionLLM version 1.0, project page see https://lhchen.top/MotionLLM

arXiv:2405.19767 [pdf]

MAE-GAN: A Novel Strategy for Simultaneous Super-resolution Reconstruction and Denoising of Post-stack Seismic Profile

Authors: Wenshuo Yu, Shiqi Dong, Shaoping Lu, Xintong Dong

Abstract: Post-stack seismic profiles are images reflecting containing geological structures which provides a critical foundation for understanding the distribution of oil and gas resources. However, due to the limitations of seismic acquisition equipment and data collecting geometry, the post-stack profiles suffer from low resolution and strong noise issues, which severely affects subsequent seismic interp… ▽ More Post-stack seismic profiles are images reflecting containing geological structures which provides a critical foundation for understanding the distribution of oil and gas resources. However, due to the limitations of seismic acquisition equipment and data collecting geometry, the post-stack profiles suffer from low resolution and strong noise issues, which severely affects subsequent seismic interpretation. To better enhance the spatial resolution and signal-to-noise ratio of post-seismic profiles, a multi-scale attention encoder-decoder network based on generative adversarial network (MAE-GAN) is proposed. This method improves the resolution of post-stack profiles, and effectively suppresses noises and recovers weak signals as well. A multi-scale residual module is proposed to extract geological features under different receptive fields. At the same time, an attention module is designed to further guide the network to focus on important feature information. Additionally, to better recover the global and local information of post-stack profiles, an adversarial network based on a Markov discriminator is proposed. Finally, by introducing an edge information preservation loss function, the conventional loss function of the Generative Adversarial Network is improved, which enables better recovery of the edge information of the original post-stack profiles. Experimental results on simulated and field post-stack profiles demonstrate that the proposed MAE-GAN method outperforms two advanced convolutional neural network-based methods in noise suppression and weak signal recovery. Furthermore, the profiles reconstructed by the MAE-GAN method preserve more geological structures. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19487 [pdf, other]

A Full-duplex Speech Dialogue Scheme Based On Large Language Models

Authors: Peng Wang, Songshuo Lu, Yaohua Tang, Sijie Yan, Yuanjun Xiong, Wei Xia

Abstract: We present a generative dialogue system capable of operating in a full-duplex manner, allowing for seamless interaction. It is based on a large language model (LLM) carefully aligned to be aware of a perception module, a motor function module, and the concept of a simple finite state machine (called neural FSM) with two states. The perception and motor function modules operate simultaneously, allo… ▽ More We present a generative dialogue system capable of operating in a full-duplex manner, allowing for seamless interaction. It is based on a large language model (LLM) carefully aligned to be aware of a perception module, a motor function module, and the concept of a simple finite state machine (called neural FSM) with two states. The perception and motor function modules operate simultaneously, allowing the system to simultaneously speak and listen to the user. The LLM generates textual tokens for inquiry responses and makes autonomous decisions to start responding to, wait for, or interrupt the user by emitting control tokens to the neural FSM. All these tasks of the LLM are carried out as next token prediction on a serialized view of the dialogue in real-time. In automatic quality evaluations simulating real-life interaction, the proposed system reduces the average conversation response latency by more than 3 folds compared with LLM-based half-duplex dialogue systems while responding within less than 500 milliseconds in more than 50% of evaluated interactions. Running a LLM with only 8 billion parameters, our system exhibits a 8% higher interruption precision rate than the best available commercial LLM for voice-based dialogue. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18891 [pdf]

Inverse Design of Promising Alloys for Electrocatalytic CO$_2$ Reduction via Generative Graph Neural Networks Combined with Bird Swarm Algorithm

Authors: Zhilong Song, Linfeng Fan, Shuaihua Lu, Qionghua Zhou, Chongyi Ling, Jinlan Wang

Abstract: Directly generating material structures with optimal properties is a long-standing goal in material design. One of the fundamental challenges lies in how to overcome the limitation of traditional generative models to efficiently explore the global chemical space rather than a small localized space. Herein, we develop a framework named MAGECS to address this dilemma, by integrating the bird swarm a… ▽ More Directly generating material structures with optimal properties is a long-standing goal in material design. One of the fundamental challenges lies in how to overcome the limitation of traditional generative models to efficiently explore the global chemical space rather than a small localized space. Herein, we develop a framework named MAGECS to address this dilemma, by integrating the bird swarm algorithm and supervised graph neural network to effectively navigate the generative model in the immense chemical space towards materials with target properties. As a demonstration, MAGECS is applied to design compelling alloy electrocatalysts for CO$_2$ reduction reaction (CO$_2$RR) and works extremely well. Specifically, the chemical space of CO$_2$RR is effectively explored, where over 250,000 promising structures with high activity have been generated and notably, the proportion of desired structures is 2.5-fold increased. Moreover, five predicted alloys, i.e., CuAl, AlPd, Sn$_2$Pd$_5$, Sn$_9$Pd$_7$, and CuAlSe$_2$ are successfully synthesized and characterized experimentally, two of which exhibit about 90% Faraday efficiency of CO$_2$RR, and CuAl achieved 76% efficiency for C$_2$ products. This pioneering application of inverse design in CO$_2$RR catalysis showcases the potential of MAGECS to dramatically accelerate the development of functional materials, paving the way for fully automated, artificial intelligence-driven material design. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18858 [pdf, other]

Distributed Bilevel Optimization with Communication Compression

Authors: Yutong He, Jie Hu, Xinmeng Huang, Songtao Lu, Bin Wang, Kun Yuan

Abstract: Stochastic bilevel optimization tackles challenges involving nested optimization structures. Its fast-growing scale nowadays necessitates efficient distributed algorithms. In conventional distributed bilevel methods, each worker must transmit full-dimensional stochastic gradients to the server every iteration, leading to significant communication overhead and thus hindering efficiency and scalabil… ▽ More Stochastic bilevel optimization tackles challenges involving nested optimization structures. Its fast-growing scale nowadays necessitates efficient distributed algorithms. In conventional distributed bilevel methods, each worker must transmit full-dimensional stochastic gradients to the server every iteration, leading to significant communication overhead and thus hindering efficiency and scalability. To resolve this issue, we introduce the first family of distributed bilevel algorithms with communication compression. The primary challenge in algorithmic development is mitigating bias in hypergradient estimation caused by the nested structure. We first propose C-SOBA, a simple yet effective approach with unbiased compression and provable linear speedup convergence. However, it relies on strong assumptions on bounded gradients. To address this limitation, we explore the use of moving average, error feedback, and multi-step compression in bilevel optimization, resulting in a series of advanced algorithms with relaxed assumptions and improved convergence properties. Numerical experiments show that our compressed bilevel algorithms can achieve $10\times$ reduction in communication overhead without severe performance degradation. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.17792 [pdf, other]

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 4 tables

arXiv:2405.16444 [pdf, other]

CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion

Authors: Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang

Abstract: Large language models (LLMs) often incorporate multiple text chunks in their inputs to provide the necessary contexts. To speed up the prefill of the long LLM inputs, one can pre-compute the KV cache of a text and re-use the KV cache when the context is reused as the prefix of another LLM input. However, the reused text chunks are not always the input prefix, and when they are not, their precomput… ▽ More Large language models (LLMs) often incorporate multiple text chunks in their inputs to provide the necessary contexts. To speed up the prefill of the long LLM inputs, one can pre-compute the KV cache of a text and re-use the KV cache when the context is reused as the prefix of another LLM input. However, the reused text chunks are not always the input prefix, and when they are not, their precomputed KV caches cannot be directly used since they ignore the text's cross-attention with the preceding text in the LLM input. Thus, the benefits of reusing KV caches remain largely unrealized. This paper tackles just one question: when an LLM input contains multiple text chunks, how to quickly combine their precomputed KV caches in order to achieve the same generation quality as the expensive full prefill (i.e., without reusing KV cache)? We present CacheBlend, a scheme that reuses the pre-computed KV caches, regardless prefix or not, and selectively recomputes the KV values of a small subset of tokens to partially update each reused KV cache. In the meantime,the small extra delay for recomputing some tokens can be pipelined with the retrieval of KV caches within the same job,allowing CacheBlend to store KV caches in slower devices with more storage capacity while retrieving them without increasing the inference delay. By comparing CacheBlend with the state-of-the-art KV cache reusing schemes on three open-source LLMs of various sizes and four popular benchmark datasets of different tasks, we show that CacheBlend reduces time-to-first-token (TTFT) by 2.2-3.3X and increases the inference throughput by 2.8-5X, compared with full KV recompute, without compromising generation quality or incurring more storage cost. △ Less

Submitted 3 June, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.15920 [pdf, other]

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

Authors: Shuai Zhang, Heshan Devaka Fernando, Miao Liu, Keerthiram Murugesan, Songtao Lu, Pin-Yu Chen, Tianyi Chen, Meng Wang

Abstract: This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specif… ▽ More This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function. This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF \& GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning. However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN). This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI. The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization. Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN \& GPI, aligning with our theoretical findings. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.16173

arXiv:2405.14325 [pdf, other]

Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection

Authors: Jia Guo, Shuai Lu, Weihang Zhang, Huiqi Li

Abstract: Recent studies highlighted a practical setting of unsupervised anomaly detection (UAD) that builds a unified model for multi-class images, serving as an alternative to the conventional one-class-one-model setup. Despite various advancements addressing this challenging task, the detection performance under the multi-class setting still lags far behind state-of-the-art class-separated models. Our re… ▽ More Recent studies highlighted a practical setting of unsupervised anomaly detection (UAD) that builds a unified model for multi-class images, serving as an alternative to the conventional one-class-one-model setup. Despite various advancements addressing this challenging task, the detection performance under the multi-class setting still lags far behind state-of-the-art class-separated models. Our research aims to bridge this substantial performance gap. In this paper, we introduce a minimalistic reconstruction-based anomaly detection framework, namely Dinomaly, which leverages pure Transformer architectures without relying on complex designs, additional modules, or specialized tricks. Given this powerful framework consisted of only Attentions and MLPs, we found four simple components that are essential to multi-class anomaly detection: (1) Foundation Transformers that extracts universal and discriminative features, (2) Noisy Bottleneck where pre-existing Dropouts do all the noise injection tricks, (3) Linear Attention that naturally cannot focus, and (4) Loose Reconstruction that does not force layer-to-layer and point-by-point reconstruction. Extensive experiments are conducted across three popular anomaly detection benchmarks including MVTec-AD, VisA, and the recently released Real-IAD. Our proposed Dinomaly achieves impressive image AUROC of 99.6%, 98.7%, and 89.3% on the three datasets respectively, which is not only superior to state-of-the-art multi-class UAD methods, but also surpasses the most advanced class-separated UAD records. △ Less

Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.11205 [pdf, other]

Fuse & Calibrate: A bi-directional Vision-Language Guided Framework for Referring Image Segmentation

Authors: Yichen Yan, Xingjian He, Sihan Chen, Shichen Lu, Jing Liu

Abstract: Referring Image Segmentation (RIS) aims to segment an object described in natural language from an image, with the main challenge being a text-to-pixel correlation. Previous methods typically rely on single-modality features, such as vision or language features, to guide the multi-modal fusion process. However, this approach limits the interaction between vision and language, leading to a lack of… ▽ More Referring Image Segmentation (RIS) aims to segment an object described in natural language from an image, with the main challenge being a text-to-pixel correlation. Previous methods typically rely on single-modality features, such as vision or language features, to guide the multi-modal fusion process. However, this approach limits the interaction between vision and language, leading to a lack of fine-grained correlation between the language description and pixel-level details during the decoding process. In this paper, we introduce FCNet, a framework that employs a bi-directional guided fusion approach where both vision and language play guiding roles. Specifically, we use a vision-guided approach to conduct initial multi-modal fusion, obtaining multi-modal features that focus on key vision information. We then propose a language-guided calibration module to further calibrate these multi-modal features, ensuring they understand the context of the input sentence. This bi-directional vision-language guided approach produces higher-quality multi-modal features sent to the decoder, facilitating adaptive propagation of fine-grained semantic information from textual features to visual features. Experiments on RefCOCO, RefCOCO+, and G-Ref datasets with various backbones consistently show our approach outperforming state-of-the-art methods. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: 12 pages, 4 figures ICIC2024

arXiv:2405.08847 [pdf]

Double symmetry and phase-controlled continuous transformation between skyrmion and meron topology

Authors: Sen Lu, Xiong Xiong, Xuefei Zi, Zhe Shen

Abstract: Topological quasiparticles, including skyrmions and merons, are topological textures with sophisticated vectorial structures that can be used for optical information storage, precision metrology, position sensing, etc. Here, we build a simple model to generate the isolated Néel-type field-skyrmion and derive the analytical solution of it. By employing a series of well-designed double-symmetry aper… ▽ More Topological quasiparticles, including skyrmions and merons, are topological textures with sophisticated vectorial structures that can be used for optical information storage, precision metrology, position sensing, etc. Here, we build a simple model to generate the isolated Néel-type field-skyrmion and derive the analytical solution of it. By employing a series of well-designed double-symmetry apertures and controlling the initial phase of light, we realized the continuous transformation between the isolated skyrmion, the meron lattice, and the skyrmion lattice. We show that the field symmetry determines the possible forms of the topological texture, and the initial phase switches the presentation form of it. These results enrich the methods for generating and transforming topological textures, provide new insights into the symmetry of the electromagnetic field, and open up new opportunities for precision measurement and topological photonics. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.07696 [pdf, other]

MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders

Authors: Xueying Jiang, Sheng Jin, Xiaoqin Zhang, Ling Shao, Shijian Lu

Abstract: Monocular 3D object detection aims for precise 3D localization and identification of objects from a single-view image. Despite its recent progress, it often struggles while handling pervasive object occlusions that tend to complicate and degrade the prediction of object dimensions, depths, and orientations. We design MonoMAE, a monocular 3D detector inspired by Masked Autoencoders that addresses t… ▽ More Monocular 3D object detection aims for precise 3D localization and identification of objects from a single-view image. Despite its recent progress, it often struggles while handling pervasive object occlusions that tend to complicate and degrade the prediction of object dimensions, depths, and orientations. We design MonoMAE, a monocular 3D detector inspired by Masked Autoencoders that addresses the object occlusion issue by masking and reconstructing objects in the feature space. MonoMAE consists of two novel designs. The first is depth-aware masking that selectively masks certain parts of non-occluded object queries in the feature space for simulating occluded object queries for network training. It masks non-occluded object queries by balancing the masked and preserved query portions adaptively according to the depth information. The second is lightweight query completion that works with the depth-aware masking to learn to reconstruct and complete the masked object queries. With the proposed object occlusion and completion, MonoMAE learns enriched 3D representations that achieve superior monocular 3D detection performance qualitatively and quantitatively for both occluded and non-occluded objects. Additionally, MonoMAE learns generalizable representations that can work well in new domains. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07468 [pdf]

Evaluating large language models in medical applications: a survey

Authors: Xiaolan Chen, Jiayang Xiang, Shanfu Lu, Yexin Liu, Mingguang He, Danli Shi

Abstract: Large language models (LLMs) have emerged as powerful tools with transformative potential across numerous domains, including healthcare and medicine. In the medical domain, LLMs hold promise for tasks ranging from clinical decision support to patient education. However, evaluating the performance of LLMs in medical contexts presents unique challenges due to the complex and critical nature of medic… ▽ More Large language models (LLMs) have emerged as powerful tools with transformative potential across numerous domains, including healthcare and medicine. In the medical domain, LLMs hold promise for tasks ranging from clinical decision support to patient education. However, evaluating the performance of LLMs in medical contexts presents unique challenges due to the complex and critical nature of medical information. This paper provides a comprehensive overview of the landscape of medical LLM evaluation, synthesizing insights from existing studies and highlighting evaluation data sources, task scenarios, and evaluation methods. Additionally, it identifies key challenges and opportunities in medical LLM evaluation, emphasizing the need for continued research and innovation to ensure the responsible integration of LLMs into clinical practice. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 4 figures, 1 table

arXiv:2405.06938 [pdf, ps, other]

Stochastic functional partial differential equations with monotone coefficients: Poisson stability measures, exponential mixing and limit theorems

Authors: Shuaishuai Lu, Xue Yang, Yong Li

Abstract: This paper examines Poisson stable (including stationary, periodic, almost periodic, Levitan almost periodic, Bohr almost automorphic, pseudo-periodic, Birkhoff recurrent, pseudo-recurrent, etc.) measures and limit theorems for stochastic functional partial differential equations(SFPDEs) with monotone coefficients. We first show the existence and uniqueness of entrance measure $μ_{t}$ for SFPDEs b… ▽ More This paper examines Poisson stable (including stationary, periodic, almost periodic, Levitan almost periodic, Bohr almost automorphic, pseudo-periodic, Birkhoff recurrent, pseudo-recurrent, etc.) measures and limit theorems for stochastic functional partial differential equations(SFPDEs) with monotone coefficients. We first show the existence and uniqueness of entrance measure $μ_{t}$ for SFPDEs by dissipative method (or remoting start). Then, with the help of Shcherbakov's comparability method in character of recurrence, we prove that the entrance measure inherits the same recurrence of coefficients. Thirdly, we show the tightness of the set of measures $μ_{t}$. As a result, any sequence of the average of $\{μ_{t}\}_{t\in\mathbb{R} }$ have the limit point $μ^{*}$. Further, we study the uniform exponential mixing of the measure $μ^{*}$ in the sense of Wasserstein metric. Fourthly, under uniform exponential mixing and Markov property, we establish the strong law of large numbers, the central limit theorem and estimate the corresponding rates of convergence for solution maps of SFPDEs. Finally, we give applications of stochastic generalized porous media equations with delay to illustrate of our results. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.06563 [pdf, other]

What Can Natural Language Processing Do for Peer Review?

Authors: Ilia Kuznetsov, Osama Mohammed Afzal, Koen Dercksen, Nils Dycke, Alexander Goldberg, Tom Hope, Dirk Hovy, Jonathan K. Kummerfeld, Anne Lauscher, Kevin Leyton-Brown, Sheng Lu, Mausam, Margot Mieskes, Aurélie Névéol, Danish Pruthi, Lizhen Qu, Roy Schwartz, Noah A. Smith, Thamar Solorio, Jingyan Wang, Xiaodan Zhu, Anna Rogers, Nihar B. Shah, Iryna Gurevych

Abstract: The number of scientific articles produced every year is growing rapidly. Providing quality control over them is crucial for scientists and, ultimately, for the public good. In modern science, this process is largely delegated to peer review -- a distributed procedure in which each submission is evaluated by several independent experts in the field. Peer review is widely used, yet it is hard, time… ▽ More The number of scientific articles produced every year is growing rapidly. Providing quality control over them is crucial for scientists and, ultimately, for the public good. In modern science, this process is largely delegated to peer review -- a distributed procedure in which each submission is evaluated by several independent experts in the field. Peer review is widely used, yet it is hard, time-consuming, and prone to error. Since the artifacts involved in peer review -- manuscripts, reviews, discussions -- are largely text-based, Natural Language Processing has great potential to improve reviewing. As the emergence of large language models (LLMs) has enabled NLP assistance for many new tasks, the discussion on machine-assisted peer review is picking up the pace. Yet, where exactly is help needed, where can NLP help, and where should it stand aside? The goal of our paper is to provide a foundation for the future efforts in NLP for peer-reviewing assistance. We discuss peer review as a general process, exemplified by reviewing at AI conferences. We detail each step of the process from manuscript submission to camera-ready revision, and discuss the associated challenges and opportunities for NLP assistance, illustrated by existing work. We then turn to the big challenges in NLP for peer review as a whole, including data acquisition and licensing, operationalization and experimentation, and ethical issues. To help consolidate community efforts, we create a companion repository that aggregates key datasets pertaining to peer review. Finally, we issue a detailed call for action for the scientific community, NLP and AI researchers, policymakers, and funding bodies to help bring the research in NLP for peer review forward. We hope that our work will help set the agenda for research in machine-assisted scientific quality control in the age of AI, within the NLP community and beyond. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.06223 [pdf, ps, other]

McKean-Vlasov SPDEs with Hölder continuous coefficients: existence, uniqueness, ergodicity, exponential mixing and limit theorems

Authors: Shuaishuai Lu, Xue Yang, Yong Li

Abstract: This paper investigates the existence and uniqueness of solutions, as well as the ergodicity and exponential mixing to invariant measures, and limit theorems for a class of McKean-Vlasov SPDEs characterized by Hlder continuity. We rigorously establish the existence and uniqueness of strong solutions for a specific class of finite-dimensional systems with Hölder continuous coefficients. Extending t… ▽ More This paper investigates the existence and uniqueness of solutions, as well as the ergodicity and exponential mixing to invariant measures, and limit theorems for a class of McKean-Vlasov SPDEs characterized by Hlder continuity. We rigorously establish the existence and uniqueness of strong solutions for a specific class of finite-dimensional systems with Hölder continuous coefficients. Extending these results to the infinite-dimensional counterparts using the Galerkin projection technique. Additionally, we explore the properties of the solutions, including time homogeneity, the Markov and the Feller property. Building upon these properties, we examine the exponential ergodicity and mixing of invariant measures under Lyapunov conditions. Finally, within the framework of coefficients meeting the criteria of Hlder continuity and Lyapunov conditions, alongside the uniform mixing property of invariant measures, we establish the strong law of large numbers and the central limit theorem for the solution and obtain estimates of corresponding convergence rates. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05367 [pdf, ps, other]

A Space/Time Interchange Symmetry of Rotating AdS Black Holes in General Dimensions

Authors: Si-Yue Lu, Peng Zhao, H. Lu

Abstract: We revisit the previously known local inversion symmetry of the five-dimensional Kerr-AdS metric that relates the over-rotating black hole to the under-rotating one and reinterpret it as an interchanging symmetry between time and the longitudinal angular coordinates. We generalize this to all $D$ dimensions, including $D=4$, thereby enlarging the trivial linear $\mathbb Z_N$ symmetry of the… ▽ More We revisit the previously known local inversion symmetry of the five-dimensional Kerr-AdS metric that relates the over-rotating black hole to the under-rotating one and reinterpret it as an interchanging symmetry between time and the longitudinal angular coordinates. We generalize this to all $D$ dimensions, including $D=4$, thereby enlarging the trivial linear $\mathbb Z_N$ symmetry of the $N=\lfloor(D-1)/2\rfloor$ longitudinal angular coordinates to the nonlinearly realized $\mathbb Z_{N+1}$ symmetry that involves time. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: LaTex, 9 pages

arXiv:2405.04434 [pdf, other]

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models. △ Less

Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.01762 [pdf, ps, other]

EiG-Search: Generating Edge-Induced Subgraphs for GNN Explanation in Linear Time

Authors: Shengyao Lu, Bang Liu, Keith G. Mills, Jiao He, Di Niu

Abstract: Understanding and explaining the predictions of Graph Neural Networks (GNNs), is crucial for enhancing their safety and trustworthiness. Subgraph-level explanations are gaining attention for their intuitive appeal. However, most existing subgraph-level explainers face efficiency challenges in explaining GNNs due to complex search processes. The key challenge is to find a balance between intuitiven… ▽ More Understanding and explaining the predictions of Graph Neural Networks (GNNs), is crucial for enhancing their safety and trustworthiness. Subgraph-level explanations are gaining attention for their intuitive appeal. However, most existing subgraph-level explainers face efficiency challenges in explaining GNNs due to complex search processes. The key challenge is to find a balance between intuitiveness and efficiency while ensuring transparency. Additionally, these explainers usually induce subgraphs by nodes, which may introduce less-intuitive disconnected nodes in the subgraph-level explanations or omit many important subgraph structures. In this paper, we reveal that inducing subgraph explanations by edges is more comprehensive than other subgraph inducing techniques. We also emphasize the need of determining the subgraph explanation size for each data instance, as different data instances may involve different important substructures. Building upon these considerations, we introduce a training-free approach, named EiG-Search. We employ an efficient linear-time search algorithm over the edge-induced subgraphs, where the edges are ranked by an enhanced gradient-based importance. We conduct extensive experiments on a total of seven datasets, demonstrating its superior performance and efficiency both quantitatively and qualitatively over the leading baselines. △ Less

Submitted 16 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: 19 pages

Journal ref: ICML 2024

Showing 1–50 of 945 results for author: Lu, S