subscribe to arXiv mailings

Research on the Tender Leaf Identification and Mechanically Perceptible Plucking Finger for High-quality Green Tea

Authors: Wei Zhang, Yong Chen, Qianqian Wang, Jun Chen

Abstract: BACKGROUND: Intelligent identification and precise plucking are the keys to intelligent tea harvesting robots, which are of increasing significance nowadays. Aiming at plucking tender leaves for high-quality green tea producing, in this paper, a tender leaf identification algorithm and a mechanically perceptible plucking finger have been proposed. RESULTS: Based on segmentation algorithm and color… ▽ More BACKGROUND: Intelligent identification and precise plucking are the keys to intelligent tea harvesting robots, which are of increasing significance nowadays. Aiming at plucking tender leaves for high-quality green tea producing, in this paper, a tender leaf identification algorithm and a mechanically perceptible plucking finger have been proposed. RESULTS: Based on segmentation algorithm and color features, the tender leaf identification algorithm shows an average identification accuracy of over 92.8%. The mechanically perceptible plucking finger plucks tender leaves in a way that a human hand does so as to remain high quality of tea products. Though finite element analysis, we determine the ideal size of grippers and the location of strain gauge attachment on a gripper to enable the employment of feedback control of desired gripping force. Revealed from our experiments, the success rate of tender leaf plucking reaches 92.5%, demonstrating the effectiveness of our design. CONCLUSION: The results show that the tender leaf identification algorithm and the mechanically perceptible plucking finger are effective for tender leaves identification and plucking, providing a foundation for the development of an intelligent tender leaf plucking robot. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.04967 [pdf, other]

MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures

Authors: Han Yang, Chenxi Hu, Yichi Zhou, Xixian Liu, Yu Shi, Jielan Li, Guanzhi Li, Zekun Chen, Shuizhou Chen, Claudio Zeni, Matthew Horton, Robert Pinsler, Andrew Fowler, Daniel Zügner, Tian Xie, Jake Smith, Lixin Sun, Qian Wang, Lingyu Kong, Chang Liu, Hongxia Hao, Ziheng Lu

Abstract: Accurate and fast prediction of materials properties is central to the digital transformation of materials design. However, the vast design space and diverse operating conditions pose significant challenges for accurately modeling arbitrary material candidates and forecasting their properties. We present MatterSim, a deep learning model actively learned from large-scale first-principles computatio… ▽ More Accurate and fast prediction of materials properties is central to the digital transformation of materials design. However, the vast design space and diverse operating conditions pose significant challenges for accurately modeling arbitrary material candidates and forecasting their properties. We present MatterSim, a deep learning model actively learned from large-scale first-principles computations, for efficient atomistic simulations at first-principles level and accurate prediction of broad material properties across the periodic table, spanning temperatures from 0 to 5000 K and pressures up to 1000 GPa. Out-of-the-box, the model serves as a machine learning force field, and shows remarkable capabilities not only in predicting ground-state material structures and energetics, but also in simulating their behavior under realistic temperatures and pressures, signifying an up to ten-fold enhancement in precision compared to the prior best-in-class. This enables MatterSim to compute materials' lattice dynamics, mechanical and thermodynamic properties, and beyond, to an accuracy comparable with first-principles methods. Specifically, MatterSim predicts Gibbs free energies for a wide range of inorganic solids with near-first-principles accuracy and achieves a 15 meV/atom resolution for temperatures up to 1000K compared with experiments. This opens an opportunity to predict experimental phase diagrams of materials at minimal computational cost. Moreover, MatterSim also serves as a platform for continuous learning and customization by integrating domain-specific data. The model can be fine-tuned for atomistic simulations at a desired level of theory or for direct structure-to-property predictions, achieving high data efficiency with a reduction in data requirements by up to 97%. △ Less

Submitted 10 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.04596 [pdf, other]

Cross-Platform Autonomous Control of Minimal Kitaev Chains

Authors: David van Driel, Rouven Koch, Vincent P. M. Sietses, Sebastiaan L. D. ten Haaf, Chun-Xiao Liu, Francesco Zatelli, Bart Roovers, Alberto Bordin, Nick van Loo, Guanzhong Wang, Jan Cornelis Wolff, Grzegorz P. Mazur, Tom Dvir, Ivan Kulesh, Qingzhen Wang, A. Mert Bozkurt, Sasa Gazibegovic, Ghada Badawy, Erik P. A. M. Bakkers, Michael Wimmer, Srijit Goswami, Jose L. Lado, Leo P. Kouwenhoven, Eliska Greplova

Abstract: Contemporary quantum devices are reaching new limits in size and complexity, allowing for the experimental exploration of emergent quantum modes. However, this increased complexity introduces significant challenges in device tuning and control. Here, we demonstrate autonomous tuning of emergent Majorana zero modes in a minimal realization of a Kitaev chain. We achieve this task using cross-platfor… ▽ More Contemporary quantum devices are reaching new limits in size and complexity, allowing for the experimental exploration of emergent quantum modes. However, this increased complexity introduces significant challenges in device tuning and control. Here, we demonstrate autonomous tuning of emergent Majorana zero modes in a minimal realization of a Kitaev chain. We achieve this task using cross-platform transfer learning. First, we train a tuning model on a theory model. Next, we retrain it using a Kitaev chain realization in a two-dimensional electron gas. Finally, we apply this model to tune a Kitaev chain realized in quantum dots coupled through a semiconductor-superconductor section in a one-dimensional nanowire. Utilizing a convolutional neural network, we predict the tunneling and Cooper pair splitting rates from differential conductance measurements, employing these predictions to adjust the electrochemical potential to a Majorana sweet spot. The algorithm successfully converges to the immediate vicinity of a sweet spot (within 1.5 mV in 67.6% of attempts and within 4.5 mV in 80.9% of cases), typically finding a sweet spot in 45 minutes or less. This advancement is a stepping stone towards autonomous tuning of emergent modes in interacting systems, and towards foundational tuning machine learning models that can be deployed across a range of experimental platforms. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.03747 [pdf, other]

ATClean: A Novel Method for Detecting Low-Luminosity Transients and Application to Pre-explosion Counterparts from SN 2023ixf

Authors: S. Rest, A. Rest, C. D. Kilpatrick, J. E. Jencson, S. von Coelln, L. Strolger, S. Smartt, J. P. Anderson, A. Clocchiatti, D. A. Coulter, L. Denneau, S. Gomez, A. Heinze, R. Ridden-Harper, K. W. Smith, B. Stalder, J. l. Tonry, Q. Wang, Y. Zenati

Abstract: In an effort to search for faint sources of emission over arbitrary timescales, we present a novel method for analyzing forced photometry light curves in difference imaging from optical surveys. Our method "ATLAS Clean'' or ATClean, utilizes the reported fluxes, uncertainties, and fits to the point-spread function from difference images to quantify the statistical significance of individual measur… ▽ More In an effort to search for faint sources of emission over arbitrary timescales, we present a novel method for analyzing forced photometry light curves in difference imaging from optical surveys. Our method "ATLAS Clean'' or ATClean, utilizes the reported fluxes, uncertainties, and fits to the point-spread function from difference images to quantify the statistical significance of individual measurements. We apply this method to control light curves across the image to determine whether any source of flux is present in the data for a range of specific timescales. From ATLAS $o$-band imaging at the site of the Type II supernova (SN) 2023ixf in M101 from 2015--2023, we show that this method accurately reproduces the 3$��$ flux limits produced from other, more computationally expensive methods. We derive limits for emission on timescales of 5~days and 80-300~days at the site of SN\,2023ixf, which are 19.8 and 21.3~mag, respectively. The latter limits rule out variability for unextinguished red supergiants (RSG) with initial masses $>$22~$M_{\odot}$, comparable to the most luminous predictions for the SN 2023ixf progenitor system. We also compare our limits to short timescale outbursts, similar to those expected for Type IIn SN progenitor stars or the Type II SN 2020tlf, and rule out outburst ejecta masses of $>$0.021~$M_{\odot}$, much lower than the inferred mass of circumstellar matter around SN 2023ixf in the literature. In the future, these methods can be applied to any forced point-spread function photometry on difference imaging from other surveys, such as Rubin optical imaging. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 24 pages, 19 figures

arXiv:2405.03673 [pdf, other]

MemoryMamba: Memory-Augmented State Space Model for Defect Recognition

Authors: Qianning Wang, He Hu, Yucheng Zhou

Abstract: As automation advances in manufacturing, the demand for precise and sophisticated defect detection technologies grows. Existing vision models for defect recognition methods are insufficient for handling the complexities and variations of defects in contemporary manufacturing settings. These models especially struggle in scenarios involving limited or imbalanced defect data. In this work, we introd… ▽ More As automation advances in manufacturing, the demand for precise and sophisticated defect detection technologies grows. Existing vision models for defect recognition methods are insufficient for handling the complexities and variations of defects in contemporary manufacturing settings. These models especially struggle in scenarios involving limited or imbalanced defect data. In this work, we introduce MemoryMamba, a novel memory-augmented state space model (SSM), designed to overcome the limitations of existing defect recognition models. MemoryMamba integrates the state space model with the memory augmentation mechanism, enabling the system to maintain and retrieve essential defect-specific information in training. Its architecture is designed to capture dependencies and intricate defect characteristics, which are crucial for effective defect detection. In the experiments, MemoryMamba was evaluated across four industrial datasets with diverse defect types and complexities. The model consistently outperformed other methods, demonstrating its capability to adapt to various defect recognition scenarios. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 15 pages, 7 figures

arXiv:2405.03160 [pdf, ps, other]

Moore Determinant of Dual Quaternion Hermitian Matrices

Authors: Chunfeng Cui, Liqun Qi, Guangjing Song, Qingwen Wang

Abstract: In this paper, we extend the Chen and Moore determinants of quaternion Hermitian} matrices to dual quaternion Hermitian matrices. We show the Chen determinant of dual quaternion Hermitian {matrices is invariant under addition, switching, multiplication, and unitary operations at the both hand sides. We then show the Chen and Moore determinants of dual quaternion Hermitian matrices are equal to eac… ▽ More In this paper, we extend the Chen and Moore determinants of quaternion Hermitian} matrices to dual quaternion Hermitian matrices. We show the Chen determinant of dual quaternion Hermitian {matrices is invariant under addition, switching, multiplication, and unitary operations at the both hand sides. We then show the Chen and Moore determinants of dual quaternion Hermitian matrices are equal to each other, and they are also equal to the products of eigenvalues. The characteristic polynomial of a dual quaternion Hermitian matrix is also studied. △ Less

Submitted 18 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.03003 [pdf, other]

Parameter-Efficient Fine-Tuning with Discrete Fourier Transform

Authors: Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, Jia Li

Abstract: Low-rank adaptation~(LoRA) has recently gained much interest in fine-tuning foundation models. It effectively reduces the number of trainable parameters by incorporating low-rank matrices $A$ and $B$ to represent the weight change, i.e., $ΔW=BA$. Despite LoRA's progress, it faces storage challenges when handling extensive customization adaptations or larger base models. In this work, we aim to fur… ▽ More Low-rank adaptation~(LoRA) has recently gained much interest in fine-tuning foundation models. It effectively reduces the number of trainable parameters by incorporating low-rank matrices $A$ and $B$ to represent the weight change, i.e., $ΔW=BA$. Despite LoRA's progress, it faces storage challenges when handling extensive customization adaptations or larger base models. In this work, we aim to further compress trainable parameters by enjoying the powerful expressiveness of the Fourier transform. Specifically, we introduce FourierFT, which treats $ΔW$ as a matrix in the spatial domain and learns only a small fraction of its spectral coefficients. With the trained spectral coefficients, we implement the inverse discrete Fourier transform to recover $ΔW$. Empirically, our FourierFT method shows comparable or better performance with fewer parameters than LoRA on various tasks, including natural language understanding, natural language generation, instruction tuning, and image classification. For example, when performing instruction tuning on the LLaMA2-7B model, FourierFT surpasses LoRA with only 0.064M trainable parameters, compared to LoRA's 33.5M. Our code is released at \url{https://github.com/Chaos96/fourierft}. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: Accepted by ICML 2024

arXiv:2405.02474 [pdf, other]

Nonlinear magnetic sensing with hybrid nitrogen-vacancy/magnon systems

Authors: Zhongqiang Hu, Zhiping He, Qiuyuan Wang, Chung-Tao Chou, Justin T. Hou, Luqiao Liu

Abstract: Magnetic sensing beyond linear regime could broaden the frequency range of detectable magnetic fields, which is crucial to various microwave and quantum applications. Recently, nonlinear interactions in diamond nitrogen-vacancy (NV) centers, one of the most extensively studied quantum magnetic sensors, are proposed to realize magnetic sensing across arbitrary frequencies. In this work, we enhance… ▽ More Magnetic sensing beyond linear regime could broaden the frequency range of detectable magnetic fields, which is crucial to various microwave and quantum applications. Recently, nonlinear interactions in diamond nitrogen-vacancy (NV) centers, one of the most extensively studied quantum magnetic sensors, are proposed to realize magnetic sensing across arbitrary frequencies. In this work, we enhance these capabilities by exploiting the nonlinear spin dynamics in hybrid systems of NV centers and ferri- or ferro-magnetic (FM) thin films. We study the frequency mixing effect in the hybrid NV/magnon systems, and demonstrate that the introduction of FM not only amplifies the intensity of nonlinear resonance signals that are intrinsic to NV spins, but also enables novel frequency mixings through parametric pumping and nonlinear magnon scattering effects. The discovery and understanding of the magnetic nonlinearities in hybrid NV/magnon systems position them as a prime candidate for magnetic sensing with a broad frequency range and high tunablity, particularly meaningful for nanoscale, dynamical, and non-invasive materials characterization. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.02359 [pdf, other]

doi 10.1007/978-3-031-43412-9_11

CVTGAD: Simplified Transformer with Cross-View Attention for Unsupervised Graph-level Anomaly Detection

Authors: Jindong Li, Qianli Xing, Qi Wang, Yi Chang

Abstract: Unsupervised graph-level anomaly detection (UGAD) has received remarkable performance in various critical disciplines, such as chemistry analysis and bioinformatics. Existing UGAD paradigms often adopt data augmentation techniques to construct multiple views, and then employ different strategies to obtain representations from different views for jointly conducting UGAD. However, most previous work… ▽ More Unsupervised graph-level anomaly detection (UGAD) has received remarkable performance in various critical disciplines, such as chemistry analysis and bioinformatics. Existing UGAD paradigms often adopt data augmentation techniques to construct multiple views, and then employ different strategies to obtain representations from different views for jointly conducting UGAD. However, most previous works only considered the relationship between nodes/graphs from a limited receptive field, resulting in some key structure patterns and feature information being neglected. In addition, most existing methods consider different views separately in a parallel manner, which is not able to explore the inter-relationship across different views directly. Thus, a method with a larger receptive field that can explore the inter-relationship across different views directly is in need. In this paper, we propose a novel Simplified Transformer with Cross-View Attention for Unsupervised Graph-level Anomaly Detection, namely, CVTGAD. To increase the receptive field, we construct a simplified transformer-based module, exploiting the relationship between nodes/graphs from both intra-graph and inter-graph perspectives. Furthermore, we design a cross-view attention mechanism to directly exploit the view co-occurrence between different views, bridging the inter-view gap at node level and graph level. To the best of our knowledge, this is the first work to apply transformer and cross attention to UGAD, which realizes graph neural network and transformer working collaboratively. Extensive experiments on 15 real-world datasets of 3 fields demonstrate the superiority of CVTGAD on the UGAD task. The code is available at \url{https://github.com/jindongli-Ai/CVTGAD}. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.00673 [pdf, other]

Quantum algorithms for matrix geometric means

Authors: Nana Liu, Qisheng Wang, Mark M. Wilde, Zhicheng Zhang

Abstract: Matrix geometric means between two positive definite matrices can be defined equivalently from distinct perspectives - as solutions to certain nonlinear systems of equations, as points along geodesics in Riemannian geometry, and as solutions to certain optimisation problems. This diversity already suggests the potential for varied applications, as well as acting as a bridge between different domai… ▽ More Matrix geometric means between two positive definite matrices can be defined equivalently from distinct perspectives - as solutions to certain nonlinear systems of equations, as points along geodesics in Riemannian geometry, and as solutions to certain optimisation problems. This diversity already suggests the potential for varied applications, as well as acting as a bridge between different domains. Here we devise new quantum subroutines to efficiently prepare quantum unitary operators that embed the standard matrix geometric mean and its generalisations called the weighted matrix geometric mean. This enables the construction of solutions to the algebraic Riccati equation, which is an important class of nonlinear systems of equations that appears in machine learning, optimal control, estimation, and filtering. Using these subroutines, we present a new class of quantum learning algorithms called quantum geometric mean metric learning. This has applications in efficiently finding the best distance measure and solving classification problems in the weakly supervised limit and for anomaly detection, for both classical and quantum problems. We also show how our method can be generalised to a particular p^th-order system of nonlinear equations. These quantum subroutines for matrix geometric means are also useful in other areas of quantum information. For example, we show how to use them in the estimation of geometric Renyi relative entropies and the Uhlmann fidelity by means of the Fuchs-Caves observable. In particular, our quantum algorithms for estimating the Uhlmann and Matsumoto fidelities have optimal dependence on the precision. Finally, we provide a BQP-complete problem based on matrix geometric means that can be solved by our subroutines, thus characterising their computational capability. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2405.00188 [pdf, other]

A Revisit of the Optimal Excess-of-Loss Contract

Authors: Ernest Aboagye, Vali Asimit, Tsz Chai Fung, Liang Peng, Qiuqi Wang

Abstract: It is well-known that Excess-of-Loss reinsurance has more marketability than Stop-Loss reinsurance, though Stop-Loss reinsurance is the most prominent setting discussed in the optimal (re)insurance design literature. We point out that optimal reinsurance policy under Stop-Loss leads to a zero insolvency probability, which motivates our paper. We provide a remedy to this peculiar property of the op… ▽ More It is well-known that Excess-of-Loss reinsurance has more marketability than Stop-Loss reinsurance, though Stop-Loss reinsurance is the most prominent setting discussed in the optimal (re)insurance design literature. We point out that optimal reinsurance policy under Stop-Loss leads to a zero insolvency probability, which motivates our paper. We provide a remedy to this peculiar property of the optimal Stop-Loss reinsurance contract by investigating the optimal Excess-of-Loss reinsurance contract instead. We also provide estimators for the optimal Excess-of-Loss and Stop-Loss contracts and investigate their statistical properties under many premium principle assumptions and various risk preferences, which according to our knowledge, have never been investigated in the literature. Simulated data and real-life data are used to illustrate our main theoretical findings. △ Less

Submitted 30 April, 2024; originally announced May 2024.

arXiv:2404.19437 [pdf, other]

Quintom cosmology and modified gravity after DESI 2024

Authors: Yuhang Yang, Xin Ren, Qingqing Wang, Zhiyu Lu, Dongdong Zhang, Yi-Fu Cai, Emmanuel N. Saridakis

Abstract: We reconstruct the cosmological background evolution under the scenario of dynamical dark energy through the Gaussian process approach, using the latest Dark Energy Spectroscopic Instrument (DESI) baryon acoustic oscillations (BAO) \cite{DESI:2024mwx} combined with other observations. Our results reveal that the reconstructed dark-energy equation-of-state (EoS) parameter $w(z)$ exhibits the so-cal… ▽ More We reconstruct the cosmological background evolution under the scenario of dynamical dark energy through the Gaussian process approach, using the latest Dark Energy Spectroscopic Instrument (DESI) baryon acoustic oscillations (BAO) \cite{DESI:2024mwx} combined with other observations. Our results reveal that the reconstructed dark-energy equation-of-state (EoS) parameter $w(z)$ exhibits the so-called quintom-B behavior, crossing $-1$ from phantom to quintessence regime as the universe expands. We investigate under what situation this type of evolution could be achieved from the perspectives of field theories and modified gravity. In particular, we reconstruct the corresponding actions for $f(R)$, $f(T)$, and $f(Q)$ gravity, respectively. We explicitly show that, certain modified gravity can exhibit the quintom dynamics and fit the recent DESI data efficiently, and for all cases the quadratic deviation from the $Λ$CDM scenario is mildly favored. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: 10 pages, 3 figures

arXiv:2404.18598 [pdf, other]

Anywhere: A Multi-Agent Framework for Reliable and Diverse Foreground-Conditioned Image Inpainting

Authors: Tianyidan Xie, Rui Ma, Qian Wang, Xiaoqian Ye, Feixuan Liu, Ying Tai, Zhenyu Zhang, Zili Yi

Abstract: Recent advancements in image inpainting, particularly through diffusion modeling, have yielded promising outcomes. However, when tested in scenarios involving the completion of images based on the foreground objects, current methods that aim to inpaint an image in an end-to-end manner encounter challenges such as "over-imagination", inconsistency between foreground and background, and limited dive… ▽ More Recent advancements in image inpainting, particularly through diffusion modeling, have yielded promising outcomes. However, when tested in scenarios involving the completion of images based on the foreground objects, current methods that aim to inpaint an image in an end-to-end manner encounter challenges such as "over-imagination", inconsistency between foreground and background, and limited diversity. In response, we introduce Anywhere, a pioneering multi-agent framework designed to address these issues. Anywhere utilizes a sophisticated pipeline framework comprising various agents such as Visual Language Model (VLM), Large Language Model (LLM), and image generation models. This framework consists of three principal components: the prompt generation module, the image generation module, and the outcome analyzer. The prompt generation module conducts a semantic analysis of the input foreground image, leveraging VLM to predict relevant language descriptions and LLM to recommend optimal language prompts. In the image generation module, we employ a text-guided canny-to-image generation model to create a template image based on the edge map of the foreground image and language prompts, and an image refiner to produce the outcome by blending the input foreground and the template image. The outcome analyzer employs VLM to evaluate image content rationality, aesthetic score, and foreground-background relevance, triggering prompt and image regeneration as needed. Extensive experiments demonstrate that our Anywhere framework excels in foreground-conditioned image inpainting, mitigating "over-imagination", resolving foreground-background discrepancies, and enhancing diversity. It successfully elevates foreground-conditioned image inpainting to produce more reliable and diverse results. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 16 pages, 9 figures, project page: https://anywheremultiagent.github.io

arXiv:2404.18547 [pdf, other]

Light tetraquark states with exotic quantum numbers $J^{PC}=2^{+-}$

Authors: Qi-Nan Wang, Ding-Kun Lian, Wei Chen

Abstract: We study the masses of light tetraquark states $ud\bar{u}\bar{d}$ , $us\bar{u}\bar{s}$ and $ss\bar{s}\bar{s}$ with exotic quantum numbers $J^{PC}=2^{+-}$ using the method of QCD sum rules. It is found that there is no tetraquark operator with two Lorentz indices coupling to the $2^{+-}$ quantum numbers. To investigate such tetraquark states, we construct the interpolating tetraquark currents with… ▽ More We study the masses of light tetraquark states $ud\bar{u}\bar{d}$ , $us\bar{u}\bar{s}$ and $ss\bar{s}\bar{s}$ with exotic quantum numbers $J^{PC}=2^{+-}$ using the method of QCD sum rules. It is found that there is no tetraquark operator with two Lorentz indices coupling to the $2^{+-}$ quantum numbers. To investigate such tetraquark states, we construct the interpolating tetraquark currents with three Lorentz indices and without derivative operator. We calculate the correlation functions up to dimension 10 condensates, and extract the $2^{+-}$ invariant functions via the projector operator. Our results show that the masses of the $ud\bar{u}\bar{d}$, $us\bar{u}\bar{s}$ and $ss\bar{s}\bar{s}$ tetraquark states with $J^{PC}=2^{+-}$ are about $3.3-3.5 ~\mathrm{GeV}$, $3.5-3.6 ~\mathrm{GeV}$ and $3.6 ~\mathrm{GeV}$, respectively. We further discuss the strong decays of these light tetraquarks into the two-meson and baryon-antibaryon final states, and suggest to search for them in the $ρπ, ωπ, φπ$, $b_{1}π$, $h_{1}π$, $K\bar K^\ast, K\bar{K}_{1}$, $Δ\barΔ$, $Σ^{\ast} \bar{Σ}^{\ast}$, $Ξ^{\ast} \bar{Ξ}^{\ast}$, $Ω\bar{Ω}$ channels in the future. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 12 pages, 7 figures

arXiv:2404.18319 [pdf, other]

User Welfare Optimization in Recommender Systems with Competing Content Creators

Authors: Fan Yao, Yiming Liao, Mingzhe Wu, Chuanhao Li, Yan Zhu, James Yang, Qifan Wang, Haifeng Xu, Hongning Wang

Abstract: Driven by the new economic opportunities created by the creator economy, an increasing number of content creators rely on and compete for revenue generated from online content recommendation platforms. This burgeoning competition reshapes the dynamics of content distribution and profoundly impacts long-term user welfare on the platform. However, the absence of a comprehensive picture of global use… ▽ More Driven by the new economic opportunities created by the creator economy, an increasing number of content creators rely on and compete for revenue generated from online content recommendation platforms. This burgeoning competition reshapes the dynamics of content distribution and profoundly impacts long-term user welfare on the platform. However, the absence of a comprehensive picture of global user preference distribution often traps the competition, especially the creators, in states that yield sub-optimal user welfare. To encourage creators to best serve a broad user population with relevant content, it becomes the platform's responsibility to leverage its information advantage regarding user preference distribution to accurately signal creators. In this study, we perform system-side user welfare optimization under a competitive game setting among content creators. We propose an algorithmic solution for the platform, which dynamically computes a sequence of weights for each user based on their satisfaction of the recommended content. These weights are then utilized to design mechanisms that adjust the recommendation policy or the post-recommendation rewards, thereby influencing creators' content production strategies. To validate the effectiveness of our proposed method, we report our findings from a series of experiments, including: 1. a proof-of-concept negative example illustrating how creators' strategies converge towards sub-optimal states without platform intervention; 2. offline experiments employing our proposed intervention mechanisms on diverse datasets; and 3. results from a three-week online experiment conducted on a leading short-video recommendation platform. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.17871 [pdf, other]

A Survey of Deep Learning Library Testing Methods

Authors: Xiaoyu Zhang, Weipeng Jiang, Chao Shen, Qi Li, Qian Wang, Chenhao Lin, Xiaohong Guan

Abstract: In recent years, software systems powered by deep learning (DL) techniques have significantly facilitated people's lives in many aspects. As the backbone of these DL systems, various DL libraries undertake the underlying optimization and computation. However, like traditional software, DL libraries are not immune to bugs, which can pose serious threats to users' personal property and safety. Study… ▽ More In recent years, software systems powered by deep learning (DL) techniques have significantly facilitated people's lives in many aspects. As the backbone of these DL systems, various DL libraries undertake the underlying optimization and computation. However, like traditional software, DL libraries are not immune to bugs, which can pose serious threats to users' personal property and safety. Studying the characteristics of DL libraries, their associated bugs, and the corresponding testing methods is crucial for enhancing the security of DL systems and advancing the widespread application of DL technology. This paper provides an overview of the testing research related to various DL libraries, discusses the strengths and weaknesses of existing methods, and provides guidance and reference for the application of the DL library. This paper first introduces the workflow of DL underlying libraries and the characteristics of three kinds of DL libraries involved, namely DL framework, DL compiler, and DL hardware library. It then provides definitions for DL underlying library bugs and testing. Additionally, this paper summarizes the existing testing methods and tools tailored to these DL libraries separately and analyzes their effectiveness and limitations. It also discusses the existing challenges of DL library testing and outlines potential directions for future research. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: 34 pages, 8 figures, 4 tables

arXiv:2404.17795 [pdf, other]

Discovery of Giant Unit-Cell Super-Structure in the Infinite-Layer Nickelate PrNiO$_2$

Authors: J. Oppliger, J. Küspert, A. -C. Dippel, M. v. Zimmermann, O. Gutowski, X. Ren, X. J. Zhou, Z. Zhu, R. Frison, Q. Wang, L. Martinelli, I. Biało, J. Chang

Abstract: Spectacular quantum phenomena such as superconductivity often emerge in flat-band systems where Coulomb interactions overpower electron kinetics. Engineering strategies for flat-band physics is therefore of great importance. Here, using high-energy grazing-incidence x-ray diffraction, we demonstrate how in-situ temperature annealing of the infinite-layer nickelate PrNiO$_2$ induces a giant superla… ▽ More Spectacular quantum phenomena such as superconductivity often emerge in flat-band systems where Coulomb interactions overpower electron kinetics. Engineering strategies for flat-band physics is therefore of great importance. Here, using high-energy grazing-incidence x-ray diffraction, we demonstrate how in-situ temperature annealing of the infinite-layer nickelate PrNiO$_2$ induces a giant superlattice structure. The annealing effect has a maximum well above room temperature. By covering a large scattering volume, we show a rare period-six in-plane (bi-axial) symmetry and a period-four symmetry in the out-of-plane direction. This giant unit-cell superstructure likely stems from ordering of diffusive oxygen. The stability of this superlattice structure suggests a connection to an energetically favorable electronic state of matter. As such, our study provides a new pathway - different from Moiré structures - to ultra-small Brillouin zone electronics. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: Main: 7 pages, 4 figures. Supplementary: 2 pages, 3 figures

arXiv:2404.16687 [pdf, other]

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC. △ Less

Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16484 [pdf, other]

Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: CVPR 2024, AI for Streaming (AIS) Workshop

arXiv:2404.15943 [pdf, other]

Decentralized Personalized Federated Learning based on a Conditional Sparse-to-Sparser Scheme

Authors: Qianyu Long, Qiyuan Wang, Christos Anagnostopoulos, Daning Bi

Abstract: Decentralized Federated Learning (DFL) has become popular due to its robustness and avoidance of centralized coordination. In this paradigm, clients actively engage in training by exchanging models with their networked neighbors. However, DFL introduces increased costs in terms of training and communication. Existing methods focus on minimizing communication often overlooking training efficiency a… ▽ More Decentralized Federated Learning (DFL) has become popular due to its robustness and avoidance of centralized coordination. In this paradigm, clients actively engage in training by exchanging models with their networked neighbors. However, DFL introduces increased costs in terms of training and communication. Existing methods focus on minimizing communication often overlooking training efficiency and data heterogeneity. To address this gap, we propose a novel \textit{sparse-to-sparser} training scheme: DA-DPFL. DA-DPFL initializes with a subset of model parameters, which progressively reduces during training via \textit{dynamic aggregation} and leads to substantial energy savings while retaining adequate information during critical learning periods. Our experiments showcase that DA-DPFL substantially outperforms DFL baselines in test accuracy, while achieving up to $5$ times reduction in energy costs. We provide a theoretical analysis of DA-DPFL's convergence by solidifying its applicability in decentralized and personalized learning. The code is available at:https://github.com/EricLoong/da-dpfl △ Less

Submitted 25 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: 15 pages, 9 figures, 3 pages theory

arXiv:2404.15874 [pdf, other]

Addendum to "Power-law decay of the fraction of the mixed eigenstates in kicked top model with mixed-type classical phase space"

Authors: Hua Yan, Qian Wang, Marko Robnik

Abstract: By using the Krylov subspace technique to generate the spin coherent states in kicked top model, a prototype model for studying quantum chaos, the accessible system size for studying the Husimi functions of eigenstates can be much larger than that reported in the literature and our previous study Phys. Rev. E 108, 054217 (2023) [arXiv:2308.04824]. In the fully chaotic kicked top, we find that the… ▽ More By using the Krylov subspace technique to generate the spin coherent states in kicked top model, a prototype model for studying quantum chaos, the accessible system size for studying the Husimi functions of eigenstates can be much larger than that reported in the literature and our previous study Phys. Rev. E 108, 054217 (2023) [arXiv:2308.04824]. In the fully chaotic kicked top, we find that the mean Wehrl entropy localization measure approaches the prediction given by the Circular Unitary Ensemble. In the mixed-type case, we identify mixed eigenstates by the overlap of the Husimi function with regular and chaotic regions in classical compact phase space. Numerically, we show that the fraction of mixed eigenstates scales as $j^{-ζ}$, a power-law decay as the system size $j$ increases, across nearly two orders of magnitude. This provides supporting evidence for the principle of uniform semiclassical condensation of Husimi functions and the Berry-Robnik picture in the semiclassical limit. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15690 [pdf, other]

Neural Proto-Language Reconstruction

Authors: Chenxuan Cui, Ying Chen, Qinxin Wang, David R. Mortensen

Abstract: Proto-form reconstruction has been a painstaking process for linguists. Recently, computational models such as RNN and Transformers have been proposed to automate this process. We take three different approaches to improve upon previous methods, including data augmentation to recover missing reflexes, adding a VAE structure to the Transformer model for proto-to-language prediction, and using a neu… ▽ More Proto-form reconstruction has been a painstaking process for linguists. Recently, computational models such as RNN and Transformers have been proposed to automate this process. We take three different approaches to improve upon previous methods, including data augmentation to recover missing reflexes, adding a VAE structure to the Transformer model for proto-to-language prediction, and using a neural machine translation model for the reconstruction task. We find that with the additional VAE structure, the Transformer model has a better performance on the WikiHan dataset, and the data augmentation step stabilizes the training. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15677 [pdf, other]

CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models

Authors: Qinghe Wang, Baolu Li, Xiaomin Li, Bing Cao, Liqian Ma, Huchuan Lu, Xu Jia

Abstract: Recent advances in text-to-image models have opened new frontiers in human-centric generation. However, these models cannot be directly employed to generate images with consistent newly coined identities. In this work, we propose CharacterFactory, a framework that allows sampling new characters with consistent identities in the latent space of GANs for diffusion models. More specifically, we consi… ▽ More Recent advances in text-to-image models have opened new frontiers in human-centric generation. However, these models cannot be directly employed to generate images with consistent newly coined identities. In this work, we propose CharacterFactory, a framework that allows sampling new characters with consistent identities in the latent space of GANs for diffusion models. More specifically, we consider the word embeddings of celeb names as ground truths for the identity-consistent generation task and train a GAN model to learn the mapping from a latent space to the celeb embedding space. In addition, we design a context-consistent loss to ensure that the generated identity embeddings can produce identity-consistent images in various contexts. Remarkably, the whole model only takes 10 minutes for training, and can sample infinite characters end-to-end during inference. Extensive experiments demonstrate excellent performance of the proposed CharacterFactory on character creation in terms of identity consistency and editability. Furthermore, the generated characters can be seamlessly combined with the off-the-shelf image/video/3D diffusion models. We believe that the proposed CharacterFactory is an important step for identity-consistent character generation. Project page is available at: https://qinghew.github.io/CharacterFactory/. △ Less

Submitted 27 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: Code will be released very soon: https://github.com/qinghew/CharacterFactory

arXiv:2404.15595 [pdf, other]

Variational Deep Survival Machines: Survival Regression with Censored Outcomes

Authors: Qinxin Wang, Jiayuan Huang, Junhui Li, Jiaming Liu

Abstract: Survival regression aims to predict the time when an event of interest will take place, typically a death or a failure. A fully parametric method [18] is proposed to estimate the survival function as a mixture of individual parametric distributions in the presence of censoring. In this paper, We present a novel method to predict the survival time by better clustering the survival data and combine… ▽ More Survival regression aims to predict the time when an event of interest will take place, typically a death or a failure. A fully parametric method [18] is proposed to estimate the survival function as a mixture of individual parametric distributions in the presence of censoring. In this paper, We present a novel method to predict the survival time by better clustering the survival data and combine primitive distributions. We propose two variants of variational auto-encoder (VAE), discrete and continuous, to generate the latent variables for clustering input covariates. The model is trained end to end by jointly optimizing the VAE loss and regression loss. Thorough experiments on dataset SUPPORT and FLCHAIN show that our method can effectively improve the clustering result and reach competitive scores with previous methods. We demonstrate the superior result of our model prediction in the long-term. Our code is available at https://github.com/qinzzz/auton-survival-785. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.15580 [pdf, other]

MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis

Authors: Jiaxin Zhuang, Linshan Wu, Qiong Wang, Varut Vardhanabhuti, Lin Luo, Hao Chen

Abstract: The Vision Transformer (ViT) has demonstrated remarkable performance in Self-Supervised Learning (SSL) for 3D medical image analysis. Mask AutoEncoder (MAE) for feature pre-training can further unleash the potential of ViT on various medical vision tasks. However, due to large spatial sizes with much higher dimensions of 3D medical images, the lack of hierarchical design for MAE may hinder the per… ▽ More The Vision Transformer (ViT) has demonstrated remarkable performance in Self-Supervised Learning (SSL) for 3D medical image analysis. Mask AutoEncoder (MAE) for feature pre-training can further unleash the potential of ViT on various medical vision tasks. However, due to large spatial sizes with much higher dimensions of 3D medical images, the lack of hierarchical design for MAE may hinder the performance of downstream tasks. In this paper, we propose a novel \textit{Mask in Mask (MiM)} pre-training framework for 3D medical images, which aims to advance MAE by learning discriminative representation from hierarchical visual tokens across varying scales. We introduce multiple levels of granularity for masked inputs from the volume, which are then reconstructed simultaneously ranging at both fine and coarse levels. Additionally, a cross-level alignment mechanism is applied to adjacent level volumes to enforce anatomical similarity hierarchically. Furthermore, we adopt a hybrid backbone to enhance the hierarchical representation learning efficiently during the pre-training. MiM was pre-trained on a large scale of available 3D volumetric images, \textit{i.e.,} Computed Tomography (CT) images containing various body parts. Extensive experiments on thirteen public datasets demonstrate the superiority of MiM over other SSL methods in organ/lesion/tumor segmentation and disease classification. We further scale up the MiM to large pre-training datasets with more than 10k volumes, showing that large-scale pre-training can further enhance the performance of downstream tasks. The improvement also concluded that the research community should pay more attention to the scale of the pre-training dataset towards the healthcare foundation model for 3D medical images. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: submitted to journal

arXiv:2404.14512 [pdf, other]

Distinguishing homolytic versus heterolytic bond dissociation of phenyl sulfonium cations with localized active space methods

Authors: Qiaohong Wang, Valay Agarawal, Matthew R. Hermes, Mario Motta, Julia E. Rice, Gavin O. Jones, Laura Gagliardi

Abstract: Modeling chemical reactions with quantum chemical methods is challenging when the electronic structure varies significantly throughout the reaction, as well as when electronic excited states are involved. Multireference methods such as complete active space self-consistent field (CASSCF) can handle these multiconfigurational situations. However, even if the size of needed active space is affordabl… ▽ More Modeling chemical reactions with quantum chemical methods is challenging when the electronic structure varies significantly throughout the reaction, as well as when electronic excited states are involved. Multireference methods such as complete active space self-consistent field (CASSCF) can handle these multiconfigurational situations. However, even if the size of needed active space is affordable, in many cases the active space does not change consistently from reactant to product, causing discontinuities in the potential energy surface. The localized active space SCF (LASSCF) is a cheaper alternative to CASSCF for strongly correlated systems with weakly correlated fragments. The method is used for the first time to study a chemical reaction, namely the bond dissociation of a mono-, di-, and triphenylsulfonium cation. LASSCF calculations generate smooth potential energy scans more easily than the corresponding, more computationally expensive, CASSCF calculations, while predicting similar bond dissociation energies. Our calculations suggest a homolytic bond cleavage for di- and triphenylsulfonium, and a heterolytic pathway for monophenylsulfonium. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.14274 [pdf, ps, other]

A Locally Divergence-Free Oscillation-Eliminating Discontinuous Galerkin Method for Ideal Magnetohydrodynamic Equations

Authors: Wei Zeng, Qian Wang

Abstract: Numerical simulations of ideal compressible magnetohydrodynamic (MHD) equations are challenging, as the solutions are required to be magnetic divergence-free for general cases as well as oscillation-free for cases involving discontinuities. To overcome these difficulties, we develop a locally divergence-free oscillation-eliminating discontinuous Galerkin (LDF-OEDG) method for ideal compressible MH… ▽ More Numerical simulations of ideal compressible magnetohydrodynamic (MHD) equations are challenging, as the solutions are required to be magnetic divergence-free for general cases as well as oscillation-free for cases involving discontinuities. To overcome these difficulties, we develop a locally divergence-free oscillation-eliminating discontinuous Galerkin (LDF-OEDG) method for ideal compressible MHD equations. In the LDF-OEDG method, the numerical solution is advanced in time by using a strong stability preserving Runge-Kutta scheme. Following the solution update in each Runge-Kutta stage, an oscillation-eliminating (OE) procedure is performed to suppress spurious oscillations near discontinuities by damping the modal coefficients of the numerical solution. Subsequently, on each element, the magnetic filed of the oscillation-free DG solution is projected onto a local divergence-free space, to satisfy the divergence-free condition. The OE procedure and the LDF projection are fully decoupled from the Runge-Kutta stage update, and can be non-intrusively integrated into existing DG codes as independent modules. The damping equation of the OE procedure can be solved exactly, making the LDF-OEDG method remain stable under normal CFL conditions. These features enable a straightforward implementation of a high-order LDF-OEDG solver, which can be used to efficiently simulate the ideal compressible MHD equations. Numerical results for benchmark cases demonstrate the high-order accuracy, strong shock capturing capability and robustness of the LDF-OEDG method. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.14162 [pdf, other]

FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on

Authors: Chenhui Wang, Tao Chen, Zhihao Chen, Zhizhong Huang, Taoran Jiang, Qi Wang, Hongming Shan

Abstract: Despite their impressive generative performance, latent diffusion model-based virtual try-on (VTON) methods lack faithfulness to crucial details of the clothes, such as style, pattern, and text. To alleviate these issues caused by the diffusion stochastic nature and latent supervision, we propose a novel Faithful Latent Diffusion Model for VTON, termed FLDM-VTON. FLDM-VTON improves the conventiona… ▽ More Despite their impressive generative performance, latent diffusion model-based virtual try-on (VTON) methods lack faithfulness to crucial details of the clothes, such as style, pattern, and text. To alleviate these issues caused by the diffusion stochastic nature and latent supervision, we propose a novel Faithful Latent Diffusion Model for VTON, termed FLDM-VTON. FLDM-VTON improves the conventional latent diffusion process in three major aspects. First, we propose incorporating warped clothes as both the starting point and local condition, supplying the model with faithful clothes priors. Second, we introduce a novel clothes flattening network to constrain generated try-on images, providing clothes-consistent faithful supervision. Third, we devise a clothes-posterior sampling for faithful inference, further enhancing the model performance over conventional clothes-agnostic Gaussian sampling. Extensive experimental results on the benchmark VITON-HD and Dress Code datasets demonstrate that our FLDM-VTON outperforms state-of-the-art baselines and is able to generate photo-realistic try-on images with faithful clothing details. △ Less

Submitted 19 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI 2024

arXiv:2404.13947 [pdf, other]

Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering

Authors: Dongze Hao, Qunbo Wang, Longteng Guo, Jie Jiang, Jing Liu

Abstract: While large pre-trained visual-language models have shown promising results on traditional visual question answering benchmarks, it is still challenging for them to answer complex VQA problems which requires diverse world knowledge. Motivated by the research of retrieval-augmented generation in the field of natural language processing, we use Dense Passage Retrieval (DPR) to retrieve related knowl… ▽ More While large pre-trained visual-language models have shown promising results on traditional visual question answering benchmarks, it is still challenging for them to answer complex VQA problems which requires diverse world knowledge. Motivated by the research of retrieval-augmented generation in the field of natural language processing, we use Dense Passage Retrieval (DPR) to retrieve related knowledge to help the model answer questions. However, DPR conduct retrieving in natural language space, which may not ensure comprehensive acquisition of image information. Thus, the retrieved knowledge is not truly conducive to helping answer the question, affecting the performance of the overall system. To address this issue, we propose a novel framework that leverages the visual-language model to select the key knowledge retrieved by DPR and answer questions. The framework consists of two modules: Selector and Answerer, where both are initialized by the MLLM and parameter-efficiently finetuned by self-bootstrapping: find key knowledge in the retrieved knowledge documents using the Selector, and then use them to finetune the Answerer to predict answers; obtain the pseudo-labels of key knowledge documents based on the predictions of the Answerer and weak supervision labels, and then finetune the Selector to select key knowledge; repeat. Our framework significantly enhances the performance of the baseline on the challenging open-domain Knowledge-based VQA benchmark, OK-VQA, achieving a state-of-the-art accuracy of 62.83\%. △ Less

Submitted 16 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13840 [pdf, other]

Study of $e^+e^-\toωX(3872)$ and $γX(3872)$ from 4.66 to 4.95 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be… ▽ More Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be $0.38\pm0.20_\text{stat.}\pm0.01_\text{syst.}$ ($R< 0.83$ at 90\% confidence level). In addition, we measure the ratio of the average cross section of $e^+e^-\toωX(3872)$ to $e^+e^-\toωχ_{c1}(ωχ_{c2})$ to be $σ_{ωX(3872)}/σ_{ωχ_{c1}}~(σ_{ωX(3872)}/σ_{ωχ_{c2}})=5.2\pm1.0_\text{stat.}\pm1.9_\text{syst.}~ (5.5\pm1.1_\text{stat.}\pm2.4_\text{syst.})$. Finally, we search for the process of $e^+e^-\toγX(3872)$, and no obvious signal is observed. The upper limit on the ratio of the average cross section of $e^+e^-\toγX(3872)$ to $e^+e^-\toωX(3872)$ is set as $σ_{γX(3872)}/σ_{ωX(3872)}<0.23$ at 90\% confidence level. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 19 pages, 10 figures

arXiv:2404.13396 [pdf]

Angle-Resolved Magneto-Chiral Anisotropy in a Non-Centrosymmetric Atomic Layer Superlattice

Authors: Long Cheng, Mingrui Bao, Jingxian Zhang, Xue Zhang, Qun Yang, Qiang Li, Hui Cao, Dawei Qiu, Jia Liu, Fei Ye, Qing Wang, Genhao Liang, Hui Li, Guanglei Cheng, Hua Zhou, Jian-Min Zuo, Xiaodong Zhou, Jian Shen, Zhifeng Zhu, Sai Mu, Wenbo Wang, Xiaofang Zhai

Abstract: Chirality in solid-state materials has sparked significant interest due to potential applications of topologically-protected chiral states in next-generation information technology. The electrical magneto-chiral effect (eMChE), arising from relativistic spin-orbit interactions, shows great promise for developing chiral materials and devices for electronic integration. Here we demonstrate an angle-… ▽ More Chirality in solid-state materials has sparked significant interest due to potential applications of topologically-protected chiral states in next-generation information technology. The electrical magneto-chiral effect (eMChE), arising from relativistic spin-orbit interactions, shows great promise for developing chiral materials and devices for electronic integration. Here we demonstrate an angle-resolved eMChE in an A-B-C-C type atomic-layer superlattice lacking time and space inversion symmetry. We observe non-superimposable enantiomers of left-handed and right-handed tilted uniaxial magnetic anisotropy as the sample rotates under static fields, with the tilting angle reaching a striking 45 degree. Magnetic force microscopy and atomistic simulations correlate the tilt to the emergence and evolution of chiral spin textures. The Dzyaloshinskii-Moriya interaction lock effect in competition with Zeeman effect is demonstrated to be responsible for the angle-resolved eMChE. Our findings open up a new horizon for engineering angle-resolved magneto-chiral anisotropy, shedding light on the development of novel angle-resolved sensing or writing techniques in chiral spintronics. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.12927 [pdf, other]

The Localized Active Space Method with Unitary Selective Coupled Cluster

Authors: Abhishek Mitra, Ruhee D'Cunha, Qiaohong Wang, Matthew R. Hermes, Yuri Alexeev, Stephen K. Gray, Matthew Otten, Laura Gagliardi

Abstract: We introduce a hybrid quantum-classical algorithm, the localized active space unitary selective coupled cluster singles and doubles (LAS-USCCSD) method. Derived from the localized active space unitary coupled cluster (LAS-UCCSD) method, LAS-USCCSD first performs a classical LASSCF calculation, then selectively identifies the most important parameters (cluster amplitudes used to build the multirefe… ▽ More We introduce a hybrid quantum-classical algorithm, the localized active space unitary selective coupled cluster singles and doubles (LAS-USCCSD) method. Derived from the localized active space unitary coupled cluster (LAS-UCCSD) method, LAS-USCCSD first performs a classical LASSCF calculation, then selectively identifies the most important parameters (cluster amplitudes used to build the multireference UCC ansatz) for restoring inter-fragment interaction energy using this reduced set of parameters with the variational quantum eigensolver method. We benchmark LAS-USCCSD against LAS-UCCSD by calculating the total energies of $(\mathrm{H}_2)_2$, $(\mathrm{H}_2)_4$ and \textit{trans}-butadiene, and the magnetic coupling constant for a bimetallic compound [Cr$_2$(OH)$_3$(NH$_3$)$_6$]$^{3+}$. For these systems, we find that LAS-USCCSD reduces the number of required parameters and thus the circuit depth by at least one order of magnitude, an aspect which is important for the practical implementation of multireference hybrid quantum-classical algorithms like LAS-UCCSD on near-term quantum computers. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12587 [pdf, other]

Reinforcement Learning Approach for Integrating Compressed Contexts into Knowledge Graphs

Authors: Ngoc Quach, Qi Wang, Zijun Gao, Qifeng Sun, Bo Guan, Lillian Floyd

Abstract: The widespread use of knowledge graphs in various fields has brought about a challenge in effectively integrating and updating information within them. When it comes to incorporating contexts, conventional methods often rely on rules or basic machine learning models, which may not fully grasp the complexity and fluidity of context information. This research suggests an approach based on reinforcem… ▽ More The widespread use of knowledge graphs in various fields has brought about a challenge in effectively integrating and updating information within them. When it comes to incorporating contexts, conventional methods often rely on rules or basic machine learning models, which may not fully grasp the complexity and fluidity of context information. This research suggests an approach based on reinforcement learning (RL), specifically utilizing Deep Q Networks (DQN) to enhance the process of integrating contexts into knowledge graphs. By considering the state of the knowledge graph as environment states defining actions as operations for integrating contexts and using a reward function to gauge the improvement in knowledge graph quality post-integration, this method aims to automatically develop strategies for optimal context integration. Our DQN model utilizes networks as function approximators, continually updating Q values to estimate the action value function, thus enabling effective integration of intricate and dynamic context information. Initial experimental findings show that our RL method outperforms techniques in achieving precise context integration across various standard knowledge graph datasets, highlighting the potential and effectiveness of reinforcement learning in enhancing and managing knowledge graphs. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: This paper has been accepted by the 2024 International Conference on Machine Learning and Neural Networks (MLNN 2024)

arXiv:2404.12180 [pdf, other]

Microwave seeding time crystal in Floquet driven Rydberg atoms

Authors: Bang Liu, Li-Hua Zhang, Yu Ma, Tian-Yu Han, Qi-Feng Wang, Jun Zhang, Zheng-Yuan Zhang, Shi-Yao Shao, Qing Li, Han-Chao Chen, Ya-Jun Wang, Jia-Dou Nan, Yi-Ming Yin, Dong-Sheng Ding, Bao-Sen Shi

Abstract: Crystal seeding enables a deeper understanding of phase behavior, leading to the development of methods for controlling and manipulating phase transitions in various applications such as materials synthesis, crystallization processes, and phase transformation engineering. How to seed a crystalline in time domain is an open question, which is of great significant and may provide an avenue to unders… ▽ More Crystal seeding enables a deeper understanding of phase behavior, leading to the development of methods for controlling and manipulating phase transitions in various applications such as materials synthesis, crystallization processes, and phase transformation engineering. How to seed a crystalline in time domain is an open question, which is of great significant and may provide an avenue to understand and control time-dependent quantum many-body physics. Here, we utilize a microwave pulse as a seed to induce the formation of a discrete time crystal in Floquet driven Rydberg atoms. In the experiment, the periodic driving on Rydberg states acts as a seeded crystalline order in subspace, which triggers the time-translation symmetry breaking across the entire ensemble. The behavior of the emergent time crystal is elaborately linked to alterations in the seed, such as the relative phase shift and the frequency difference, which result in phase dependent seeding and corresponding shift in periodicity of the time crystal, leading to embryonic synchronization. This result opens up new possibilities for studying and harnessing time-dependent quantum many-body phenomena, offering insights into the behavior of complex many-body systems under seeding. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.12022 [pdf, other]

Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration

Authors: Pengfei Wu, Jiahao Liu, Zhuocheng Gong, Qifan Wang, Jinpeng Li, Jingang Wang, Xunliang Cai, Dongyan Zhao

Abstract: Large language models (LLMs) have recently shown remarkable performance across a wide range of tasks. However, the substantial number of parameters in LLMs contributes to significant latency during model inference. This is particularly evident when utilizing autoregressive decoding methods, which generate one token in a single forward process, thereby not fully capitalizing on the parallel computi… ▽ More Large language models (LLMs) have recently shown remarkable performance across a wide range of tasks. However, the substantial number of parameters in LLMs contributes to significant latency during model inference. This is particularly evident when utilizing autoregressive decoding methods, which generate one token in a single forward process, thereby not fully capitalizing on the parallel computing capabilities of GPUs. In this paper, we propose a novel parallel decoding approach, namely \textit{hidden transfer}, which decodes multiple successive tokens simultaneously in a single forward pass. The idea is to transfer the intermediate hidden states of the previous context to the \textit{pseudo} hidden states of the future tokens to be generated, and then the pseudo hidden states will pass the following transformer layers thereby assimilating more semantic information and achieving superior predictive accuracy of the future tokens. Besides, we use the novel tree attention mechanism to simultaneously generate and verify multiple candidates of output sequences, which ensure the lossless generation and further improves the generation efficiency of our method. Experiments demonstrate the effectiveness of our method. We conduct a lot of analytic experiments to prove our motivation. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11613 [pdf, other]

InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior

Authors: Zhiheng Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jie Xiao, Kai Zhu, Nan Xue, Yu Liu, Yujun Shen, Yang Cao

Abstract: 3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant proper… ▽ More 3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant properties of the introduced points, whose optimization largely benefits from their initial 3D positions. To this end, we propose to guide the point initialization with an image-conditioned depth completion model, which learns to directly restore the depth map based on the observed image. Such a design allows our model to fill in depth values at an aligned scale with the original depth, and also to harness strong generalizability from largescale diffusion prior. Thanks to the more accurate depth completion, our approach, dubbed InFusion, surpasses existing alternatives with sufficiently better fidelity and efficiency under various complex scenarios. We further demonstrate the effectiveness of InFusion with several practical applications, such as inpainting with user-specific texture or with novel object insertion. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Project page: https://johanan528.github.io/Infusion

arXiv:2404.11415 [pdf, other]

Achromatic Full Stokes Polarimetry Metasurface for Full-color Polarization Imaging in the Visible

Authors: Yueqiang Hu, Yi Zhang, Yuting Jiang, Quan Wang, Meiyan Pan, Huigao Duan

Abstract: Metasurfaces composed of anisotropic subwavelength structures provide an ultrathin platform for a compact, real-time polarimeter. However, applications in polychromatic scenes are restricted by the limited operating bandwidths and degraded imaging quality due to the loss of spectral information. Here, we demonstrated full-color polarization imaging based on an achromatic polarimeter consisting of… ▽ More Metasurfaces composed of anisotropic subwavelength structures provide an ultrathin platform for a compact, real-time polarimeter. However, applications in polychromatic scenes are restricted by the limited operating bandwidths and degraded imaging quality due to the loss of spectral information. Here, we demonstrated full-color polarization imaging based on an achromatic polarimeter consisting of four polarization-dependent metalenses. Boosted by an intelligent design scheme, arbitrary phase compensation and multi-objective matching are effectively compatible with a limited database. Broadband achromaticity for wavelengths ranging from 450 nm to 650 nm, with a relative bandwidth of nearly 0.435, is achieved for the full Stokes imaging. The experimental polarization reconstructed errors for operating wavelengths of 450 nm, 550 nm, and 650 nm are 7.5%, 5.9%, and 3.8%, respectively. The full-color and full-polarization imaging capability of the device is also verified with a customized object. The proposed scheme paves the way for further developing polarization imaging toward practical applications. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11112 [pdf, other]

An Adaptive Regularized Proximal Newton-Type Methods for Composite Optimization over the Stiefel Manifold

Authors: Qinsi Wang, Wei Hong Yang

Abstract: Recently, the proximal Newton-type method and its variants have been generalized to solve composite optimization problems over the Stiefel manifold whose objective function is the summation of a smooth function and a nonsmooth function. In this paper, we propose an adaptive quadratically regularized proximal quasi-Newton method, named ARPQN, to solve this class of problems. Under some mild assumpt… ▽ More Recently, the proximal Newton-type method and its variants have been generalized to solve composite optimization problems over the Stiefel manifold whose objective function is the summation of a smooth function and a nonsmooth function. In this paper, we propose an adaptive quadratically regularized proximal quasi-Newton method, named ARPQN, to solve this class of problems. Under some mild assumptions, the global convergence, the local linear convergence rate and the iteration complexity of ARPQN are established. Numerical experiments and comparisons with other state-of-the-art methods indicate that ARPQN is very promising. We also propose an adaptive quadratically regularized proximal Newton method, named ARPN. It is shown the ARPN method has a local superlinear convergence rate under certain reasonable assumptions, which demonstrates attractive convergence properties of regularized proximal Newton methods. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 38 pages, 6 figures

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2404.09793 [pdf, other]

First Search for Light Fermionic Dark Matter Absorption on Electrons Using Germanium Detector in CDEX-10 Experiment

Authors: J. X. Liu, L. T. Yang, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, J. R. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (61 additional authors not shown)

Abstract: We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present ne… ▽ More We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present new constraints of cross section in the DM range of 0.1--10 keV/$c^2$ for vector and axial-vector interaction. The upper limit on the cross section is set to be $\rm 5.5\times10^{-46}~cm^2$ for vector interaction, and $\rm 1.8\times10^{-46}~cm^2$ for axial-vector interaction at DM mass of 5 keV/$c^2$. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 6 pages, 4 figures

arXiv:2404.09757 [pdf, other]

Ultra-Wide Dual-band Rydberg Atomic Receiver Based on Space Division Multiplexing RF-Chip Modules

Authors: Li-Hua Zhang, Bang Liu, Zong-Kai Liu, Zheng-Yuan Zhang, Shi-Yao Shao, Qi-Feng Wang, Ma YuTian-Yu Han, Guang-Can Guo, Dong-Sheng Ding, Bao-Sen Shi

Abstract: Detecting microwave signals over a wide frequency range has numerous advantages as it enables simultaneous transmission of a large amount of information and access to more spectrum resources. This capability is crucial for applications such as microwave communication, remote sensing, and radar. However, conventional microwave receiving systems are limited by amplifiers and band-pass filters that c… ▽ More Detecting microwave signals over a wide frequency range has numerous advantages as it enables simultaneous transmission of a large amount of information and access to more spectrum resources. This capability is crucial for applications such as microwave communication, remote sensing, and radar. However, conventional microwave receiving systems are limited by amplifiers and band-pass filters that can only operate efficiently in a specific frequency range. Typically, these systems can only process signals within a three-fold frequency range, which limits the data transfer bandwidth of the microwave communication systems. Developing novel atom-integrated microwave sensors, for example, radio frequency (RF)-chip coupled Rydberg atomic receiver, provides opportunities for a large working bandwidth of microwave sensing at the atomic level. Here, an ultra-wide dual-band RF sensing scheme is demonstrated by space-division multiplexing two RF-chip-integrated atomic receiver modules. The system can simultaneously receive dual-band microwave signals that span a frequency range exceeding 6 octaves (300 MHz and 24 GHz). This work paves the way for multi-band microwave reception applications within an ultra-wide range by RF-chip-integrated Rydberg atomic sensor. △ Less

Submitted 16 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: 11 pages, 5 figures

arXiv:2404.09619 [pdf, other]

UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark

Authors: Zhaokun Zhou, Qiulin Wang, Bin Lin, Yiwei Su, Rui Chen, Xin Tao, Amin Zheng, Li Yuan, Pengfei Wan, Di Zhang

Abstract: As an alternative to expensive expert evaluation, Image Aesthetic Assessment (IAA) stands out as a crucial task in computer vision. However, traditional IAA methods are typically constrained to a single data source or task, restricting the universality and broader application. In this work, to better align with human aesthetics, we propose a Unified Multi-modal Image Aesthetic Assessment (UNIAA) f… ▽ More As an alternative to expensive expert evaluation, Image Aesthetic Assessment (IAA) stands out as a crucial task in computer vision. However, traditional IAA methods are typically constrained to a single data source or task, restricting the universality and broader application. In this work, to better align with human aesthetics, we propose a Unified Multi-modal Image Aesthetic Assessment (UNIAA) framework, including a Multi-modal Large Language Model (MLLM) named UNIAA-LLaVA and a comprehensive benchmark named UNIAA-Bench. We choose MLLMs with both visual perception and language ability for IAA and establish a low-cost paradigm for transforming the existing datasets into unified and high-quality visual instruction tuning data, from which the UNIAA-LLaVA is trained. To further evaluate the IAA capability of MLLMs, we construct the UNIAA-Bench, which consists of three aesthetic levels: Perception, Description, and Assessment. Extensive experiments validate the effectiveness and rationality of UNIAA. UNIAA-LLaVA achieves competitive performance on all levels of UNIAA-Bench, compared with existing MLLMs. Specifically, our model performs better than GPT-4V in aesthetic perception and even approaches the junior-level human. We find MLLMs have great potential in IAA, yet there remains plenty of room for further improvement. The UNIAA-LLaVA and UNIAA-Bench will be released. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09540 [pdf, other]

Text-Driven Diverse Facial Texture Generation via Progressive Latent-Space Refinement

Authors: Chi Wang, Junming Huang, Rong Zhang, Qi Wang, Haotian Yang, Haibin Huang, Chongyang Ma, Weiwei Xu

Abstract: Automatic 3D facial texture generation has gained significant interest recently. Existing approaches may not support the traditional physically based rendering pipeline or rely on 3D data captured by Light Stage. Our key contribution is a progressive latent space refinement approach that can bootstrap from 3D Morphable Models (3DMMs)-based texture maps generated from facial images to generate high… ▽ More Automatic 3D facial texture generation has gained significant interest recently. Existing approaches may not support the traditional physically based rendering pipeline or rely on 3D data captured by Light Stage. Our key contribution is a progressive latent space refinement approach that can bootstrap from 3D Morphable Models (3DMMs)-based texture maps generated from facial images to generate high-quality and diverse PBR textures, including albedo, normal, and roughness. It starts with enhancing Generative Adversarial Networks (GANs) for text-guided and diverse texture generation. To this end, we design a self-supervised paradigm to overcome the reliance on ground truth 3D textures and train the generative model with only entangled texture maps. Besides, we foster mutual enhancement between GANs and Score Distillation Sampling (SDS). SDS boosts GANs with more generative modes, while GANs promote more efficient optimization of SDS. Furthermore, we introduce an edge-aware SDS for multi-view consistent facial structure. Experiments demonstrate that our method outperforms existing 3D texture generation methods regarding photo-realistic quality, diversity, and efficiency. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09422 [pdf, other]

FEASTS Combined with Interferometry (I): Overall Properties of Diffuse HI and Implications for Gas Accretion in Nearby Galaxies

Authors: Jing Wang, Xuchen Lin, Dong Yang, Lister Staveley-Smith, Fabian Walter, Q. Daniel Wang, Ran Wang, A. J. Battisti, Barbara Catinella, Hsiao-Wen Chen, Luca Cortese, D. B. Fisher, Luis C. Ho, Suoqing Ji, Peng Jiang, Guinevere Kauffmann, Xu Kong, Ziming Liu, Li Shao, Jie Wang, Lile Wang, Shun Wang

Abstract: We present a statistical study of the properties of diffuse HI in ten nearby galaxies, comparing the HI detected by the single-dish telescope FAST (FEASTS program) and the interferometer VLA (THINGS program), respectively. The THINGS' observation missed HI with a median of 23% due to the short-spacing problem of interferometry and limited sensitivity. We extract the diffuse HI by subtracting the d… ▽ More We present a statistical study of the properties of diffuse HI in ten nearby galaxies, comparing the HI detected by the single-dish telescope FAST (FEASTS program) and the interferometer VLA (THINGS program), respectively. The THINGS' observation missed HI with a median of 23% due to the short-spacing problem of interferometry and limited sensitivity. We extract the diffuse HI by subtracting the dense HI, which is obtained from the THINGS data with a uniform flux-density threshold, from the total HI detected by FAST. Among the sample, the median diffuse-HI fraction is 34%, and more diffuse HI is found in galaxies exhibiting more prominent tidal-interaction signatures. The diffuse HI we detected seems to be distributed in disk-like layers within a typical thickness of $1\,\text{kpc}$, different from the more halo-like diffuse HI detected around NGC 4631 in a previous study. Most of the diffuse HI is cospatial with the dense HI and has a typical column density of $10^{17.7}$-$10^{20.1}\,\text{cm}^{-2}$. The diffuse and dense HI exhibits a similar rotational motion, but the former lags by a median of 25% in at least the inner disks, and its velocity dispersions are typically twice as high. Based on a simplified estimation of circum-galactic medium properties and assuming pressure equilibrium, the volume density of diffuse HI appears to be constant within each individual galaxy, implying its role as a cooling interface. Comparing with existing models, these results are consistent with a possible link between tidal interactions, the formation of diffuse HI, and gas accretion. △ Less