-
Magnon squeezing via reservoir-engineered optomagnomechanics
Authors:
Zhi-Yuan Fan,
Huai-Bing Zhu,
Hao-Tian Li,
Jie Li
Abstract:
We show how to prepare magnonic squeezed states in an optomagnomechanical system, in which magnetostriction induced mechanical displacement couples to an optical cavity via radiation pressure. We discuss two scenarios depending on whether the magnomechanical coupling is linear or dispersive. We show that in both cases the strong mechanical squeezing obtained via two-tone driving of the optical cav…
▽ More
We show how to prepare magnonic squeezed states in an optomagnomechanical system, in which magnetostriction induced mechanical displacement couples to an optical cavity via radiation pressure. We discuss two scenarios depending on whether the magnomechanical coupling is linear or dispersive. We show that in both cases the strong mechanical squeezing obtained via two-tone driving of the optical cavity can be efficiently transferred to the magnon mode. In the linear coupling case, stationary magnon squeezing is achieved; while in the dispersive coupling case, a transient magnonic squeezed state is prepared in a two-step protocol. The proposed magnonic squeezed states find promising applications in quantum information processing and quantum sensing using magnons.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Expressive Gaussian Human Avatars from Monocular RGB Video
Authors:
Hezhen Hu,
Zhiwen Fan,
Tianhao Wu,
Yihan Xi,
Seoyoung Lee,
Georgios Pavlakos,
Zhangyang Wang
Abstract:
Nuanced expressiveness, particularly through fine-grained hand and facial expressions, is pivotal for enhancing the realism and vitality of digital human representations. In this work, we focus on investigating the expressiveness of human avatars when learned from monocular RGB video; a setting that introduces new challenges in capturing and animating fine-grained details. To this end, we introduc…
▽ More
Nuanced expressiveness, particularly through fine-grained hand and facial expressions, is pivotal for enhancing the realism and vitality of digital human representations. In this work, we focus on investigating the expressiveness of human avatars when learned from monocular RGB video; a setting that introduces new challenges in capturing and animating fine-grained details. To this end, we introduce EVA, a drivable human model that meticulously sculpts fine details based on 3D Gaussians and SMPL-X, an expressive parametric human model. Focused on enhancing expressiveness, our work makes three key contributions. First, we highlight the critical importance of aligning the SMPL-X model with RGB frames for effective avatar learning. Recognizing the limitations of current SMPL-X prediction methods for in-the-wild videos, we introduce a plug-and-play module that significantly ameliorates misalignment issues. Second, we propose a context-aware adaptive density control strategy, which is adaptively adjusting the gradient thresholds to accommodate the varied granularity across body parts. Last but not least, we develop a feedback mechanism that predicts per-pixel confidence to better guide the learning of 3D Gaussians. Extensive experiments on two benchmarks demonstrate the superiority of our framework both quantitatively and qualitatively, especially on the fine-grained hand and facial details. See the project website at \url{https://evahuman.github.io}
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Multi-Epoch learning with Data Augmentation for Deep Click-Through Rate Prediction
Authors:
Zhongxiang Fan,
Zhaocheng Liu,
Jian Liang,
Dongying Kong,
Han Li,
Peng Jiang,
Shuang Li,
Kun Gai
Abstract:
This paper investigates the one-epoch overfitting phenomenon in Click-Through Rate (CTR) models, where performance notably declines at the start of the second epoch. Despite extensive research, the efficacy of multi-epoch training over the conventional one-epoch approach remains unclear. We identify the overfitting of the embedding layer, caused by high-dimensional data sparsity, as the primary is…
▽ More
This paper investigates the one-epoch overfitting phenomenon in Click-Through Rate (CTR) models, where performance notably declines at the start of the second epoch. Despite extensive research, the efficacy of multi-epoch training over the conventional one-epoch approach remains unclear. We identify the overfitting of the embedding layer, caused by high-dimensional data sparsity, as the primary issue. To address this, we introduce a novel and simple Multi-Epoch learning with Data Augmentation (MEDA) framework, suitable for both non-continual and continual learning scenarios, which can be seamlessly integrated into existing deep CTR models and may have potential applications to handle the "forgetting or overfitting" dilemma in the retraining and the well-known catastrophic forgetting problems. MEDA minimizes overfitting by reducing the dependency of the embedding layer on subsequent training data or the Multi-Layer Perceptron (MLP) layers, and achieves data augmentation through training the MLP with varied embedding spaces. Our findings confirm that pre-trained MLP layers can adapt to new embedding spaces, enhancing performance without overfitting. This adaptability underscores the MLP layers' role in learning a matching function focused on the relative relationships among embeddings rather than their absolute positions. To our knowledge, MEDA represents the first multi-epoch training strategy tailored for deep CTR prediction models. We conduct extensive experiments on several public and business datasets, and the effectiveness of data augmentation and superiority over conventional single-epoch training are fully demonstrated. Besides, MEDA has exhibited significant benefits in a real-world online advertising system.
△ Less
Submitted 27 June, 2024;
originally announced July 2024.
-
GaussianStego: A Generalizable Stenography Pipeline for Generative 3D Gaussians Splatting
Authors:
Chenxin Li,
Hengyu Liu,
Zhiwen Fan,
Wuyang Li,
Yifan Liu,
Panwang Pan,
Yixuan Yuan
Abstract:
Recent advancements in large generative models and real-time neural rendering using point-based techniques pave the way for a future of widespread visual data distribution through sharing synthesized 3D assets. However, while standardized methods for embedding proprietary or copyright information, either overtly or subtly, exist for conventional visual content such as images and videos, this issue…
▽ More
Recent advancements in large generative models and real-time neural rendering using point-based techniques pave the way for a future of widespread visual data distribution through sharing synthesized 3D assets. However, while standardized methods for embedding proprietary or copyright information, either overtly or subtly, exist for conventional visual content such as images and videos, this issue remains unexplored for emerging generative 3D formats like Gaussian Splatting. We present GaussianStego, a method for embedding steganographic information in the rendering of generated 3D assets. Our approach employs an optimization framework that enables the accurate extraction of hidden information from images rendered using Gaussian assets derived from large models, while maintaining their original visual quality. We conduct preliminary evaluations of our method across several potential deployment scenarios and discuss issues identified through analysis. GaussianStego represents an initial exploration into the novel challenge of embedding customizable, imperceptible, and recoverable information within the renders produced by current 3D generative models, while ensuring minimal impact on the rendered content's quality.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al
Authors:
Ji Yan,
Jiwei Li,
X. T. He,
Lifeng Wang,
Yaohua Chen,
Feng Wang,
Xiaoying Han,
Kaiqiang Pan,
Juxi Liang,
Yulong Li,
Zanyang Guan,
Xiangming Liu,
Xingsen Che,
Zhongjing Chen,
Xing Zhang,
Yan Xu,
Bin Li,
Minging He,
Hongbo Cai,
Liang. Hao,
Zhanjun Liu,
Chunyang Zheng,
Zhensheng Dai,
Zhengfeng Fan,
Bin Qiao
, et al. (4 additional authors not shown)
Abstract:
A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al
A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
MLPHand: Real Time Multi-View 3D Hand Mesh Reconstruction via MLP Modeling
Authors:
Jian Yang,
Jiakun Li,
Guoming Li,
Zhen Shen,
Huai-Yu Wu,
Zhaoxin Fan,
Heng Huang
Abstract:
Multi-view hand mesh reconstruction is a critical task for applications in virtual reality and human-computer interaction, but it remains a formidable challenge. Although existing multi-view hand reconstruction methods achieve remarkable accuracy, they typically come with an intensive computational burden that hinders real-time inference. To this end, we propose MLPHand, a novel method designed fo…
▽ More
Multi-view hand mesh reconstruction is a critical task for applications in virtual reality and human-computer interaction, but it remains a formidable challenge. Although existing multi-view hand reconstruction methods achieve remarkable accuracy, they typically come with an intensive computational burden that hinders real-time inference. To this end, we propose MLPHand, a novel method designed for real-time multi-view single hand reconstruction. MLP Hand consists of two primary modules: (1) a lightweight MLP-based Skeleton2Mesh model that efficiently recovers hand meshes from hand skeletons, and (2) a multi-view geometry feature fusion prediction module that enhances the Skeleton2Mesh model with detailed geometric information from multiple views. Experiments on three widely used datasets demonstrate that MLPHand can reduce computational complexity by 90% while achieving comparable reconstruction accuracy to existing state-of-the-art baselines.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Trustworthy Enhanced Multi-view Multi-modal Alzheimer's Disease Prediction with Brain-wide Imaging Transcriptomics Data
Authors:
Shan Cong,
Zhoujie Fan,
Hongwei Liu,
Yinghan Zhang,
Xin Wang,
Haoran Luo,
Xiaohui Yao
Abstract:
Brain transcriptomics provides insights into the molecular mechanisms by which the brain coordinates its functions and processes. However, existing multimodal methods for predicting Alzheimer's disease (AD) primarily rely on imaging and sometimes genetic data, often neglecting the transcriptomic basis of brain. Furthermore, while striving to integrate complementary information between modalities,…
▽ More
Brain transcriptomics provides insights into the molecular mechanisms by which the brain coordinates its functions and processes. However, existing multimodal methods for predicting Alzheimer's disease (AD) primarily rely on imaging and sometimes genetic data, often neglecting the transcriptomic basis of brain. Furthermore, while striving to integrate complementary information between modalities, most studies overlook the informativeness disparities between modalities. Here, we propose TMM, a trusted multiview multimodal graph attention framework for AD diagnosis, using extensive brain-wide transcriptomics and imaging data. First, we construct view-specific brain regional co-function networks (RRIs) from transcriptomics and multimodal radiomics data to incorporate interaction information from both biomolecular and imaging perspectives. Next, we apply graph attention (GAT) processing to each RRI network to produce graph embeddings and employ cross-modal attention to fuse transcriptomics-derived embedding with each imagingderived embedding. Finally, a novel true-false-harmonized class probability (TFCP) strategy is designed to assess and adaptively adjust the prediction confidence of each modality for AD diagnosis. We evaluate TMM using the AHBA database with brain-wide transcriptomics data and the ADNI database with three imaging modalities (AV45-PET, FDG-PET, and VBM-MRI). The results demonstrate the superiority of our method in identifying AD, EMCI, and LMCI compared to state-of-the-arts. Code and data are available at https://github.com/Yaolab-fantastic/TMM.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking
Authors:
Siyuan Wang,
Zhuohan Long,
Zhihao Fan,
Zhongyu Wei
Abstract:
The rapid development of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has exposed vulnerabilities to various adversarial attacks. This paper provides a comprehensive overview of jailbreaking research targeting both LLMs and MLLMs, highlighting recent advancements in evaluation benchmarks, attack techniques and defense strategies. Compared to the more advanced state of…
▽ More
The rapid development of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has exposed vulnerabilities to various adversarial attacks. This paper provides a comprehensive overview of jailbreaking research targeting both LLMs and MLLMs, highlighting recent advancements in evaluation benchmarks, attack techniques and defense strategies. Compared to the more advanced state of unimodal jailbreaking, multimodal domain remains underexplored. We summarize the limitations and potential research directions of multimodal jailbreaking, aiming to inspire future research and further enhance the robustness and security of MLLMs.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Communication with Quantum Catalysts
Authors:
Yuqi Li,
Junjing Xing,
Dengke Qu,
Lei Xiao,
Zhaobing Fan,
Zhu-Jun Zheng,
Haitao Ma,
Peng Xue,
Kishor Bharti,
Dax Enshan Koh,
Yunlong Xiao
Abstract:
Communication is essential for advancing science and technology. Quantum communication, in particular, benefits from the use of catalysts. During the communication process, these catalysts enhance performance while remaining unchanged. Although chemical catalysts that undergo deactivation typically perform worse than those that remain unaffected, quantum catalysts, referred to as embezzling cataly…
▽ More
Communication is essential for advancing science and technology. Quantum communication, in particular, benefits from the use of catalysts. During the communication process, these catalysts enhance performance while remaining unchanged. Although chemical catalysts that undergo deactivation typically perform worse than those that remain unaffected, quantum catalysts, referred to as embezzling catalysts, can surprisingly outperform their non-deactivating counterparts despite experiencing slight alterations. In this work, we employ embezzling quantum catalysts to enhance the transmission of both quantum and classical information. Our results reveal that using embezzling catalysts augments the efficiency of information transmission across noisy quantum channels, ensuring a non-zero catalytic channel capacity. Furthermore, we introduce catalytic superdense coding, demonstrating how embezzling catalysts can enhance the transmission of classical information. Finally, we explore methods to reduce the dimensionality of catalysts, a step toward making quantum catalysis a practical reality.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Teleportation with Embezzling Catalysts
Authors:
Junjing Xing,
Yuqi Li,
Dengke Qu,
Lei Xiao,
Zhaobing Fan,
Haitao Ma,
Peng Xue,
Kishor Bharti,
Dax Enshan Koh,
Yunlong Xiao
Abstract:
Quantum teleportation is the process of transferring quantum information using classical communication and pre-shared entanglement. This process can benefit from the use of catalysts, which are ancillary entangled states that can enhance teleportation without being consumed. While chemical catalysts undergoing deactivation invariably exhibit inferior performance compared to those unaffected by dea…
▽ More
Quantum teleportation is the process of transferring quantum information using classical communication and pre-shared entanglement. This process can benefit from the use of catalysts, which are ancillary entangled states that can enhance teleportation without being consumed. While chemical catalysts undergoing deactivation invariably exhibit inferior performance compared to those unaffected by deactivation, quantum catalysts, termed embezzling catalysts, that are subject to deactivation, may surprisingly outperform their non-deactivating counterparts. In this work, we present teleportation protocols with embezzling catalyst that can achieve arbitrarily high fidelity, namely the teleported state can be made arbitrarily close to the original state, with finite-dimensional embezzling catalysts. We show that some embezzling catalysts are universal, meaning that they can improve the teleportation fidelity for any pre-shared entanglement. We also explore methods to reduce the dimension of catalysts without increasing catalyst consumption, an essential step towards realizing quantum catalysis in practice.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
4K4DGen: Panoramic 4D Generation at 4K Resolution
Authors:
Renjie Li,
Panwang Pan,
Bangbang Yang,
Dejia Xu,
Shijie Zhou,
Xuanyang Zhang,
Zeming Li,
Achuta Kadambi,
Zhangyang Wang,
Zhiwen Fan
Abstract:
The blooming of virtual reality and augmented reality (VR/AR) technologies has driven an increasing demand for the creation of high-quality, immersive, and dynamic environments. However, existing generative techniques either focus solely on dynamic objects or perform outpainting from a single perspective image, failing to meet the needs of VR/AR applications. In this work, we tackle the challengin…
▽ More
The blooming of virtual reality and augmented reality (VR/AR) technologies has driven an increasing demand for the creation of high-quality, immersive, and dynamic environments. However, existing generative techniques either focus solely on dynamic objects or perform outpainting from a single perspective image, failing to meet the needs of VR/AR applications. In this work, we tackle the challenging task of elevating a single panorama to an immersive 4D experience. For the first time, we demonstrate the capability to generate omnidirectional dynamic scenes with 360-degree views at 4K resolution, thereby providing an immersive user experience. Our method introduces a pipeline that facilitates natural scene animations and optimizes a set of 4D Gaussians using efficient splatting techniques for real-time exploration. To overcome the lack of scene-scale annotated 4D data and models, especially in panoramic formats, we propose a novel Panoramic Denoiser that adapts generic 2D diffusion priors to animate consistently in 360-degree images, transforming them into panoramic videos with dynamic scenes at targeted regions. Subsequently, we elevate the panoramic video into a 4D immersive environment while preserving spatial and temporal consistency. By transferring prior knowledge from 2D models in the perspective domain to the panoramic domain and the 4D lifting with spatial appearance and geometry regularization, we achieve high-quality Panorama-to-4D generation at a resolution of (4096 $\times$ 2048) for the first time. See the project website at https://4k4dgen.github.io.
△ Less
Submitted 4 July, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors
Authors:
Panwang Pan,
Zhuo Su,
Chenguo Lin,
Zhen Fan,
Yongjie Zhang,
Zeming Li,
Tingting Shen,
Yadong Mu,
Yebin Liu
Abstract:
Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat which predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In part…
▽ More
Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat which predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In particular, HumanSplat comprises a 2D multi-view diffusion model and a latent reconstruction transformer with human structure priors that adeptly integrate geometric priors and semantic features within a unified framework. A hierarchical loss that incorporates human semantic information is further designed to achieve high-fidelity texture modeling and better constrain the estimated multiple views. Comprehensive experiments on standard benchmarks and in-the-wild images demonstrate that HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
On the Metastability of Quantum Fields in Thermal Bath
Authors:
Zhiyi Fan,
Takeo Moroi
Abstract:
We investigate the metastability of scalar fields in quantum field theories at finite temperature, focusing on a detailed understanding of the bounce solution. At finite temperature, the bounce solution depends on two variables: the Euclidean time $τ$ and the spatial radial distance $r$, and it is periodic in the $τ$ direction. We propose a novel method to determine the bounce that describes trans…
▽ More
We investigate the metastability of scalar fields in quantum field theories at finite temperature, focusing on a detailed understanding of the bounce solution. At finite temperature, the bounce solution depends on two variables: the Euclidean time $τ$ and the spatial radial distance $r$, and it is periodic in the $τ$ direction. We propose a novel method to determine the bounce that describes transitions in a thermal bath, suitable for numerical calculations. Two types of bounces exist for transitions in the thermal bath: $τ$-dependent and $τ$-independent bounces. We apply our method to compute these bounces in several models, including both thin-wall and thick-wall scenarios, to examine their properties. Specifically, we evaluate the critical temperature below which the $τ$-independent bounce becomes destabilized due to fluctuations, rendering it irrelevant. We demonstrate that in the thick-wall case, the $τ$-dependent bounce smoothly transitions into the $τ$-independent one as temperature increases, whereas in the thin-wall case, the transition between the two types of bounces is discontinuous.
△ Less
Submitted 24 June, 2024; v1 submitted 16 June, 2024;
originally announced June 2024.
-
Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses
Authors:
Zhiwen Fan,
Pu Wang,
Yang Zhao,
Yibo Zhao,
Boris Ivanovic,
Zhangyang Wang,
Marco Pavone,
Hao Frank Yang
Abstract:
The increasing rate of road accidents worldwide results not only in significant loss of life but also imposes billions financial burdens on societies. Current research in traffic crash frequency modeling and analysis has predominantly approached the problem as classification tasks, focusing mainly on learning-based classification or ensemble learning methods. These approaches often overlook the in…
▽ More
The increasing rate of road accidents worldwide results not only in significant loss of life but also imposes billions financial burdens on societies. Current research in traffic crash frequency modeling and analysis has predominantly approached the problem as classification tasks, focusing mainly on learning-based classification or ensemble learning methods. These approaches often overlook the intricate relationships among the complex infrastructure, environmental, human and contextual factors related to traffic crashes and risky situations. In contrast, we initially propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports and incorporating infrastructure data, environmental and traffic textual and visual information in Washington State. Leveraging this rich dataset, we further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes, such as crash types, severity and number of injuries, based on contextual and environmental factors. The proposed model, CrashLLM, distinguishes itself from existing solutions by leveraging the inherent text reasoning capabilities of LLMs to parse and learn from complex, unstructured data, thereby enabling a more nuanced analysis of contributing factors. Our experiments results shows that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes, all with averaged F1 score boosted from 34.9% to 53.8%. Furthermore, CrashLLM can provide valuable insights for numerous open-world what-if situational-awareness traffic safety analyses with learned reasoning features, which existing models cannot offer. We make our benchmark, datasets, and model public available for further exploration.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing
Authors:
Ming Meng,
Yufei Zhao,
Bo Zhang,
Yonggui Zhu,
Weimin Shi,
Maxwell Wen,
Zhaoxin Fan
Abstract:
Talking head synthesis, an advanced method for generating portrait videos from a still image driven by specific content, has garnered widespread attention in virtual reality, augmented reality and game production. Recently, significant breakthroughs have been made with the introduction of novel models such as the transformer and the diffusion model. Current methods can not only generate new conten…
▽ More
Talking head synthesis, an advanced method for generating portrait videos from a still image driven by specific content, has garnered widespread attention in virtual reality, augmented reality and game production. Recently, significant breakthroughs have been made with the introduction of novel models such as the transformer and the diffusion model. Current methods can not only generate new content but also edit the generated material. This survey systematically reviews the technology, categorizing it into three pivotal domains: portrait generation, driven mechanisms, and editing techniques. We summarize milestone studies and critically analyze their innovations and shortcomings within each domain. Additionally, we organize an extensive collection of datasets and provide a thorough performance analysis of current methodologies based on various evaluation metrics, aiming to furnish a clear framework and robust data support for future research. Finally, we explore application scenarios of talking head synthesis, illustrate them with specific cases, and examine potential future directions.
△ Less
Submitted 18 June, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.
-
Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes…
▽ More
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
DeTriever: Decoder-representation-based Retriever for Improving NL2SQL In-Context Learning
Authors:
Yuxi Feng,
Raymond Li,
Zhenan Fan,
Giuseppe Carenini,
Mohammadreza Pourreza,
Weiwei Zhang,
Yong Zhang
Abstract:
While in-context Learning (ICL) has proven to be an effective technique to improve the performance of Large Language Models (LLMs) in a variety of complex tasks, notably in translating natural language questions into Structured Query Language (NL2SQL), the question of how to select the most beneficial demonstration examples remains an open research problem. While prior works often adapted off-the-…
▽ More
While in-context Learning (ICL) has proven to be an effective technique to improve the performance of Large Language Models (LLMs) in a variety of complex tasks, notably in translating natural language questions into Structured Query Language (NL2SQL), the question of how to select the most beneficial demonstration examples remains an open research problem. While prior works often adapted off-the-shelf encoders to retrieve examples dynamically, an inherent discrepancy exists in the representational capacities between the external retrievers and the LLMs. Further, optimizing the selection of examples is a non-trivial task, since there are no straightforward methods to assess the relative benefits of examples without performing pairwise inference. To address these shortcomings, we propose DeTriever, a novel demonstration retrieval framework that learns a weighted combination of LLM hidden states, where rich semantic information is encoded. To train the model, we propose a proxy score that estimates the relative benefits of examples based on the similarities between output queries. Experiments on two popular NL2SQL benchmarks demonstrate that our method significantly outperforms the state-of-the-art baselines on one-shot NL2SQL tasks.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Output-sensitive Conjunctive Query Evaluation
Authors:
Shaleen Deep,
Hangdong Zhao,
Austen Z. Fan,
Paraschos Koutris
Abstract:
Join evaluation is one of the most fundamental operations performed by database systems and arguably the most well-studied problem in the Database community. A staggering number of join algorithms have been developed, and commercial database engines use finely tuned join heuristics that take into account many factors including the selectivity of predicates, memory, IO, etc. However, most of the re…
▽ More
Join evaluation is one of the most fundamental operations performed by database systems and arguably the most well-studied problem in the Database community. A staggering number of join algorithms have been developed, and commercial database engines use finely tuned join heuristics that take into account many factors including the selectivity of predicates, memory, IO, etc. However, most of the results have catered to either full join queries or non-full join queries but with degree constraints (such as PK-FK relationships) that make joins \emph{easier} to evaluate. Further, most of the algorithms are also not output-sensitive.
In this paper, we present a novel, output-sensitive algorithm for the evaluation of acyclic Conjunctive Queries (CQs) that contain arbitrary free variables. Our result is based on a novel generalization of the Yannakakis algorithm and shows that it is possible to improve the running time guarantee of the Yannakakis algorithm by a polynomial factor. Importantly, our algorithmic improvement does not depend on the use of fast matrix multiplication, as a recently proposed algorithm does. The upper bound is complemented with matching lower bounds conditioned on two variants of the $k$-clique conjecture. The application of our algorithm recovers known prior results and improves on known state-of-the-art results for common queries such as paths and stars.
△ Less
Submitted 14 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Some quenched and annealed limit theorems of superprocesses in random environments
Authors:
Zeteng Fan,
Jieliang Hong,
Jie Xiong
Abstract:
Let $X=(X_t, t\geq 0)$ be a superprocess in a random environment described by a Gaussian noise $W=\{W(t,x), t\geq 0, x\in \mathbb{R}^d\}$ white in time and colored in space with correlation kernel $g(x,y)$. When $d\geq 3$, under the condition that the correlation function $g(x,y)$ is bounded above by some appropriate function $\bar{g}(x-y)$, we present the quenched and annealed Strong Law of Large…
▽ More
Let $X=(X_t, t\geq 0)$ be a superprocess in a random environment described by a Gaussian noise $W=\{W(t,x), t\geq 0, x\in \mathbb{R}^d\}$ white in time and colored in space with correlation kernel $g(x,y)$. When $d\geq 3$, under the condition that the correlation function $g(x,y)$ is bounded above by some appropriate function $\bar{g}(x-y)$, we present the quenched and annealed Strong Law of Large Numbers and the Central Limit Theorems regarding the weighted occupation measure $\int_0^t X_s ds$ as $t\to \infty$.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Authors:
Peng Xia,
Ze Chen,
Juanxi Tian,
Yangrui Gong,
Ruibo Hou,
Yue Xu,
Zhenbang Wu,
Zhiyuan Fan,
Yiyang Zhou,
Kangyu Zhu,
Wenhao Zheng,
Zhaoyang Wang,
Xiao Wang,
Xuchao Zhang,
Chetan Bansal,
Marc Niethammer,
Junzhou Huang,
Hongtu Zhu,
Yun Li,
Jimeng Sun,
Zongyuan Ge,
Gang Li,
James Zou,
Huaxiu Yao
Abstract:
Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehen…
▽ More
Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness. CARES comprises about 41K question-answer pairs in both closed and open-ended formats, covering 16 medical image modalities and 27 anatomical regions. Our analysis reveals that the models consistently exhibit concerns regarding trustworthiness, often displaying factual inaccuracies and failing to maintain fairness across different demographic groups. Furthermore, they are vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly release our benchmark and code in https://github.com/richard-peng-xia/CARES.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Imputation of Missing Photometric Data and Photometric Redshift Estimation for CSST
Authors:
Zhijian Luo,
Zhirui Tang,
Zhu Chen,
Liping Fu,
Wei Du,
Shaohua Zhang,
Yan Gong,
Chenggang Shu,
Junhao Lu,
Yicheng Li,
Xian-Min Meng,
Xingchen Zhou,
Zuhui Fan
Abstract:
Accurate photometric redshift (photo-$z$) estimation requires support from multi-band observational data. However, in the actual process of astronomical observations and data processing, some sources may have missing observational data in certain bands for various reasons. This could greatly affect the accuracy and reliability of photo-$z$ estimation for these sources, and even render some estimat…
▽ More
Accurate photometric redshift (photo-$z$) estimation requires support from multi-band observational data. However, in the actual process of astronomical observations and data processing, some sources may have missing observational data in certain bands for various reasons. This could greatly affect the accuracy and reliability of photo-$z$ estimation for these sources, and even render some estimation methods unusable. The same situation may exist for the upcoming Chinese Space Station Telescope (CSST). In this study, we employ a deep learning method called Generative Adversarial Imputation Networks (GAIN) to impute the missing photometric data in CSST, aiming to reduce the impact of data missing on photo-$z$ estimation and improve estimation accuracy. Our results demonstrate that using the GAIN technique can effectively fill in the missing photometric data in CSST. Particularly, when the data missing rate is below 30\%, the imputation of photometric data exhibits high accuracy, with higher accuracy in the $g$, $r$, $i$, $z$, and $y$ bands compared to the $NUV$ and $u$ bands. After filling in the missing values, the quality of photo-$z$ estimation obtained by the widely used Easy and Accurate Zphot from Yale (EAZY) software is notably enhanced. Evaluation metrics for assessing the quality of photo-$z$ estimation, including the catastrophic outlier fraction ($f_{out}$), the normalized median absolute deviation ($\rm {σ_{NMAD}}$), and the bias of photometric redshift ($bias$), all show some degree of improvement. Our research will help maximize the utilization of observational data and provide a new method for handling sample missing values for applications that require complete photometry data to produce results.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Kronecker-product random matrices and a matrix least squares problem
Authors:
Zhou Fan,
Renyuan Ma
Abstract:
We study the eigenvalue distribution and resolvent of a Kronecker-product random matrix model $A \otimes I_{n \times n}+I_{n \times n} \otimes B+Θ\otimes Ξ\in \mathbb{C}^{n^2 \times n^2}$, where $A,B$ are independent Wigner matrices and $Θ,Ξ$ are deterministic and diagonal. For fixed spectral arguments, we establish a quantitative approximation for the Stieltjes transform by that of an approximati…
▽ More
We study the eigenvalue distribution and resolvent of a Kronecker-product random matrix model $A \otimes I_{n \times n}+I_{n \times n} \otimes B+Θ\otimes Ξ\in \mathbb{C}^{n^2 \times n^2}$, where $A,B$ are independent Wigner matrices and $Θ,Ξ$ are deterministic and diagonal. For fixed spectral arguments, we establish a quantitative approximation for the Stieltjes transform by that of an approximating free operator, and a diagonal deterministic equivalent approximation for the resolvent. We further obtain sharp estimates in operator norm for the $n \times n$ resolvent blocks, and show that off-diagonal resolvent entries fall on two differing scales of $n^{-1/2}$ and $n^{-1}$ depending on their locations in the Kronecker structure.
Our study is motivated by consideration of a matrix-valued least-squares optimization problem $\min_{X \in \mathbb{R}^{n \times n}} \frac{1}{2}\|XA+BX\|_F^2+\frac{1}{2}\sum_{ij} ξ_iθ_j x_{ij}^2$ subject to a linear constraint. For random instances of this problem defined by Wigner inputs $A,B$, our analyses imply an asymptotic characterization of the minimizer $X$ and its associated minimum objective value as $n \to \infty$.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild
Authors:
Zhiqiang Wang,
Dejia Xu,
Rana Muhammad Shahroz Khan,
Yanbin Lin,
Zhiwen Fan,
Xingquan Zhu
Abstract:
Image geolocation is a critical task in various image-understanding applications. However, existing methods often fail when analyzing challenging, in-the-wild images. Inspired by the exceptional background knowledge of multimodal language models, we systematically evaluate their geolocation capabilities using a novel image dataset and a comprehensive evaluation framework. We first collect images f…
▽ More
Image geolocation is a critical task in various image-understanding applications. However, existing methods often fail when analyzing challenging, in-the-wild images. Inspired by the exceptional background knowledge of multimodal language models, we systematically evaluate their geolocation capabilities using a novel image dataset and a comprehensive evaluation framework. We first collect images from various countries via Google Street View. Then, we conduct training-free and training-based evaluations on closed-source and open-source multi-modal language models. we conduct both training-free and training-based evaluations on closed-source and open-source multimodal language models. Our findings indicate that closed-source models demonstrate superior geolocation abilities, while open-source models can achieve comparable performance through fine-tuning.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Federated Learning under Partially Class-Disjoint Data via Manifold Reshaping
Authors:
Ziqing Fan,
Jiangchao Yao,
Ruipeng Zhang,
Lingjuan Lyu,
Ya Zhang,
Yanfeng Wang
Abstract:
Statistical heterogeneity severely limits the performance of federated learning (FL), motivating several explorations e.g., FedProx, MOON and FedDyn, to alleviate this problem. Despite effectiveness, their considered scenario generally requires samples from almost all classes during the local training of each client, although some covariate shifts may exist among clients. In fact, the natural case…
▽ More
Statistical heterogeneity severely limits the performance of federated learning (FL), motivating several explorations e.g., FedProx, MOON and FedDyn, to alleviate this problem. Despite effectiveness, their considered scenario generally requires samples from almost all classes during the local training of each client, although some covariate shifts may exist among clients. In fact, the natural case of partially class-disjoint data (PCDD), where each client contributes a few classes (instead of all classes) of samples, is practical yet underexplored. Specifically, the unique collapse and invasion characteristics of PCDD can induce the biased optimization direction in local training, which prevents the efficiency of federated learning. To address this dilemma, we propose a manifold reshaping approach called FedMR to calibrate the feature space of local training. Our FedMR adds two interplaying losses to the vanilla federated learning: one is intra-class loss to decorrelate feature dimensions for anti-collapse; and the other one is inter-class loss to guarantee the proper margin among categories in the feature expansion. We conduct extensive experiments on a range of datasets to demonstrate that our FedMR achieves much higher accuracy and better communication efficiency. Source code is available at: https://github.com/MediaBrain-SJTU/FedMR.git.
△ Less
Submitted 3 June, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Federated Learning with Bilateral Curation for Partially Class-Disjoint Data
Authors:
Ziqing Fan,
Ruipeng Zhang,
Jiangchao Yao,
Bo Han,
Ya Zhang,
Yanfeng Wang
Abstract:
Partially class-disjoint data (PCDD), a common yet under-explored data formation where each client contributes a part of classes (instead of all classes) of samples, severely challenges the performance of federated algorithms. Without full classes, the local objective will contradict the global objective, yielding the angle collapse problem for locally missing classes and the space waste problem f…
▽ More
Partially class-disjoint data (PCDD), a common yet under-explored data formation where each client contributes a part of classes (instead of all classes) of samples, severely challenges the performance of federated algorithms. Without full classes, the local objective will contradict the global objective, yielding the angle collapse problem for locally missing classes and the space waste problem for locally existing classes. As far as we know, none of the existing methods can intrinsically mitigate PCDD challenges to achieve holistic improvement in the bilateral views (both global view and local view) of federated learning. To address this dilemma, we are inspired by the strong generalization of simplex Equiangular Tight Frame~(ETF) on the imbalanced data, and propose a novel approach called FedGELA where the classifier is globally fixed as a simplex ETF while locally adapted to the personal distributions. Globally, FedGELA provides fair and equal discrimination for all classes and avoids inaccurate updates of the classifier, while locally it utilizes the space of locally missing classes for locally existing classes. We conduct extensive experiments on a range of datasets to demonstrate that our FedGELA achieves promising performance~(averaged improvement of 3.9% to FedAvg and 1.5% to best baselines) and provide both local and global convergence guarantees. Source code is available at:https://github.com/MediaBrain-SJTU/FedGELA.git.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization
Authors:
Ziqing Fan,
Shengchao Hu,
Jiangchao Yao,
Gang Niu,
Ya Zhang,
Masashi Sugiyama,
Yanfeng Wang
Abstract:
In federated learning (FL), the multi-step update and data heterogeneity among clients often lead to a loss landscape with sharper minima, degenerating the performance of the resulted global model. Prevalent federated approaches incorporate sharpness-aware minimization (SAM) into local training to mitigate this problem. However, the local loss landscapes may not accurately reflect the flatness of…
▽ More
In federated learning (FL), the multi-step update and data heterogeneity among clients often lead to a loss landscape with sharper minima, degenerating the performance of the resulted global model. Prevalent federated approaches incorporate sharpness-aware minimization (SAM) into local training to mitigate this problem. However, the local loss landscapes may not accurately reflect the flatness of global loss landscape in heterogeneous environments; as a result, minimizing local sharpness and calculating perturbations on client data might not align the efficacy of SAM in FL with centralized training. To overcome this challenge, we propose FedLESAM, a novel algorithm that locally estimates the direction of global perturbation on client side as the difference between global models received in the previous active and current rounds. Besides the improved quality, FedLESAM also speed up federated SAM-based approaches since it only performs once backpropagation in each iteration. Theoretically, we prove a slightly tighter bound than its original FedSAM by ensuring consistent perturbation. Empirically, we conduct comprehensive experiments on four federated benchmark datasets under three partition strategies to demonstrate the superior performance and efficiency of FedLESAM.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts
Authors:
Ruipeng Zhang,
Ziqing Fan,
Jiangchao Yao,
Ya Zhang,
Yanfeng Wang
Abstract:
This paper presents a Domain-Inspired Sharpness-Aware Minimization (DISAM) algorithm for optimization under domain shifts. It is motivated by the inconsistent convergence degree of SAM across different domains, which induces optimization bias towards certain domains and thus impairs the overall convergence. To address this issue, we consider the domain-level convergence consistency in the sharpnes…
▽ More
This paper presents a Domain-Inspired Sharpness-Aware Minimization (DISAM) algorithm for optimization under domain shifts. It is motivated by the inconsistent convergence degree of SAM across different domains, which induces optimization bias towards certain domains and thus impairs the overall convergence. To address this issue, we consider the domain-level convergence consistency in the sharpness estimation to prevent the overwhelming (deficient) perturbations for less (well) optimized domains. Specifically, DISAM introduces the constraint of minimizing variance in the domain loss, which allows the elastic gradient calibration in perturbation generation: when one domain is optimized above the averaging level \textit{w.r.t.} loss, the gradient perturbation towards that domain will be weakened automatically, and vice versa. Under this mechanism, we theoretically show that DISAM can achieve faster overall convergence and improved generalization in principle when inconsistent convergence emerges. Extensive experiments on various domain generalization benchmarks show the superiority of DISAM over a range of state-of-the-art methods. Furthermore, we show the superior efficiency of DISAM in parameter-efficient fine-tuning combined with the pretraining models. The source code is released at https://github.com/MediaBrain-SJTU/DISAM.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning
Authors:
Shengchao Hu,
Ziqing Fan,
Li Shen,
Ya Zhang,
Yanfeng Wang,
Dacheng Tao
Abstract:
The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy applicable to diverse tasks without the need for online environmental interaction. Recent advancements approach this through sequence modeling, leveraging the Transformer architecture's scalability and the benefits of parameter sharing to exploit task similarities. However, variations in task content and…
▽ More
The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy applicable to diverse tasks without the need for online environmental interaction. Recent advancements approach this through sequence modeling, leveraging the Transformer architecture's scalability and the benefits of parameter sharing to exploit task similarities. However, variations in task content and complexity pose significant challenges in policy formulation, necessitating judicious parameter sharing and management of conflicting gradients for optimal policy performance. In this work, we introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task. We approach this as a bi-level optimization problem, employing a meta-learning framework that leverages gradient-based techniques. The upper level of this framework is dedicated to learning a task-specific mask that delineates the harmony subspace, while the inner level focuses on updating parameters to enhance the overall performance of the unified policy. Empirical evaluations on a series of benchmarks demonstrate the superiority of HarmoDT, verifying the effectiveness of our approach.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Block encoding of sparse structured matrices coming from ocean acoustics in quantum computing
Authors:
Chunlin Yang,
Hongmei Yao,
Zexian Li,
Zhaobing Fan,
Guofeng Zhang,
Jianshe Liu
Abstract:
Block encoding is a data input model commonly used in a quantum computer. It is an ingenious technique that embeds a matrix $A$ satisfying $\left\|A/ α\right\| \leq 1$ into a larger unitary matrix $U_{A}$. Its complexity can affect the complexity of quantum algorithms in the framework of block encoding. In this paper, a new base scheme of block encoding is given which generalizes the one in \cite{…
▽ More
Block encoding is a data input model commonly used in a quantum computer. It is an ingenious technique that embeds a matrix $A$ satisfying $\left\|A/ α\right\| \leq 1$ into a larger unitary matrix $U_{A}$. Its complexity can affect the complexity of quantum algorithms in the framework of block encoding. In this paper, a new base scheme of block encoding is given which generalizes the one in \cite{camps2024explicit} by removing the constraint that every data item should appear in all columns. And applying preamplification and state preparation methods, the base scheme is further improved, which results in lower \textit{figures of merit} than that in special case \cite{sunderhauf2024block}. Then, the construction of oracles in block encoding schemes are discussed in detail. Considering special sparse structured matrices coming from ocean acoustics, two concrete examples are used to illustrate the feasibility of the proposed base scheme of block encoding and their explicit quantum circuits are implemented. Finally, the corresponding \verb|MATLAB| codes are presented to effectively simulate the quantum circuits.
△ Less
Submitted 9 July, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Q-value Regularized Transformer for Offline Reinforcement Learning
Authors:
Shengchao Hu,
Ziqing Fan,
Chaoqin Huang,
Li Shen,
Ya Zhang,
Yanfeng Wang,
Dacheng Tao
Abstract:
Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Conditional Sequence Modeling (CSM), a paradigm that learns the action distribution based on history trajectory and target returns for each state. However, these methods often struggle with stitching together optimal trajectories from sub-optimal ones due to the inconsistency between the sampled returns…
▽ More
Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Conditional Sequence Modeling (CSM), a paradigm that learns the action distribution based on history trajectory and target returns for each state. However, these methods often struggle with stitching together optimal trajectories from sub-optimal ones due to the inconsistency between the sampled returns within individual trajectories and the optimal returns across multiple trajectories. Fortunately, Dynamic Programming (DP) methods offer a solution by leveraging a value function to approximate optimal future returns for each state, while these techniques are prone to unstable learning behaviors, particularly in long-horizon and sparse-reward scenarios. Building upon these insights, we propose the Q-value regularized Transformer (QT), which combines the trajectory modeling ability of the Transformer with the predictability of optimal future returns from DP methods. QT learns an action-value function and integrates a term maximizing action-values into the training loss of CSM, which aims to seek optimal actions that align closely with the behavior policy. Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods, highlighting the potential of QT to enhance the state-of-the-art in offline RL.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Dipolar bosons in a twisted bilayer geometry
Authors:
Chao Zhang,
Zhijie Fan,
Barbara Capogrosso-Sansone,
Youjin Deng
Abstract:
In recent years, twisted bilayer systems such as bilayer graphene have attracted a great deal of attention as the twist angle introduces a degree of freedom which can be used to non-trivially modify system properties. This idea has been picked up in the cold atom community, first with a theoretical proposal to simulate twisted bilayers in state-dependent optical lattices, and, more recently, with…
▽ More
In recent years, twisted bilayer systems such as bilayer graphene have attracted a great deal of attention as the twist angle introduces a degree of freedom which can be used to non-trivially modify system properties. This idea has been picked up in the cold atom community, first with a theoretical proposal to simulate twisted bilayers in state-dependent optical lattices, and, more recently, with an experimental realization of twisted bilayers with bosonic atoms in two different spin states. In this manuscript, we theoretically investigate dipolar bosons in a twisted bilayer geometry. The interplay between dipolar interaction and the twist between the layers results in the emergence of quantum states not observed in the absence of twist. We study how system properties vary as we change the twist angle at fixed distance between the layers and fixed dipolar interaction. We find that at a twist angle $θ=0.1^{\circ}$, the observed quantum phases are consistent with those seen in the absence of twist angle, i.e. paired superfluid, paired supersolid, and paired solid phases. However, a slight increase in the twist angle to $θ=0.2^{\circ}$ disrupts these paired phases in favor of a phase separation between checkerboard solid and superfluid regions. Notably, at a twist angle of $θ=5.21^{\circ}$, the local occupation number follows the moiré pattern of the underlying moiré bilayers so that a periodic structure of insulating islands is formed. These insulating islands are surrounded by a superfluid.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Trajectory-Based Multi-Objective Hyperparameter Optimization for Model Retraining
Authors:
Wenyu Wang,
Zheyi Fan,
Szu Hui Ng
Abstract:
Training machine learning models inherently involves a resource-intensive and noisy iterative learning procedure that allows epoch-wise monitoring of the model performance. However, in multi-objective hyperparameter optimization scenarios, the insights gained from the iterative learning procedure typically remain underutilized. We notice that tracking the model performance across multiple epochs u…
▽ More
Training machine learning models inherently involves a resource-intensive and noisy iterative learning procedure that allows epoch-wise monitoring of the model performance. However, in multi-objective hyperparameter optimization scenarios, the insights gained from the iterative learning procedure typically remain underutilized. We notice that tracking the model performance across multiple epochs under a hyperparameter setting creates a trajectory in the objective space and that trade-offs along the trajectories are often overlooked despite their potential to offer valuable insights to decision-making for model retraining. Therefore, in this study, we propose to enhance the multi-objective hyperparameter optimization problem by having training epochs as an additional decision variable to incorporate trajectory information. Correspondingly, we present a novel trajectory-based multi-objective Bayesian optimization algorithm characterized by two features: 1) an acquisition function that captures the improvement made by the predictive trajectory of any hyperparameter setting and 2) a multi-objective early stopping mechanism that determines when to terminate the trajectory to maximize epoch efficiency. Numerical experiments on diverse synthetic simulations and hyperparameter tuning benchmarks indicate that our algorithm outperforms the state-of-the-art multi-objective optimizers in both locating better trade-offs and tuning efficiency.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Minimizing UCB: a Better Local Search Strategy in Local Bayesian Optimization
Authors:
Zheyi Fan,
Wenyu Wang,
Szu Hui Ng,
Qingpei Hu
Abstract:
Local Bayesian optimization is a promising practical approach to solve the high dimensional black-box function optimization problem. Among them is the approximated gradient class of methods, which implements a strategy similar to gradient descent. These methods have achieved good experimental results and theoretical guarantees. However, given the distributional properties of the Gaussian processes…
▽ More
Local Bayesian optimization is a promising practical approach to solve the high dimensional black-box function optimization problem. Among them is the approximated gradient class of methods, which implements a strategy similar to gradient descent. These methods have achieved good experimental results and theoretical guarantees. However, given the distributional properties of the Gaussian processes applied on these methods, there may be potential to further exploit the information of the Gaussian processes to facilitate the BO search. In this work, we develop the relationship between the steps of the gradient descent method and one that minimizes the Upper Confidence Bound (UCB), and show that the latter can be a better strategy than direct gradient descent when a Gaussian process is applied as a surrogate. Through this insight, we propose a new local Bayesian optimization algorithm, MinUCB, which replaces the gradient descent step with minimizing UCB in GIBO. We further show that MinUCB maintains a similar convergence rate with GIBO. We then improve the acquisition function of MinUCB further through a look ahead strategy, and obtain a more efficient algorithm LA-MinUCB. We apply our algorithms on different synthetic and real-world functions, and the results show the effectiveness of our method. Our algorithms also illustrate improvements on local search strategies from an upper bound perspective in Bayesian optimization, and provides a new direction for future algorithm design.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
CuckooGraph: A Scalable and Space-Time Efficient Data Structure for Large-Scale Dynamic Graphs
Authors:
Zhuochen Fan,
Yalun Cai,
Zirui Liu,
Jiarui Guo,
Xin Fan,
Tong Yang,
Bin Cui
Abstract:
Graphs play an increasingly important role in various big data applications. However, existing graph data structures cannot simultaneously address the performance bottlenecks caused by the dynamic updates, large scale, and high query complexity of current graphs. This paper proposes a novel data structure for large-scale dynamic graphs called CuckooGraph. It does not need to know the amount of gra…
▽ More
Graphs play an increasingly important role in various big data applications. However, existing graph data structures cannot simultaneously address the performance bottlenecks caused by the dynamic updates, large scale, and high query complexity of current graphs. This paper proposes a novel data structure for large-scale dynamic graphs called CuckooGraph. It does not need to know the amount of graph data in advance, and can adaptively resize to the most memory-efficient form according to the data scale, realizing multiple graph analytic tasks faster. The key techniques of CuckooGraph include TRANSFORMATION and DENYLIST. TRANSFORMATION fully utilizes the limited memory by designing related data structures that allow flexible space transformations to smoothly expand/tighten the required space depending on the number of incoming items. DENYLIST efficiently handles item insertion failures and further improves processing speed. We conduct extensive experiments, and the results show that CuckooGraph significantly reduces query time by four orders of magnitude on 1-hop successor and precursor queries compared to the state-of-the-art.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Calibrated Self-Rewarding Vision Language Models
Authors:
Yiyang Zhou,
Zhiyuan Fan,
Dongjie Cheng,
Sihan Yang,
Zhaorun Chen,
Chenhang Cui,
Xiyao Wang,
Yun Li,
Linjun Zhang,
Huaxiu Yao
Abstract:
Large Vision-Language Models (LVLMs) have made substantial progress by integrating pre-trained large language models (LLMs) and vision models through instruction tuning. Despite these advancements, LVLMs often exhibit the hallucination phenomenon, where generated text responses appear linguistically plausible but contradict the input image, indicating a misalignment between image and text pairs. T…
▽ More
Large Vision-Language Models (LVLMs) have made substantial progress by integrating pre-trained large language models (LLMs) and vision models through instruction tuning. Despite these advancements, LVLMs often exhibit the hallucination phenomenon, where generated text responses appear linguistically plausible but contradict the input image, indicating a misalignment between image and text pairs. This misalignment arises because the model tends to prioritize textual information over visual input, even when both the language model and visual representations are of high quality. Existing methods leverage additional models or human annotations to curate preference data and enhance modality alignment through preference optimization. These approaches may not effectively reflect the target LVLM's preferences, making the curated preferences easily distinguishable. Our work addresses these challenges by proposing the Calibrated Self-Rewarding (CSR) approach, which enables the model to self-improve by iteratively generating candidate responses, evaluating the reward for each response, and curating preference data for fine-tuning. In the reward modeling, we employ a step-wise strategy and incorporate visual constraints into the self-rewarding process to place greater emphasis on visual input. Empirical results demonstrate that CSR enhances performance and reduces hallucinations across ten benchmarks and tasks, achieving substantial improvements over existing methods by 7.62%. Our empirical results are further supported by rigorous theoretical analysis, under mild assumptions, verifying the effectiveness of introducing visual constraints into the self-rewarding paradigm. Additionally, CSR shows compatibility with different vision-language models and the ability to incrementally improve performance through iterative fine-tuning. Our data and code are available at https://github.com/YiyangZhou/CSR.
△ Less
Submitted 31 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Specular Polynomials
Authors:
Zhimin Fan,
Jie Guo,
Yiming Wang,
Tianyu Xiao,
Hao Zhang,
Chenxi Zhou,
Zhenyu Chen,
Pengpei Hong,
Yanwen Guo,
Ling-Qi Yan
Abstract:
Finding valid light paths that involve specular vertices in Monte Carlo rendering requires solving many non-linear, transcendental equations in high-dimensional space. Existing approaches heavily rely on Newton iterations in path space, which are limited to obtaining at most a single solution each time and easily diverge when initialized with improper seeds.
We propose specular polynomials, a Ne…
▽ More
Finding valid light paths that involve specular vertices in Monte Carlo rendering requires solving many non-linear, transcendental equations in high-dimensional space. Existing approaches heavily rely on Newton iterations in path space, which are limited to obtaining at most a single solution each time and easily diverge when initialized with improper seeds.
We propose specular polynomials, a Newton iteration-free methodology for finding a complete set of admissible specular paths connecting two arbitrary endpoints in a scene. The core is a reformulation of specular constraints into polynomial systems, which makes it possible to reduce the task to a univariate root-finding problem. We first derive bivariate systems utilizing rational coordinate mapping between the coordinates of consecutive vertices. Subsequently, we adopt the hidden variable resultant method for variable elimination, converting the problem into finding zeros of the determinant of univariate matrix polynomials. This can be effectively solved through Laplacian expansion for one bounce and a bisection solver for more bounces.
Our solution is generic, completely deterministic, accurate for the case of one bounce, and GPU-friendly. We develop efficient CPU and GPU implementations and apply them to challenging glints and caustic rendering. Experiments on various scenarios demonstrate the superiority of specular polynomial-based solutions compared to Newton iteration-based counterparts.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Prompt-Enhanced Spatio-Temporal Graph Transfer Learning
Authors:
Junfeng Hu,
Xu Liu,
Zhencheng Fan,
Yifang Yin,
Shili Xiang,
Savitha Ramasamy,
Roger Zimmermann
Abstract:
Spatio-temporal graph neural networks have demonstrated efficacy in capturing complex dependencies for urban computing tasks such as forecasting and kriging. However, their performance is constrained by the reliance on extensive data for training on specific tasks, which limits their adaptability to new urban domains with varied demands. Although transfer learning has been proposed to address this…
▽ More
Spatio-temporal graph neural networks have demonstrated efficacy in capturing complex dependencies for urban computing tasks such as forecasting and kriging. However, their performance is constrained by the reliance on extensive data for training on specific tasks, which limits their adaptability to new urban domains with varied demands. Although transfer learning has been proposed to address this problem by leveraging knowledge across domains, cross-task generalization remains underexplored in spatio-temporal graph transfer learning methods due to the absence of a unified framework. To bridge this gap, we propose Spatio-Temporal Graph Prompting (STGP), a prompt-enhanced transfer learning framework capable of adapting to diverse tasks in data-scarce domains. Specifically, we first unify different tasks into a single template and introduce a task-agnostic network architecture that aligns with this template. This approach enables the capture of spatio-temporal dependencies shared across tasks. Furthermore, we employ learnable prompts to achieve domain and task transfer in a two-stage prompting pipeline, enabling the prompts to effectively capture domain knowledge and task-specific properties at each stage. Extensive experiments demonstrate that STGP outperforms state-of-the-art baselines in three downstream tasks forecasting, kriging, and extrapolation by a notable margin.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Separability and lower bounds of quantum entanglement based on realignment
Authors:
Jiaxin Sun,
Hongmei Yao,
Shao-Ming Fei,
Zhaobing Fan
Abstract:
The detection and estimation of quantum entanglement are the essential issues in the theory of quantum entanglement. We construct matrices based on the realignment of density matrices and the vectorization of the reduced density matrices, from which a family of separability criteria are presented for both bipartite and multipartite systems. Moreover, new lower bounds of concurrence and convex-roof…
▽ More
The detection and estimation of quantum entanglement are the essential issues in the theory of quantum entanglement. We construct matrices based on the realignment of density matrices and the vectorization of the reduced density matrices, from which a family of separability criteria are presented for both bipartite and multipartite systems. Moreover, new lower bounds of concurrence and convex-roof extended negativity are derived. Criteria are also given to detect the genuine tripartite entanglement. Lower bounds of the concurrence of genuine tripartite entanglement are presented. By detailed examples we show that our results are better than the corresponding ones in identifying and estimating quantum entanglement as well as genuine multipartite entanglement.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Data quality control system and long-term performance monitor of the LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen
, et al. (263 additional authors not shown)
Abstract:
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To…
▽ More
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively.
△ Less
Submitted 13 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Quantum entanglement estimation via symmetric measurement based positive maps
Authors:
Jiaxin Li,
Hongmei Yao,
Shao-Ming Fei,
Zhaobing Fan,
Haitao Ma
Abstract:
We provide a class of positive and trace-preserving maps based on symmetric measurements. From these positive maps we present separability criteria, entanglement witnesses, as well as the lower bounds of concurrence. We show by detailed examples that our separability criteria, entanglement witnesses and lower bounds can detect and estimate the quantum entanglement better than the related existing…
▽ More
We provide a class of positive and trace-preserving maps based on symmetric measurements. From these positive maps we present separability criteria, entanglement witnesses, as well as the lower bounds of concurrence. We show by detailed examples that our separability criteria, entanglement witnesses and lower bounds can detect and estimate the quantum entanglement better than the related existing results.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
High Discrimination Ratio, Broadband Circularly Polarized Light Photodetector Using Dielectric Achiral Nanostructures
Authors:
Guanyu Zhang,
Xiaying Lyu,
Yulu Qin,
Yaolong Li,
Zipu Fan,
Xianghan Meng,
Yuqing Cheng,
Zini Cao,
Yixuan Xu,
Dong Sun,
Yunan Gao,
Qihuang Gong,
Guowei Lu
Abstract:
The on-chip measurement of polarization states plays an increasingly crucial role in modern sensing and imaging applications. While high-performance monolithic linearly polarized photodetectors have been extensively studied, integrated circularly polarized light (CPL) photodetectors are still hindered by inadequate discrimination capability. In this study, we employ achiral all-dielectric nanostru…
▽ More
The on-chip measurement of polarization states plays an increasingly crucial role in modern sensing and imaging applications. While high-performance monolithic linearly polarized photodetectors have been extensively studied, integrated circularly polarized light (CPL) photodetectors are still hindered by inadequate discrimination capability. In this study, we employ achiral all-dielectric nanostructures to develop a broadband CPL photodetector with an impressive discrimination ratio of ~107 at the wavelength of 405 nm, significantly surpassing its counterparts by two orders of magnitude. Our device shows outstanding CPL discrimination capability across the visible band without requiring intensity calibration. Its function mechanism is based on the CPL-dependent near-field modes within achiral structures: under left or right CPL illumination, distinct near-field modes are excited, resulting in asymmetric irradiation of the two electrodes and generating a photovoltage with directions determined by the chirality of the incident light field. The proposed design strategy facilitates the realization of ultra-compact CPL detection across diverse materials, structures, and spectral ranges, presenting a novel avenue for achieving high-performance monolithic CPL detection.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Rapidly Achieving Chemical Accuracy with Quantum Computing Enforced Language Model
Authors:
Honghui Shang,
Xiongzhi Zeng,
Ming Gong,
Yangju Wu,
Shaojun Guo,
Haoran Qian,
Chen Zha,
Zhijie Fan,
Kai Yan,
Xiaobo Zhu,
Zhenyu Li,
Yi Luo,
Jian-Wei Pan,
Jinlong Yang
Abstract:
Finding accurate ground state energy of a many-body system has been a major challenge in quantum chemistry. The integration of classic and quantum computers has shed new light on resolving this outstanding problem. Here we propose QiankunNet-VQE, a transformer based language models enforced with quantum computing to learn and generate quantum states. It has been implemented using up to 12 qubits a…
▽ More
Finding accurate ground state energy of a many-body system has been a major challenge in quantum chemistry. The integration of classic and quantum computers has shed new light on resolving this outstanding problem. Here we propose QiankunNet-VQE, a transformer based language models enforced with quantum computing to learn and generate quantum states. It has been implemented using up to 12 qubits and attaining an accuracy level competitive with state-of-the-art classical methods. By leveraging both quantum and classical resources, this scheme overcomes the limitations of variational quantum eigensolver(VQE) without the need for cumbersome error mitigation. Moreover, QiankunNet-VQE provides a different route to achieve a practical quantum advantage for solving many-electron Schrödinger equation without requiring extremely precise preparation and measurement of the ground-state wavefunction on quantum computer.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
NAFRSSR: a Lightweight Recursive Network for Efficient Stereo Image Super-Resolution
Authors:
Yihong Chen,
Zhen Fan,
Shuai Dong,
Zhiwei Chen,
Wenjie Li,
Minghui Qin,
Min Zeng,
Xubing Lu,
Guofu Zhou,
Xingsen Gao,
Jun-Ming Liu
Abstract:
Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high co…
▽ More
Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high computational complexity. Here, we propose a simple yet efficient stereo image SR model called NAFRSSR, which is modified from the previous state-of-the-art model NAFSSR by introducing recursive connections and lightweighting the constituent modules. Our NAFRSSR model is composed of nonlinear activation free and group convolution-based blocks (NAFGCBlocks) and depth-separated stereo cross attention modules (DSSCAMs). The NAFGCBlock improves feature extraction and reduces number of parameters by removing the simple channel attention mechanism from NAFBlock and using group convolution. The DSSCAM enhances feature fusion and reduces number of parameters by replacing 1x1 pointwise convolution in SCAM with weight-shared 3x3 depthwise convolution. Besides, we propose to incorporate trainable edge detection operator into NAFRSSR to further improve the model performance. Four variants of NAFRSSR with different sizes, namely, NAFRSSR-Mobile (NAFRSSR-M), NAFRSSR-Tiny (NAFRSSR-T), NAFRSSR-Super (NAFRSSR-S) and NAFRSSR-Base (NAFRSSR-B) are designed, and they all exhibit fewer parameters, higher PSNR/SSIM, and faster speed than the previous state-of-the-art models. In particular, to the best of our knowledge, NAFRSSR-M is the lightest (0.28M parameters) and fastest (50 ms inference time) model achieving an average PSNR/SSIM as high as 24.657 dB/0.7622 on the benchmark datasets. Codes and models will be released at https://github.com/JNUChenYiHong/NAFRSSR.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i…
▽ More
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Perfect basis theory for quantum Borcherds-Bozec algebras
Authors:
Zhaobing Fan,
Shaolong Han,
Seok-Jin Kang,
Young Rock Kim
Abstract:
In this paper, we develop the perfect basis theory for quantum Borcherds-Bozec algebras $U_{q}(\mathfrak g)$ and their irreducible highest weight modules $V(λ)$. We show that the lower perfect graph (resp. upper perfect graph) of every lower perfect basis (resp. upper perfect basis) of $U_{q}^{-}(\mathfrak g)$ (resp. $V(λ)$) is isomorphic to the crystal $B(\infty)$ (resp. $B(λ)$).
In this paper, we develop the perfect basis theory for quantum Borcherds-Bozec algebras $U_{q}(\mathfrak g)$ and their irreducible highest weight modules $V(λ)$. We show that the lower perfect graph (resp. upper perfect graph) of every lower perfect basis (resp. upper perfect basis) of $U_{q}^{-}(\mathfrak g)$ (resp. $V(λ)$) is isomorphic to the crystal $B(\infty)$ (resp. $B(λ)$).
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results
Authors:
Yaqi Wu,
Zhihao Fan,
Xiaofeng Chu,
Jimmy S. Ren,
Xiaoming Li,
Zongsheng Yue,
Chongyi Li,
Shangcheng Zhou,
Ruicheng Feng,
Yuekun Dai,
Peiqing Yang,
Chen Change Loy,
Senyan Xu,
Zhijing Sun,
Jiaying Zhu,
Yurui Zhu,
Xueyang Fu,
Zheng-Jun Zha,
Jun Cao,
Cheng Li,
Shu Chen,
Liang Ma,
Shiyang Zhou,
Haijin Zeng,
Kai Feng
, et al. (24 additional authors not shown)
Abstract:
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra…
▽ More
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Codexity: Secure AI-assisted Code Generation
Authors:
Sung Yong Kim,
Zhiyu Fan,
Yannic Noller,
Abhik Roychoudhury
Abstract:
Despite the impressive performance of Large Language Models (LLMs) in software development activities, recent studies show the concern of introducing vulnerabilities into software codebase by AI programming assistants (e.g., Copilot, CodeWhisperer). In this work, we present Codexity, a security-focused code generation framework integrated with five LLMs. Codexity leverages the feedback of static a…
▽ More
Despite the impressive performance of Large Language Models (LLMs) in software development activities, recent studies show the concern of introducing vulnerabilities into software codebase by AI programming assistants (e.g., Copilot, CodeWhisperer). In this work, we present Codexity, a security-focused code generation framework integrated with five LLMs. Codexity leverages the feedback of static analysis tools such as Infer and CppCheck to mitigate security vulnerabilities in LLM-generated programs. Our evaluation in a real-world benchmark with 751 automatically generated vulnerable subjects demonstrates Codexity can prevent 60% of the vulnerabilities being exposed to the software developer.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent
Authors:
Shang Shang,
Xinqiang Zhao,
Zhongjiang Yao,
Yepeng Yao,
Liya Su,
Zijing Fan,
Xiaodan Zhang,
Zhengwei Jiang
Abstract:
To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts.This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content securi…
▽ More
To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts.This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content security measures. We detail two implementations under this framework: "Obscure Intention" and "Create Ambiguity", which manipulate query complexity and ambiguity to evade malicious intent detection effectively. We empirically validate the effectiveness of the IntentObfuscator method across several models, including ChatGPT-3.5, ChatGPT-4, Qwen and Baichuan, achieving an average jailbreak success rate of 69.21\%. Notably, our tests on ChatGPT-3.5, which claims 100 million weekly active users, achieved a remarkable success rate of 83.65\%. We also extend our validation to diverse types of sensitive content like graphic violence, racism, sexism, political sensitivity, cybersecurity threats, and criminal skills, further proving the substantial impact of our findings on enhancing 'Red Team' strategies against LLM content security frameworks.
△ Less
Submitted 7 May, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
Polynomial lower bound on the effective resistance for the one-dimensional critical long-range percolation
Authors:
Jian Ding,
Zherui Fan,
Lu-Jing Huang
Abstract:
In this work, we study the critical long-range percolation on $\mathbb{Z}$, where an edge connects $i$ and $j$ independently with probability $1-\exp\{-β|i-j|^{-2}\}$ for some fixed $β>0$. Viewing this as a random electric network where each edge has a unit conductance, we show that with high probability the effective resistances from the origin 0 to $[-N, N]^c$ and from the interval $[-N,N]$ to…
▽ More
In this work, we study the critical long-range percolation on $\mathbb{Z}$, where an edge connects $i$ and $j$ independently with probability $1-\exp\{-β|i-j|^{-2}\}$ for some fixed $β>0$. Viewing this as a random electric network where each edge has a unit conductance, we show that with high probability the effective resistances from the origin 0 to $[-N, N]^c$ and from the interval $[-N,N]$ to $[-2N,2N]^c$ (conditioned on no edge joining $[-N,N]$ and $[-2N,2N]^c$) both have a polynomial lower bound in $N$. Our bound holds for all $β>0$ and thus rules out a potential phase transition (around $β= 1$) which seemed to be a reasonable possibility.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Auto-Encoding Morph-Tokens for Multimodal LLM
Authors:
Kaihang Pan,
Siliang Tang,
Juncheng Li,
Zhaoyu Fan,
Wei Chow,
Shuicheng Yan,
Tat-Seng Chua,
Yueting Zhuang,
Hanwang Zhang
Abstract:
For multimodal LLMs, the synergy of visual comprehension (textual output) and generation (visual output) presents an ongoing challenge. This is due to a conflicting objective: for comprehension, an MLLM needs to abstract the visuals; for generation, it needs to preserve the visuals as much as possible. Thus, the objective is a dilemma for visual-tokens. To resolve the conflict, we propose encoding…
▽ More
For multimodal LLMs, the synergy of visual comprehension (textual output) and generation (visual output) presents an ongoing challenge. This is due to a conflicting objective: for comprehension, an MLLM needs to abstract the visuals; for generation, it needs to preserve the visuals as much as possible. Thus, the objective is a dilemma for visual-tokens. To resolve the conflict, we propose encoding images into morph-tokens to serve a dual purpose: for comprehension, they act as visual prompts instructing MLLM to generate texts; for generation, they take on a different, non-conflicting role as complete visual-tokens for image reconstruction, where the missing visual cues are recovered by the MLLM. Extensive experiments show that morph-tokens can achieve a new SOTA for multimodal comprehension and generation simultaneously. Our project is available at https://github.com/DCDmllm/MorphTokens.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.