subscribe to arXiv mailings

Bayesian Detector Combination for Object Detection with Crowdsourced Annotations

Authors: Zhi Qin Tan, Olga Isupova, Gustavo Carneiro, Xiatian Zhu, Yunpeng Li

Abstract: Acquiring fine-grained object detection annotations in unconstrained images is time-consuming, expensive, and prone to noise, especially in crowdsourcing scenarios. Most prior object detection methods assume accurate annotations; A few recent works have studied object detection with noisy crowdsourced annotations, with evaluation on distinct synthetic crowdsourced datasets of varying setups under… ▽ More Acquiring fine-grained object detection annotations in unconstrained images is time-consuming, expensive, and prone to noise, especially in crowdsourcing scenarios. Most prior object detection methods assume accurate annotations; A few recent works have studied object detection with noisy crowdsourced annotations, with evaluation on distinct synthetic crowdsourced datasets of varying setups under artificial assumptions. To address these algorithmic limitations and evaluation inconsistency, we first propose a novel Bayesian Detector Combination (BDC) framework to more effectively train object detectors with noisy crowdsourced annotations, with the unique ability of automatically inferring the annotators' label qualities. Unlike previous approaches, BDC is model-agnostic, requires no prior knowledge of the annotators' skill level, and seamlessly integrates with existing object detection models. Due to the scarcity of real-world crowdsourced datasets, we introduce large synthetic datasets by simulating varying crowdsourcing scenarios. This allows consistent evaluation of different models at scale. Extensive experiments on both real and synthetic crowdsourced datasets show that BDC outperforms existing state-of-the-art methods, demonstrating its superiority in leveraging crowdsourced data for object detection. Our code and data are available at https://github.com/zhiqin1998/bdc. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted at ECCV 2024

arXiv:2407.02973 [pdf, other]

NOEMA formIng Cluster survEy (NICE): Characterizing eight massive galaxy groups at $1.5 < z < 4$ in the COSMOS field

Authors: Nikolaj B. Sillassen, Shuowen Jin, Georgios E. Magdis, Emanuele Daddi, Tao Wang, Shiying Lu, Hanwen Sun, Vinod Arumugam, Daizhong Liu, Malte Brinch, Chiara D'Eugenio, Raphael Gobat, Carlos Gómez-Guijarro, Michael Rich, Eva Schinnerer, Veronica Strazzullo, Qinghua Tan, Francesco Valentino, Yijun Wang, Mengyuan Xiao, Luwenjia Zhou, David Blánquez-Sesé, Zheng Cai, Yanmei Chen, Laure Ciesla , et al. (19 additional authors not shown)

Abstract: The NOEMA formIng Cluster survEy (NICE) is a large program targeting 69 massive galaxy group candidates at $z>2$ in six deep fields. We report spectroscopic confirmation of eight groups at $1.65\leq z\leq3.61$ in COSMOS. Homogeneously selected as significant overdensities of red IRAC sources with red Herschel colors, four groups are confirmed by CO and [CI] with NOEMA 3mm observations, three are c… ▽ More The NOEMA formIng Cluster survEy (NICE) is a large program targeting 69 massive galaxy group candidates at $z>2$ in six deep fields. We report spectroscopic confirmation of eight groups at $1.65\leq z\leq3.61$ in COSMOS. Homogeneously selected as significant overdensities of red IRAC sources with red Herschel colors, four groups are confirmed by CO and [CI] with NOEMA 3mm observations, three are confirmed with ALMA, and one is confirmed by H$α$ from Subaru/FMOS. We constructed the integrated FIR SEDs for the eight groups, obtaining total IR SFR $=260-1300~{\rm M_\odot}$~yr$^{-1}$. We adopted six methods to estimate the dark matter masses, including stellar mass to halo mass relations, overdensity with galaxy bias, and NFW profile fitting to radial stellar mass density. We found the radial stellar mass density are consistent with a NFW profile, supporting that they are collapsed structures hosted by a single dark matter halo. The best halo mass estimates are $\log(M_{\rm h}/{\rm M_\odot})=12.8-13.7$ with uncertainty of 0.3 dex. From halo mass estimates, we derive baryonic accretion rate ${\rm BAR}=(1-8)\times10^{3}\,{\rm M_{\odot}/yr}$ for this sample. We find a quasi-linear correlation between the integrated SFR/BAR and the theoretical halo mass limit for cold streams, $M_{\rm stream}/M_{\rm h}$, with ${\rm SFR/BAR}=10^{-0.46\pm0.22}\left({M_{\rm stream}/M_{\rm h}}\right)^{0.71\pm0.16}$ with a scatter of $0.40\,{\rm dex}$. Further, we compare halo masses and stellar masses with simulations, and find all structures are consistent with being progenitors of $M_{\rm h}(z=0)>10^{14}\,{\rm M_{\odot}}$ galaxy clusters, and the most massive central galaxies have stellar masses consistent with brightest cluster galaxies (BCGs) progenitors in the TNG300 simulation. The results strongly suggest these structures are forming massive galaxy clusters via baryonic and dark matter accretion. △ Less

Submitted 5 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

Comments: 44 pages (27pp appendix), 32 figures, 18 tables, accepted for publication in A&A

arXiv:2407.02715 [pdf, other]

Revealing the Electronic Structure of NiPS$_3$ through Synchrotron-Based ARPES and Alkali Metal Dosing

Authors: Yifeng Cao, Qishuo Tan, Yucheng Guo, Clóvis Guerim Vieira, Mário S. C. Mazzon, Jude Laverock, Nicholas Russo, Hongze Gao, Chris Jozwiak, Aaron Bostwick, Eli Rotenberg, Jinghua Guo, Ming Yi, Matheus J. S. Matos, Xi Ling, Kevin E. Smith

Abstract: This study presents a comprehensive analysis of the band structure in NiPS$_3$, a van der Waals layered antiferromagnet, utilizing high-resolution synchrotron-based angle-resolved photoemission spectroscopy (ARPES) and corroborative density functional theory (DFT) calculations. By tuning the parameters of the light source, we obtained a very clear and wide energy range band structure of NiPS$_3$.… ▽ More This study presents a comprehensive analysis of the band structure in NiPS$_3$, a van der Waals layered antiferromagnet, utilizing high-resolution synchrotron-based angle-resolved photoemission spectroscopy (ARPES) and corroborative density functional theory (DFT) calculations. By tuning the parameters of the light source, we obtained a very clear and wide energy range band structure of NiPS$_3$. Comparison with DFT calculations allows for the identification of the orbital character of the observed bands. Our DFT calculations perfectly match the experimental results, and no adaptations were made to the calculations based on the experimental outcomes. The appearance of novel electronic structure upon alkali metal dosing (AMD) were also obtained in this ARPES study. Above valence band maximum, structure of conduction bands and bands from defect states were firstly observed in NiPS$_3$. We provide the direct determination of the band gap of NiPS$_3$ as 1.3 eV from the band structure by AMD. In addition, detailed temperature dependent ARPES spectra were obtained across a range that spans both below and above the Néel transition temperature of NiPS$_3$. We found that the paramagnetic and antiferromagnetic states have almost identical spectra, indicating the highly localized nature of Ni $d$ states. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 4 figures

arXiv:2407.01881 [pdf, other]

Spectral evidence for NiPS3 as a Mott-Hubbard insulator

Authors: Yifeng Cao, Nicholas Russo, Qishuo Tan, Xi Ling, Jinghua Guo, Yi-de Chuang, Kevin E. Smith

Abstract: The layered van der Waals trichalcogenide NiPS3 has attracted widespread attention due to its unique optical, magnetic, and electronic properties. The complexity of NiPS3 itself, however, has also led to ongoing debates regarding its characteristics such as the existence of self-doped ligand holes. In this study, X-ray absorption spectroscopy and resonant inelastic X-ray scattering have been appli… ▽ More The layered van der Waals trichalcogenide NiPS3 has attracted widespread attention due to its unique optical, magnetic, and electronic properties. The complexity of NiPS3 itself, however, has also led to ongoing debates regarding its characteristics such as the existence of self-doped ligand holes. In this study, X-ray absorption spectroscopy and resonant inelastic X-ray scattering have been applied to investigate the electronic structure of NiPS3. With the aid of theoretical calculations using the charge-transfer multiplet model, we provide experimental evidence for NiPS3 being a Mott-Hubbard insulator rather than a charge-transfer insulator. Moreover, we explain why some previous XAS studies have concluded that NiPS3 is a charge-transfer insulator by comparing surface and bulk sensitive spectra. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 6 figures

arXiv:2407.01284 [pdf, other]

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Authors: Runqi Qiao, Qiuna Tan, Guanting Dong, Minhui Wu, Chong Sun, Xiaoshuai Song, Zhuoma GongQue, Shanglin Lei, Zhe Wei, Miaoxuan Zhang, Runfeng Qiao, Yifan Zhang, Xiao Zong, Yida Xu, Muxi Diao, Zhimin Bao, Chen Li, Honggang Zhang

Abstract: Visual mathematical reasoning, as a fundamental visual reasoning ability, has received widespread attention from the Large Multimodal Models (LMMs) community. Existing benchmarks, such as MathVista and MathVerse, focus more on the result-oriented performance but neglect the underlying principles in knowledge acquisition and generalization. Inspired by human-like mathematical reasoning, we introduc… ▽ More Visual mathematical reasoning, as a fundamental visual reasoning ability, has received widespread attention from the Large Multimodal Models (LMMs) community. Existing benchmarks, such as MathVista and MathVerse, focus more on the result-oriented performance but neglect the underlying principles in knowledge acquisition and generalization. Inspired by human-like mathematical reasoning, we introduce WE-MATH, the first benchmark specifically designed to explore the problem-solving principles beyond end-to-end performance. We meticulously collect and categorize 6.5K visual math problems, spanning 67 hierarchical knowledge concepts and five layers of knowledge granularity. We decompose composite problems into sub-problems according to the required knowledge concepts and introduce a novel four-dimensional metric, namely Insufficient Knowledge (IK), Inadequate Generalization (IG), Complete Mastery (CM), and Rote Memorization (RM), to hierarchically assess inherent issues in LMMs' reasoning process. With WE-MATH, we conduct a thorough evaluation of existing LMMs in visual mathematical reasoning and reveal a negative correlation between solving steps and problem-specific performance. We confirm the IK issue of LMMs can be effectively improved via knowledge augmentation strategies. More notably, the primary challenge of GPT-4o has significantly transitioned from IK to IG, establishing it as the first LMM advancing towards the knowledge generalization stage. In contrast, other LMMs exhibit a marked inclination towards Rote Memorization - they correctly solve composite problems involving multiple knowledge concepts yet fail to answer sub-problems. We anticipate that WE-MATH will open new pathways for advancements in visual mathematical reasoning for LMMs. The WE-MATH data and evaluation code are available at https://github.com/We-Math/We-Math. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Work in progress

arXiv:2406.15711 [pdf, other]

Parameterized quasinormal frequencies and Hawking radiation for axial gravitational perturbations of a holonomy-corrected black hole

Authors: Sen Yang, Wen-Di Guo, Qin Tan, Li Zhao, Yu-Xiao Liu

Abstract: As the fingerprints of black holes, quasinormal modes are closely associated with many properties of black holes. Especially, the ringdown phase of gravitational waveforms from the merger of compact binary components can be described by quasinormal modes. Serving as a model-independent approach, the framework of parameterized quasinormal frequencies offers a universal method for investigating quas… ▽ More As the fingerprints of black holes, quasinormal modes are closely associated with many properties of black holes. Especially, the ringdown phase of gravitational waveforms from the merger of compact binary components can be described by quasinormal modes. Serving as a model-independent approach, the framework of parameterized quasinormal frequencies offers a universal method for investigating quasinormal modes of diverse black holes. In this work, we first obtain the Schrödinger-like master equation of the axial gravitational perturbation of a holonomy-corrected black hole. We calculate the corresponding quasinormal frequencies using the Wentzel-Kramers-Brillouin approximation and asymptotic iteration methods. We investigate the numerical evolution of an initial wave packet on the background spacetime. Then, we deduce the parameterized expression of the quasinormal frequencies and find that $r_0 \leq 10^{-2}$ is a necessary condition for the parameterized approximation to be valid. We also study the impact of the quantum parameter $r_0$ on the greybody factor and Hawking radiation. With more ringdown signals of gravitational waves detected in the future, our research will contribute to the study of the quantum properties of black holes. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 13 pages,6 figures,and 6 tables

arXiv:2406.14045 [pdf, other]

Understanding Different Design Choices in Training Large Time Series Models

Authors: Yu-Neng Chuang, Songchen Li, Jiayi Yuan, Guanchu Wang, Kwei-Herng Lai, Leisheng Yu, Sirui Ding, Chia-Yuan Chang, Qiaoyu Tan, Daochen Zha, Xia Hu

Abstract: Inspired by Large Language Models (LLMs), Time Series Forecasting (TSF), a long-standing task in time series analysis, is undergoing a transition towards Large Time Series Models (LTSMs), aiming to train universal transformer-based models for TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datase… ▽ More Inspired by Large Language Models (LLMs), Time Series Forecasting (TSF), a long-standing task in time series analysis, is undergoing a transition towards Large Time Series Models (LTSMs), aiming to train universal transformer-based models for TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datasets. Recent endeavors have studied and evaluated various design choices aimed at enhancing LTSM training and generalization capabilities, spanning pre-processing techniques, model configurations, and dataset configurations. In this work, we comprehensively analyze these design choices and aim to identify the best practices for training LTSM. Moreover, we propose \emph{time series prompt}, a novel statistical prompting strategy tailored to time series data. Furthermore, based on the observations in our analysis, we introduce \texttt{LTSM-bundle}, which bundles the best design choices we have identified. Empirical results demonstrate that \texttt{LTSM-bundle} achieves superior zero-shot and few-shot performances compared to state-of-the-art LSTMs and traditional TSF methods on benchmark datasets. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13934 [pdf, other]

Reasoning Like a Doctor: Improving Medical Dialogue Systems via Diagnostic Reasoning Process Alignment

Authors: Kaishuai Xu, Yi Cheng, Wenjun Hou, Qiaoyu Tan, Wenjie Li

Abstract: Medical dialogue systems have attracted significant attention for their potential to act as medical assistants. Enabling these medical systems to emulate clinicians' diagnostic reasoning process has been the long-standing research focus. Previous studies rudimentarily realized the simulation of clinicians' diagnostic process by fine-tuning language models on high-quality dialogue datasets. Nonethe… ▽ More Medical dialogue systems have attracted significant attention for their potential to act as medical assistants. Enabling these medical systems to emulate clinicians' diagnostic reasoning process has been the long-standing research focus. Previous studies rudimentarily realized the simulation of clinicians' diagnostic process by fine-tuning language models on high-quality dialogue datasets. Nonetheless, they overly focus on the outcomes of the clinician's reasoning process while ignoring their internal thought processes and alignment with clinician preferences. Our work aims to build a medical dialogue system that aligns with clinicians' diagnostic reasoning processes. We propose a novel framework, Emulation, designed to generate an appropriate response that relies on abductive and deductive diagnostic reasoning analyses and aligns with clinician preferences through thought process modeling. Experimental results on two datasets confirm the efficacy of Emulation. Crucially, our framework furnishes clear explanations for the generated responses, enhancing its transparency in medical consultations. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Accepted by ACL 2024 Findings

arXiv:2406.13705 [pdf, other]

EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

Authors: Long Bai, Tong Chen, Qiaozhi Tan, Wan Jun Nah, Yanheng Li, Zhicheng He, Sishen Yuan, Zhen Chen, Jinlin Wu, Mobarakol Islam, Zhen Li, Hongbin Liu, Hongliang Ren

Abstract: Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema… ▽ More Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels remains underexplored. To tackle this, we introduce EndoUIC, a WCE unified illumination correction solution using an end-to-end promptable diffusion transformer (DiT) model. In our work, the illumination prompt module shall navigate the model to adapt to different exposure levels and perform targeted image enhancement, in which the Adaptive Prompt Integration (API) and Global Prompt Scanner (GPS) modules shall further boost the concurrent representation learning between the prompt parameters and features. Besides, the U-shaped restoration DiT model shall capture the long-range dependencies and contextual information for unified illumination restoration. Moreover, we present a novel Capsule-endoscopy Exposure Correction (CEC) dataset, including ground-truth and corrupted image pairs annotated by expert photographers. Extensive experiments against a variety of state-of-the-art (SOTA) methods on four datasets showcase the effectiveness of our proposed method and components in WCE illumination restoration, and the additional downstream experiments further demonstrate its utility for clinical diagnosis and surgical assistance. △ Less

Submitted 8 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

Comments: To appear in MICCAI 2024. Code and dataset availability: https://github.com/longbai1006/EndoUIC

arXiv:2406.12950 [pdf, other]

MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction

Authors: Yuyan Liu, Sirui Ding, Sheng Zhou, Wenqi Fan, Qiaoyu Tan

Abstract: Molecular property prediction (MPP) is a fundamental and crucial task in drug discovery. However, prior methods are limited by the requirement for a large number of labeled molecules and their restricted ability to generalize for unseen and new tasks, both of which are essential for real-world applications. To address these challenges, we present MolecularGPT for few-shot MPP. From a perspective o… ▽ More Molecular property prediction (MPP) is a fundamental and crucial task in drug discovery. However, prior methods are limited by the requirement for a large number of labeled molecules and their restricted ability to generalize for unseen and new tasks, both of which are essential for real-world applications. To address these challenges, we present MolecularGPT for few-shot MPP. From a perspective on instruction tuning, we fine-tune large language models (LLMs) based on curated molecular instructions spanning over 1000 property prediction tasks. This enables building a versatile and specialized LLM that can be adapted to novel MPP tasks without any fine-tuning through zero- and few-shot in-context learning (ICL). MolecularGPT exhibits competitive in-context reasoning capabilities across 10 downstream evaluation datasets, setting new benchmarks for few-shot molecular prediction tasks. More importantly, with just two-shot examples, MolecularGPT can outperform standard supervised graph neural network methods on 4 out of 7 datasets. It also excels state-of-the-art LLM baselines by up to 16.6% increase on classification accuracy and decrease of 199.17 on regression metrics (e.g., RMSE) under zero-shot. This study demonstrates the potential of LLMs as effective few-shot molecular property predictors. The code is available at https://github.com/NYUSHCS/MolecularGPT. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12052 [pdf, other]

UniGLM: Training One Unified Language Model for Text-Attributed Graphs

Authors: Yi Fang, Dongzhe Fan, Sirui Ding, Ninghao Liu, Qiaoyu Tan

Abstract: Representation learning on text-attributed graphs (TAGs), where nodes are represented by textual descriptions, is crucial for textual and relational knowledge systems and recommendation systems. Currently, state-of-the-art embedding methods for TAGs primarily focus on fine-tuning language models (e.g., BERT) using structure-aware training signals. While effective, these methods are tailored for in… ▽ More Representation learning on text-attributed graphs (TAGs), where nodes are represented by textual descriptions, is crucial for textual and relational knowledge systems and recommendation systems. Currently, state-of-the-art embedding methods for TAGs primarily focus on fine-tuning language models (e.g., BERT) using structure-aware training signals. While effective, these methods are tailored for individual TAG and cannot generalize across various graph scenarios. Given the shared textual space, leveraging multiple TAGs for joint fine-tuning, aligning text and graph structure from different aspects, would be more beneficial. Motivated by this, we introduce a novel Unified Graph Language Model (UniGLM) framework, the first graph embedding model that generalizes well to both in-domain and cross-domain TAGs. Specifically, UniGLM is trained over multiple TAGs with different domains and scales using self-supervised contrastive learning. UniGLM includes an adaptive positive sample selection technique for identifying structurally similar nodes and a lazy contrastive module that is devised to accelerate training by minimizing repetitive encoding calculations. Extensive empirical results across 9 benchmark TAGs demonstrate UniGLM's efficacy against leading embedding baselines in terms of generalization (various downstream tasks and backbones) and transfer learning (in and out of domain scenarios). The code is available at https://github.com/NYUSHCS/UniGLM. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11945 [pdf, other]

GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models

Authors: Yi Fang, Dongzhe Fan, Daochen Zha, Qiaoyu Tan

Abstract: This work studies self-supervised graph learning for text-attributed graphs (TAGs) where nodes are represented by textual attributes. Unlike traditional graph contrastive methods that perturb the numerical feature space and alter the graph's topological structure, we aim to improve view generation through language supervision. This is driven by the prevalence of textual attributes in real applicat… ▽ More This work studies self-supervised graph learning for text-attributed graphs (TAGs) where nodes are represented by textual attributes. Unlike traditional graph contrastive methods that perturb the numerical feature space and alter the graph's topological structure, we aim to improve view generation through language supervision. This is driven by the prevalence of textual attributes in real applications, which complement graph structures with rich semantic information. However, this presents challenges because of two major reasons. First, text attributes often vary in length and quality, making it difficulty to perturb raw text descriptions without altering their original semantic meanings. Second, although text attributes complement graph structures, they are not inherently well-aligned. To bridge the gap, we introduce GAugLLM, a novel framework for augmenting TAGs. It leverages advanced large language models like Mistral to enhance self-supervised graph learning. Specifically, we introduce a mixture-of-prompt-expert technique to generate augmented node features. This approach adaptively maps multiple prompt experts, each of which modifies raw text attributes using prompt engineering, into numerical feature space. Additionally, we devise a collaborative edge modifier to leverage structural and textual commonalities, enhancing edge augmentation by examining or building connections between nodes. Empirical results across five benchmark datasets spanning various domains underscore our framework's ability to enhance the performance of leading contrastive methods as a plug-in tool. Notably, we observe that the augmented features and graph structure can also enhance the performance of standard generative methods, as well as popular graph neural networks. The open-sourced implementation of our GAugLLM is available at Github. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.08587 [pdf, other]

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Authors: Xiaoshuai Song, Muxi Diao, Guanting Dong, Zhengyang Wang, Yujia Fu, Runqi Qiao, Zhexu Wang, Dayuan Fu, Huangxuan Wu, Bin Liang, Weihao Zeng, Yejie Wang, Zhuoma GongQue, Jianing Yu, Qiuna Tan, Weiran Xu

Abstract: Computer Science (CS) stands as a testament to the intricacies of human intelligence, profoundly advancing the development of artificial intelligence and modern society. However, the current community of large language models (LLMs) overly focuses on benchmarks for analyzing specific foundational skills (e.g. mathematics and code generation), neglecting an all-round evaluation of the computer scie… ▽ More Computer Science (CS) stands as a testament to the intricacies of human intelligence, profoundly advancing the development of artificial intelligence and modern society. However, the current community of large language models (LLMs) overly focuses on benchmarks for analyzing specific foundational skills (e.g. mathematics and code generation), neglecting an all-round evaluation of the computer science field. To bridge this gap, we introduce CS-Bench, the first bilingual (Chinese-English) benchmark dedicated to evaluating the performance of LLMs in computer science. CS-Bench comprises approximately 5K meticulously curated test samples, covering 26 subfields across 4 key areas of computer science, encompassing various task forms and divisions of knowledge and reasoning. Utilizing CS-Bench, we conduct a comprehensive evaluation of over 30 mainstream LLMs, revealing the relationship between CS performance and model scales. We also quantitatively analyze the reasons for failures in existing LLMs and highlight directions for improvements, including knowledge supplementation and CS-specific reasoning. Further cross-capability experiments show a high correlation between LLMs' capabilities in computer science and their abilities in mathematics and coding. Moreover, expert LLMs specialized in mathematics and coding also demonstrate strong performances in several CS subfields. Looking ahead, we envision CS-Bench serving as a cornerstone for LLM applications in the CS field and paving new avenues in assessing LLMs' diverse reasoning capabilities. The CS-Bench data and evaluation code are available at https://github.com/csbench/csbench. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Work in progress

arXiv:2406.08310 [pdf, other]

GraphFM: A Comprehensive Benchmark for Graph Foundation Model

Authors: Yuhao Xu, Xinqi Liu, Keyu Duan, Yi Fang, Yu-Neng Chuang, Daochen Zha, Qiaoyu Tan

Abstract: Foundation Models (FMs) serve as a general class for the development of artificial intelligence systems, offering broad potential for generalization across a spectrum of downstream tasks. Despite extensive research into self-supervised learning as the cornerstone of FMs, several outstanding issues persist in Graph Foundation Models that rely on graph self-supervised learning, namely: 1) Homogeniza… ▽ More Foundation Models (FMs) serve as a general class for the development of artificial intelligence systems, offering broad potential for generalization across a spectrum of downstream tasks. Despite extensive research into self-supervised learning as the cornerstone of FMs, several outstanding issues persist in Graph Foundation Models that rely on graph self-supervised learning, namely: 1) Homogenization. The extent of generalization capability on downstream tasks remains unclear. 2) Scalability. It is unknown how effectively these models can scale to large datasets. 3) Efficiency. The training time and memory usage of these models require evaluation. 4) Training Stop Criteria. Determining the optimal stopping strategy for pre-training across multiple tasks to maximize performance on downstream tasks. To address these questions, we have constructed a rigorous benchmark that thoroughly analyzes and studies the generalization and scalability of self-supervised Graph Neural Network (GNN) models. Regarding generalization, we have implemented and compared the performance of various self-supervised GNN models, trained to generate node representations, across tasks such as node classification, link prediction, and node clustering. For scalability, we have compared the performance of various models after training using full-batch and mini-batch strategies. Additionally, we have assessed the training efficiency of these models by conducting experiments to test their GPU memory usage and throughput. Through these experiments, we aim to provide insights to motivate future research. The code for this benchmark is publicly available at https://github.com/NYUSHCS/GraphFM. △ Less

Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.05597 [pdf, other]

doi 10.1103/PhysRevA.109.063508

Optimal control of linear Gaussian quantum systems via quantum learning control

Authors: Yu-Hong Liu, Yexiong Zeng, Qing-Shou Tan, Daoyi Dong, Franco Nori, Jie-Qiao Liao

Abstract: Efficiently controlling linear Gaussian quantum (LGQ) systems is a significant task in both the study of fundamental quantum theory and the development of modern quantum technology. Here, we propose a general quantum-learning-control method for optimally controlling LGQ systems based on the gradient-descent algorithm. Our approach flexibly designs the loss function for diverse tasks by utilizing f… ▽ More Efficiently controlling linear Gaussian quantum (LGQ) systems is a significant task in both the study of fundamental quantum theory and the development of modern quantum technology. Here, we propose a general quantum-learning-control method for optimally controlling LGQ systems based on the gradient-descent algorithm. Our approach flexibly designs the loss function for diverse tasks by utilizing first- and second-order moments that completely describe the quantum state of LGQ systems. We demonstrate both deep optomechanical cooling and large optomechanical entanglement using this approach. Our approach enables the fast and deep ground-state cooling of a mechanical resonator within a short time, surpassing the limitations of sideband cooling in the continuous-wave driven strong-coupling regime. Furthermore, optomechanical entanglement could be generated remarkably fast and surpass several times the corresponding steady-state entanglement, even when the thermal phonon occupation reaches one hundred. This work will not only broaden the application of quantum learning control, but also open an avenue for optimal control of LGQ systems. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 14 pages, 7 figures

Journal ref: Phys. Rev. A 109, 063508 (2024)

arXiv:2406.04553 [pdf, other]

Better Late Than Never: Formulating and Benchmarking Recommendation Editing

Authors: Chengyu Lai, Sheng Zhou, Zhimeng Jiang, Qiaoyu Tan, Yuanchen Bei, Jiawei Chen, Ningyu Zhang, Jiajun Bu

Abstract: Recommendation systems play a pivotal role in suggesting items to users based on their preferences. However, in online platforms, these systems inevitably offer unsuitable recommendations due to limited model capacity, poor data quality, or evolving user interests. Enhancing user experience necessitates efficiently rectify such unsuitable recommendation behaviors. This paper introduces a novel and… ▽ More Recommendation systems play a pivotal role in suggesting items to users based on their preferences. However, in online platforms, these systems inevitably offer unsuitable recommendations due to limited model capacity, poor data quality, or evolving user interests. Enhancing user experience necessitates efficiently rectify such unsuitable recommendation behaviors. This paper introduces a novel and significant task termed recommendation editing, which focuses on modifying known and unsuitable recommendation behaviors. Specifically, this task aims to adjust the recommendation model to eliminate known unsuitable items without accessing training data or retraining the model. We formally define the problem of recommendation editing with three primary objectives: strict rectification, collaborative rectification, and concentrated rectification. Three evaluation metrics are developed to quantitatively assess the achievement of each objective. We present a straightforward yet effective benchmark for recommendation editing using novel Editing Bayesian Personalized Ranking Loss. To demonstrate the effectiveness of the proposed method, we establish a comprehensive benchmark that incorporates various methods from related fields. Codebase is available at https://github.com/cycl2018/Recommendation-Editing. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03929 [pdf, ps, other]

Ringing Thick Braneworld with Finite Extra Dimension

Authors: Hai-Long Jia, Wen-Di Guo, Qin Tan, Yu-Xiao Liu

Abstract: In this work, we investigate the quasinormal modes of the Poincaré thick brane with a finite extra dimension. Unlike the case with an infinite extra dimension, the gravitational effective potential exhibits three distinct shapes within different ranges of the parameter $n$ in the warp factor: harmonic oscillator potential, Pöschl-Teller potential, and volcano-like potential. We then study various… ▽ More In this work, we investigate the quasinormal modes of the Poincaré thick brane with a finite extra dimension. Unlike the case with an infinite extra dimension, the gravitational effective potential exhibits three distinct shapes within different ranges of the parameter $n$ in the warp factor: harmonic oscillator potential, Pöschl-Teller potential, and volcano-like potential. We then study various types of perturbations in this system. Utilizing a combination of analytical, semi-analytical, and numerical methods, we obtain the quasinormal modes of the perturbed fields. Our findings reveal a set of discrete quasinormal modes for the thick brane, similar to those of black holes. Interestingly, when $n=1$, the quasinormal modes exhibit purely imaginary behavior. This study may provide a new way to detect the existence of extra dimensions. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.00452 [pdf, other]

Towards a Unified Framework of Clustering-based Anomaly Detection

Authors: Zeyu Fang, Ming Gu, Sheng Zhou, Jiawei Chen, Qiaoyu Tan, Haishuai Wang, Jiajun Bu

Abstract: Unsupervised Anomaly Detection (UAD) plays a crucial role in identifying abnormal patterns within data without labeled examples, holding significant practical implications across various domains. Although the individual contributions of representation learning and clustering to anomaly detection are well-established, their interdependencies remain under-explored due to the absence of a unified the… ▽ More Unsupervised Anomaly Detection (UAD) plays a crucial role in identifying abnormal patterns within data without labeled examples, holding significant practical implications across various domains. Although the individual contributions of representation learning and clustering to anomaly detection are well-established, their interdependencies remain under-explored due to the absence of a unified theoretical framework. Consequently, their collective potential to enhance anomaly detection performance remains largely untapped. To bridge this gap, in this paper, we propose a novel probabilistic mixture model for anomaly detection to establish a theoretical connection among representation learning, clustering, and anomaly detection. By maximizing a novel anomaly-aware data likelihood, representation learning and clustering can effectively reduce the adverse impact of anomalous data and collaboratively benefit anomaly detection. Meanwhile, a theoretically substantiated anomaly score is naturally derived from this framework. Lastly, drawing inspiration from gravitational analysis in physics, we have devised an improved anomaly score that more effectively harnesses the combined power of representation learning and clustering. Extensive experiments, involving 17 baseline methods across 30 diverse datasets, validate the effectiveness and generalization capability of the proposed method, surpassing state-of-the-art methods. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.20701 [pdf, other]

Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement

Authors: Pengwei Zhan, Zhen Xu, Qian Tan, Jie Song, Ru Xie

Abstract: Large language models (LLMs) demonstrate exceptional instruct-following ability to complete various downstream tasks. Although this impressive ability makes LLMs flexible task solvers, their performance in solving tasks also heavily relies on instructions. In this paper, we reveal that LLMs are over-sensitive to lexical variations in task instructions, even when the variations are imperceptible to… ▽ More Large language models (LLMs) demonstrate exceptional instruct-following ability to complete various downstream tasks. Although this impressive ability makes LLMs flexible task solvers, their performance in solving tasks also heavily relies on instructions. In this paper, we reveal that LLMs are over-sensitive to lexical variations in task instructions, even when the variations are imperceptible to humans. By providing models with neighborhood instructions, which are closely situated in the latent representation space and differ by only one semantically similar word, the performance on downstream tasks can be vastly different. Following this property, we propose a black-box Combinatorial Optimization framework for Prompt Lexical Enhancement (COPLE). COPLE performs iterative lexical optimization according to the feedback from a batch of proxy tasks, using a search strategy related to word influence. Experiments show that even widely-used human-crafted prompts for current benchmarks suffer from the lexical sensitivity of models, and COPLE recovers the declined model ability in both instruct-following and solving downstream tasks. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.13284 [pdf, other]

Sub-kiloparsec scaling relations between hot gas, dense gas and star formation rate in five nearby star-forming galaxies

Authors: Chunyi Zhang, Junfeng Wang, Qing-Hua Tan, Yu Gao, Shuting Ling, Xiaoyu Xu

Abstract: Based on the newly acquired dense gas observations from the JCMT MALATANG survey and X-ray data from Chandra, we explore the correlation between hot gas and HCN $J=4 \rightarrow 3$, HCO$^+\ J=4 \rightarrow 3$ emission for the first time at sub-kiloparsec scale of five nearby star-forming galaxies, namely M82, M83, IC 342, NGC 253, and NGC 6946. We find that both HCN $J=4 \rightarrow 3$ and HCO… ▽ More Based on the newly acquired dense gas observations from the JCMT MALATANG survey and X-ray data from Chandra, we explore the correlation between hot gas and HCN $J=4 \rightarrow 3$, HCO$^+\ J=4 \rightarrow 3$ emission for the first time at sub-kiloparsec scale of five nearby star-forming galaxies, namely M82, M83, IC 342, NGC 253, and NGC 6946. We find that both HCN $J=4 \rightarrow 3$ and HCO$^+\ J=4 \rightarrow 3$ line luminosity show a statistically significant correlation with the 0.5${-}$2 keV X-ray emission of the diffuse hot gas ($L_{\rm 0.5 - 2\,keV}^{\rm gas}$). The Bayesian regression analysis gives the best fit of ${\rm log}(L_{\rm 0.5-2\,keV}^{\rm gas} /{\rm erg\,s^{-1}})=2.39\,{\rm log}(L'_{\rm HCN(4-3)} /{\rm K\,km\,s^{-1}\,pc^{2}})+24.83$ and ${\rm log}(L_{\rm 0.5-2\,keV}^{\rm gas} /{\rm erg\,s^{-1}})=2.48\,{\rm log}(L'_{\rm HCO^{+}(4-3)} /{\rm K\,km\,s^{-1}\,pc^{2}})+23.84$, with dispersion of $\thicksim$0.69 dex and 0.54 dex, respectively. At the sub-kiloparsec scale, we find that the power-law index of the $L_{\rm 0.5 - 2\,keV}^{\rm gas}$ ${-}$ star formation rate (SFR) relation is ${\rm log}(L_{\rm 0.5-2\,keV}^{\rm gas} /{\rm erg\,s^{-1}})=1.80\,{\rm log} ({\rm SFR} /M_\odot\,{\rm yr}^{-1})+39.16$, deviated from previous linear relations at global scale. This implies that the global property of hot gas significantly differs from individual resolved regions, which is influenced by the local physical conditions close to the sites of star formation. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 10 pages, 3figures, accepted for publication in the ApJ Letters. Dedicated to Prof. Yu Gao, who initiated this work

arXiv:2405.03401 [pdf, other]

E2GNN: Efficient Graph Neural Network Ensembles for Semi-Supervised Classification

Authors: Xin Zhang, Daochen Zha, Qiaoyu Tan

Abstract: This work studies ensemble learning for graph neural networks (GNNs) under the popular semi-supervised setting. Ensemble learning has shown superiority in improving the accuracy and robustness of traditional machine learning by combining the outputs of multiple weak learners. However, adopting a similar idea to integrate different GNN models is challenging because of two reasons. First, GNN is not… ▽ More This work studies ensemble learning for graph neural networks (GNNs) under the popular semi-supervised setting. Ensemble learning has shown superiority in improving the accuracy and robustness of traditional machine learning by combining the outputs of multiple weak learners. However, adopting a similar idea to integrate different GNN models is challenging because of two reasons. First, GNN is notorious for its poor inference ability, so naively assembling multiple GNN models would deteriorate the inference efficiency. Second, when GNN models are trained with few labeled nodes, their performance are limited. In this case, the vanilla ensemble approach, e.g., majority vote, may be sub-optimal since most base models, i.e., GNNs, may make the wrong predictions. To this end, in this paper, we propose an efficient ensemble learner--E2GNN to assemble multiple GNNs in a learnable way by leveraging both labeled and unlabeled nodes. Specifically, we first pre-train different GNN models on a given data scenario according to the labeled nodes. Next, instead of directly combing their outputs for label inference, we train a simple multi-layer perceptron--MLP model to mimic their predictions on both labeled and unlabeled nodes. Then the unified MLP model is deployed to infer labels for unlabeled or new nodes. Since the predictions of unlabeled nodes from different GNN models may be incorrect, we develop a reinforced discriminator to effectively filter out those wrongly predicted nodes to boost the performance of MLP. By doing this, we suggest a principled approach to tackle the inference issues of GNN ensembles and maintain the merit of ensemble learning: improved performance. Comprehensive experiments over both transductive and inductive settings, across different GNN backbones and 8 benchmark datasets, demonstrate the superiority of E2GNN. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.14135 [pdf, other]

Text in the Dark: Extremely Low-Light Text Image Enhancement

Authors: Che-Tsung Lin, Chun Chet Ng, Zhi Qin Tan, Wan Jun Nah, Xinyu Wang, Jie Long Kew, Pohao Hsu, Shang Hong Lai, Chee Seng Chan, Christopher Zach

Abstract: Extremely low-light text images are common in natural scenes, making scene text detection and recognition challenging. One solution is to enhance these images using low-light image enhancement methods before text extraction. However, previous methods often do not try to particularly address the significance of low-level features, which are crucial for optimal performance on downstream scene text t… ▽ More Extremely low-light text images are common in natural scenes, making scene text detection and recognition challenging. One solution is to enhance these images using low-light image enhancement methods before text extraction. However, previous methods often do not try to particularly address the significance of low-level features, which are crucial for optimal performance on downstream scene text tasks. Further research is also hindered by the lack of extremely low-light text datasets. To address these limitations, we propose a novel encoder-decoder framework with an edge-aware attention module to focus on scene text regions during enhancement. Our proposed method uses novel text detection and edge reconstruction losses to emphasize low-level scene text features, leading to successful text extraction. Additionally, we present a Supervised Deep Curve Estimation (Supervised-DCE) model to synthesize extremely low-light images based on publicly available scene text datasets such as ICDAR15 (IC15). We also labeled texts in the extremely low-light See In the Dark (SID) and ordinary LOw-Light (LOL) datasets to allow for objective assessment of extremely low-light image enhancement through scene text tasks. Extensive experiments show that our model outperforms state-of-the-art methods in terms of both image quality and scene text metrics on the widely-used LOL, SID, and synthetic IC15 datasets. Code and dataset will be released publicly at https://github.com/chunchet-ng/Text-in-the-Dark. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: The first two authors contributed equally to this work

arXiv:2404.11217 [pdf, ps, other]

Quasibound and quasinormal modes of a thick brane in Rastall gravity

Authors: Qin Tan, Yi Zhong, Wen-Di Guo

Abstract: In this work, we study the gravitational quasinormal modes of the thick brane in Rastall gravity. Using the asymptotic iteration and direct integration methods, we solve the quasinormal frequencies of the Rastall thick brane. We also obtained the waveforms of these quasinormal modes through numerical evolution. The results indicate that although the Rastall thick brane lacks a bound zero mode, whe… ▽ More In this work, we study the gravitational quasinormal modes of the thick brane in Rastall gravity. Using the asymptotic iteration and direct integration methods, we solve the quasinormal frequencies of the Rastall thick brane. We also obtained the waveforms of these quasinormal modes through numerical evolution. The results indicate that although the Rastall thick brane lacks a bound zero mode, when the Rastall parameter $λ\gtrsim0$, a long-lived quasinormal mode appears. This long-lived quasinormal mode may restore the four-dimensional effective Newtonian potential on the brane on a large scale. This may provide a new perspective for the localization of gravity on thick branes, that a thick brane does not necessarily require the gravity to be localized, perhaps quasi-localized is sufficient. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 10 pages, 6 figures, and 1 table

arXiv:2403.19631 [pdf, other]

Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models

Authors: Yucheng Shi, Qiaoyu Tan, Xuansheng Wu, Shaochen Zhong, Kaixiong Zhou, Ninghao Liu

Abstract: Large Language Models (LLMs) have shown proficiency in question-answering tasks but often struggle to integrate real-time knowledge updates, leading to potentially outdated or inaccurate responses. This problem becomes even more challenging when dealing with multi-hop questions since they require LLMs to update and integrate multiple knowledge pieces relevant to the questions. To tackle the proble… ▽ More Large Language Models (LLMs) have shown proficiency in question-answering tasks but often struggle to integrate real-time knowledge updates, leading to potentially outdated or inaccurate responses. This problem becomes even more challenging when dealing with multi-hop questions since they require LLMs to update and integrate multiple knowledge pieces relevant to the questions. To tackle the problem, we propose the Retrieval-Augmented model Editing (RAE) framework tailored for multi-hop question answering. RAE first retrieves edited facts and then refines the language model through in-context learning. Specifically, our retrieval approach, based on mutual information maximization, leverages the reasoning abilities of LLMs to identify chain facts that naïve similarity-based searches might miss. Additionally, our framework incorporates a pruning strategy to eliminate redundant information from the retrieved facts, which enhances the editing accuracy and mitigates the hallucination problem. Our framework is supported by theoretical justification for its fact retrieval efficacy. Finally, comprehensive evaluation across various LLMs validates RAE's ability in providing accurate answers with updated knowledge. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: Work in progress

arXiv:2403.16149 [pdf, other]

A Survey on Consumer IoT Traffic: Security and Privacy

Authors: Yan Jia, Yuxin Song, Zihou Liu, Qingyin Tan, Fangming Wang, Yu Zhang, Zheli Liu

Abstract: For the past few years, the Consumer Internet of Things (CIoT) has entered public lives. While CIoT has improved the convenience of people's daily lives, it has also brought new security and privacy concerns. In this survey, we try to figure out what researchers can learn about the security and privacy of CIoT by traffic analysis, a popular method in the security community. From the security and p… ▽ More For the past few years, the Consumer Internet of Things (CIoT) has entered public lives. While CIoT has improved the convenience of people's daily lives, it has also brought new security and privacy concerns. In this survey, we try to figure out what researchers can learn about the security and privacy of CIoT by traffic analysis, a popular method in the security community. From the security and privacy perspective, this survey seeks out the new characteristics in CIoT traffic analysis, the state-of-the-art progress in CIoT traffic analysis, and the challenges yet to be solved. We collected 310 papers from January 2018 to December 2023 related to CIoT traffic analysis from the security and privacy perspective and summarized the process of CIoT traffic analysis in which the new characteristics of CIoT are identified. Then, we detail existing works based on five application goals: device fingerprinting, user activity inference, malicious traffic analysis, security analysis, and measurement. At last, we discuss the new challenges and future research directions. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.01268 [pdf, other]

Defending Against Data Reconstruction Attacks in Federated Learning: An Information Theory Approach

Authors: Qi Tan, Qi Li, Yi Zhao, Zhuotao Liu, Xiaobing Guo, Ke Xu

Abstract: Federated Learning (FL) trains a black-box and high-dimensional model among different clients by exchanging parameters instead of direct data sharing, which mitigates the privacy leak incurred by machine learning. However, FL still suffers from membership inference attacks (MIA) or data reconstruction attacks (DRA). In particular, an attacker can extract the information from local datasets by cons… ▽ More Federated Learning (FL) trains a black-box and high-dimensional model among different clients by exchanging parameters instead of direct data sharing, which mitigates the privacy leak incurred by machine learning. However, FL still suffers from membership inference attacks (MIA) or data reconstruction attacks (DRA). In particular, an attacker can extract the information from local datasets by constructing DRA, which cannot be effectively throttled by existing techniques, e.g., Differential Privacy (DP). In this paper, we aim to ensure a strong privacy guarantee for FL under DRA. We prove that reconstruction errors under DRA are constrained by the information acquired by an attacker, which means that constraining the transmitted information can effectively throttle DRA. To quantify the information leakage incurred by FL, we establish a channel model, which depends on the upper bound of joint mutual information between the local dataset and multiple transmitted parameters. Moreover, the channel model indicates that the transmitted information can be constrained through data space operation, which can improve training efficiency and the model accuracy under constrained information. According to the channel model, we propose algorithms to constrain the information transmitted in a single round of local training. With a limited number of training rounds, the algorithms ensure that the total amount of transmitted information is limited. Furthermore, our channel model can be applied to various privacy-enhancing techniques (such as DP) to enhance privacy guarantees against DRA. Extensive experiments with real-world datasets validate the effectiveness of our methods. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: Accepted by USENIX Security '24

arXiv:2402.14265 [pdf, ps, other]

doi 10.3390/universe9070320

Quasinormal modes of a charged black hole with scalar hair

Authors: Wen-Di Guo, Qin Tan

Abstract: From a five-dimensional Einstein-Maxwell theory, Bah et al. constructed a singularity free topology star/black hole [Phys. Rev. Lett. 126, 151101 (2021)]. After the Klein-Kluza reduction, i.e., integrating the extra space dimension, it can obtain an effective four-dimensional static spherical charged black hole with scalar hair. In this paper, we study the quasinormal modes (QNMs) of the scalar fi… ▽ More From a five-dimensional Einstein-Maxwell theory, Bah et al. constructed a singularity free topology star/black hole [Phys. Rev. Lett. 126, 151101 (2021)]. After the Klein-Kluza reduction, i.e., integrating the extra space dimension, it can obtain an effective four-dimensional static spherical charged black hole with scalar hair. In this paper, we study the quasinormal modes (QNMs) of the scalar field, electromagnetic field, and gravitational field on the background of this effective four-dimensional charged black hole. The radial parts of the perturbed fields all satisfy a Schrödinger-like equation. Using the asymptotic iteration method, we obtain the QNM frequencies semianalytically. For low overtone QNMs, the results obtained from the asymptotic iteration method and the Wentzel-Kramers-Brillouin approximation method agree well. In the null coordinates, the evolution of a Gaussian package is also studied. The QNM frequencies obtained by fitting the evolution data also agree well with the results obtained by the asymptotic iteration method. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 10 pages, 4 figures, and 3 tables

Journal ref: Universe 9, 97, 320 (2023)

arXiv:2402.11476 [pdf, other]

EndoOOD: Uncertainty-aware Out-of-distribution Detection in Capsule Endoscopy Diagnosis

Authors: Qiaozhi Tan, Long Bai, Guankun Wang, Mobarakol Islam, Hongliang Ren

Abstract: Wireless capsule endoscopy (WCE) is a non-invasive diagnostic procedure that enables visualization of the gastrointestinal (GI) tract. Deep learning-based methods have shown effectiveness in disease screening using WCE data, alleviating the burden on healthcare professionals. However, existing capsule endoscopy classification methods mostly rely on pre-defined categories, making it challenging to… ▽ More Wireless capsule endoscopy (WCE) is a non-invasive diagnostic procedure that enables visualization of the gastrointestinal (GI) tract. Deep learning-based methods have shown effectiveness in disease screening using WCE data, alleviating the burden on healthcare professionals. However, existing capsule endoscopy classification methods mostly rely on pre-defined categories, making it challenging to identify and classify out-of-distribution (OOD) data, such as undefined categories or anatomical landmarks. To address this issue, we propose the Endoscopy Out-of-Distribution (EndoOOD) framework, which aims to effectively handle the OOD detection challenge in WCE diagnosis. The proposed framework focuses on improving the robustness and reliability of WCE diagnostic capabilities by incorporating uncertainty-aware mixup training and long-tailed in-distribution (ID) data calibration techniques. Additionally, virtual-logit matching is employed to accurately distinguish between OOD and ID data while minimizing information loss. To assess the performance of our proposed solution, we conduct evaluations and comparisons with 12 state-of-the-art (SOTA) methods using two publicly available datasets. The results demonstrate the effectiveness of the proposed framework in enhancing diagnostic accuracy and supporting clinical decision-making. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: To appear in IEEE ISBI 2024

arXiv:2402.06852 [pdf]

ChemLLM: A Chemical Large Language Model

Authors: Di Zhang, Wei Liu, Qian Tan, Jingdan Chen, Hang Yan, Yuliang Yan, Jiatong Li, Weiran Huang, Xiangyu Yue, Wanli Ouyang, Dongzhan Zhou, Shufei Zhang, Mao Su, Han-Sen Zhong, Yuqiang Li

Abstract: Large language models (LLMs) have made impressive progress in chemistry applications. However, the community lacks an LLM specifically designed for chemistry. The main challenges are two-fold: firstly, most chemical data and scientific knowledge are stored in structured databases, which limits the model's ability to sustain coherent dialogue when used directly. Secondly, there is an absence of obj… ▽ More Large language models (LLMs) have made impressive progress in chemistry applications. However, the community lacks an LLM specifically designed for chemistry. The main challenges are two-fold: firstly, most chemical data and scientific knowledge are stored in structured databases, which limits the model's ability to sustain coherent dialogue when used directly. Secondly, there is an absence of objective and fair benchmark that encompass most chemistry tasks. Here, we introduce ChemLLM, a comprehensive framework that features the first LLM dedicated to chemistry. It also includes ChemData, a dataset specifically designed for instruction tuning, and ChemBench, a robust benchmark covering nine essential chemistry tasks. ChemLLM is adept at performing various tasks across chemical disciplines with fluid dialogue interaction. Notably, ChemLLM achieves results comparable to GPT-4 on the core chemical tasks and demonstrates competitive performance with LLMs of similar size in general scenarios. ChemLLM paves a new path for exploration in chemical studies, and our method of incorporating structured chemical knowledge into dialogue systems sets a new standard for developing LLMs in various scientific fields. Codes, Datasets, and Model weights are publicly accessible at https://hf.co/AI4Chem △ Less

Submitted 25 April, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: 9 pages, 5 figures

arXiv:2401.17913 [pdf, ps, other]

Hilbert modular forms and class numbers

Authors: Qinyun Tan, Bingyong Xie

Abstract: In 1975, Goldfeld gave an effective solution to Gauss's conjecture on the class numbers of imaginary quadratic fields. In this paper, we generalize Goldfeld's theorem to the setting of totally real number fields. In 1975, Goldfeld gave an effective solution to Gauss's conjecture on the class numbers of imaginary quadratic fields. In this paper, we generalize Goldfeld's theorem to the setting of totally real number fields. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: 35 pages

arXiv:2401.02875 [pdf, other]

The Dust Attenuation Scaling Relation of Star-Forming Galaxies in the EAGLE Simulations

Authors: Man Qiao, Xian Zhong Zheng, Antonios Katsianis, Jianbo Qin, Zhizheng Pan, Wenhao Liu, Qing-Hua Tan, Fang Xia An, Dong Dong Shi, Zongfei Lü, Yuheng Zhang, Run Wen, Shuang Liu, Chao Yang

Abstract: Dust attenuation in star-forming galaxies (SFGs), as parameterized by the infrared excess (IRX $\equiv L_{\rm IR}/L_{\rm UV}$), is found to be tightly correlated with star formation rate (SFR), metallicity and galaxy size, following a universal IRX relation up to $z=3$. This scaling relation can provide a fundamental constraint for theoretical models to reconcile galaxy star formation, chemical en… ▽ More Dust attenuation in star-forming galaxies (SFGs), as parameterized by the infrared excess (IRX $\equiv L_{\rm IR}/L_{\rm UV}$), is found to be tightly correlated with star formation rate (SFR), metallicity and galaxy size, following a universal IRX relation up to $z=3$. This scaling relation can provide a fundamental constraint for theoretical models to reconcile galaxy star formation, chemical enrichment, and structural evolution across cosmic time. We attempt to reproduce the universal IRX relation over $0.1\leq z\leq 2.5$ using the EAGLE hydrodynamical simulations and examine sensitive parameters in determining galaxy dust attenuation. Our findings show that while the predicted universal IRX relation from EAGLE approximately aligns with observations at $z\leq 0.5$, noticeable disparities arise at different stellar masses and higher redshifts. Specifically, we investigate how modifying various galaxy parameters can affect the predicted universal IRX relation in comparison to the observed data. We demonstrate that the simulated gas-phase metallicity is the critical quantity for the shape of the predicted universal IRX relation. We find that the influence of the infrared luminosity and infrared excess is less important while galaxy size has virtually no significant effect. Overall, the EAGLE simulations are not able to replicate some of the observed characteristics between IRX and galaxy parameters of SFGs, emphasizing the need for further investigation and testing for our current state-of-the-art theoretical models. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: 19 pages, 15 figures, accepted for publication in MNRAS

arXiv:2312.16700 [pdf, other]

doi 10.1093/mnras/stad3999

Understanding the Universal Dust Attenuation Scaling Relation of Star-Forming Galaxies

Authors: J. Qin, X. Z. Zheng, S. Wuyts, Z. Lv, M. Qiao, J. -S. Huang, F. S. Liu, A. Katsianis, V. Gonzalez, F. Bian, H. Xu, Z. Pan, W. Liu, Q. -H. Tan, F. X. An, D. D. Shi, Y. Zhang, R. Wen, S. Liu, C. Yang

Abstract: Star-forming galaxies (SFGs) adhere to a surprisingly tight scaling relation of dust attenuation parameterized by the infrared excess (IRX=$L_{\rm IR}/L_{\rm UV}$), being jointly determined by the star formation rate (SFR), galaxy size ($R_{\rm e}$), metallicity ($Z$/Z$_\odot$) and axial ratio ($b/a$). We examine how these galaxy parameters determine the effective dust attenuation and give rise to… ▽ More Star-forming galaxies (SFGs) adhere to a surprisingly tight scaling relation of dust attenuation parameterized by the infrared excess (IRX=$L_{\rm IR}/L_{\rm UV}$), being jointly determined by the star formation rate (SFR), galaxy size ($R_{\rm e}$), metallicity ($Z$/Z$_\odot$) and axial ratio ($b/a$). We examine how these galaxy parameters determine the effective dust attenuation and give rise to the universal IRX relation, utilizing a simple two-component star-dust geometry model in which dust in the dense and diffuse interstellar medium (ISM) follows exponential mass density profiles, connected with but not necessarily identical to the stellar mass profiles. Meanwhile, empirical relations are adopted to link galaxy properties, including the gas--star formation relation, the dust-to-stellar size relation, as well as the dust-to-gas ratio versus metallicity relation. By fitting a large sample of local SFGs with the model, we obtain the best-fitting model parameters as a function of metallicity, showing that the two-component geometry model is able to successfully reproduce the dependence of IRX on SFR, $R_{\rm e}$, $b/a$ at given $Z$/Z$_\odot$, as well as the dependence of power-law indices on metallicity. Moreover, we also retrieve constraints on the model geometry parameters, including the optical depth of birth clouds (BCs), BC-to-total dust mass fraction, BC covering factor of UV-emitting stars, and star-to-total dust disc radius ratio, which all evolve with galaxy metallicity. Finally, a consistent picture of how the star-dust geometry in SFGs evolves with galaxy metallicity is discussed. △ Less

Submitted 30 January, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

Comments: 20 pages, 10 figures, published in MNRAS (2024, Volume 528, Issue 1, pp.658-675); A PHTHON package IRX_TAU_TOT is available at https://github.com/LvZF/irx_tau_tot/ to calculate the total dust optical depth of a galaxy with given metallicity and best-fitting geometry parameters

Journal ref: MNRAS, 528, 658 (2024)

arXiv:2312.16605 [pdf, ps, other]

Quasinormal modes and greybody factor of a Lorentz-violating black hole

Authors: Wen-Di Guo, Qin Tan, Yu-Xiao Liu

Abstract: Recently, a static spherically symmetric black hole solution was found in gravity nonminimally coupled a background Kalb-Ramond field. The Lorentz symmetry is spontaneously broken when the Kalb-Ramond field has a nonvanishing vacuum expectation value. In this work, we focus on the quasinormal modes and greybody factor of this black hole. The master equations for the perturbed scalar field, electro… ▽ More Recently, a static spherically symmetric black hole solution was found in gravity nonminimally coupled a background Kalb-Ramond field. The Lorentz symmetry is spontaneously broken when the Kalb-Ramond field has a nonvanishing vacuum expectation value. In this work, we focus on the quasinormal modes and greybody factor of this black hole. The master equations for the perturbed scalar field, electromagnetic field, and gravitational field can be written into a uniform form. We use three methods to solve the quasinormal frequencies in the frequency domain. The results agree well with each other. The time evolution of a Gaussian wave packet is studied. The quasinormal frequencies fitted from the time evolution data agree well with that of frequency domain. The greybody factor is calculated by Wentzel-Kramers-Brillouin (WKB) method. The effect of the Lorentz-violating parameter on the quasinormal modes and greybody factor are also studied. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: 10 pages, 4 figures, and 3 tables

arXiv:2312.16031 [pdf, other]

Transverse electric waves in Bandos-Lechner-Sorokin-Townsend nonlinear electrodynamics

Authors: Yang Shi, Qinyan Tan, Towe Wang

Abstract: In the generalized Born-Infeld electrodynamics discovered by Bandos, Lechner, Sorokin and Townsend, we study transverse electric waves propagating perpendicular to a constant magnetic field background in a parallel-plate waveguide. The directions of propagation and polarization of the waves are perpendicular to each other, and both of them are parallel to the perfectly conducting plates. Two speci… ▽ More In the generalized Born-Infeld electrodynamics discovered by Bandos, Lechner, Sorokin and Townsend, we study transverse electric waves propagating perpendicular to a constant magnetic field background in a parallel-plate waveguide. The directions of propagation and polarization of the waves are perpendicular to each other, and both of them are parallel to the perfectly conducting plates. Two specific configurations are studied, in which the background magnetic field is either normal to the plates or along the polarization direction. The dispersion relation, the velocity and the cutoff frequency of the lowest-order lowest-frequency mode are calculated in both configurations. This paves the way for a promising test of the generalized Born-Infeld electrodynamics. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: 10 pages, 4 figures

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.08987 [pdf, other]

doi 10.1038/s43588-023-00576-2

Unbiased organism-agnostic and highly sensitive signal peptide predictor with deep protein language model

Authors: Junbo Shen, Qinze Yu, Shenyang Chen, Qingxiong Tan, Jingcheng Li, Yu Li

Abstract: Signal peptide (SP) is a short peptide located in the N-terminus of proteins. It is essential to target and transfer transmembrane and secreted proteins to correct positions. Compared with traditional experimental methods to identify signal peptides, computational methods are faster and more efficient, which are more practical for analyzing thousands or even millions of protein sequences, especial… ▽ More Signal peptide (SP) is a short peptide located in the N-terminus of proteins. It is essential to target and transfer transmembrane and secreted proteins to correct positions. Compared with traditional experimental methods to identify signal peptides, computational methods are faster and more efficient, which are more practical for analyzing thousands or even millions of protein sequences, especially for metagenomic data. Here we present Unbiased Organism-agnostic Signal Peptide Network (USPNet), a signal peptide classification and cleavage site prediction deep learning method that takes advantage of protein language models. We propose to apply label distribution-aware margin loss to handle data imbalance problems and use evolutionary information of protein to enrich representation and overcome species information dependence. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 23 pages 5 figures. Nat Comput Sci (2023)

arXiv:2312.05526 [pdf, other]

Reinforcement Neighborhood Selection for Unsupervised Graph Anomaly Detection

Authors: Yuanchen Bei, Sheng Zhou, Qiaoyu Tan, Hao Xu, Hao Chen, Zhao Li, Jiajun Bu

Abstract: Unsupervised graph anomaly detection is crucial for various practical applications as it aims to identify anomalies in a graph that exhibit rare patterns deviating significantly from the majority of nodes. Recent advancements have utilized Graph Neural Networks (GNNs) to learn high-quality node representations for anomaly detection by aggregating information from neighborhoods. However, the presen… ▽ More Unsupervised graph anomaly detection is crucial for various practical applications as it aims to identify anomalies in a graph that exhibit rare patterns deviating significantly from the majority of nodes. Recent advancements have utilized Graph Neural Networks (GNNs) to learn high-quality node representations for anomaly detection by aggregating information from neighborhoods. However, the presence of anomalies may render the observed neighborhood unreliable and result in misleading information aggregation for node representation learning. Selecting the proper neighborhood is critical for graph anomaly detection but also challenging due to the absence of anomaly-oriented guidance and the interdependence with representation learning. To address these issues, we utilize the advantages of reinforcement learning in adaptively learning in complex environments and propose a novel method that incorporates Reinforcement neighborhood selection for unsupervised graph ANomaly Detection (RAND). RAND begins by enriching the candidate neighbor pool of the given central node with multiple types of indirect neighbors. Next, RAND designs a tailored reinforcement anomaly evaluation module to assess the reliability and reward of considering the given neighbor. Finally, RAND selects the most reliable subset of neighbors based on these rewards and introduces an anomaly-aware aggregator to amplify messages from reliable neighbors while diminishing messages from unreliable ones. Extensive experiments on both three synthetic and two real-world datasets demonstrate that RAND outperforms the state-of-the-art methods. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: 1O pages, 7 figures, accepted by ICDM2023

arXiv:2312.05425 [pdf, other]

Fitting pseudo-S${\rm \acute{e}}$rsic(Spergel) light profiles to galaxies in interferometric data: the excellence of the $uv$-plane

Authors: Qing-Hua Tan, Emanuele Daddi, Victor de Souza Magalhães, Carlos Gómez-Guijarro, Jérôme Pety, Boris S. Kalita, David Elbaz, Zhaoxuan Liu, Benjamin Magnelli, Annagrazia Puglisi, Wiphu Rujopakarn, John D. Silverman, Francesco Valentino, Shao-Bo Zhang

Abstract: Modern (sub)millimeter interferometers, such as ALMA and NOEMA, offer high angular resolution and unprecedented sensitivity. This provides the possibility to characterize the morphology of the gas and dust in distant galaxies. To assess the capabilities of current softwares in recovering morphologies and surface brightness profiles in interferometric observations, we test the performance of the Sp… ▽ More Modern (sub)millimeter interferometers, such as ALMA and NOEMA, offer high angular resolution and unprecedented sensitivity. This provides the possibility to characterize the morphology of the gas and dust in distant galaxies. To assess the capabilities of current softwares in recovering morphologies and surface brightness profiles in interferometric observations, we test the performance of the Spergel model for fitting in the $uv$-plane, which has been recently implemented in the IRAM software GILDAS (uv$\_$fit). Spergel profiles provide an alternative to the Sersic profile, with the advantage of having an analytical Fourier transform, making them ideal to model visibilities in the $uv$-plane. We provide an approximate conversion between Spergel index and Sersic index, which depends on the ratio of the galaxy size to the angular resolution of the data. We show through extensive simulations that Spergel modeling in the $uv$-plane is a more reliable method for parameter estimation than modeling in the image-plane, as it returns parameters that are less affected by systematic biases and results in a higher effective signal-to-noise ratio (S/N). The better performance in the $uv$-plane is likely driven by the difficulty of accounting for correlated signal in interferometric images. Even in the $uv$-plane, the integrated source flux needs to be at least 50 times larger than the noise per beam to enable a reasonably good measurement of a Spergel index. We characterise the performance of Spergel model fitting in detail by showing that parameters biases are generally low (< 10%) and that uncertainties returned by uv$\_$fit are reliable within a factor of two. Finally, we showcase the power of Spergel fitting by re-examining two claims of extended halos around galaxies from the literature, showing that galaxies and halos can be successfully fitted simultaneously with a single Spergel model. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: 23 pages, 15 figures, accepted for publication in A&A

arXiv:2312.01431 [pdf, other]

D$^2$ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition

Authors: Wenjie Pei, Qizhong Tan, Guangming Lu, Jiandong Tian

Abstract: Adapting large pre-trained image models to few-shot action recognition has proven to be an effective and efficient strategy for learning robust feature extractors, which is essential for few-shot learning. Typical fine-tuning based adaptation paradigm is prone to overfitting in the few-shot learning scenarios and offers little modeling flexibility for learning temporal features in video data. In t… ▽ More Adapting large pre-trained image models to few-shot action recognition has proven to be an effective and efficient strategy for learning robust feature extractors, which is essential for few-shot learning. Typical fine-tuning based adaptation paradigm is prone to overfitting in the few-shot learning scenarios and offers little modeling flexibility for learning temporal features in video data. In this work we present the Disentangled-and-Deformable Spatio-Temporal Adapter (D$^2$ST-Adapter), which is a novel adapter tuning framework well-suited for few-shot action recognition due to lightweight design and low parameter-learning overhead. It is designed in a dual-pathway architecture to encode spatial and temporal features in a disentangled manner. In particular, we devise the anisotropic Deformable Spatio-Temporal Attention module as the core component of D$^2$ST-Adapter, which can be tailored with anisotropic sampling densities along spatial and temporal domains to learn spatial and temporal features specifically in corresponding pathways, allowing our D$^2$ST-Adapter to encode features in a global view in 3D spatio-temporal space while maintaining a lightweight design. Extensive experiments with instantiations of our method on both pre-trained ResNet and ViT demonstrate the superiority of our method over state-of-the-art methods for few-shot action recognition. Our method is particularly well-suited to challenging scenarios where temporal dynamics are critical for action recognition. △ Less

Submitted 20 April, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

arXiv:2312.00738 [pdf, other]

SeaLLMs -- Large Language Models for Southeast Asia

Authors: Xuan-Phi Nguyen, Wenxuan Zhang, Xin Li, Mahani Aljunied, Zhiqiang Hu, Chenhui Shen, Yew Ken Chia, Xingxuan Li, Jianyu Wang, Qingyu Tan, Liying Cheng, Guanzheng Chen, Yue Deng, Sen Yang, Chaoqun Liu, Hang Zhang, Lidong Bing

Abstract: Despite the remarkable achievements of large language models (LLMs) in various tasks, there remains a linguistic bias that favors high-resource languages, such as English, often at the expense of low-resource and regional languages. To address this imbalance, we introduce SeaLLMs, an innovative series of language models that specifically focuses on Southeast Asian (SEA) languages. SeaLLMs are buil… ▽ More Despite the remarkable achievements of large language models (LLMs) in various tasks, there remains a linguistic bias that favors high-resource languages, such as English, often at the expense of low-resource and regional languages. To address this imbalance, we introduce SeaLLMs, an innovative series of language models that specifically focuses on Southeast Asian (SEA) languages. SeaLLMs are built upon the Llama-2 model and further advanced through continued pre-training with an extended vocabulary, specialized instruction and alignment tuning to better capture the intricacies of regional languages. This allows them to respect and reflect local cultural norms, customs, stylistic preferences, and legal considerations. Our comprehensive evaluation demonstrates that SeaLLM-13b models exhibit superior performance across a wide spectrum of linguistic tasks and assistant-style instruction-following capabilities relative to comparable open-source models. Moreover, they outperform ChatGPT-3.5 in non-Latin languages, such as Thai, Khmer, Lao, and Burmese, by large margins while remaining lightweight and cost-effective to operate. △ Less

Submitted 1 July, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: Technical report, ACL 2024 DEMO TRACK

arXiv:2311.17610 [pdf, ps, other]

The Calabi-Yau Equation on Symplectic Manifolds

Authors: Qiang Tan, Hongyu Wang

Abstract: By using the global deformation of almost complex structures which are compatible with a symplectic form off a Lebesgue measure zero subset, we construct a (measurable) Lipschitz Kahler metric such that the one-form type Calabi-Yau equation on an open dense submanifold is reduced to the complex Monge-Ampere equation with respect to the measurable Kahler metric. We give an existence theorem for sol… ▽ More By using the global deformation of almost complex structures which are compatible with a symplectic form off a Lebesgue measure zero subset, we construct a (measurable) Lipschitz Kahler metric such that the one-form type Calabi-Yau equation on an open dense submanifold is reduced to the complex Monge-Ampere equation with respect to the measurable Kahler metric. We give an existence theorem for solutions to the one-form type Calabi-Yau equation on closed symplectic manifolds. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.12201 [pdf, other]

doi 10.1021/acs.nanolett.4c00772

Observation of three-state nematicity and domain evolution in atomically-thin antiferromagnetic NiPS3

Authors: Qishuo Tan, Connor A. Occhialini, Hongze Gao, Jiaruo Li, Hikari Kitadai, Riccardo Comin, Xi Ling

Abstract: Nickel phosphorus trisulfide (NiPS3), a van der Waals (vdW) 2D antiferromagnet, has captivated enormous attention for its intriguing physics in recent years. However, despite its fundamental importance in physics of magnetism and promising potential for technological applications, the study of magnetic domains in NiPS3 down to atomically thin is still lacking. Here, we report the layer-dependent m… ▽ More Nickel phosphorus trisulfide (NiPS3), a van der Waals (vdW) 2D antiferromagnet, has captivated enormous attention for its intriguing physics in recent years. However, despite its fundamental importance in physics of magnetism and promising potential for technological applications, the study of magnetic domains in NiPS3 down to atomically thin is still lacking. Here, we report the layer-dependent magnetic characteristics and magnetic domains within antiferromagnetic NiPS3 by employing linear dichroism (LD) combined with polarized microscopy, spin-correlated photoluminescence (PL), and Raman spectroscopy. Our results reveal the existence of the paramagnetic-to-antiferromagnetic phase transition in bulk to bilayer NiPS3 with stronger spin fluctuation in thinner NiPS3. Furthermore, our study identifies three distinct antiferromagnetic domains within atomicallythin NiPS3 and captures the thermally-activated domain evolution. Our findings provide crucial insights for the development of antiferromagnetic spintronics and related technologies. △ Less

Submitted 27 February, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.09821 [pdf, other]

Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning

Authors: Qingyu Tan, Hwee Tou Ng, Lidong Bing

Abstract: Knowledge in the real world is being updated constantly. However, it is costly to frequently update large language models (LLMs). Therefore, it is crucial for LLMs to understand the concept of temporal knowledge. However, prior works on temporal question answering (TQA) did not emphasize multi-answer and multi-hop types of temporal reasoning. In this paper, we propose a complex temporal question-a… ▽ More Knowledge in the real world is being updated constantly. However, it is costly to frequently update large language models (LLMs). Therefore, it is crucial for LLMs to understand the concept of temporal knowledge. However, prior works on temporal question answering (TQA) did not emphasize multi-answer and multi-hop types of temporal reasoning. In this paper, we propose a complex temporal question-answering dataset Complex-TR that focuses on multi-answer and multi-hop temporal reasoning. Besides, we also propose a novel data augmentation strategy to improve the complex temporal reasoning capability and robustness of LLMs. We conducted experiments on multiple temporal QA datasets. Experimental results show that our method is able to improve LLMs' performance on temporal QA benchmarks by significant margins. Our code and data are released at: https://github.com/nusnlp/complex-tr. △ Less

Submitted 12 July, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: To appear in Findings of ACL 2024

arXiv:2311.08823 [pdf, other]

Ultrafast 3-D Super Resolution Ultrasound using Row-Column Array specific Coherence-based Beamforming and Rolling Acoustic Sub-aperture Processing: In Vitro, In Vivo and Clinical Study

Authors: Joseph Hansen-Shearer, Jipeng Yan, Marcelo Lerendegui, Biao Huang, Matthieu Toulemonde, Kai Riemer, Qingyuan Tan, Johanna Tonko, Peter D. Weinberg, Chris Dunsby, Meng-Xing Tang

Abstract: The row-column addressed array is an emerging probe for ultrafast 3-D ultrasound imaging. It achieves this with far fewer independent electronic channels and a wider field of view than traditional 2-D matrix arrays, of the same channel count, making it a good candidate for clinical translation. However, the image quality of row-column arrays is generally poor, particularly when investigating tissu… ▽ More The row-column addressed array is an emerging probe for ultrafast 3-D ultrasound imaging. It achieves this with far fewer independent electronic channels and a wider field of view than traditional 2-D matrix arrays, of the same channel count, making it a good candidate for clinical translation. However, the image quality of row-column arrays is generally poor, particularly when investigating tissue. Ultrasound localisation microscopy allows for the production of super-resolution images even when the initial image resolution is not high. Unfortunately, the row-column probe can suffer from imaging artefacts that can degrade the quality of super-resolution images as `secondary' lobes from bright microbubbles can be mistaken as microbubble events, particularly when operated using plane wave imaging. These false events move through the image in a physiologically realistic way so can be challenging to remove via tracking, leading to the production of 'false vessels'. Here, a new type of rolling window image reconstruction procedure was developed, which integrated a row-column array-specific coherence-based beamforming technique with acoustic sub-aperture processing for the purposes of reducing `secondary' lobe artefacts, noise and increasing the effective frame rate. Using an {\it{in vitro}} cross tube, it was found that the procedure reduced the percentage of `false' locations from $\sim$26\% to $\sim$15\% compared to traditional orthogonal plane wave compounding. Additionally, it was found that the noise could be reduced by $\sim$7 dB and that the effective frame rate could be increased to over 4000 fps. Subsequently, {\it{in vivo}} ultrasound localisation microscopy was used to produce images non-invasively of a rabbit kidney and a human thyroid. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2310.16523 [pdf, other]

Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting

Authors: Preethi Lahoti, Nicholas Blumm, Xiao Ma, Raghavendra Kotikalapudi, Sahitya Potluri, Qijun Tan, Hansa Srinivasan, Ben Packer, Ahmad Beirami, Alex Beutel, Jilin Chen

Abstract: A crucial challenge for generative large language models (LLMs) is diversity: when a user's prompt is under-specified, models may follow implicit assumptions while generating a response, which may result in homogenization of the responses, as well as certain demographic groups being under-represented or even erased from the generated responses. In this paper, we formalize diversity of representati… ▽ More A crucial challenge for generative large language models (LLMs) is diversity: when a user's prompt is under-specified, models may follow implicit assumptions while generating a response, which may result in homogenization of the responses, as well as certain demographic groups being under-represented or even erased from the generated responses. In this paper, we formalize diversity of representation in generative LLMs. We present evaluation datasets and propose metrics to measure diversity in generated responses along people and culture axes. We find that LLMs understand the notion of diversity, and that they can reason and critique their own responses for that goal. This finding motivated a new prompting technique called collective-critique and self-voting (CCSV) to self-improve people diversity of LLMs by tapping into its diversity reasoning capabilities, without relying on handcrafted examples or prompt tuning. Extensive empirical experiments with both human and automated evaluations show that our proposed approach is effective at improving people and culture diversity, and outperforms all baseline methods by a large margin. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: To appear at EMNLP 2023 main conference

arXiv:2310.15925 [pdf, other]

doi 10.1051/0004-6361/202348351

Noema formIng Cluster survEy (NICE): Discovery of a starbursting galaxy group with a radio-luminous core at z=3.95

Authors: Luwenjia Zhou, Tao Wang, Emanuele Daddi, Rosemary Coogan, Hanwen Sun, Ke Xu, Vinodiran Arumugam, Shuowen Jin, Daizhong Liu, Shiying Lu, Nikolaj Sillassen, Yijun Wang, Yong Shi, Zhi-Yu Zhang, Qinghua Tan, Qiusheng Gu, David Elbaz, Aurelien Le Bail, Benjamin Magnelli, Carlos Gómez-Guijarro, Chiara d'Eugenio, Georgios E. Magdis, Francesco Valentino, Zhiyuan Ji, Raphael Gobat , et al. (12 additional authors not shown)

Abstract: The study of distant galaxy groups and clusters at the peak epoch of star formation is limited by the lack of a statistically and homogeneously selected and spectroscopically confirmed sample. Recent discoveries of concentrated starburst activities in cluster cores have opened a new window to hunt for these structures based on their integrated IR luminosities. Hereby we carry out the large NOEMA (… ▽ More The study of distant galaxy groups and clusters at the peak epoch of star formation is limited by the lack of a statistically and homogeneously selected and spectroscopically confirmed sample. Recent discoveries of concentrated starburst activities in cluster cores have opened a new window to hunt for these structures based on their integrated IR luminosities. Hereby we carry out the large NOEMA (NOrthern Extended Millimeter Array) program targeting a statistical sample of infrared-luminous sources associated with overdensities of massive galaxies at z>2, the Noema formIng Cluster survEy (NICE). We present the first result from the ongoing NICE survey, a compact group at z=3.95 in the Lockman Hole field (LH-SBC3), confirmed via four massive (M_star>10^10.5M_sun) galaxies detected in CO(4-3) and [CI](1-0) lines. The four CO-detected members of LH-SBC3 are distributed over a 180 kpc physical scale, and the entire structure has an estimated halo mass of ~10^13Msun and total star formation rate (SFR) of ~4000Msun/yr. In addition, the most massive galaxy hosts a radio-loud AGN with L_1.4GHz, rest = 3.0*10^25W/Hz. The discovery of LH-SBC3 demonstrates the feasibility of our method to efficiently identify high-z compact groups or forming cluster cores. The existence of these starbursting cluster cores up to z~4 provides critical insights into the mass assembly history of the central massive galaxies in clusters. △ Less

Submitted 29 April, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: 10 pages, 8 figures, published by A&A

Journal ref: A&A, 684, A196 (2024)

arXiv:2309.00181 [pdf, other]

doi 10.1145/3576915.3616643

Security Verification of Low-Trust Architectures

Authors: Qinhan Tan, Yonathan Fisseha, Shibo Chen, Lauren Biernacki, Jean-Baptiste Jeannin, Sharad Malik, Todd Austin

Abstract: Low-trust architectures work on, from the viewpoint of software, always-encrypted data, and significantly reduce the amount of hardware trust to a small software-free enclave component. In this paper, we perform a complete formal verification of a specific low-trust architecture, the Sequestered Encryption (SE) architecture, to show that the design is secure against direct data disclosures and dig… ▽ More Low-trust architectures work on, from the viewpoint of software, always-encrypted data, and significantly reduce the amount of hardware trust to a small software-free enclave component. In this paper, we perform a complete formal verification of a specific low-trust architecture, the Sequestered Encryption (SE) architecture, to show that the design is secure against direct data disclosures and digital side channels for all possible programs. We first define the security requirements of the ISA of SE low-trust architecture. Looking upwards, this ISA serves as an abstraction of the hardware for the software, and is used to show how any program comprising these instructions cannot leak information, including through digital side channels. Looking downwards this ISA is a specification for the hardware, and is used to define the proof obligations for any RTL implementation arising from the ISA-level security requirements. These cover both functional and digital side-channel leakage. Next, we show how these proof obligations can be successfully discharged using commercial formal verification tools. We demonstrate the efficacy of our RTL security verification technique for seven different correct and buggy implementations of the SE architecture. △ Less

Submitted 31 August, 2023; originally announced September 2023.

Comments: 19 pages with appendix

arXiv:2308.09663 [pdf, other]

GiGaMAE: Generalizable Graph Masked Autoencoder via Collaborative Latent Space Reconstruction

Authors: Yucheng Shi, Yushun Dong, Qiaoyu Tan, Jundong Li, Ninghao Liu

Abstract: Self-supervised learning with masked autoencoders has recently gained popularity for its ability to produce effective image or textual representations, which can be applied to various downstream tasks without retraining. However, we observe that the current masked autoencoder models lack good generalization ability on graph data. To tackle this issue, we propose a novel graph masked autoencoder fr… ▽ More Self-supervised learning with masked autoencoders has recently gained popularity for its ability to produce effective image or textual representations, which can be applied to various downstream tasks without retraining. However, we observe that the current masked autoencoder models lack good generalization ability on graph data. To tackle this issue, we propose a novel graph masked autoencoder framework called GiGaMAE. Different from existing masked autoencoders that learn node presentations by explicitly reconstructing the original graph components (e.g., features or edges), in this paper, we propose to collaboratively reconstruct informative and integrated latent embeddings. By considering embeddings encompassing graph topology and attribute information as reconstruction targets, our model could capture more generalized and comprehensive knowledge. Furthermore, we introduce a mutual information based reconstruction loss that enables the effective reconstruction of multiple targets. This learning objective allows us to differentiate between the exclusive knowledge learned from a single target and common knowledge shared by multiple targets. We evaluate our method on three downstream tasks with seven datasets as benchmarks. Extensive experiments demonstrate the superiority of GiGaMAE against state-of-the-art baselines. We hope our results will shed light on the design of foundation models on graph-structured data. Our code is available at: https://github.com/sycny/GiGaMAE. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: Accepted by CIKM 2023

arXiv:2308.07231 [pdf, other]

Large-scale environment mapping and immersive human-robot interaction for agricultural mobile robot teleoperation

Authors: Tao Liu, Baohua Zhang, Qianqiu Tan

Abstract: Remote operation is a crucial solution to problems encountered in agricultural machinery operations. However, traditional video streaming control methods fall short in overcoming the challenges of single perspective views and the inability to obtain 3D information. In light of these issues, our research proposes a large-scale digital map reconstruction and immersive human-machine remote control fr… ▽ More Remote operation is a crucial solution to problems encountered in agricultural machinery operations. However, traditional video streaming control methods fall short in overcoming the challenges of single perspective views and the inability to obtain 3D information. In light of these issues, our research proposes a large-scale digital map reconstruction and immersive human-machine remote control framework for agricultural scenarios. In our methodology, a DJI unmanned aerial vehicle(UAV) was utilized for data collection, and a novel video segmentation approach based on feature points was introduced. To tackle texture richness variability, an enhanced Structure from Motion (SfM) using superpixel segmentation was implemented. This method integrates the open Multiple View Geometry (openMVG) framework along with Local Features from Transformers (LoFTR). The enhanced SfM results in a point cloud map, which is further processed through Multi-View Stereo (MVS) to generate a complete map model. For control, a closed-loop system utilizing TCP for VR control and positioning of agricultural machinery was introduced. Our system offers a fully visual-based immersive control method, where upon connection to the local area network, operators can utilize VR for immersive remote control. The proposed method enhances both the robustness and convenience of the reconstruction process, thereby significantly facilitating operators in acquiring more comprehensive on-site information and engaging in immersive remote control operations. The code is available at: https://github.com/LiuTao1126/Enhance-SFM △ Less

Submitted 1 March, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

Showing 1–50 of 216 results for author: Tan, Q