subscribe to arXiv mailings

How coronal mass ejections are influenced by the morphology and toroidal flux of their source magnetic flux ropes?

Authors: J. H. Guo, L. Linan, S. Poedts, Y. Guo, B. Schmieder, A. Lani, Y. W. Ni, M. Brchnelova, B. Perri, T. Baratashvili, S. T. Li, P. F. Chen

Abstract: Coronal mass ejections (CMEs) stand as intense eruptions of magnetized plasma from the Sun, playing a pivotal role in driving significant changes of the heliospheric environment. Deducing the properties of CMEs from their progenitors in solar source regions is crucial for space weather forecasting. Deducing the properties of CMEs from their progenitors in solar source regions is crucial for space… ▽ More Coronal mass ejections (CMEs) stand as intense eruptions of magnetized plasma from the Sun, playing a pivotal role in driving significant changes of the heliospheric environment. Deducing the properties of CMEs from their progenitors in solar source regions is crucial for space weather forecasting. Deducing the properties of CMEs from their progenitors in solar source regions is crucial for space weather forecasting. The primary objective of this paper is to establish a connection between CMEs and their progenitors in solar source regions, enabling us to infer the magnetic structures of CMEs before their full development. To this end, we create a dataset comprising a magnetic flux rope series with varying projection shapes, sizes and toroidal fluxes, using the Regularized Biot-Savart Laws (RBSL). Thereafter, we simulate the propagation of these flux ropes from the solar surface to a distance of 25$R_{\odot}$ with our global coronal MHD model which is named COCONUT. Our parametric survey reveals significant impacts of source flux ropes on the consequent CMEs. We find that the projection shape can influence the magnetic structures of CMEs at 20$R_{\odot}$, albeit with minimal impacts on the propagation speed. However, these impacts diminish as source flux ropes become fat. In terms of toroidal flux, our simulation results demonstrate a pronounced correlation with the propagation speed of CMEs, as well as the successfulness in erupting. This work builds the bridge between the CMEs in the outer corona and their progenitors in solar source regions. Our parametric survey suggests that the projection shape, cross-section radius and toroidal flux of source flux ropes are crucial parameters in predicting magnetic structures and propagation speed of CMEs, providing valuable insights for space weather prediction. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 11 pages, 10 figrues, accepted for publication by A&A

arXiv:2407.06584 [pdf, other]

HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

Authors: Xiaoyu Huang, Qiayuan Liao, Yiming Ni, Zhongyu Li, Laura Smith, Sergey Levine, Xue Bin Peng, Koushil Sreenath

Abstract: This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel des… ▽ More This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel design of this framework tackles the challenges of integrating continuous locomotion control and manipulation using legs. It develops an operational space locomotion controller that can track arbitrary robot end-effector (toe) trajectories while walking at different velocities. This controller is designed to be general to different downstream tasks, and therefore, can be utilized in high-level manipulation planning policy to address specific tasks. To demonstrate the versatility of this framework, we utilize HiLMa-Res to tackle several challenging loco-manipulation tasks using a quadrupedal robot in the real world. These tasks span from leveraging state-based policy to vision-based policy, from training purely from the simulation data to learning from real-world data. In these tasks, HiLMa-Res shows better performance than other methods. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: IROS 2024

arXiv:2406.15252 [pdf, other]

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Authors: Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Yuchen Lin, Wenhu Chen

Abstract: The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-prov… ▽ More The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-provided multi-aspect score over 37.6K synthesized videos from 11 existing video generative models. We train VideoScore (initialized from Mantis) based on VideoFeedback to enable automatic video quality assessment. Experiments show that the Spearman correlation between VideoScore and humans can reach 77.1 on VideoFeedback-test, beating the prior best metrics by about 50 points. Further result on other held-out EvalCrafter, GenAI-Bench, and VBench show that VideoScore has consistently much higher correlation with human judges than other metrics. Due to these results, we believe VideoScore can serve as a great proxy for human raters to (1) rate different video models to track progress (2) simulate fine-grained human feedback in Reinforcement Learning with Human Feedback (RLHF) to improve current video generation models. △ Less

Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.11168 [pdf, other]

Two-Timescale Optimization Framework for Decentralized Linear-Quadratic Optimal Control

Authors: Lechen Feng, Yuan-Hua Ni, Xuebo Zhang

Abstract: This study investigates a decentralized linear-quadratic optimal control problem, and several approximate separable constrained optimization problems are formulated for the first time based on the selection of sparsity promoting functions. First, for the optimization problem with weighted $\ell_1$ sparsity promoting function, a two-timescale algorithm is adopted that is based on the BSUM (Block Su… ▽ More This study investigates a decentralized linear-quadratic optimal control problem, and several approximate separable constrained optimization problems are formulated for the first time based on the selection of sparsity promoting functions. First, for the optimization problem with weighted $\ell_1$ sparsity promoting function, a two-timescale algorithm is adopted that is based on the BSUM (Block Successive Upper-bound Minimization) framework and a differential equation solver. Second, a piecewise quadratic sparsity promoting function is introduced, and the induced optimization problem demonstrates an accelerated convergence rate by performing the same two-timescale algorithm. Finally, the optimization problem with $\ell_0$ sparsity promoting function is considered that is nonconvex and discontinuous, and can be approximated by successive coordinatewise convex optimization problems. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10318 [pdf, other]

Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding

Authors: Tuo Zhang, Tiantian Feng, Yibin Ni, Mengqin Cao, Ruying Liu, Katharine Butler, Yanjun Weng, Mi Zhang, Shrikanth S. Narayanan, Salman Avestimehr

Abstract: Large vision-language models (VLMs) have demonstrated remarkable abilities in understanding everyday content. However, their performance in the domain of art, particularly culturally rich art forms, remains less explored. As a pearl of human wisdom and creativity, art encapsulates complex cultural narratives and symbolism. In this paper, we offer the Pun Rebus Art Dataset, a multimodal dataset for… ▽ More Large vision-language models (VLMs) have demonstrated remarkable abilities in understanding everyday content. However, their performance in the domain of art, particularly culturally rich art forms, remains less explored. As a pearl of human wisdom and creativity, art encapsulates complex cultural narratives and symbolism. In this paper, we offer the Pun Rebus Art Dataset, a multimodal dataset for art understanding deeply rooted in traditional Chinese culture. We focus on three primary tasks: identifying salient visual elements, matching elements with their symbolic meanings, and explanations for the conveyed messages. Our evaluation reveals that state-of-the-art VLMs struggle with these tasks, often providing biased and hallucinated explanations and showing limited improvement through in-context learning. By releasing the Pun Rebus Art Dataset, we aim to facilitate the development of VLMs that can better understand and interpret culturally specific content, promoting greater inclusiveness beyond English-based corpora. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.05862 [pdf, other]

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Authors: Ziqiang Liu, Feiteng Fang, Xi Feng, Xinrun Du, Chenhao Zhang, Zekun Wang, Yuelin Bai, Qixuan Zhao, Liyang Fan, Chengguang Gan, Hongquan Lin, Jiaming Li, Yuansheng Ni, Haihong Wu, Yaswanth Narsupalli, Zhigang Zheng, Chengming Li, Xiping Hu, Ruifeng Xu, Xiaojun Chen, Min Yang, Jiaheng Liu, Ruibo Liu, Wenhao Huang, Ge Zhang , et al. (1 additional authors not shown)

Abstract: The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap,… ▽ More The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap, we propose the Image Implication understanding Benchmark, II-Bench, which aims to evaluate the model's higher-order perception of images. Through extensive experiments on II-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on II-Bench. The pinnacle accuracy of MLLMs attains 74.8%, whereas human accuracy averages 90%, peaking at an impressive 98%. Subsequently, MLLMs perform worse on abstract and complex images, suggesting limitations in their ability to understand high-level semantics and capture image details. Finally, it is observed that most models exhibit enhanced accuracy when image sentiment polarity hints are incorporated into the prompts. This observation underscores a notable deficiency in their inherent understanding of image sentiment. We believe that II-Bench will inspire the community to develop the next generation of MLLMs, advancing the journey towards expert artificial general intelligence (AGI). II-Bench is publicly available at https://huggingface.co/datasets/m-a-p/II-Bench. △ Less

Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: 100 pages, 82 figures, add citations

arXiv:2406.04485 [pdf, other]

GenAI Arena: An Open Evaluation Platform for Generative Models

Authors: Dongfu Jiang, Max Ku, Tianle Li, Yuansheng Ni, Shizhuo Sun, Rongqi Fan, Wenhu Chen

Abstract: Generative AI has made remarkable strides to revolutionize fields such as image and video generation. These advancements are driven by innovative algorithms, architecture, and data. However, the rapid proliferation of generative models has highlighted a critical gap: the absence of trustworthy evaluation metrics. Current automatic assessments such as FID, CLIP, FVD, etc often fail to capture the n… ▽ More Generative AI has made remarkable strides to revolutionize fields such as image and video generation. These advancements are driven by innovative algorithms, architecture, and data. However, the rapid proliferation of generative models has highlighted a critical gap: the absence of trustworthy evaluation metrics. Current automatic assessments such as FID, CLIP, FVD, etc often fail to capture the nuanced quality and user satisfaction associated with generative outputs. This paper proposes an open platform GenAI-Arena to evaluate different image and video generative models, where users can actively participate in evaluating these models. By leveraging collective user feedback and votes, GenAI-Arena aims to provide a more democratic and accurate measure of model performance. It covers three arenas for text-to-image generation, text-to-video generation, and image editing respectively. Currently, we cover a total of 27 open-source generative models. GenAI-Arena has been operating for four months, amassing over 6000 votes from the community. We describe our platform, analyze the data, and explain the statistical methods for ranking the models. To further promote the research in building model-based evaluation metrics, we release a cleaned version of our preference data for the three tasks, namely GenAI-Bench. We prompt the existing multi-modal models like Gemini, GPT-4o to mimic human voting. We compute the correlation between model voting with human voting to understand their judging abilities. Our results show existing multimodal models are still lagging in assessing the generated visual content, even the best model GPT-4o only achieves a Pearson correlation of 0.22 in the quality subscore, and behaves like random guessing in others. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 9 pages,7 figures

arXiv:2406.02803 [pdf, other]

DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency

Authors: Haoran Ma, Yifan Qiao, Shi Liu, Shan Yu, Yuanjiang Ni, Qingda Lu, Jiesheng Wu, Yiying Zhang, Miryung Kim, Harry Xu

Abstract: Despite being a powerful concept, distributed shared memory (DSM) has not been made practical due to the extensive synchronization needed between servers to implement memory coherence. This paper shows a practical DSM implementation based on the insight that the ownership model embedded in programming languages such as Rust automatically constrains the order of read and write, providing opportunit… ▽ More Despite being a powerful concept, distributed shared memory (DSM) has not been made practical due to the extensive synchronization needed between servers to implement memory coherence. This paper shows a practical DSM implementation based on the insight that the ownership model embedded in programming languages such as Rust automatically constrains the order of read and write, providing opportunities for significantly simplifying the coherence implementation if the ownership semantics can be exposed to and leveraged by the runtime. This paper discusses the design and implementation of DistR, a Rust-based DSM system that outperforms the two state-of-the-art DSM systems GAM and Grappa by up to 2.64x and 29.16x in throughput, and scales much better with the number of servers. △ Less

Submitted 27 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.02664 [pdf, other]

Discrepancies Between JWST Observations and Simulations of Quenched Massive Galaxies at $z > 3$: A Comparative Study With IllustrisTNG and ASTRID

Authors: Emma Jane Weller, Fabio Pacucci, Yueying Ni, Lars Hernquist, Minjung Park

Abstract: Recent JWST observations have uncovered an unexpectedly large population of massive quiescent galaxies at $z>3$. Using the cosmological simulations IllustrisTNG and ASTRID, we identify analogous galaxies and investigate their abundance, formation, quenching mechanisms, and post-quenching evolution for stellar masses $9.5 < \log_{10}{(M_\star/{\rm M}_\odot)} < 12$. We apply three different quenchin… ▽ More Recent JWST observations have uncovered an unexpectedly large population of massive quiescent galaxies at $z>3$. Using the cosmological simulations IllustrisTNG and ASTRID, we identify analogous galaxies and investigate their abundance, formation, quenching mechanisms, and post-quenching evolution for stellar masses $9.5 < \log_{10}{(M_\star/{\rm M}_\odot)} < 12$. We apply three different quenching definitions and find that both simulations significantly underestimate the comoving number density of quenched massive galaxies at $z \gtrsim 3$ compared to JWST observations by up to $\sim 2$ dex. This fact highlights the necessity for improved physical models of AGN feedback in galaxy formation simulations. In both simulations, the high-$z$ quenched massive galaxies often host overmassive central black holes above the standard $M_{BH}-M_\star$ relation, implying that the AGN feedback plays a crucial role in quenching galaxies in the early Universe. The typical quenching timescales for these galaxies are $\sim 200-600$ Myr. IllustrisTNG primarily employs AGN kinetic feedback, while ASTRID relies on AGN thermal feedback, which is less effective and has a longer quenching timescale. We also study the post-quenching evolution of the high-$z$ massive quiescent galaxies and find that many experience subsequent reactivation of star formation, evolving into primary progenitors of $z=0$ brightest cluster galaxies. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Submitted to The Astrophysical Journal. 13 pages, 12 figures

arXiv:2406.01574 [pdf, other]

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Authors: Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, Wenhu Chen

Abstract: In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in… ▽ More In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in model capabilities. This paper introduces MMLU-Pro, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning-focused questions and expanding the choice set from four to ten options. Additionally, MMLU-Pro eliminates the trivial and noisy questions in MMLU. Our experimental results show that MMLU-Pro not only raises the challenge, causing a significant drop in accuracy by 16% to 33% compared to MMLU but also demonstrates greater stability under varying prompts. With 24 different prompt styles tested, the sensitivity of model scores to prompt variations decreased from 4-5% in MMLU to just 2% in MMLU-Pro. Additionally, we found that models utilizing Chain of Thought (CoT) reasoning achieved better performance on MMLU-Pro compared to direct answering, which is in stark contrast to the findings on the original MMLU, indicating that MMLU-Pro includes more complex reasoning questions. Our assessments confirm that MMLU-Pro is a more discriminative benchmark to better track progress in the field. △ Less

Submitted 23 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01255 [pdf, other]

On the Nonlinearity of Layer Normalization

Authors: Yunhao Ni, Yuxin Guo, Junlong Jia, Lei Huang

Abstract: Layer normalization (LN) is a ubiquitous technique in deep learning but our theoretical understanding to it remains elusive. This paper investigates a new theoretical direction for LN, regarding to its nonlinearity and representation capacity. We investigate the representation capacity of a network with layerwise composition of linear and LN transformations, referred to as LN-Net. We theoretically… ▽ More Layer normalization (LN) is a ubiquitous technique in deep learning but our theoretical understanding to it remains elusive. This paper investigates a new theoretical direction for LN, regarding to its nonlinearity and representation capacity. We investigate the representation capacity of a network with layerwise composition of linear and LN transformations, referred to as LN-Net. We theoretically show that, given $m$ samples with any label assignment, an LN-Net with only 3 neurons in each layer and $O(m)$ LN layers can correctly classify them. We further show the lower bound of the VC dimension of an LN-Net. The nonlinearity of LN can be amplified by group partition, which is also theoretically demonstrated with mild assumption and empirically supported by our experiments. Based on our analyses, we consider to design neural architecture by exploiting and amplifying the nonlinearity of LN, and the effectiveness is supported by our experiments. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 42 pages, accepted to ICML 2024

arXiv:2405.18203 [pdf, other]

IAPT: Instruction-Aware Prompt Tuning for Large Language Models

Authors: Wei Zhu, Aaron Xuxiang Tian, Congrui Yin, Yuan Ni, Xiaoling Wang, Guotong Xie

Abstract: Soft prompt tuning is a widely studied parameter-efficient fine-tuning method. However, it has a clear drawback: many soft tokens must be inserted into the input sequences to guarantee downstream performance. As a result, soft prompt tuning is less considered than Low-rank adaptation (LoRA) in the large language modeling (LLM) era. In this work, we propose a novel prompt tuning method, Instruction… ▽ More Soft prompt tuning is a widely studied parameter-efficient fine-tuning method. However, it has a clear drawback: many soft tokens must be inserted into the input sequences to guarantee downstream performance. As a result, soft prompt tuning is less considered than Low-rank adaptation (LoRA) in the large language modeling (LLM) era. In this work, we propose a novel prompt tuning method, Instruction-Aware Prompt Tuning (IAPT), that requires only four soft tokens. First, we install a parameter-efficient soft prompt generator at each Transformer layer to generate idiosyncratic soft prompts for each input instruction. The generated soft prompts can be seen as a semantic summary of the input instructions and can effectively guide the output generation. Second, the soft prompt generators are modules with a bottleneck architecture consisting of a self-attention pooling operation, two linear projections, and an activation function. Pilot experiments show that prompt generators at different Transformer layers require different activation functions. Thus, we propose to learn the idiosyncratic activation functions for prompt generators automatically with the help of rational functions. We have conducted experiments on various tasks, and the experimental results demonstrate that (a) our IAPT method can outperform the recent baselines with comparable tunable parameters. (b) Our IAPT method is more efficient than LoRA under the single-backbone multi-tenant setting. △ Less

Submitted 7 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: Accepted by ACL-2024

arXiv:2405.14051 [pdf, ps, other]

A Concentration Inequality for Maximum Mean Discrepancy (MMD)-based Statistics and Its Application in Generative Models

Authors: Yijin Ni, Xiaoming Huo

Abstract: Maximum Mean Discrepancy (MMD) is a probability metric that has found numerous applications in machine learning. In this work, we focus on its application in generative models, including the minimum MMD estimator, Generative Moment Matching Network (GMMN), and Generative Adversarial Network (GAN). In these cases, MMD is part of an objective function in a minimization or min-max optimization proble… ▽ More Maximum Mean Discrepancy (MMD) is a probability metric that has found numerous applications in machine learning. In this work, we focus on its application in generative models, including the minimum MMD estimator, Generative Moment Matching Network (GMMN), and Generative Adversarial Network (GAN). In these cases, MMD is part of an objective function in a minimization or min-max optimization problem. Even if its empirical performance is competitive, the consistency and convergence rate analysis of the corresponding MMD-based estimators has yet to be carried out. We propose a uniform concentration inequality for a class of Maximum Mean Discrepancy (MMD)-based estimators, that is, a maximum deviation bound of empirical MMD values over a collection of generated distributions and adversarially learned kernels. Here, our inequality serves as an efficient tool in the theoretical analysis for MMD-based generative models. As elaborating examples, we applied our main result to provide the generalization error bounds for the MMD-based estimators in the context of the minimum MMD estimator and MMD GAN. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.10343 [pdf, other]

UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning

Authors: Shikun Feng, Yuyan Ni, Minghao Li, Yanwen Huang, Zhi-Ming Ma, Wei-Ying Ma, Yanyan Lan

Abstract: Recently, a noticeable trend has emerged in developing pre-trained foundation models in the domains of CV and NLP. However, for molecular pre-training, there lacks a universal model capable of effectively applying to various categories of molecular tasks, since existing prevalent pre-training methods exhibit effectiveness for specific types of downstream tasks. Furthermore, the lack of profound un… ▽ More Recently, a noticeable trend has emerged in developing pre-trained foundation models in the domains of CV and NLP. However, for molecular pre-training, there lacks a universal model capable of effectively applying to various categories of molecular tasks, since existing prevalent pre-training methods exhibit effectiveness for specific types of downstream tasks. Furthermore, the lack of profound understanding of existing pre-training methods, including 2D graph masking, 2D-3D contrastive learning, and 3D denoising, hampers the advancement of molecular foundation models. In this work, we provide a unified comprehension of existing pre-training methods through the lens of contrastive learning. Thus their distinctions lie in clustering different views of molecules, which is shown beneficial to specific downstream tasks. To achieve a complete and general-purpose molecular representation, we propose a novel pre-training framework, named UniCorn, that inherits the merits of the three methods, depicting molecular views in three different levels. SOTA performance across quantum, physicochemical, and biological tasks, along with comprehensive ablation study, validate the universality and effectiveness of UniCorn. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.07542 [pdf, other]

EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models

Authors: Yunsheng Ni, Chuanjian Liu, Yehui Tang, Kai Han, Yunhe Wang

Abstract: Speculative decoding emerges as a pivotal technique for enhancing the inference speed of Large Language Models (LLMs). Despite recent research aiming to improve prediction efficiency, multi-sample speculative decoding has been overlooked due to varying numbers of accepted tokens within a batch in the verification phase. Vanilla method adds padding tokens in order to ensure that the number of new t… ▽ More Speculative decoding emerges as a pivotal technique for enhancing the inference speed of Large Language Models (LLMs). Despite recent research aiming to improve prediction efficiency, multi-sample speculative decoding has been overlooked due to varying numbers of accepted tokens within a batch in the verification phase. Vanilla method adds padding tokens in order to ensure that the number of new tokens remains consistent across samples. However, this increases the computational and memory access overhead, thereby reducing the speedup ratio. We propose a novel method that can resolve the issue of inconsistent tokens accepted by different samples without necessitating an increase in memory or computing overhead. Furthermore, our proposed method can handle the situation where the prediction tokens of different samples are inconsistent without the need to add padding tokens. Sufficient experiments demonstrate the efficacy of our method. Our code is available at https://github.com/niyunsheng/EMS-SD. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.02940 [pdf]

Spherulite-enhanced Macroscopic Polarization in Molecular Ferroelectric Films from Vacuum Deposition

Authors: Bibek Tiwari, Yuanyuan Ni, Jackson Savage, Ellen Daugherty, Bharat Giri, Xin Li, Xiaoshan Xu

Abstract: Proton-transfer type molecular ferroelectrics hold great application potential due to their large spontaneous polarizations, high Curie temperatures, and small switching fields. However, it is puzzling that preparation of quasi-2D films with macroscopic ferroelectric behaviors has only been reported in few molecular ferroelectrics. To resolve this puzzle, we studied the effect of microstructures o… ▽ More Proton-transfer type molecular ferroelectrics hold great application potential due to their large spontaneous polarizations, high Curie temperatures, and small switching fields. However, it is puzzling that preparation of quasi-2D films with macroscopic ferroelectric behaviors has only been reported in few molecular ferroelectrics. To resolve this puzzle, we studied the effect of microstructures on macroscopic ferroelectric properties of 5,6-Dichloro-2-methylbenzimidazole (DC-MBI) films grown using low-temperature deposition followed by restrained crystallization (LDRC) method. We revealed a competition between dense spherulites and porous microstructures containing randomly oriented nanograins in as-grown films. Post-growth annealing at moderate temperature promotes the formation of spherulites which leads to macroscopic ferroelectric polarization switching. These results highlight microstructure density as a critical factor for macroscopic ferroelectric properties, potentially resolving the puzzle for absence of macroscopic ferroelectric behavior in molecules ferroelectric films. We expect the approach for enhancing microstructure density offered in this work to greatly advance fabrication of quasi-2D molecular ferroelectrics films and to unlock their potential in device applications. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2404.18911 [pdf, other]

Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

Authors: Fangcheng Liu, Yehui Tang, Zhenhua Liu, Yunsheng Ni, Kai Han, Yunhe Wang

Abstract: Speculative decoding has demonstrated its effectiveness in accelerating the inference of large language models while maintaining a consistent sampling distribution. However, the conventional approach of training a separate draft model to achieve a satisfactory token acceptance rate can be costly. Drawing inspiration from early exiting, we propose a novel self-speculative decoding framework \emph{K… ▽ More Speculative decoding has demonstrated its effectiveness in accelerating the inference of large language models while maintaining a consistent sampling distribution. However, the conventional approach of training a separate draft model to achieve a satisfactory token acceptance rate can be costly. Drawing inspiration from early exiting, we propose a novel self-speculative decoding framework \emph{Kangaroo}, which uses a fixed shallow sub-network as a self-draft model, with the remaining layers serving as the larger target model. We train a lightweight and efficient adapter module on top of the sub-network to bridge the gap between the sub-network and the full model's representation ability. It is noteworthy that the inference latency of the self-draft model may no longer be negligible compared to the large model, necessitating strategies to increase the token acceptance rate while minimizing the drafting steps of the small model. To address this challenge, we introduce an additional early exiting mechanism for generating draft tokens. Specifically, we halt the small model's subsequent prediction during the drafting phase once the confidence level for the current token falls below a certain threshold. Extensive experiments on the Spec-Bench demonstrate the effectiveness of Kangaroo. Under single-sequence verification, Kangaroo achieves speedups up to $1.68\times$ on Spec-Bench, outperforming Medusa-1 with 88.7\% fewer additional parameters (67M compared to 591M). The code for Kangaroo is available at https://github.com/Equationliu/Kangaroo. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.05148 [pdf, other]

Generalized Criterion for Identifiability of Additive Noise Models Using Majorization

Authors: Aramayis Dallakyan, Yang Ni

Abstract: The discovery of causal relationships from observational data is very challenging. Many recent approaches rely on complexity or uncertainty concepts to impose constraints on probability distributions, aiming to identify specific classes of directed acyclic graph (DAG) models. In this paper, we introduce a novel identifiability criterion for DAGs that places constraints on the conditional variances… ▽ More The discovery of causal relationships from observational data is very challenging. Many recent approaches rely on complexity or uncertainty concepts to impose constraints on probability distributions, aiming to identify specific classes of directed acyclic graph (DAG) models. In this paper, we introduce a novel identifiability criterion for DAGs that places constraints on the conditional variances of additive noise models. We demonstrate that this criterion extends and generalizes existing identifiability criteria in the literature that employ (conditional) variances as measures of uncertainty in (conditional) distributions. For linear Structural Equation Models, we present a new algorithm that leverages the concept of weak majorization applied to the diagonal elements of the Cholesky factor of the covariance matrix to learn a topological ordering of variables. Through extensive simulations and the analysis of bank connectivity data, we provide evidence of the effectiveness of our approach in successfully recovering DAGs. The code for reproducing the results in this paper is available in Supplementary Materials. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04949 [pdf, other]

SilverSight: A Multi-Task Chinese Financial Large Language Model Based on Adaptive Semantic Space Learning

Authors: Yuhang Zhou, Zeping Li, Siyu Tian, Yuchen Ni, Sen Liu, Guangnan Ye, Hongfeng Chai

Abstract: Large language models (LLMs) are increasingly being applied across various specialized fields, leveraging their extensive knowledge to empower a multitude of scenarios within these domains. However, each field encompasses a variety of specific tasks that require learning, and the diverse, heterogeneous data across these domains can lead to conflicts during model task transfer. In response to this… ▽ More Large language models (LLMs) are increasingly being applied across various specialized fields, leveraging their extensive knowledge to empower a multitude of scenarios within these domains. However, each field encompasses a variety of specific tasks that require learning, and the diverse, heterogeneous data across these domains can lead to conflicts during model task transfer. In response to this challenge, our study introduces an Adaptive Semantic Space Learning (ASSL) framework, which utilizes the adaptive reorganization of data distributions within the semantic space to enhance the performance and selection efficacy of multi-expert models. Utilizing this framework, we trained a financial multi-task LLM named "SilverSight". Our research findings demonstrate that our framework can achieve results close to those obtained with full data training using only 10% of the data, while also exhibiting strong generalization capabilities. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: 17 pages, 17 figures

arXiv:2404.00521 [pdf, other]

CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization

Authors: Yao Ni, Piotr Koniusz

Abstract: Generative Adversarial Networks (GANs) significantly advanced image generation but their performance heavily depends on abundant training data. In scenarios with limited data, GANs often struggle with discriminator overfitting and unstable training. Batch Normalization (BN), despite being known for enhancing generalization and training stability, has rarely been used in the discriminator of Data-E… ▽ More Generative Adversarial Networks (GANs) significantly advanced image generation but their performance heavily depends on abundant training data. In scenarios with limited data, GANs often struggle with discriminator overfitting and unstable training. Batch Normalization (BN), despite being known for enhancing generalization and training stability, has rarely been used in the discriminator of Data-Efficient GANs. Our work addresses this gap by identifying a critical flaw in BN: the tendency for gradient explosion during the centering and scaling steps. To tackle this issue, we present CHAIN (lipsCHitz continuity constrAIned Normalization), which replaces the conventional centering step with zero-mean regularization and integrates a Lipschitz continuity constraint in the scaling step. CHAIN further enhances GAN training by adaptively interpolating the normalized and unnormalized features, effectively avoiding discriminator overfitting. Our theoretical analyses firmly establishes CHAIN's effectiveness in reducing gradients in latent features and weights, improving stability and generalization in GAN training. Empirical evidence supports our theory. CHAIN achieves state-of-the-art results in data-limited scenarios on CIFAR-10/100, ImageNet, five low-shot and seven high-resolution few-shot image datasets. Code: https://github.com/MaxwellYaoNi/CHAIN △ Less

Submitted 1 June, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024. 26 pages. Improve Lemma 3.1 - Prop. 3.1 logic flow. Code: https://github.com/MaxwellYaoNi/CHAIN

arXiv:2403.17372 [pdf, other]

An Empirical Study of Training ID-Agnostic Multi-modal Sequential Recommenders

Authors: Youhua Li, Hanwen Du, Yongxin Ni, Yuanqi He, Junchen Fu, Xiangyan Liu, Qi Guo

Abstract: Sequential Recommendation (SR) aims to predict future user-item interactions based on historical interactions. While many SR approaches concentrate on user IDs and item IDs, the human perception of the world through multi-modal signals, like text and images, has inspired researchers to delve into constructing SR from multi-modal information without using IDs. However, the complexity of multi-modal… ▽ More Sequential Recommendation (SR) aims to predict future user-item interactions based on historical interactions. While many SR approaches concentrate on user IDs and item IDs, the human perception of the world through multi-modal signals, like text and images, has inspired researchers to delve into constructing SR from multi-modal information without using IDs. However, the complexity of multi-modal learning manifests in diverse feature extractors, fusion methods, and pre-trained models. Consequently, designing a simple and universal \textbf{M}ulti-\textbf{M}odal \textbf{S}equential \textbf{R}ecommendation (\textbf{MMSR}) framework remains a formidable challenge. We systematically summarize the existing multi-modal related SR methods and distill the essence into four core components: visual encoder, text encoder, multimodal fusion module, and sequential architecture. Along these dimensions, we dissect the model designs, and answer the following sub-questions: First, we explore how to construct MMSR from scratch, ensuring its performance either on par with or exceeds existing SR methods without complex techniques. Second, we examine if MMSR can benefit from existing multi-modal pre-training paradigms. Third, we assess MMSR's capability in tackling common challenges like cold start and domain transferring. Our experiment results across four real-world recommendation scenarios demonstrate the great potential ID-agnostic multi-modal sequential recommendation. Our framework can be found at: https://github.com/MMSR23/MMSR. △ Less

Submitted 30 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: An Empirical Study of Training ID-Agnostic Multi-modal Sequential Recommenders

arXiv:2403.16797 [pdf, other]

Privacy Preservation by Intermittent Transmission in Cooperative LQG Control Systems

Authors: Wenhao Lin, Yuqing Ni, Wen Yang, Chao Yang

Abstract: In this paper, we study a cooperative linear quadratic Gaussian (LQG) control system with a single user and a server. In this system, the user runs a process and employs the server to meet the needs of computation. However, the user regards its state trajectories as privacy. Therefore, we propose a privacy scheme, in which the user sends data to the server intermittently. By this scheme, the serve… ▽ More In this paper, we study a cooperative linear quadratic Gaussian (LQG) control system with a single user and a server. In this system, the user runs a process and employs the server to meet the needs of computation. However, the user regards its state trajectories as privacy. Therefore, we propose a privacy scheme, in which the user sends data to the server intermittently. By this scheme, the server's received information of the user is reduced, and consequently the user's privacy is preserved. In this paper, we consider a periodic transmission scheme. We analyze the performance of privacy preservation and LQG control of different transmission periods. Under the given threshold of the control performance loss, a trade-off optimization problem is proposed. Finally, we give the solution to the optimization problem. △ Less

Submitted 28 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.15983 [pdf, other]

Bayesian segmented Gaussian copula factor model for single-cell sequencing data

Authors: Junsouk Choi, Hee Cheol Chung, Irina Gaynanova, Yang Ni

Abstract: Single-cell sequencing technologies have significantly advanced molecular and cellular biology, offering unprecedented insights into cellular heterogeneity by allowing for the measurement of gene expression at an individual cell level. However, the analysis of such data is challenged by the prevalence of low counts due to dropout events and the skewed nature of the data distribution, which convent… ▽ More Single-cell sequencing technologies have significantly advanced molecular and cellular biology, offering unprecedented insights into cellular heterogeneity by allowing for the measurement of gene expression at an individual cell level. However, the analysis of such data is challenged by the prevalence of low counts due to dropout events and the skewed nature of the data distribution, which conventional Gaussian factor models struggle to handle effectively. To address these challenges, we propose a novel Bayesian segmented Gaussian copula model to explicitly account for inflation of zero and near-zero counts, and to address the high skewness in the data. By employing a Dirichlet-Laplace prior for each column of the factor loadings matrix, we shrink the loadings of unnecessary factors towards zero, which leads to a simple approach to automatically determine the number of latent factors, and resolve the identifiability issue inherent in factor models due to the rotational invariance of the factor loadings matrix. Through simulation studies, we demonstrate the superior performance of our method over existing approaches in conducting factor analysis on data exhibiting the characteristics of single-cell data, such as excessive low counts and high skewness. Furthermore, we apply the proposed method to a real single-cell RNA-sequencing dataset from a lymphoblastoid cell line, successfully identifying biologically meaningful latent factors and detecting previously uncharacterized cell subtypes. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.14027 [pdf, other]

EcoSense: Energy-Efficient Intelligent Sensing for In-Shore Ship Detection through Edge-Cloud Collaboration

Authors: Wenjun Huang, Hanning Chen, Yang Ni, Arghavan Rezvani, Sanggeon Yun, Sungheon Jeon, Eric Pedley, Mohsen Imani

Abstract: Detecting marine objects inshore presents challenges owing to algorithmic intricacies and complexities in system deployment. We propose a difficulty-aware edge-cloud collaborative sensing system that splits the task into object localization and fine-grained classification. Objects are classified either at the edge or within the cloud, based on their estimated difficulty. The framework comprises a… ▽ More Detecting marine objects inshore presents challenges owing to algorithmic intricacies and complexities in system deployment. We propose a difficulty-aware edge-cloud collaborative sensing system that splits the task into object localization and fine-grained classification. Objects are classified either at the edge or within the cloud, based on their estimated difficulty. The framework comprises a low-power device-tailored front-end model for object localization, classification, and difficulty estimation, along with a transformer-graph convolutional network-based back-end model for fine-grained classification. Our system demonstrates superior performance (mAP@0.5 +4.3%}) on widely used marine object detection datasets, significantly reducing both data transmission volume (by 95.43%) and energy consumption (by 72.7%}) at the system level. We validate the proposed system across various embedded system platforms and in real-world scenarios involving drone deployment. △ Less

Submitted 26 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.12987 [pdf, other]

Rethinking Specificity in SBDD: Leveraging Delta Score and Energy-Guided Diffusion

Authors: Bowen Gao, Minsi Ren, Yuyan Ni, Yanwen Huang, Bo Qiang, Zhi-Ming Ma, Wei-Ying Ma, Yanyan Lan

Abstract: In the field of Structure-based Drug Design (SBDD), deep learning-based generative models have achieved outstanding performance in terms of docking score. However, further study shows that the existing molecular generative methods and docking scores both have lacked consideration in terms of specificity, which means that generated molecules bind to almost every protein pocket with high affinity. T… ▽ More In the field of Structure-based Drug Design (SBDD), deep learning-based generative models have achieved outstanding performance in terms of docking score. However, further study shows that the existing molecular generative methods and docking scores both have lacked consideration in terms of specificity, which means that generated molecules bind to almost every protein pocket with high affinity. To address this, we introduce the Delta Score, a new metric for evaluating the specificity of molecular binding. To further incorporate this insight for generation, we develop an innovative energy-guided approach using contrastive learning, with active compounds as decoys, to direct generative models toward creating molecules with high specificity. Our empirical results show that this method not only enhances the delta score but also maintains or improves traditional docking scores, successfully bridging the gap between SBDD and real-world needs. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.10648 [pdf, other]

Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies with CAMELS

Authors: Victoria Ono, Core Francisco Park, Nayantara Mudur, Yueying Ni, Carolina Cuesta-Lazaro, Francisco Villaescusa-Navarro

Abstract: Galaxies are biased tracers of the underlying cosmic web, which is dominated by dark matter components that cannot be directly observed. Galaxy formation simulations can be used to study the relationship between dark matter density fields and galaxy distributions. However, this relationship can be sensitive to assumptions in cosmology and astrophysical processes embedded in the galaxy formation mo… ▽ More Galaxies are biased tracers of the underlying cosmic web, which is dominated by dark matter components that cannot be directly observed. Galaxy formation simulations can be used to study the relationship between dark matter density fields and galaxy distributions. However, this relationship can be sensitive to assumptions in cosmology and astrophysical processes embedded in the galaxy formation models, that remain uncertain in many aspects. In this work, we develop a diffusion generative model to reconstruct dark matter fields from galaxies. The diffusion model is trained on the CAMELS simulation suite that contains thousands of state-of-the-art galaxy formation simulations with varying cosmological parameters and sub-grid astrophysics. We demonstrate that the diffusion model can predict the unbiased posterior distribution of the underlying dark matter fields from the given stellar mass fields, while being able to marginalize over uncertainties in cosmological and astrophysical models. Interestingly, the model generalizes to simulation volumes approximately 500 times larger than those it was trained on, and across different galaxy formation models. Code for reproducing these results can be found at https://github.com/victoriaono/variational-diffusion-cdm △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.08108 [pdf, other]

TaskCLIP: Extend Large Vision-Language Model for Task Oriented Object Detection

Authors: Hanning Chen, Wenjun Huang, Yang Ni, Sanggeon Yun, Fei Wen, Hugo Latapie, Mohsen Imani

Abstract: Task-oriented object detection aims to find objects suitable for accomplishing specific tasks. As a challenging task, it requires simultaneous visual data processing and reasoning under ambiguous semantics. Recent solutions are mainly all-in-one models. However, the object detection backbones are pre-trained without text supervision. Thus, to incorporate task requirements, their intricate models u… ▽ More Task-oriented object detection aims to find objects suitable for accomplishing specific tasks. As a challenging task, it requires simultaneous visual data processing and reasoning under ambiguous semantics. Recent solutions are mainly all-in-one models. However, the object detection backbones are pre-trained without text supervision. Thus, to incorporate task requirements, their intricate models undergo extensive learning on a highly imbalanced and scarce dataset, resulting in capped performance, laborious training, and poor generalizability. In contrast, we propose TaskCLIP, a more natural two-stage design composed of general object detection and task-guided object selection. Particularly for the latter, we resort to the recently successful large Vision-Language Models (VLMs) as our backbone, which provides rich semantic knowledge and a uniform embedding space for images and texts. Nevertheless, the naive application of VLMs leads to sub-optimal quality, due to the misalignment between embeddings of object images and their visual attributes, which are mainly adjective phrases. To this end, we design a transformer-based aligner after the pre-trained VLMs to re-calibrate both embeddings. Finally, we employ a trainable score function to post-process the VLM matching results for object selection. Experimental results demonstrate that our TaskCLIP outperforms the state-of-the-art DETR-based model TOIST by 3.5% and only requires a single NVIDIA RTX 4090 for both training and inference. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.05763 [pdf, other]

HDReason: Algorithm-Hardware Codesign for Hyperdimensional Knowledge Graph Reasoning

Authors: Hanning Chen, Yang Ni, Ali Zakeri, Zhuowen Zou, Sanggeon Yun, Fei Wen, Behnam Khaleghi, Narayan Srinivasa, Hugo Latapie, Mohsen Imani

Abstract: In recent times, a plethora of hardware accelerators have been put forth for graph learning applications such as vertex classification and graph classification. However, previous works have paid little attention to Knowledge Graph Completion (KGC), a task that is well-known for its significantly higher algorithm complexity. The state-of-the-art KGC solutions based on graph convolution neural netwo… ▽ More In recent times, a plethora of hardware accelerators have been put forth for graph learning applications such as vertex classification and graph classification. However, previous works have paid little attention to Knowledge Graph Completion (KGC), a task that is well-known for its significantly higher algorithm complexity. The state-of-the-art KGC solutions based on graph convolution neural network (GCN) involve extensive vertex/relation embedding updates and complicated score functions, which are inherently cumbersome for acceleration. As a result, existing accelerator designs are no longer optimal, and a novel algorithm-hardware co-design for KG reasoning is needed. Recently, brain-inspired HyperDimensional Computing (HDC) has been introduced as a promising solution for lightweight machine learning, particularly for graph learning applications. In this paper, we leverage HDC for an intrinsically more efficient and acceleration-friendly KGC algorithm. We also co-design an acceleration framework named HDReason targeting FPGA platforms. On the algorithm level, HDReason achieves a balance between high reasoning accuracy, strong model interpretability, and less computation complexity. In terms of architecture, HDReason offers reconfigurability, high training throughput, and low energy consumption. When compared with NVIDIA RTX 4090 GPU, the proposed accelerator achieves an average 10.6x speedup and 65x energy efficiency improvement. When conducting cross-models and cross-platforms comparison, HDReason yields an average 4.2x higher performance and 3.4x better energy efficiency with similar accuracy versus the state-of-the-art FPGA-based GCN training platform. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.03944 [pdf, other]

MR.RGM: An R Package for Fitting Bayesian Multivariate Bidirectional Mendelian Randomization Networks

Authors: Bitan Sarkar, Yang Ni

Abstract: Motivation: Mendelian randomization (MR) infers causal relationships between exposures and outcomes using genetic variants as instrumental variables. Typically, MR considers only a pair of exposure and outcome at a time, limiting its capability of capturing the entire causal network. We overcome this limitation by developing 'MR.RGM' (Mendelian randomization via reciprocal graphical model), a fast… ▽ More Motivation: Mendelian randomization (MR) infers causal relationships between exposures and outcomes using genetic variants as instrumental variables. Typically, MR considers only a pair of exposure and outcome at a time, limiting its capability of capturing the entire causal network. We overcome this limitation by developing 'MR.RGM' (Mendelian randomization via reciprocal graphical model), a fast R-package that implements the Bayesian reciprocal graphical model and enables practitioners to construct holistic causal networks with possibly cyclic/reciprocal causation and proper uncertainty quantifications, offering a comprehensive understanding of complex biological systems and their interconnections. Results: We developed 'MR.RGM', an open-source R package that applies bidirectional MR using a network-based strategy, enabling the exploration of causal relationships among multiple variables in complex biological systems. 'MR.RGM' holds the promise of unveiling intricate interactions and advancing our understanding of genetic networks, disease risks, and phenotypic complexities. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.13779 [pdf, other]

Contextual Molecule Representation Learning from Chemical Reaction Knowledge

Authors: Han Tang, Shikun Feng, Bicheng Lin, Yuyan Ni, JIngjing Liu, Wei-Ying Ma, Yanyan Lan

Abstract: In recent years, self-supervised learning has emerged as a powerful tool to harness abundant unlabelled data for representation learning and has been broadly adopted in diverse areas. However, when applied to molecular representation learning (MRL), prevailing techniques such as masked sub-unit reconstruction often fall short, due to the high degree of freedom in the possible combinations of atoms… ▽ More In recent years, self-supervised learning has emerged as a powerful tool to harness abundant unlabelled data for representation learning and has been broadly adopted in diverse areas. However, when applied to molecular representation learning (MRL), prevailing techniques such as masked sub-unit reconstruction often fall short, due to the high degree of freedom in the possible combinations of atoms within molecules, which brings insurmountable complexity to the masking-reconstruction paradigm. To tackle this challenge, we introduce REMO, a self-supervised learning framework that takes advantage of well-defined atom-combination rules in common chemistry. Specifically, REMO pre-trains graph/Transformer encoders on 1.7 million known chemical reactions in the literature. We propose two pre-training objectives: Masked Reaction Centre Reconstruction (MRCR) and Reaction Centre Identification (RCI). REMO offers a novel solution to MRL by exploiting the underlying shared patterns in chemical reactions as \textit{context} for pre-training, which effectively infers meaningful representations of common chemistry knowledge. Such contextual representations can then be utilized to support diverse downstream molecular tasks with minimum finetuning, such as affinity prediction and drug-drug interaction prediction. Extensive experimental results on MoleculeACE, ACNet, drug-drug interaction (DDI), and reaction type classification show that across all tested downstream tasks, REMO outperforms the standard baseline of single-molecule masked modeling used in current MRL. Remarkably, REMO is the pioneering deep learning model surpassing fingerprint-based methods in activity cliff benchmarks. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: Preprint. Under Review

arXiv:2402.12713 [pdf, ps, other]

Are LLMs Rational Investors? A Study on Detecting and Reducing the Financial Bias in LLMs

Authors: Yuhang Zhou, Yuchen Ni, Yunhui Gan, Zhangyue Yin, Xiang Liu, Jian Zhang, Sen Liu, Xipeng Qiu, Guangnan Ye, Hongfeng Chai

Abstract: Large Language Models (LLMs) are increasingly adopted in financial analysis for interpreting complex market data and trends. However, their use is challenged by intrinsic biases (e.g., risk-preference bias) and a superficial understanding of market intricacies, necessitating a thorough assessment of their financial insight. To address these issues, we introduce Financial Bias Indicators (FBI), a f… ▽ More Large Language Models (LLMs) are increasingly adopted in financial analysis for interpreting complex market data and trends. However, their use is challenged by intrinsic biases (e.g., risk-preference bias) and a superficial understanding of market intricacies, necessitating a thorough assessment of their financial insight. To address these issues, we introduce Financial Bias Indicators (FBI), a framework with components like Bias Unveiler, Bias Detective, Bias Tracker, and Bias Antidote to identify, detect, analyze, and eliminate irrational biases in LLMs. By combining behavioral finance principles with bias examination, we evaluate 23 leading LLMs and propose a de-biasing method based on financial causal knowledge. Results show varying degrees of financial irrationality among models, influenced by their design and training. Models trained specifically on financial datasets may exhibit more irrationality, and even larger financial language models (FinLLMs) can show more bias than smaller, general models. We utilize four prompt-based methods incorporating causal debiasing, effectively reducing financial biases in these models. This work enhances the understanding of LLMs' bias in financial applications, laying the foundation for developing more reliable and rational financial analysis tools. △ Less

Submitted 1 July, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.11223 [pdf, other]

HEAL: Brain-inspired Hyperdimensional Efficient Active Learning

Authors: Yang Ni, Zhuowen Zou, Wenjun Huang, Hanning Chen, William Youngwoo Chung, Samuel Cho, Ranganath Krishnan, Pietro Mercati, Mohsen Imani

Abstract: Drawing inspiration from the outstanding learning capability of our human brains, Hyperdimensional Computing (HDC) emerges as a novel computing paradigm, and it leverages high-dimensional vector presentation and operations for brain-like lightweight Machine Learning (ML). Practical deployments of HDC have significantly enhanced the learning efficiency compared to current deep ML methods on a broad… ▽ More Drawing inspiration from the outstanding learning capability of our human brains, Hyperdimensional Computing (HDC) emerges as a novel computing paradigm, and it leverages high-dimensional vector presentation and operations for brain-like lightweight Machine Learning (ML). Practical deployments of HDC have significantly enhanced the learning efficiency compared to current deep ML methods on a broad spectrum of applications. However, boosting the data efficiency of HDC classifiers in supervised learning remains an open question. In this paper, we introduce Hyperdimensional Efficient Active Learning (HEAL), a novel Active Learning (AL) framework tailored for HDC classification. HEAL proactively annotates unlabeled data points via uncertainty and diversity-guided acquisition, leading to a more efficient dataset annotation and lowering labor costs. Unlike conventional AL methods that only support classifiers built upon deep neural networks (DNN), HEAL operates without the need for gradient or probabilistic computations. This allows it to be effortlessly integrated with any existing HDC classifier architecture. The key design of HEAL is a novel approach for uncertainty estimation in HDC classifiers through a lightweight HDC ensemble with prior hypervectors. Additionally, by exploiting hypervectors as prototypes (i.e., compact representations), we develop an extra metric for HEAL to select diverse samples within each batch for annotation. Our evaluation shows that HEAL surpasses a diverse set of baselines in AL quality and achieves notably faster acquisition than many BNN-powered or diversity-guided AL methods, recording 11 times to 40,000 times speedup in acquisition runtime per batch. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.06079 [pdf, other]

DiscDiff: Latent Diffusion Model for DNA Sequence Generation

Authors: Zehui Li, Yuhao Ni, William A V Beardall, Guoxuan Xia, Akashaditya Das, Guy-Bart Stan, Yiren Zhao

Abstract: This paper introduces a novel framework for DNA sequence generation, comprising two key components: DiscDiff, a Latent Diffusion Model (LDM) tailored for generating discrete DNA sequences, and Absorb-Escape, a post-training algorithm designed to refine these sequences. Absorb-Escape enhances the realism of the generated sequences by correcting `round errors' inherent in the conversion process betw… ▽ More This paper introduces a novel framework for DNA sequence generation, comprising two key components: DiscDiff, a Latent Diffusion Model (LDM) tailored for generating discrete DNA sequences, and Absorb-Escape, a post-training algorithm designed to refine these sequences. Absorb-Escape enhances the realism of the generated sequences by correcting `round errors' inherent in the conversion process between latent and input spaces. Our approach not only sets new standards in DNA sequence generation but also demonstrates superior performance over existing diffusion models, in generating both short and long DNA sequences. Additionally, we introduce EPD-GenDNA, the first comprehensive, multi-species dataset for DNA generation, encompassing 160,000 unique sequences from 15 species. We hope this study will advance the generative modelling of DNA, with potential implications for gene therapy and protein production. △ Less

Submitted 17 April, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: Different from the prior work "Latent Diffusion Model for DNA Sequence Generation" (arXiv:2310.06150), we updated the evaluation framework and compared the DiscDiff with other methods comprehensively. In addition, a post-training framework is proposed to increase the quality of generated sequences

arXiv:2402.03584 [pdf, other]

Helium-deficient ER UMa-type dwarf nova below the period minimum with a hot secondary

Authors: Youngdae Lee, Dae-Sik Moon, Sang Chul Kim, Hong Soo Park, Yuan Qi Ni

Abstract: We present the discovery of a peculiar dwarf nova KSP-OT-201712a using high-cadence, multi-color observations made with the Korea Microlensing Telescope Network. KSP-OT-201712a exhibits a rare presence of outbursts during standstills as well as strong H$α$ emission for a dwarf nova below the period minimum with an orbital period of 58.75 $\pm$ 0.02 minutes. The outburst cycles are ~ 6.6 days withi… ▽ More We present the discovery of a peculiar dwarf nova KSP-OT-201712a using high-cadence, multi-color observations made with the Korea Microlensing Telescope Network. KSP-OT-201712a exhibits a rare presence of outbursts during standstills as well as strong H$α$ emission for a dwarf nova below the period minimum with an orbital period of 58.75 $\pm$ 0.02 minutes. The outburst cycles are ~ 6.6 days within standstills but increase to ~ 15 days outside of them. Both B-V and V-I colors become bluer and redder as the outburst luminosities increase and decrease, respectively, for the outburst within standstill, while they evolve in the opposite directions outside of the standstills. The presence of strong double-peaked H$α$ and weak He I emission lines with He/H flux ratio of 0.27, together with absorption lines of Mg b and Na D in the source, leads to the estimation Teff ~ 4570 $\pm$ 40 K, [Fe/H] ~ 0.06 $\pm$ 0.15 dex, and log g ~ 4.5 $\pm$ 0.1 for its secondary. KSP-OT-201712a is the second He-deficient dwarf nova below the period minimum, while the temperature of the secondary is measured for the first time in such objects. We identify it to be an ER UMa type dwarf nova suggesting that the evolution of dwarf novae across the period minimum is accompanied by large mass transfers. The high temperature of the secondary indicates that the system started its mass transfer when the secondary was about 93$\%$ of its main sequence age. The system will evolve to a helium cataclysmic variable or to AM CVn once its hydrogen envelope is exhausted before it explodes as a Type Ia supernova. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 12 pages, 5 figures, accepted for publication in ApJ

arXiv:2402.02791 [pdf, other]

Rethinking Optimization and Architecture for Tiny Language Models

Authors: Yehui Tang, Fangcheng Liu, Yunsheng Ni, Yuchuan Tian, Zheyuan Bai, Yi-Qi Hu, Sichao Liu, Shangling Jui, Kai Han, Yunhe Wang

Abstract: The power of large language models (LLMs) has been demonstrated through numerous data and computing resources. However, the application of language models on mobile devices is facing huge challenge on the computation and memory costs, that is, tiny language models with high performance are urgently required. Limited by the highly complex training process, there are many details for optimizing lang… ▽ More The power of large language models (LLMs) has been demonstrated through numerous data and computing resources. However, the application of language models on mobile devices is facing huge challenge on the computation and memory costs, that is, tiny language models with high performance are urgently required. Limited by the highly complex training process, there are many details for optimizing language models that are seldom studied carefully. In this study, based on a tiny language model with 1B parameters, we carefully design a series of empirical study to analyze the effect of each component. Three perspectives are mainly discussed, \ie, neural architecture, parameter initialization, and optimization strategy. Several design formulas are empirically proved especially effective for tiny language models, including tokenizer compression, architecture tweaking, parameter inheritance and multiple-round training. Then we train PanGu-$π$-1B Pro and PanGu-$π$-1.5B Pro on 1.6T multilingual corpora, following the established formulas. Experimental results demonstrate the improved optimization and architecture yield a notable average improvement of 8.87 on benchmark evaluation sets for PanGu-$π$-1B Pro. Besides, PanGu-$π$-1.5B Pro surpasses a range of SOTA models with larger model sizes, validating its superior performance. The code is available at https://github.com/YuchuanTian/RethinkTinyLM. △ Less

Submitted 6 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.02043 [pdf, other]

A Plug-in Tiny AI Module for Intelligent and Selective Sensor Data Transmission

Authors: Wenjun Huang, Arghavan Rezvani, Hanning Chen, Yang Ni, Sanggeon Yun, Sungheon Jeong, Mohsen Imani

Abstract: Applications in the Internet of Things (IoT) utilize machine learning to analyze sensor-generated data. However, a major challenge lies in the lack of targeted intelligence in current sensing systems, leading to vast data generation and increased computational and communication costs. To address this challenge, we propose a novel sensing module to equip sensing frameworks with intelligent data tra… ▽ More Applications in the Internet of Things (IoT) utilize machine learning to analyze sensor-generated data. However, a major challenge lies in the lack of targeted intelligence in current sensing systems, leading to vast data generation and increased computational and communication costs. To address this challenge, we propose a novel sensing module to equip sensing frameworks with intelligent data transmission capabilities by integrating a highly efficient machine learning model placed near the sensor. This model provides prompt feedback for the sensing system to transmit only valuable data while discarding irrelevant information by regulating the frequency of data transmission. The near-sensor model is quantized and optimized for real-time sensor control. To enhance the framework's performance, the training process is customized and a "lazy" sensor deactivation strategy utilizing temporal information is introduced. The suggested method is orthogonal to other IoT frameworks and can be considered as a plugin for selective data transmission. The framework is implemented, encompassing both software and hardware components. The experiments demonstrate that the framework utilizing the suggested module achieves over 85% system efficiency in terms of energy consumption and storage, with negligible impact on performance. This methodology has the potential to significantly reduce data output from sensors, benefiting a wide range of IoT applications. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: 14 pages, 6 figures

arXiv:2402.00395 [pdf, other]

ONE-SA: Enabling Nonlinear Operations in Systolic Arrays for Efficient and Flexible Neural Network Inference

Authors: Ruiqi Sun, Yinchen Ni, Xin He, Jie Zhao, An Zou

Abstract: The computation and memory-intensive nature of DNNs limits their use in many mobile and embedded contexts. Application-specific integrated circuit (ASIC) hardware accelerators employ matrix multiplication units (such as the systolic arrays) and dedicated nonlinear function units to speed up DNN computations. A close examination of these ASIC accelerators reveals that the designs are often speciali… ▽ More The computation and memory-intensive nature of DNNs limits their use in many mobile and embedded contexts. Application-specific integrated circuit (ASIC) hardware accelerators employ matrix multiplication units (such as the systolic arrays) and dedicated nonlinear function units to speed up DNN computations. A close examination of these ASIC accelerators reveals that the designs are often specialized and lack versatility across different networks, especially when the networks have different types of computation. In this paper, we introduce a novel systolic array architecture, which is capable of executing nonlinear functions. By encompassing both inherent linear and newly enabled nonlinear functions within the systolic arrays, the proposed architecture facilitates versatile network inferences, substantially enhancing computational power and energy efficiency. Experimental results show that employing this systolic array enables seamless execution of entire DNNs, incurring only a negligible loss in the network inference accuracy. Furthermore, assessment and evaluation with FPGAs reveal that integrating nonlinear computation capacity into a systolic array does not introduce extra notable (less than 1.5%) block memory memories (BRAMs), look-up-tables (LUTs), or digital signal processors (DSPs) but a mere 13.3% - 24.1% more flip flops (FFs). In comparison to existing methodologies, executing the networks with the proposed systolic array, which enables the flexibility of different network models, yields up to 25.73x, 5.21x, and 1.54x computational efficiency when compared to general-purpose CPUs, GPUs, and SoCs respectively, while achieving comparable (83.4% - 135.8%) performance with the conventional accelerators which are designed for specific neural network models. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: Accepted to DATE 2024

arXiv:2401.16608 [pdf, other]

The evolution of galaxy morphology from redshift z=6 to 3: Mock JWST observations of galaxies in the ASTRID simulation

Authors: Patrick LaChance, Rupert Croft, Yueying Ni, Nianyi Chen, Tiziana Di Matteo, Simeon Bird

Abstract: We present mock JWST observations for more than 215,000 different galaxies from the Astrid simulation with $3 \leq z \leq 6$. The mock observations are made using the BPASS stellar SED model, and a simple dust model. They are then viewed through NIRCam filters, convolved with a PSF, have noise added, and are drizzled together to emulate the Cosmic Evolution Early Release Science (CEERS) survey. We… ▽ More We present mock JWST observations for more than 215,000 different galaxies from the Astrid simulation with $3 \leq z \leq 6$. The mock observations are made using the BPASS stellar SED model, and a simple dust model. They are then viewed through NIRCam filters, convolved with a PSF, have noise added, and are drizzled together to emulate the Cosmic Evolution Early Release Science (CEERS) survey. We analyse this dataset by computing a number of morphological measures and find our catalog to have comparable statistics to similar mock catalogs, and the first release of CEERS data. We find that most of the Sersic indices of galaxies in our redshift range are lower than observed, with most having n less than one. Additionally, we observe the sizes of galaxies of all masses to increase from redshift z=6 to redshift z=3 consistent with other results. The number of galaxies in our catalog allows us to examine how relationships like the mass-size relation evolve with redshift, and compare the accuracy of a variety of traditional galaxy classification techniques (Sersic fit, Asymmetry-Concentration, and Gini-$M_{20}$) within our redshift range. We find the mass-size relation to be nearly flat at redshift z=6, and consistently increases as redshift decreases, and find the galaxy classification methods have minimal correlation with each other in our redshift range. We also investigate the impact that different stages of our imaging pipeline have on these morphological measures to determine how robust mock catalogs are to different choices at each step. Finally, we test the addition of incorporating light from AGNs into our pipeline and find that while the population of galaxies that have significant AGN luminosity is low, those galaxies do tend to have higher Sersic indices once the AGN luminosity is added, rectifying some of the systematic bias towards lower Sersic indices present in our dataset. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 17 pages, 14 figures

arXiv:2401.10400 [pdf, other]

Auto-Calibration and Biconvex Compressive Sensing with Applications to Parallel MRI

Authors: Yuan Ni, Thomas Strohmer

Abstract: We study an auto-calibration problem in which a transform-sparse signal is compressive-sensed by multiple sensors in parallel with unknown sensing parameters. The problem has an important application in pMRI reconstruction, where explicit coil calibrations are often difficult and costly to achieve in practice, but nevertheless a fundamental requirement for high-precision reconstructions. Most auto… ▽ More We study an auto-calibration problem in which a transform-sparse signal is compressive-sensed by multiple sensors in parallel with unknown sensing parameters. The problem has an important application in pMRI reconstruction, where explicit coil calibrations are often difficult and costly to achieve in practice, but nevertheless a fundamental requirement for high-precision reconstructions. Most auto-calibrated strategies result in reconstruction that corresponds to solving a challenging biconvex optimization problem. We transform the auto-calibrated parallel sensing as a convex optimization problem using the idea of `lifting'. By exploiting sparsity structures in the signal and the redundancy introduced by multiple sensors, we solve a mixed-norm minimization problem to recover the underlying signal and the sensing parameters simultaneously. Robust and stable recovery guarantees are derived in the presence of noise and sparsity deficiencies in the signals. For the pMRI application, our method provides a theoretically guaranteed approach to self-calibrated parallel imaging to accelerate MRI acquisitions under appropriate assumptions. Developments in MRI are discussed, and numerical simulations using the analytical phantom and simulated coil sensitives are presented to support our theoretical results. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: Keywords: Self-calibration, Compressive sensing, Convex optimization, Random matrices, Parallel MRI

arXiv:2401.08914 [pdf, other]

Simulated host galaxy analogs of high-z quasars observed with JWST

Authors: Sabrina Berger, Madeline A. Marshall, J. Stuart B. Wyithe, Tiziana di Matteo, Yueying Ni, Stephen M. Wilkins

Abstract: The hosts of two low-luminosity high-z quasars, J2255+0251 and J2236+0032, were recently detected using JWST's NIRCam instrument. These represent the first high-z quasar host galaxy stellar detections and open a new window into studying high-z quasars. We examine the implications of the measured properties of J2255+0251 and J2236+0032 within the context of the hydrodynamic simulation BlueTides at… ▽ More The hosts of two low-luminosity high-z quasars, J2255+0251 and J2236+0032, were recently detected using JWST's NIRCam instrument. These represent the first high-z quasar host galaxy stellar detections and open a new window into studying high-z quasars. We examine the implications of the measured properties of J2255+0251 and J2236+0032 within the context of the hydrodynamic simulation BlueTides at z = 6.5. We find that these observed quasars fall on the BlueTides stellar to black hole mass relation and have similar luminosities to the brightest simulated quasars. We predict their star formation rates, estimating approximately $10^{2-3}$ $M_{\odot}/ \rm yr$ for both quasar hosts. J2255+0251 and J2236+0032's host galaxy radii also fall within estimates of the radii of the simulated host galaxies of similar luminosity quasars. We generate mock JWST NIRCam images of analogs to the observed quasars within BlueTides and perform a point source removal to illustrate both a qualitative and quantitative comparison of the measured and simulated radii and magnitudes. The quasar subtraction works well for similar luminosity quasars, and the recovered host images are consistent with what was observed for J2255+0251 and J2236+0032, further supporting the success of those observations. We also use our mock imaging pipeline to make predictions for the detection of J2255+0251 and J2236+0032's hosts in upcoming JWST observations. We anticipate that the simulation analogs of future high-z quasar host discoveries will allow us to make accurate predictions of their properties beyond the capabilities of JWST. △ Less

Submitted 18 April, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted to MNRAS. 15 pages, 11 figures

arXiv:2401.02034 [pdf, other]

Text2MDT: Extracting Medical Decision Trees from Medical Texts

Authors: Wei Zhu, Wenfeng Li, Xing Tian, Pengfei Wang, Xiaoling Wang, Jin Chen, Yuanbin Wu, Yuan Ni, Guotong Xie

Abstract: Knowledge of the medical decision process, which can be modeled as medical decision trees (MDTs), is critical to build clinical decision support systems. However, the current MDT construction methods rely heavily on time-consuming and laborious manual annotation. In this work, we propose a novel task, Text2MDT, to explore the automatic extraction of MDTs from medical texts such as medical guidelin… ▽ More Knowledge of the medical decision process, which can be modeled as medical decision trees (MDTs), is critical to build clinical decision support systems. However, the current MDT construction methods rely heavily on time-consuming and laborious manual annotation. In this work, we propose a novel task, Text2MDT, to explore the automatic extraction of MDTs from medical texts such as medical guidelines and textbooks. We normalize the form of the MDT and create an annotated Text-to-MDT dataset in Chinese with the participation of medical experts. We investigate two different methods for the Text2MDT tasks: (a) an end-to-end framework which only relies on a GPT style large language models (LLM) instruction tuning to generate all the node information and tree structures. (b) The pipeline framework which decomposes the Text2MDT task to three subtasks. Experiments on our Text2MDT dataset demonstrate that: (a) the end-to-end method basd on LLMs (7B parameters or larger) show promising results, and successfully outperform the pipeline methods. (b) The chain-of-thought (COT) prompting method \cite{Wei2022ChainOT} can improve the performance of the fine-tuned LLMs on the Text2MDT test set. (c) the lightweight pipelined method based on encoder-based pretrained models can perform comparably with LLMs with model complexity two magnititudes smaller. Our Text2MDT dataset is open-sourced at \url{https://tianchi.aliyun.com/dataset/95414}, and the source codes are open-sourced at \url{https://github.com/michael-wzhu/text2dt}. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2401.01286 [pdf, other]

A Comprehensive Study of Knowledge Editing for Large Language Models

Authors: Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni, Siyuan Cheng, Ziwen Xu, Xin Xu, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, Huajun Chen

Abstract: Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs t… ▽ More Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the knowledge editing problem and then provide a comprehensive review of cutting-edge approaches. Drawing inspiration from educational and cognitive research theories, we propose a unified categorization criterion that classifies knowledge editing methods into three groups: resorting to external knowledge, merging knowledge into the model, and editing intrinsic knowledge. Furthermore, we introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches. Additionally, we provide an in-depth analysis of knowledge location, which can give a deeper understanding of the knowledge structures inherent within LLMs. Finally, we discuss several potential applications of knowledge editing, outlining its broad and impactful implications. △ Less

Submitted 28 March, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

Comments: Ongoing work; 52 pages, 282 citations; benchmark is available at https://huggingface.co/datasets/zjunlp/KnowEdit code is available at https://github.com/zjunlp/EasyEdit paper list is available at https://github.com/zjunlp/KnowledgeEditingPapers

arXiv:2312.14263 [pdf, other]

z~2 dual AGN host galaxies are disky: stellar kinematics in the ASTRID Simulation

Authors: Ekaterine Dadiani, Tiziana Di Matteo, Nianyi Chen, Patrick Lachance, Yue Shen, Yu-Ching Chen, Rupert Croft, Yueying Ni, Simeon Bird

Abstract: We study dual AGN host galaxy morphologies at $z=2$ using the ASTRID simulation, selecting black hole (BH) pairs with small separation ($Δr<30\rm{kpc}$), high mass ($M_{\text{BH,12}}>10^7M_\odot$), and luminosity ($L_{\text{bol,12}}>10^{43}\rm{erg/s}$). We kinematically decompose (using MORDOR) $\sim1000$ dual AGN hosts into standard components - a `disk' (thin and thick disk, pseudo-bulge) and 'b… ▽ More We study dual AGN host galaxy morphologies at $z=2$ using the ASTRID simulation, selecting black hole (BH) pairs with small separation ($Δr<30\rm{kpc}$), high mass ($M_{\text{BH,12}}>10^7M_\odot$), and luminosity ($L_{\text{bol,12}}>10^{43}\rm{erg/s}$). We kinematically decompose (using MORDOR) $\sim1000$ dual AGN hosts into standard components - a `disk' (thin and thick disk, pseudo-bulge) and 'bulge' (bulge and halo) and define disk-dominated galaxies by the disk-to-total $D/T\geq0.5$. In ASTRID, $60.9\pm2.1\%$ of dual AGN hosts (independent of separation) are disk-dominated, with the $D/T$ distribution peaking at $\sim0.7$. Notably, hosts of BH pairs have similar morphologies (most either both disk or bulge-dominated). In dual-AGN hosts, the $D/T$ increases from $\sim17\% $ at $M_{\rm *}\sim 10^{9} M_{\odot}$ to $ 64\% $ for $M_{\rm *} \sim 10^{11.5} M_{\odot}$, and the pseudo-bulge is the dominant component of the disk fraction at the high mass end. Moreover, dual AGN hosts exhibit a higher fraction of disk/large pseudo-bulge than single-AGN hosts. The Disk-to-Total ratio is approximately constant with BH mass or AGN luminosity. We also create mock images of dual AGN host galaxies, employing morphological fitting software Statmorph to calculate morphological parameters and compare them with our kinematic decomposition results. Around $83.3\pm2.4\%$ of galaxies display disk-like profiles, of which $\sim60.7\pm2.2\%$ are kinematically confirmed as disks. Seŕsic indices and half-mass radii of dual AGN host galaxies align with observational measurements from HST at $z\sim2$. Around $34\%$ are identified as mergers from the $\text{Gini}-M_{20}$ relation. We find two dual AGN hosted by galaxies that exhibit disk-like seŕsic index $n_{12}<1$ and $(D/T)_{12}>0.5$, which are in remarkable agreement with properties of recently discovered dual quasars in disk galaxies at $z\sim 2$. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 15 pages, 12 figures, submitted to the Open Journal of Astrophysics

arXiv:2312.09602 [pdf, other]

Multi-Modality is All You Need for Transferable Recommender Systems

Authors: Youhua Li, Hanwen Du, Yongxin Ni, Pengpeng Zhao, Qi Guo, Fajie Yuan, Xiaofang Zhou

Abstract: ID-based Recommender Systems (RecSys), where each item is assigned a unique identifier and subsequently converted into an embedding vector, have dominated the designing of RecSys. Though prevalent, such ID-based paradigm is not suitable for developing transferable RecSys and is also susceptible to the cold-start issue. In this paper, we unleash the boundaries of the ID-based paradigm and propose a… ▽ More ID-based Recommender Systems (RecSys), where each item is assigned a unique identifier and subsequently converted into an embedding vector, have dominated the designing of RecSys. Though prevalent, such ID-based paradigm is not suitable for developing transferable RecSys and is also susceptible to the cold-start issue. In this paper, we unleash the boundaries of the ID-based paradigm and propose a Pure Multi-Modality based Recommender system (PMMRec), which relies solely on the multi-modal contents of the items (e.g., texts and images) and learns transition patterns general enough to transfer across domains and platforms. Specifically, we design a plug-and-play framework architecture consisting of multi-modal item encoders, a fusion module, and a user encoder. To align the cross-modal item representations, we propose a novel next-item enhanced cross-modal contrastive learning objective, which is equipped with both inter- and intra-modality negative samples and explicitly incorporates the transition patterns of user behaviors into the item encoders. To ensure the robustness of user representations, we propose a novel noised item detection objective and a robustness-aware contrastive learning objective, which work together to denoise user sequences in a self-supervised manner. PMMRec is designed to be loosely coupled, so after being pre-trained on the source data, each component can be transferred alone, or in conjunction with other components, allowing PMMRec to achieve versatility under both multi-modality and single-modality transfer learning settings. Extensive experiments on 4 sources and 10 target datasets demonstrate that PMMRec surpasses the state-of-the-art recommenders in both recommendation performance and transferability. Our code and dataset is available at: https://github.com/ICDE24/PMMRec. △ Less

Submitted 18 December, 2023; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: ICDE'24 Accepted

arXiv:2312.09183 [pdf, other]

MAGICS I. The First Few Orbits Encode the Fate of Seed Massive Black Hole Pairs

Authors: Nianyi Chen, Diptajyoti Mukherjee, Tiziana Di Matteo, Yueying Ni, Simeon Bird, Rupert Croft

Abstract: The elusive massive black hole (MBH) seeds stand to be revealed by the Laser Space Antenna Interferometer through mergers. As an aftermath of galaxy mergers, MBH coalescence is a vastly multi-scale process connected to galaxy formation. We introduce the "Massive black hole Assembly in Galaxies Informed by Cosmological Simulations" (MAGICS) suite, with galaxy/MBH properties and orbits recovered fro… ▽ More The elusive massive black hole (MBH) seeds stand to be revealed by the Laser Space Antenna Interferometer through mergers. As an aftermath of galaxy mergers, MBH coalescence is a vastly multi-scale process connected to galaxy formation. We introduce the "Massive black hole Assembly in Galaxies Informed by Cosmological Simulations" (MAGICS) suite, with galaxy/MBH properties and orbits recovered from large-volume cosmological simulation ASTRID. The simulations include subgrid star formation, supernovae feedback, and MBH accretion/feedback. In this first suite, we extract fifteen representative galaxy mergers with seed MBHs to examine their dynamics at an improved mass and spatial resolution (by $\sim2000$ and $\sim20$) and follow MBH orbits down to $\sim10\,\text{pc}$. We find that the seed MBH energy loss and orbital decay are largely governed by global torques induced by the galaxy merger process on scales resolvable by cosmological simulations. Specifically, pairs sink quickly if their orbits shrink rapidly below $1\,\text{kpc}$ during the first $\sim200\,\text{Myr}$ of pairing due to effective energy loss in major galaxy mergers, whereas MBHs gaining energy in minor galaxy mergers with head-on collisions are likely to stall. High initial eccentricities ($e_\text{init}>0.5$) and high stellar densities at kpc scales ($ρ_\text{star}>0.05\,M_\odot/\text{pc}^3$) also lead to most efficient decays. $\sim50\%$ high-redshift seed MBH pairs experience consecutive galaxy mergers and are more likely to stall at $\sim1\,\text{kpc}$. For a subset of systems, we carry out N-Body re-simulations until binary formation and find that some stalled systems merge at high-z when embedded in sufficient nuclear star clusters. △ Less

Submitted 25 April, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 21 pages, 20 Figures. Submitted to The Open Journal of Astrophysics. Comments welcome!

arXiv:2312.05860 [pdf, other]

Hund's coupling driven interorbital entanglement in orbital-selective Mott phase

Authors: Yuekun Niu, Yu Ni, Haishan Zhang, Liang Qiu, Jianli Wang, Leiming Chen, Yun Song, Shiping Feng

Abstract: We examine the orbital-selective Mott transition in the non-hybridized two-band Hubbard model using the dynamical mean-field theory. We find that the orbital-selective Mott transition could be depicted by the local quantum state fidelity. Additionally, within the orbital-selective Mott phase, the combined characteristics of the two orbitals lead to the presence of interorbital entanglement, which… ▽ More We examine the orbital-selective Mott transition in the non-hybridized two-band Hubbard model using the dynamical mean-field theory. We find that the orbital-selective Mott transition could be depicted by the local quantum state fidelity. Additionally, within the orbital-selective Mott phase, the combined characteristics of the two orbitals lead to the presence of interorbital entanglement, which is characterized by the non-semi-integer values of local quantum state fidelity. It is demonstrated that this entanglement is driven by transverse Hund's coupling, and the mechanisms underlying the orbital-selective Mott transition show prominent variations depending on the presence or absence of Hund's coupling and its transverse terms. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 6 pages, 3 figures

arXiv:2312.04426 [pdf, other]

SN2023ixf in Messier 101: the twilight years of the progenitor as seen by Pan-STARRS

Authors: Conor L. Ransome, V. Ashley Villar, Anna Tartaglia, Sebastian Javier Gonzalez, Wynn V. Jacobson-Galán, Charles D. Kilpatrick, Raffaella Margutti, Ryan J. Foley, Matthew Grayling, Yuan Qi Ni, Ricardo Yarza, Christine Ye, Katie Auchettl, Thomas de Boer, Kenneth C. Chambers, David A. Coulter, Maria R. Drout, Diego Farias, Christa Gall, Hua Gao, Mark E. Huber, Adaeze L. Ibik, David O. Jones, Nandita Khetan, Chien-Cheng Lin , et al. (6 additional authors not shown)

Abstract: The nearby type II supernova, SN2023ixf in M101 exhibits signatures of early-time interaction with circumstellar material in the first week post-explosion. This material may be the consequence of prior mass loss suffered by the progenitor which possibly manifested in the form of a detectable pre-supernova outburst. We present an analysis of the long-baseline pre-explosion photometric data in $g$,… ▽ More The nearby type II supernova, SN2023ixf in M101 exhibits signatures of early-time interaction with circumstellar material in the first week post-explosion. This material may be the consequence of prior mass loss suffered by the progenitor which possibly manifested in the form of a detectable pre-supernova outburst. We present an analysis of the long-baseline pre-explosion photometric data in $g$, $w$, $r$, $i$, $z$ and $y$ filters from Pan-STARRS as part of the Young Supernova Experiment, spanning $\sim$5,000 days. We find no significant detections in the Pan-STARRS pre-explosion light curve. We train a multilayer perceptron neural network to classify pre-supernova outbursts. We find no evidence of eruptive pre-supernova activity to a limiting absolute magnitude of $-7$. The limiting magnitudes from the full set of $gwrizy$ (average absolute magnitude $\approx$-8) data are consistent with previous pre-explosion studies. We use deep photometry from the literature to constrain the progenitor of SN2023ixf, finding that these data are consistent with a dusty red supergiant (RSG) progenitor with luminosity $\log\left(L/L_\odot\right)$$\approx$5.12 and temperature $\approx$3950K, corresponding to a mass of 14-20 M$_\odot$ △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: 19 pages, 8 figures, 1 table

arXiv:2311.16502 [pdf, other]

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Authors: Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

Abstract: We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and… ▽ More We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering. These questions span 30 subjects and 183 subfields, comprising 30 highly heterogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. Unlike existing benchmarks, MMMU focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. The evaluation of 14 open-source LMMs as well as the proprietary GPT-4V(ision) and Gemini highlights the substantial challenges posed by MMMU. Even the advanced GPT-4V and Gemini Ultra only achieve accuracies of 56% and 59% respectively, indicating significant room for improvement. We believe MMMU will stimulate the community to build next-generation multimodal foundation models towards expert artificial general intelligence. △ Less

Submitted 13 June, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: CVPR 2024 Oral

arXiv:2311.14531 [pdf, other]

Formation of a long filament through the connection of two filament segments observed by CHASE

Authors: H. T. Li, X. Cheng, Y. W. Ni, C. Li, S. H. Rao, J. H. Guo, M. D. Ding, P. F. Chen

Abstract: We present imaging and spectroscopic diagnostics of a long filament during its formation with the observations from the Chinese H$α$ Solar Explorer and Solar Dynamics Observatory. The seed filament first appeared at about 05:00 UT on 2022 September 13. Afterwards, it grew gradually and connected to another filament segment nearby, building up a long filament at about 20:00 UT on the same day. The… ▽ More We present imaging and spectroscopic diagnostics of a long filament during its formation with the observations from the Chinese H$α$ Solar Explorer and Solar Dynamics Observatory. The seed filament first appeared at about 05:00 UT on 2022 September 13. Afterwards, it grew gradually and connected to another filament segment nearby, building up a long filament at about 20:00 UT on the same day. The CHASE H$α$ spectra show an obvious centroid absorption with mild broadening at the main spine of the long filament, which is interpreted as the evidence of filament material accumulation. More interestingly, near the footpoints of the filament, persistent redshifts have been detected in the H$α$ spectra during the filament formation, indicating continuous drainage of filament materials. Furthermore, through inspecting the extreme ultraviolet images and magnetograms, it is found that EUV jets and brightenings appeared repeatedly at the junction of the two filament segments, where opposite magnetic polarities converged and canceled to each other continuously. These results suggest the occurrence of intermittent magnetic reconnection that not only connects magnetic structures of the two filament segments but also supplies cold materials for the filament channel likely by the condensation of injected hot plasma, even though a part of cold materials fall down to the filament footpoints at the same time. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 11 pages, 6 figures, Accepted for publication in ApJL

arXiv:2311.13432 [pdf, other]

Modelling the propagation of coronal mass ejections with COCONUT: implementation of the Regularized Biot-Savart Laws flux rope model

Authors: Jinhan Guo, L. Linan, S. Poedts, Y. Guo, A. Lani, B. Schmieder, M. Brchnelova, B. Perri, T. Baratashvili, Y. W. Ni, P. F. Chen

Abstract: Context: Coronal mass ejections (CMEs) are rapid eruptions of magnetized plasma that occur on the Sun, which are known as the main drivers of adverse space weather. Accurately tracking their evolution in the heliosphere in numerical models is of utmost importance for space weather forecasting. Aims: The main objective of this paper is to implement the Regularized Biot-Savart Laws (RBSL) method in… ▽ More Context: Coronal mass ejections (CMEs) are rapid eruptions of magnetized plasma that occur on the Sun, which are known as the main drivers of adverse space weather. Accurately tracking their evolution in the heliosphere in numerical models is of utmost importance for space weather forecasting. Aims: The main objective of this paper is to implement the Regularized Biot-Savart Laws (RBSL) method in a new global corona model COCONUT. This approach has the capability to construct the magnetic flux rope with an axis of arbitrary shape. Methods: We present the implementation process of the RBSL flux rope model in COCONUT, which is superposed onto a realistic solar wind reconstructed from the observed magnetogram around the minimum of solar activity. Based on this, we simulate the propagation of an S-shaped flux rope from the solar surface to a distance of 25 solar radii. Results: Our simulation successfully reproduces the birth process of a CME originating from a sigmoid in a self-consistent way. The model effectively captures various physical processes and retrieves the prominent features of the CMEs in observations. In addition, the simulation results indicate that the magnetic topology of the CME flux rope at around 20 solar radii deviates from a coherent structure, and manifests as a mix of open and closed field lines with diverse footpoints. Conclusions: This work demonstrates the potential of the RBSL flux rope model in reproducing CME events that are more consistent with observations. Moreover, our findings strongly suggest that magnetic reconnection during the CME propagation plays a critical role in destroying the coherent characteristic of a CME flux rope. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: 14 pages, 8 figures, accepted for publication in A&A

Showing 1–50 of 353 results for author: Ni, Y