subscribe to arXiv mailings

A rotational ellipsoid model for solid Earth tide with high precision

Authors: Yongfeng Yang, Yunfei Zhang, Qiang Liu, Xianqing Lv, Pu Huang

Abstract: Solid Earth tide represents the elastic response of solid Earth to the lunar (solar) gravitational force. The yielding solid Earth due to the force has been thought to be a prolate ellipsoid since the time of Lord Kelvin, yet the ellipsoid's geometry such as semi-major axis's length, semi-minor axis's length, and oblateness remains unresolved. Additionally, the tidal displacement of solid Earth is… ▽ More Solid Earth tide represents the elastic response of solid Earth to the lunar (solar) gravitational force. The yielding solid Earth due to the force has been thought to be a prolate ellipsoid since the time of Lord Kelvin, yet the ellipsoid's geometry such as semi-major axis's length, semi-minor axis's length, and oblateness remains unresolved. Additionally, the tidal displacement of solid Earth is conventionally resolved through a combination of expanded potential equations and given Earth model. Here we present a geometric model in which both the ellipsoid's geometry and the tidal displacement of solid Earth can be resolved through a rotating ellipse with respect to the Moon (Sun). We test the geometric model using 23-year gravity data from 22 superconducting gravimeter (SG) stations and compare it with the current model recommended by the IERS (International Earth Rotation System) conventions (2010), the average Root Mean Square (RMS) deviation of the gravity change yielded by the geometric model against observation is 6.47 μGal (equivalent to 2.07 cm), while that yielded by the current model is 30.77μGal (equivalent to 9.85 cm). The geometric model represents a significant advance in understanding and predicting solid Earth tide, and will greatly contribute to many application fields such as geodesy, geophysics, astronomy, and oceanography. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 20 pages, 4 figures, 1 table

arXiv:2407.04051 [pdf, other]

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, speaking style, and speaker identity. SenseVoice-Small delivers exceptionally low-latency ASR for 5 languages, and SenseVoice-Large supports high-precision ASR for over 50 languages, while CosyVoice excels in multi-lingual voice generation, zero-shot in-context learning, cross-lingual voice cloning, and instruction-following capabilities. The models related to SenseVoice and CosyVoice have been open-sourced on Modelscope and Huggingface, along with the corresponding training, inference, and fine-tuning codes released on GitHub. By integrating these models with LLMs, FunAudioLLM enables applications such as speech-to-speech translation, emotional voice chat, interactive podcasts, and expressive audiobook narration, thereby pushing the boundaries of voice interaction technology. Demos are available at https://fun-audio-llm.github.io, and the code can be accessed at https://github.com/FunAudioLLM. △ Less

Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

Comments: Work in progress. Authors are listed in alphabetical order by family name

arXiv:2406.18556 [pdf]

Renal digital pathology visual knowledge search platform based on language large model and book knowledge

Authors: Xiaomin Lv, Chong Lai, Liya Ding, Maode Lai, Qingrong Sun

Abstract: Large models have become mainstream, yet their applications in digital pathology still require exploration. Meanwhile renal pathology images play an important role in the diagnosis of renal diseases. We conducted image segmentation and paired corresponding text descriptions based on 60 books for renal pathology, clustering analysis for all image and text description features based on large models,… ▽ More Large models have become mainstream, yet their applications in digital pathology still require exploration. Meanwhile renal pathology images play an important role in the diagnosis of renal diseases. We conducted image segmentation and paired corresponding text descriptions based on 60 books for renal pathology, clustering analysis for all image and text description features based on large models, ultimately building a retrieval system based on the semantic features of large models. Based above analysis, we established a knowledge base of 10,317 renal pathology images and paired corresponding text descriptions, and then we evaluated the semantic feature capabilities of 4 large models, including GPT2, gemma, LLma and Qwen, and the image-based feature capabilities of dinov2 large model. Furthermore, we built a semantic retrieval system to retrieve pathological images based on text descriptions, and named RppD (aidp.zjsru.edu.cn). △ Less

Submitted 26 May, 2024; originally announced June 2024.

Comments: 9 pages, 6 figures

arXiv:2406.15486 [pdf, other]

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

Authors: Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Huanqi Cao, Xiao Chuanfu, Xingcheng Zhang, Dahua Lin, Chao Yang

Abstract: Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional pretraining or finetuning, and often sacrifice model accuracy. In this paper, we first provide both theoretical and empirical foundations for nea… ▽ More Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional pretraining or finetuning, and often sacrifice model accuracy. In this paper, we first provide both theoretical and empirical foundations for near-lossless sparse attention. We find dynamically capturing head-specific sparse patterns at runtime with low overhead is crucial. To address this, we propose SampleAttention, an adaptive structured and near-lossless sparse attention. Leveraging observed significant sparse patterns, SampleAttention attends to a fixed percentage of adjacent tokens to capture local window patterns, and employs a two-stage query-guided key-value filtering approach, which adaptively select a minimum set of key-values with low overhead, to capture column stripe patterns. Comprehensive evaluations show that SampleAttention can seamlessly replace vanilla attention in off-the-shelf LLMs with nearly no accuracy loss, and reduces TTFT by up to $2.42\times$ compared with FlashAttention. △ Less

Submitted 28 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.12793 [pdf, other]

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang , et al. (32 additional authors not shown)

Abstract: We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained on ten trillions of tokens mostly in Chinese and English, along with a small set of corpus from 24 languages, and aligned primarily for Chinese and English usage. The high-quality alignment is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human feedback. Evaluations show that GLM-4 1) closely rivals or outperforms GPT-4 in terms of general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval, 2) gets close to GPT-4-Turbo in instruction following as measured by IFEval, 3) matches GPT-4 Turbo (128K) and Claude 3 for long context tasks, and 4) outperforms GPT-4 in Chinese alignments as measured by AlignBench. The GLM-4 All Tools model is further aligned to understand user intent and autonomously decide when and which tool(s) touse -- including web browser, Python interpreter, text-to-image model, and user-defined functions -- to effectively complete complex tasks. In practical applications, it matches and even surpasses GPT-4 All Tools in tasks like accessing online information via web browsing and solving math problems using Python interpreter. Over the course, we have open-sourced a series of models, including ChatGLM-6B (three generations), GLM-4-9B (128K, 1M), GLM-4V-9B, WebGLM, and CodeGeeX, attracting over 10 million downloads on Hugging face in the year 2023 alone. The open models can be accessed through https://github.com/THUDM and https://huggingface.co/THUDM. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12295 [pdf, other]

Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding

Authors: Kaiyan Zhang, Jianyu Wang, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv, Bowen Zhou

Abstract: Large Language Models (LLMs) demonstrate impressive performance in diverse applications, yet they face significant drawbacks, including high inference latency, expensive training cost, and generation of hallucination. Collaborative decoding between large and small language models (SLMs) offers a novel approach to address these challenges. Inspired by dual-process cognitive theory, we integrate the… ▽ More Large Language Models (LLMs) demonstrate impressive performance in diverse applications, yet they face significant drawbacks, including high inference latency, expensive training cost, and generation of hallucination. Collaborative decoding between large and small language models (SLMs) offers a novel approach to address these challenges. Inspired by dual-process cognitive theory, we integrate these methods into a unified framework termed Fast and Slow Generating (FS-GEN). This paper explores several techniques within the FS-GEN framework, including speculative decoding, contrastive decoding, and emulator or proxy fine-tuning. We provide a comprehensive analysis of these methodologies, offering insights into their similarities and differences under this framework. Our study delves into the differential knowledge capabilities of LLMs versus SLMs through the FS-GEN lens, revealing that fewer than 20% of collaborative interactions are required across various methods. These interactions adhere to a scaling law relative to the parameter ratios, thereby facilitating predictable collaboration. Furthermore, we investigate the specific positions where collaboration is most effective from an uncertainty perspective, yielding novel insights that could refine FS-GEN methods. Our findings reveal that the essential difference between models of different sizes lies in the uncertainty of the next token prediction, where interventions by larger models are most needed to assist the smaller ones. Code for Reproduction: https://github.com/TsinghuaC3I/FS-GEN △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.10744 [pdf, other]

Technique Report of CVPR 2024 PBDL Challenges

Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches. △ Less

Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

arXiv:2406.03949 [pdf, other]

UltraMedical: Building Specialized Generalists in Biomedicine

Authors: Kaiyan Zhang, Sihang Zeng, Ermo Hua, Ning Ding, Zhang-Ren Chen, Zhiyuan Ma, Haoxin Li, Ganqu Cui, Biqing Qi, Xuekai Zhu, Xingtai Lv, Hu Jinfang, Zhiyuan Liu, Bowen Zhou

Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains and are moving towards more specialized areas. Recent advanced proprietary models such as GPT-4 and Gemini have achieved significant advancements in biomedicine, which have also raised privacy and security challenges. The construction of specialized generalists hinges largely on high-quality datasets, enh… ▽ More Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains and are moving towards more specialized areas. Recent advanced proprietary models such as GPT-4 and Gemini have achieved significant advancements in biomedicine, which have also raised privacy and security challenges. The construction of specialized generalists hinges largely on high-quality datasets, enhanced by techniques like supervised fine-tuning and reinforcement learning from human or AI feedback, and direct preference optimization. However, these leading technologies (e.g., preference learning) are still significantly limited in the open source community due to the scarcity of specialized data. In this paper, we present the UltraMedical collections, which consist of high-quality manual and synthetic datasets in the biomedicine domain, featuring preference annotations across multiple advanced LLMs. By utilizing these datasets, we fine-tune a suite of specialized medical models based on Llama-3 series, demonstrating breathtaking capabilities across various medical benchmarks. Moreover, we develop powerful reward models skilled in biomedical and general reward benchmark, enhancing further online preference learning within the biomedical LLM community. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Datasets and models are available at https://github.com/TsinghuaC3I/UltraMedical

arXiv:2406.00434 [pdf, other]

MoDGS: Dynamic Gaussian Splatting from Causually-captured Monocular Videos

Authors: Qingming Liu, Yuan Liu, Jiepeng Wang, Xianqiang Lv, Peng Wang, Wenping Wang, Junhui Hou

Abstract: In this paper, we propose MoDGS, a new pipeline to render novel-view images in dynamic scenes using only casually captured monocular videos. Previous monocular dynamic NeRF or Gaussian Splatting methods strongly rely on the rapid movement of input cameras to construct multiview consistency but fail to reconstruct dynamic scenes on casually captured input videos whose cameras are static or move slo… ▽ More In this paper, we propose MoDGS, a new pipeline to render novel-view images in dynamic scenes using only casually captured monocular videos. Previous monocular dynamic NeRF or Gaussian Splatting methods strongly rely on the rapid movement of input cameras to construct multiview consistency but fail to reconstruct dynamic scenes on casually captured input videos whose cameras are static or move slowly. To address this challenging task, MoDGS adopts recent single-view depth estimation methods to guide the learning of the dynamic scene. Then, a novel 3D-aware initialization method is proposed to learn a reasonable deformation field and a new robust depth loss is proposed to guide the learning of dynamic scene geometry. Comprehensive experiments demonstrate that MoDGS is able to render high-quality novel view images of dynamic scenes from just a casually captured monocular video, which outperforms baseline methods by a significant margin. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.11870 [pdf, other]

Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process

Authors: Ermo Hua, Biqing Qi, Kaiyan Zhang, Yue Yu, Ning Ding, Xingtai Lv, Kai Tian, Bowen Zhou

Abstract: Supervised Fine-Tuning (SFT) and Preference Optimization (PO) are two fundamental processes for enhancing the capabilities of Language Models (LMs) post pre-training, aligning them better with human preferences. Although SFT advances in training efficiency, PO delivers better alignment, thus they are often combined. However, common practices simply apply them sequentially without integrating their… ▽ More Supervised Fine-Tuning (SFT) and Preference Optimization (PO) are two fundamental processes for enhancing the capabilities of Language Models (LMs) post pre-training, aligning them better with human preferences. Although SFT advances in training efficiency, PO delivers better alignment, thus they are often combined. However, common practices simply apply them sequentially without integrating their optimization objectives, ignoring the opportunities to bridge their paradigm gap and take the strengths from both. To obtain a unified understanding, we interpret SFT and PO with two sub-processes -- Preference Estimation and Transition Optimization -- defined at token level within the Markov Decision Process (MDP) framework. This modeling shows that SFT is only a specialized case of PO with inferior estimation and optimization. PO evaluates the quality of model's entire generated answer, whereas SFT only scores predicted tokens based on preceding tokens from target answers. Therefore, SFT overestimates the ability of model, leading to inferior optimization. Building on this view, we introduce Intuitive Fine-Tuning (IFT) to integrate SFT and Preference Optimization into a single process. IFT captures LMs' intuitive sense of the entire answers through a temporal residual connection, but it solely relies on a single policy and the same volume of non-preference-labeled data as SFT. Our experiments show that IFT performs comparably or even superiorly to sequential recipes of SFT and some typical Preference Optimization methods across several tasks, particularly those requires generation, reasoning, and fact-following abilities. An explainable Frozen Lake game further validates the effectiveness of IFT for getting competitive policy. △ Less

Submitted 28 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.09552 [pdf, other]

ODFormer: Semantic Fundus Image Segmentation Using Transformer for Optic Nerve Head Detection

Authors: Jiayi Wang, Yi-An Mao, Xiaoyu Ma, Sicen Guo, Yuting Shao, Xiao Lv, Wenting Han, Mark Christopher, Linda M. Zangwill, Yanlong Bi, Rui Fan

Abstract: Optic nerve head (ONH) detection has been a crucial area of study in ophthalmology for years. However, the significant discrepancy between fundus image datasets, each generated using a single type of fundus camera, poses challenges to the generalizability of ONH detection approaches developed based on semantic segmentation networks. Despite the numerous recent advancements in general-purpose seman… ▽ More Optic nerve head (ONH) detection has been a crucial area of study in ophthalmology for years. However, the significant discrepancy between fundus image datasets, each generated using a single type of fundus camera, poses challenges to the generalizability of ONH detection approaches developed based on semantic segmentation networks. Despite the numerous recent advancements in general-purpose semantic segmentation methods using convolutional neural networks (CNNs) and Transformers, there is currently a lack of benchmarks for these state-of-the-art (SoTA) networks specifically trained for ONH detection. Therefore, in this article, we make contributions from three key aspects: network design, the publication of a dataset, and the establishment of a comprehensive benchmark. Our newly developed ONH detection network, referred to as ODFormer, is based upon the Swin Transformer architecture and incorporates two novel components: a multi-scale context aggregator and a lightweight bidirectional feature recalibrator. Our published large-scale dataset, known as TongjiU-DROD, provides multi-resolution fundus images for each participant, captured using two distinct types of cameras. Our established benchmark involves three datasets: DRIONS-DB, DRISHTI-GS1, and TongjiU-DROD, created by researchers from different countries and containing fundus images captured from participants of diverse races and ages. Extensive experimental results demonstrate that our proposed ODFormer outperforms other state-of-the-art (SoTA) networks in terms of performance and generalizability. Our dataset and source code are publicly available at mias.group/ODFormer. △ Less

Submitted 2 June, 2024; v1 submitted 15 April, 2024; originally announced May 2024.

arXiv:2404.19584 [pdf, other]

Broadband microwave-rate dark pulse microcombs in dissipation-engineered LiNbO$_3$ microresonators

Authors: Xiaomin Lv, Binbin Nie, Chen Yang, Rui Ma, Ze Wang, Yanwu Liu, Xing Jin, Kaixuan Zhu, Zhenyu Chen, Du Qian, Guanyu Zhang, Guowei Lv, Qihuang Gong, Fang Bo, Qi-Fan Yang

Abstract: Kerr microcombs generated in optical microresonators provide broadband light sources bridging optical and microwave signals. Their translation to thin-film lithium niobate unlocks second-order nonlinear optical interfaces such as electro-optic modulation and frequency doubling for completing comb functionalities. However, the strong Raman response of LiNbO$_3$ has complicated the formation of Kerr… ▽ More Kerr microcombs generated in optical microresonators provide broadband light sources bridging optical and microwave signals. Their translation to thin-film lithium niobate unlocks second-order nonlinear optical interfaces such as electro-optic modulation and frequency doubling for completing comb functionalities. However, the strong Raman response of LiNbO$_3$ has complicated the formation of Kerr microcombs. Until now, dark pulse microcombs, requiring a double balance between Kerr nonlinearity and normal group velocity dispersion as well as gain and loss, have remained elusive in LiNbO$_3$ microresonators. Here, by incorporating dissipation engineering, we demonstrate dark pulse microcombs with 25 GHz repetition frequency and 200 nm span in a high-$Q$ LiNbO$_3$ microresonator. Resonances near the Raman-active wavelengths are strongly damped by controlling phase-matching conditions of a specially designed pulley coupler. The coherence and tunability of the dark pulse microcombs are also investigated. Our work provides a solution to realize high-power microcombs operating at microwave rates on LiNbO$_3$ chips, promising new opportunities for the monolithic integration of applications spanning communication to microwave photonics. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.16687 [pdf, other]

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC. △ Less

Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.13299 [pdf, other]

PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition

Authors: Xi Fang, Weigang Wang, Xiaoxin Lv, Jun Yan

Abstract: The development of Large Language Models (LLM) and Diffusion Models brings the boom of Artificial Intelligence Generated Content (AIGC). It is essential to build an effective quality assessment framework to provide a quantifiable evaluation of different images or videos based on the AIGC technologies. The content generated by AIGC methods is driven by the crafted prompts. Therefore, it is intuitiv… ▽ More The development of Large Language Models (LLM) and Diffusion Models brings the boom of Artificial Intelligence Generated Content (AIGC). It is essential to build an effective quality assessment framework to provide a quantifiable evaluation of different images or videos based on the AIGC technologies. The content generated by AIGC methods is driven by the crafted prompts. Therefore, it is intuitive that the prompts can also serve as the foundation of the AIGC quality assessment. This study proposes an effective AIGC quality assessment (QA) framework. First, we propose a hybrid prompt encoding method based on a dual-source CLIP (Contrastive Language-Image Pre-Training) text encoder to understand and respond to the prompt conditions. Second, we propose an ensemble-based feature mixer module to effectively blend the adapted prompt and vision features. The empirical study practices in two datasets: AIGIQA-20K (AI-Generated Image Quality Assessment database) and T2VQA-DB (Text-to-Video Quality Assessment DataBase), which validates the effectiveness of our proposed method: Prompt Condition Quality Assessment (PCQA). Our proposed simple and feasible framework may promote research development in the multimodal generation field. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: Published in CVPR-2024's NTIRE: New Trends in Image Restoration and Enhancement workshop and challenges

arXiv:2404.10253 [pdf, other]

Kilometer-Level Coupled Modeling Using 40 Million Cores: An Eight-Year Journey of Model Development

Authors: Xiaohui Duan, Yuxuan Li, Zhao Liu, Bin Yang, Juepeng Zheng, Haohuan Fu, Shaoqing Zhang, Shiming Xu, Yang Gao, Wei Xue, Di Wei, Xiaojing Lv, Lifeng Yan, Haopeng Huang, Haitian Lu, Lingfeng Wan, Haoran Lin, Qixin Chang, Chenlin Li, Quanjie He, Zeyu Song, Xuantong Wang, Yangyang Yu, Xilong Fan, Zhaopeng Qu , et al. (16 additional authors not shown)

Abstract: With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries t… ▽ More With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries to minimizes manual code modifications, our project tries to achieve both improvement of performance and consistency of the model code. By using a hierarchical grid system and an OpenMP-based offloading toolkit, our porting and parallelization effort covers over 80% of the code, and achieves a simulation speed of 340 SDPD (simulated days per day) for 5-km atmosphere, 265 SDPD for 3-km ocean, and 222 SDPD for a coupled model, thus making multi-year or even multi-decadal experiments at such high resolution possible. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 18 pages, 13 figures

arXiv:2404.03577 [pdf, other]

Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models

Authors: Yantao Liu, Zijun Yao, Xin Lv, Yuchen Fan, Shulin Cao, Jifan Yu, Lei Hou, Juanzi Li

Abstract: Providing knowledge documents for large language models (LLMs) has emerged as a promising solution to update the static knowledge inherent in their parameters. However, knowledge in the document may conflict with the memory of LLMs due to outdated or incorrect knowledge in the LLMs' parameters. This leads to the necessity of examining the capability of LLMs to assimilate supplemental external know… ▽ More Providing knowledge documents for large language models (LLMs) has emerged as a promising solution to update the static knowledge inherent in their parameters. However, knowledge in the document may conflict with the memory of LLMs due to outdated or incorrect knowledge in the LLMs' parameters. This leads to the necessity of examining the capability of LLMs to assimilate supplemental external knowledge that conflicts with their memory. While previous studies have explained to what extent LLMs extract conflicting knowledge from the provided text, they neglect the necessity to reason with conflicting knowledge. Furthermore, there lack a detailed analysis on strategies to enable LLMs to resolve conflicting knowledge via prompting, decoding strategy, and supervised fine-tuning. To address these limitations, we construct a new dataset, dubbed KNOT, for knowledge conflict resolution examination in the form of question answering. KNOT facilitates in-depth analysis by dividing reasoning with conflicting knowledge into three levels: (1) Direct Extraction, which directly extracts conflicting knowledge to answer questions. (2) Explicit Reasoning, which reasons with conflicting knowledge when the reasoning path is explicitly provided in the question. (3) Implicit Reasoning, where reasoning with conflicting knowledge requires LLMs to infer the reasoning path independently to answer questions. We also conduct extensive experiments on KNOT to establish empirical guidelines for LLMs to utilize conflicting knowledge in complex circumstances. Dataset and associated codes can be accessed at https://github.com/THU-KEG/KNOT . △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Accepted by LREC-COLING 2024 as long paper

arXiv:2403.15872 [pdf, other]

RAAMove: A Corpus for Analyzing Moves in Research Article Abstracts

Authors: Hongzheng Li, Ruojin Wang, Ge Shi, Xing Lv, Lei Lei, Chong Feng, Fang Liu, Jinkun Lin, Yangguang Mei, Lingnan Xu

Abstract: Move structures have been studied in English for Specific Purposes (ESP) and English for Academic Purposes (EAP) for decades. However, there are few move annotation corpora for Research Article (RA) abstracts. In this paper, we introduce RAAMove, a comprehensive multi-domain corpus dedicated to the annotation of move structures in RA abstracts. The primary objective of RAAMove is to facilitate mov… ▽ More Move structures have been studied in English for Specific Purposes (ESP) and English for Academic Purposes (EAP) for decades. However, there are few move annotation corpora for Research Article (RA) abstracts. In this paper, we introduce RAAMove, a comprehensive multi-domain corpus dedicated to the annotation of move structures in RA abstracts. The primary objective of RAAMove is to facilitate move analysis and automatic move identification. This paper provides a thorough discussion of the corpus construction process, including the scheme, data collection, annotation guidelines, and annotation procedures. The corpus is constructed through two stages: initially, expert annotators manually annotate high-quality data; subsequently, based on the human-annotated data, a BERT-based model is employed for automatic annotation with the help of experts' modification. The result is a large-scale and high-quality corpus comprising 33,988 annotated instances. We also conduct preliminary move identification experiments using the BERT-based model to verify the effectiveness of the proposed corpus and model. The annotated corpus is available for academic research purposes and can serve as essential resources for move analysis, English language teaching and writing, as well as move/discourse-related tasks in Natural Language Processing (NLP). △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: Accepted by LREC-COLING 2024

arXiv:2403.12021 [pdf, other]

A tweezer array with 6100 highly coherent atomic qubits

Authors: Hannah J. Manetsch, Gyohei Nomura, Elie Bataille, Kon H. Leung, Xudong Lv, Manuel Endres

Abstract: Optical tweezer arrays have had a transformative impact on atomic and molecular physics over the past years, and they now form the backbone for a wide range of leading experiments in quantum computing, simulation, and metrology. Underlying this development is the simplicity of single particle control and detection inherent to the technique. Typical experiments trap tens to hundreds of atomic qubit… ▽ More Optical tweezer arrays have had a transformative impact on atomic and molecular physics over the past years, and they now form the backbone for a wide range of leading experiments in quantum computing, simulation, and metrology. Underlying this development is the simplicity of single particle control and detection inherent to the technique. Typical experiments trap tens to hundreds of atomic qubits, and very recently systems with around one thousand atoms were realized without defining qubits or demonstrating coherent control. However, scaling to thousands of atomic qubits with long coherence times and low-loss, high-fidelity imaging is an outstanding challenge and critical for progress in quantum computing, simulation, and metrology, in particular, towards applications with quantum error correction. Here, we experimentally realize an array of optical tweezers trapping over 6,100 neutral atoms in around 12,000 sites while simultaneously surpassing state-of-the-art performance for several key metrics associated with fundamental limitations of the platform. Specifically, while scaling to such a large number of atoms, we also demonstrate a coherence time of 12.6(1) seconds, a record for hyperfine qubits in an optical tweezer array. Further, we show trapping lifetimes close to 23 minutes in a room-temperature apparatus, enabling record-high imaging survival of 99.98952(1)% in combination with an imaging fidelity of over 99.99%. Our results, together with other recent developments, indicate that universal quantum computing with ten thousand atomic qubits could be a near-term prospect. Furthermore, our work could pave the way for quantum simulation and metrology experiments with inherent single particle readout and positioning capabilities at a similar scale. △ Less

Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: H.J.M., G.N., and E.B. contributed equally to this work

arXiv:2403.11832 [pdf, other]

Precise measurement of the cosmic-ray spectrum and $\left \langle \ln A \right \rangle$ by LHAASO -- connecting the Galactic to the extragalactic components

Authors: Xing-Jian Lv, Xiao-Jun Bi, Kun Fang, Yi-Qing Guo, Hui-Hai He, Ling-Ling Ma, Peng-Fei Yin, Qiang Yuan, Meng-Jie Zhao

Abstract: Recently LHAASO Collaboration gives precise measurements of cosmic rays (CR) all particle energy spectrum and mean logarithmic mass $\left \langle \ln A \right \rangle$ from 0.3 PeV to 30 PeV. Combining the CR measurements by AMS-02 and DAMPE in space and that by LHAASO and Auger on the ground we construct a model to recover all these measurements from tens of GeV to tens of EeV. We find the LHAAS… ▽ More Recently LHAASO Collaboration gives precise measurements of cosmic rays (CR) all particle energy spectrum and mean logarithmic mass $\left \langle \ln A \right \rangle$ from 0.3 PeV to 30 PeV. Combining the CR measurements by AMS-02 and DAMPE in space and that by LHAASO and Auger on the ground we construct a model to recover all these measurements from tens of GeV to tens of EeV. We find the LHAASO measurement is crucial in the model construction by connecting the Galactic component to the extragalactic component. The precise measurements of CR spectra for individual species by AMS-02 and DAMPE together with the newest LHAASO results clearly indicates three Galactic CR components, that is, a soft low energy background, a hard high energy component, and a local source contribution. However, the LHAASO data show that above $\sim 10^{16}$ eV a nonnegligible extragalactic component must be included. Combining the Auger results and the LHAASO results we figure out the extragalactic CRs which need at least two components at lower and higher energies. Thanks to the precise measurements by LHAASO the constraints on the model parameters are quite stringent. The spectra features and mass measurements in all energy range are all well reproduced in the model. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 11 pages, 2 figures, 4 tables

arXiv:2403.09384 [pdf]

Anomalous thermal transport and high thermoelectric performance of Cu-based vanadate CuVO3

Authors: Xin Jin, Qiling Ou, Haoran Wei, Xianyong Ding, Fangyang Zhan, Rui Wang, Xiaolong Yang, Xuewei Lv, Peng Yu

Abstract: Thermoelectric (TE) conversion technology, capable of transforming heat into electricity, is critical for sustainable energy solutions. Many promising TE materials contain rare or toxic elements, so the development of cost-effective and eco-friendly high-performance TE materials is highly urgent. Herein, we explore the thermal transport and TE properties of transition metal vanadate CuVO3 by using… ▽ More Thermoelectric (TE) conversion technology, capable of transforming heat into electricity, is critical for sustainable energy solutions. Many promising TE materials contain rare or toxic elements, so the development of cost-effective and eco-friendly high-performance TE materials is highly urgent. Herein, we explore the thermal transport and TE properties of transition metal vanadate CuVO3 by using first-principles calculation. On the basis of unified theory of heat conduction, we uncover the hierarchical thermal transport feature in CuVO3, where wave-like tunneling makes a significant contribution to the lattice thermal conductivity (\k{appa}l) and result in the anomalously weak temperature dependence of \k{appa}l. This is primarily attributable to the complex phononic band structure caused by the heterogeneity of Cu-O and V-O bonds. Simultaneously, we report a high power factor of 5.45 mW K-2 m-1 realized in hole-doped CuVO3, which arises from a high electrical conductivity and a large Seebeck coefficient enabled by the multiple valleys and large electronic density of states near the valence band edge. Impressively, the low \k{appa}l and the high power factor make p-typed CuVO3 have ZT of up to 1.39, with the excellent average ZT above 1.0 from 300 to 600 K, which is superior to most reported Cu-based TE materials. Our findings suggest that CuVO3 compound is promising candidate for energy conversion applications in innovative TE devices. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.08281 [pdf, other]

Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models

Authors: Ning Ding, Yulin Chen, Ganqu Cui, Xingtai Lv, Weilin Zhao, Ruobing Xie, Bowen Zhou, Zhiyuan Liu, Maosong Sun

Abstract: Underlying data distributions of natural language, programming code, and mathematical symbols vary vastly, presenting a complex challenge for large language models (LLMs) that strive to achieve high performance across all three domains simultaneously. Achieving a very high level of proficiency for an LLM within a specific domain often requires extensive training with relevant corpora, which is typ… ▽ More Underlying data distributions of natural language, programming code, and mathematical symbols vary vastly, presenting a complex challenge for large language models (LLMs) that strive to achieve high performance across all three domains simultaneously. Achieving a very high level of proficiency for an LLM within a specific domain often requires extensive training with relevant corpora, which is typically accompanied by a sacrifice in performance in other domains. In this paper, we propose to fuse models that are already highly-specialized directly. The proposed fusing framework, UltraFuser, consists of three distinct specialists that are already sufficiently trained on language, coding, and mathematics. A token-level gating mechanism is introduced to blend the specialists' outputs. A two-stage training strategy accompanied by balanced sampling is designed to ensure stability. To effectively train the fused model, we further construct a high-quality supervised instruction tuning dataset, UltraChat 2, which includes text, code, and mathematical content. This dataset comprises approximately 300,000 instructions and covers a wide range of topics in each domain. Experiments show that our model could simultaneously achieve mastery of the three crucial domains. △ Less

Submitted 26 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2402.18109 [pdf, other]

doi 10.1007/s11042-023-17517-w

Dual-Context Aggregation for Universal Image Matting

Authors: Qinglin Liu, Xiaoqian Lv, Wei Yu, Changyong Guo, Shengping Zhang

Abstract: Natural image matting aims to estimate the alpha matte of the foreground from a given image. Various approaches have been explored to address this problem, such as interactive matting methods that use guidance such as click or trimap, and automatic matting methods tailored to specific objects. However, existing matting methods are designed for specific objects or guidance, neglecting the common re… ▽ More Natural image matting aims to estimate the alpha matte of the foreground from a given image. Various approaches have been explored to address this problem, such as interactive matting methods that use guidance such as click or trimap, and automatic matting methods tailored to specific objects. However, existing matting methods are designed for specific objects or guidance, neglecting the common requirement of aggregating global and local contexts in image matting. As a result, these methods often encounter challenges in accurately identifying the foreground and generating precise boundaries, which limits their effectiveness in unforeseen scenarios. In this paper, we propose a simple and universal matting framework, named Dual-Context Aggregation Matting (DCAM), which enables robust image matting with arbitrary guidance or without guidance. Specifically, DCAM first adopts a semantic backbone network to extract low-level features and context features from the input image and guidance. Then, we introduce a dual-context aggregation network that incorporates global object aggregators and local appearance aggregators to iteratively refine the extracted context features. By performing both global contour segmentation and local boundary refinement, DCAM exhibits robustness to diverse types of guidance and objects. Finally, we adopt a matting decoder network to fuse the low-level features and the refined context features for alpha matte estimation. Experimental results on five matting datasets demonstrate that the proposed DCAM outperforms state-of-the-art matting methods in both automatic matting and interactive matting tasks, which highlights the strong universality and high performance of DCAM. The source code is available at \url{https://github.com/Windaway/DCAM}. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Journal ref: Multimed Tools Appl (2023)

arXiv:2402.15149 [pdf, other]

Possible spectral irregularities in the AMS-02 positron spectrum

Authors: Xing-Jian Lv, Xiao-Jun Bi, Kun Fang, Peng-Fei Yin, Meng-Jie Zhao

Abstract: The excesses in the electron and positron spectra observed by many experiments, such as PAMELA and AMS-02, have sparked significant theoretical investigation. It is not easy to distinguish the two primary hypotheses dark matter annihilation/decay and pulsars from the spectral features. Should pulsars be the source of this excess, the expected variability in their distribution may introduce distinc… ▽ More The excesses in the electron and positron spectra observed by many experiments, such as PAMELA and AMS-02, have sparked significant theoretical investigation. It is not easy to distinguish the two primary hypotheses dark matter annihilation/decay and pulsars from the spectral features. Should pulsars be the source of this excess, the expected variability in their distribution may introduce distinct irregularities in the positron energy spectrum. In this study, we use an irregularity estimator to detect these potential features in the positron energy spectrum of AMS-02. Our analysis of the current AMS-02 data reveals these spectral irregularities with a statistical significance of $1.75σ$. However, our projection indicates that, with AMS-02 data collected over a period of 20 years, such irregularities could be identified with a confidence level of $3σ$ level in 71\% of our simulations. △ Less

Submitted 29 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: 6 pages, 6 figures

arXiv:2402.14840 [pdf, other]

RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning

Authors: Congyun Jin, Ming Zhang, Xiaowei Ma, Li Yujiao, Yingbo Wang, Yabo Jia, Yuliang Du, Tao Sun, Haowen Wang, Cong Fan, Jinjie Gu, Chenfei Chi, Xiangguo Lv, Fangzhou Li, Wei Xue, Yiran Huang

Abstract: Recent advancements in Large Language Models (LLMs) and Large Multi-modal Models (LMMs) have shown potential in various medical applications, such as Intelligent Medical Diagnosis. Although impressive results have been achieved, we find that existing benchmarks do not reflect the complexity of real medical reports and specialized in-depth reasoning capabilities. In this work, we introduced RJUA-Me… ▽ More Recent advancements in Large Language Models (LLMs) and Large Multi-modal Models (LMMs) have shown potential in various medical applications, such as Intelligent Medical Diagnosis. Although impressive results have been achieved, we find that existing benchmarks do not reflect the complexity of real medical reports and specialized in-depth reasoning capabilities. In this work, we introduced RJUA-MedDQA, a comprehensive benchmark in the field of medical specialization, which poses several challenges: comprehensively interpreting imgage content across diverse challenging layouts, possessing numerical reasoning ability to identify abnormal indicators and demonstrating clinical reasoning ability to provide statements of disease diagnosis, status and advice based on medical contexts. We carefully design the data generation pipeline and proposed the Efficient Structural Restoration Annotation (ESRA) Method, aimed at restoring textual and tabular content in medical report images. This method substantially enhances annotation efficiency, doubling the productivity of each annotator, and yields a 26.8% improvement in accuracy. We conduct extensive evaluations, including few-shot assessments of 5 LMMs which are capable of solving Chinese medical QA tasks. To further investigate the limitations and potential of current LMMs, we conduct comparative experiments on a set of strong LLMs by using image-text generated by ESRA method. We report the performance of baselines and offer several observations: (1) The overall performance of existing LMMs is still limited; however LMMs more robust to low-quality and diverse-structured images compared to LLMs. (3) Reasoning across context and image content present significant challenges. We hope this benchmark helps the community make progress on these challenging tasks in multi-modal medical document understanding and facilitate its application in healthcare. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 15 pages, 13 figures

arXiv:2401.18058 [pdf, other]

LongAlign: A Recipe for Long Context Alignment of Large Language Models

Authors: Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, Juanzi Li

Abstract: Extending large language models to effectively handle long contexts requires instruction fine-tuning on input sequences of similar length. To address this, we present LongAlign -- a recipe of the instruction data, training, and evaluation for long context alignment. First, we construct a long instruction-following dataset using Self-Instruct. To ensure the data diversity, it covers a broad range o… ▽ More Extending large language models to effectively handle long contexts requires instruction fine-tuning on input sequences of similar length. To address this, we present LongAlign -- a recipe of the instruction data, training, and evaluation for long context alignment. First, we construct a long instruction-following dataset using Self-Instruct. To ensure the data diversity, it covers a broad range of tasks from various long context sources. Second, we adopt the packing and sorted batching strategies to speed up supervised fine-tuning on data with varied length distributions. Additionally, we develop a loss weighting method to balance the contribution to the loss across different sequences during packing training. Third, we introduce the LongBench-Chat benchmark for evaluating instruction-following capabilities on queries of 10k-100k in length. Experiments show that LongAlign outperforms existing recipes for LLMs in long context tasks by up to 30\%, while also maintaining their proficiency in handling short, generic tasks. The code, data, and long-aligned models are open-sourced at https://github.com/THUDM/LongAlign. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.11204 [pdf, other]

Towards Category Unification of 3D Single Object Tracking on Point Clouds

Authors: Jiahao Nie, Zhiwei He, Xudong Lv, Xueyi Zhou, Dong-Kyu Chae, Fei Xie

Abstract: Category-specific models are provenly valuable methods in 3D single object tracking (SOT) regardless of Siamese or motion-centric paradigms. However, such over-specialized model designs incur redundant parameters, thus limiting the broader applicability of 3D SOT task. This paper first introduces unified models that can simultaneously track objects across all categories using a single network with… ▽ More Category-specific models are provenly valuable methods in 3D single object tracking (SOT) regardless of Siamese or motion-centric paradigms. However, such over-specialized model designs incur redundant parameters, thus limiting the broader applicability of 3D SOT task. This paper first introduces unified models that can simultaneously track objects across all categories using a single network with shared model parameters. Specifically, we propose to explicitly encode distinct attributes associated to different object categories, enabling the model to adapt to cross-category data. We find that the attribute variances of point cloud objects primarily occur from the varying size and shape (e.g., large and square vehicles v.s. small and slender humans). Based on this observation, we design a novel point set representation learning network inheriting transformer architecture, termed AdaFormer, which adaptively encodes the dynamically varying shape and size information from cross-category data in a unified manner. We further incorporate the size and shape prior derived from the known template targets into the model's inputs and learning objective, facilitating the learning of unified representation. Equipped with such designs, we construct two category-unified models SiamCUT and MoCUT.Extensive experiments demonstrate that SiamCUT and MoCUT exhibit strong generalization and training stability. Furthermore, our category-unified models outperform the category-specific counterparts by a significant margin (e.g., on KITTI dataset, 12% and 3% performance gains on the Siamese and motion paradigms). Our code will be available. △ Less

Submitted 20 January, 2024; originally announced January 2024.

Comments: Accepted by ICLR2024 (poster)

arXiv:2312.16051 [pdf, other]

Inter-X: Towards Versatile Human-Human Interaction Analysis

Authors: Liang Xu, Xintao Lv, Yichao Yan, Xin Jin, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, Yunhui Liu, Wenjun Zeng, Xiaokang Yang

Abstract: The analysis of the ubiquitous human-human interactions is pivotal for understanding humans as social beings. Existing human-human interaction datasets typically suffer from inaccurate body motions, lack of hand gestures and fine-grained textual descriptions. To better perceive and generate human-human interactions, we propose Inter-X, a currently largest human-human interaction dataset with accur… ▽ More The analysis of the ubiquitous human-human interactions is pivotal for understanding humans as social beings. Existing human-human interaction datasets typically suffer from inaccurate body motions, lack of hand gestures and fine-grained textual descriptions. To better perceive and generate human-human interactions, we propose Inter-X, a currently largest human-human interaction dataset with accurate body movements and diverse interaction patterns, together with detailed hand gestures. The dataset includes ~11K interaction sequences and more than 8.1M frames. We also equip Inter-X with versatile annotations of more than 34K fine-grained human part-level textual descriptions, semantic interaction categories, interaction order, and the relationship and personality of the subjects. Based on the elaborate annotations, we propose a unified benchmark composed of 4 categories of downstream tasks from both the perceptual and generative directions. Extensive experiments and comprehensive analysis show that Inter-X serves as a testbed for promoting the development of versatile human-human interaction analysis. Our dataset and benchmark will be publicly available for research purposes. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: Project page: https://liangxuy.github.io/inter-x/

arXiv:2312.11196 [pdf, other]

Coherence time of 20 s with a single cesium atom in an optical dipole trap

Authors: Zhuangzhuang Tian, Haobo Chang, Xin Lv, Mengna Yang, Zhihui Wang, Pengfei Yang, Pengfei Zhang, Gang Li, Tiancai Zhang

Abstract: We analyze the decoherence between two ground electronic states of an optically trapped atom by adopting a full description of the atomic wavefunction. The motional state, i.e., the phonon state, is taken into account. In addition to the decoherence due to the variance of differential light shift (DLS), a new decoherence mechanism, phonon-jumping-induced decoherence (PJID), is discovered and verif… ▽ More We analyze the decoherence between two ground electronic states of an optically trapped atom by adopting a full description of the atomic wavefunction. The motional state, i.e., the phonon state, is taken into account. In addition to the decoherence due to the variance of differential light shift (DLS), a new decoherence mechanism, phonon-jumping-induced decoherence (PJID), is discovered and verified experimentally. A coherence time of $T_2\approx 20$ s is then obtained for a single Cs atom by suppressing both variances of DLS and PJID by trapping the atom in a blue-detuned BBT and preparing the atom into its three-dimensional motional ground states. Our work opens a new prospect to extend the coherence time of optically trapped single atoms. △ Less

Submitted 31 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: 5 pages, 4 figures in the main text; 6 pages, 8 figures in the supplementary material

arXiv:2312.07336 [pdf]

doi 10.1088/0256-307X/39/6/067101

Giant domain wall anomalous Hall effect in an antiferromagnet

Authors: Wei Xia, Bo Bai, Xuejiao Chen, Yichen Yang, Yang Zhang, Jian Yuan, Qiang Li, Kunya Yang, Xiangqi Liu, Yang Shi, Haiyang Ma, Huali Yang, Mingquan He, Lei Li, Chuanying Xi, Li Pi, Xiaodong Lv, Xia Wang, Xuerong Liu, Shiyan Li, Xiaodong Zhou, Jianpeng Liu, Yulin Chen, Jian Shen, Dawei Shen , et al. (3 additional authors not shown)

Abstract: The Hall effect plays a crucial role in establishment of band theory of solids and discovery of emergent new phases of interacting electrons such as the topological phases of matter. Generally, the dissipationless Hall effect requires time-reversal symmetry breaking (TRSB), where TRSB induced by external magnetic field results in ordinary Hall effect, while TRSB caused by spontaneous magnetization… ▽ More The Hall effect plays a crucial role in establishment of band theory of solids and discovery of emergent new phases of interacting electrons such as the topological phases of matter. Generally, the dissipationless Hall effect requires time-reversal symmetry breaking (TRSB), where TRSB induced by external magnetic field results in ordinary Hall effect, while TRSB caused by spontaneous magnetization gives rise to anomalous Hall effect (AHE) which scales with the net magnetization. The AHE is therefore not expected in antiferromagnets with vanishing small magnetization. However, large AHE was recently observed in certain antiferromagnets with noncolinear spin structure and nonvanishing Berry curvature, thus opening a new area for exploration of large AHE in antiferromagnets. Here, we report another origin of AHE in a layered antiferromagnet, namely the domain wall (DW) skew scattering with Weyl points near the Fermi level, in experiments for the first time. Interestingly, the DWs form a unique periodic stripe structure with controllable periodicity by external magnetic field, which decreases nearly monotonically from 975 nm at 0 T to 232 nm at 4 T. Electrons incident on DW with topological bound states experience strong asymmetric scattering, leading to giant extrinsic AHE, with the DW Hall conductivity (DWHC) at 2 K and 1.2 T even reaching a record value of about 1.51*104 S cm-1 among bulk systems, which is two orders of magnitude larger than the intrinsic anomalous Hall conductivity. The observation of giant DWHC and controllable stripe DW structure in an antiferromagnet not only sets a new paradigm for exploration of large extrinsic anomalous Hall effect, but also provides potential applications in spintronic devices. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 19 pages Main Text, 5 main figures

Journal ref: Chinese Physics Letters 2022, 39: 067101

arXiv:2312.06718 [pdf, other]

Large Scale Foundation Models for Intelligent Manufacturing Applications: A Survey

Authors: Haotian Zhang, Semujju Stuart Dereck, Zhicheng Wang, Xianwei Lv, Kang Xu, Liang Wu, Ye Jia, Jing Wu, Zhuo Long, Wensheng Liang, X. G. Ma, Ruiyan Zhuang

Abstract: Although the applications of artificial intelligence especially deep learning had greatly improved various aspects of intelligent manufacturing, they still face challenges for wide employment due to the poor generalization ability, difficulties to establish high-quality training datasets, and unsatisfactory performance of deep learning methods. The emergence of large scale foundational models(LSFM… ▽ More Although the applications of artificial intelligence especially deep learning had greatly improved various aspects of intelligent manufacturing, they still face challenges for wide employment due to the poor generalization ability, difficulties to establish high-quality training datasets, and unsatisfactory performance of deep learning methods. The emergence of large scale foundational models(LSFMs) had triggered a wave in the field of artificial intelligence, shifting deep learning models from single-task, single-modal, limited data patterns to a paradigm encompassing diverse tasks, multimodal, and pre-training on massive datasets. Although LSFMs had demonstrated powerful generalization capabilities, automatic high-quality training dataset generation and superior performance across various domains, applications of LSFMs on intelligent manufacturing were still in their nascent stage. A systematic overview of this topic was lacking, especially regarding which challenges of deep learning can be addressed by LSFMs and how these challenges can be systematically tackled. To fill this gap, this paper systematically expounded current statue of LSFMs and their advantages in the context of intelligent manufacturing. and compared comprehensively with the challenges faced by current deep learning models in various intelligent manufacturing applications. We also outlined the roadmaps for utilizing LSFMs to address these challenges. Finally, case studies of applications of LSFMs in real-world intelligent manufacturing scenarios were presented to illustrate how LSFMs could help industries, improve their efficiency. △ Less

Submitted 22 December, 2023; v1 submitted 10 December, 2023; originally announced December 2023.

arXiv:2311.17494 [pdf, other]

Resolved Raman sideband cooling of a single optically trapped cesium atom

Authors: Zhuangzhuang Tian, Haobo Chang, Xin Lv, Mengna Yang, Zhihui Wang, Pengfei Yang, Pengfei Zhang, Gang Li, Tiancai Zhang

Abstract: We developed a resolved Raman sideband cooling scheme that can efficiently prepare a single optically trapped cesium (Cs) atom in its motional ground states. A two-photon Raman process between two outermost Zeeman sublevels in a single hyperfine state is applied to reduce the phonon number. Our scheme is less sensitive to the variation in the magnetic field than the commonly used scheme where the… ▽ More We developed a resolved Raman sideband cooling scheme that can efficiently prepare a single optically trapped cesium (Cs) atom in its motional ground states. A two-photon Raman process between two outermost Zeeman sublevels in a single hyperfine state is applied to reduce the phonon number. Our scheme is less sensitive to the variation in the magnetic field than the commonly used scheme where the two outermost Zeeman sublevels belonging to the two separate ground hyperfine states are taken. Fast optical pumping with less spontaneous emission guarantees the efficiency of the cooling process. After cooling for 50 ms, 82% of the Cs atoms populate their three-dimensional ground states. Our scheme improves the long-term stability of Raman sideband cooling in the presence of magnetic field drift and is thus suitable for cooling other trapped atoms or ions with abundant magnetic sublevels. △ Less

Submitted 31 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: 4 pages, 3 figures, 1 table

arXiv:2311.13982 [pdf, other]

Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions

Authors: Shulin Cao, Jiajie Zhang, Jiaxin Shi, Xin Lv, Zijun Yao, Qi Tian, Juanzi Li, Lei Hou

Abstract: Large language models (LLMs) are capable of answering knowledge-intensive complex questions with chain-of-thought (CoT) reasoning. However, they tend to generate factually incorrect reasoning steps when the required knowledge is not available or up-to-date in models' parameters. Recent works turn to retrieving external knowledge to augment CoT reasoning. Despite being promising, these chain-based… ▽ More Large language models (LLMs) are capable of answering knowledge-intensive complex questions with chain-of-thought (CoT) reasoning. However, they tend to generate factually incorrect reasoning steps when the required knowledge is not available or up-to-date in models' parameters. Recent works turn to retrieving external knowledge to augment CoT reasoning. Despite being promising, these chain-based methods suffer from: 1) Negative retrieval. Unnecessary or incorrect retrieval may mislead the reasoning; 2) Limited sight. Lacking the ability to look backward or forward, a local error in one step will propagate along the chain. In this paper, we propose a novel approach: Probabilistic Tree-of-thought Reasoning (ProbTree). First, LLMs translate a complex question into a query tree, in which each non-root node denotes a sub-question of its parent node. Then, probabilistic reasoning is conducted over the tree, by solving questions from leaf to root considering the confidence of both question decomposing and answering. During reasoning, for leaf nodes, LLMs choose a more confident answer from Closed-book QA that employs parametric knowledge and Open-book QA that employs retrieved external knowledge, thus eliminating the negative retrieval problem. For non-leaf nodes, with the hierarchical structure, LLMs have broader sights and are able to globally reason with the information from child nodes, thus recovering from local errors. The experiments on three Complex QA datasets under the open-domain setting show that our approach outperforms SOTA methods significantly, demonstrating the effect of probabilistic tree-of-thought reasoning. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: Accepted by EMNLP 2023

arXiv:2311.11696 [pdf, other]

Sparse Low-rank Adaptation of Pre-trained Language Models

Authors: Ning Ding, Xingtai Lv, Qiaosen Wang, Yulin Chen, Bowen Zhou, Zhiyuan Liu, Maosong Sun

Abstract: Fine-tuning pre-trained large language models in a parameter-efficient manner is widely studied for its effectiveness and efficiency. The popular method of low-rank adaptation (LoRA) offers a notable approach, hypothesizing that the adaptation process is intrinsically low-dimensional. Although LoRA has demonstrated commendable performance, it is implemented with a fixed and unalterable intrinsic r… ▽ More Fine-tuning pre-trained large language models in a parameter-efficient manner is widely studied for its effectiveness and efficiency. The popular method of low-rank adaptation (LoRA) offers a notable approach, hypothesizing that the adaptation process is intrinsically low-dimensional. Although LoRA has demonstrated commendable performance, it is implemented with a fixed and unalterable intrinsic rank that might not always be the ideal choice. Recognizing the need for more flexible adaptation, we extend the methodology of LoRA to an innovative approach we call sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process. We achieve this through the incorporation of a gate unit optimized with proximal gradient method in the training stage, controlling the cardinality of rank under the sparsity of the gate. In the subsequent inference stage, we eliminate the parameter blocks corresponding to the zeroed-out ranks, to reduce each SoRA module back to a concise yet rank-optimal LoRA. Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters via updating in a sparse way. We further introduce a sparsifying scheduler for SoRA, aiming to examine the impact of the number of non-zero parameters on the model's memorization and generalization. Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: Accepted to EMNLP 2023 (Main Conference)

arXiv:2311.07153 [pdf]

Double Dome and Reemergence of Superconductivity in Pristine 6R-TaS2 under Pressure

Authors: Xindeng Lv, Hao Song, Kun Chen, Sirui Liu, Yanping Huang, Yuqiang Fang, Tian Cui

Abstract: Investigating the implications of interlayer coupling on superconductivity is essential for comprehending the intrinsic mechanisms of high temperature superconductors. Van der Waals heterojunctions have attracted extensive research due to their exotic interlayer coupling. Here, we present a natural heterojunction superconductor of 6R-TaS2 that demonstrates a double-dome of superconductivity, in ad… ▽ More Investigating the implications of interlayer coupling on superconductivity is essential for comprehending the intrinsic mechanisms of high temperature superconductors. Van der Waals heterojunctions have attracted extensive research due to their exotic interlayer coupling. Here, we present a natural heterojunction superconductor of 6R-TaS2 that demonstrates a double-dome of superconductivity, in addition to, the reemergence of superconducting under high pressures. Our first principles calculation shows that the first dome of superconductivity in 6R-TaS2 can be attributed to changes in interlayer coupling and charge transfer. The second superconducting dome and the reemergence of superconductivity can be ascribed to changes in the density of states resulting from Fermi surface reconstruction, in which the DOS of T-layer and S p-orbitals play a crucial role. We have reported the first observation in TMDs that non-metallic atoms playing a dominant role in the reemergence of superconducting and the influence of two Lifshitz transitions on superconducting properties. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2310.05635 [pdf, other]

Nanoscale engineering and dynamical stabilization of mesoscopic spin textures

Authors: Kieren Harkins, Christoph Fleckenstein, Noella D'Souza, Paul M. Schindler, David Marchiori, Claudia Artiaco, Quentin Reynard-Feytis, Ushoshi Basumallick, William Beatrez, Arjun Pillai, Matthias Hagn, Aniruddha Nayak, Samantha Breuer, Xudong Lv, Maxwell McAllister, Paul Reshetikhin, Emanuel Druga, Marin Bukov, Ashok Ajoy

Abstract: Thermalization phenomena, while ubiquitous in quantum systems, have traditionally been viewed as obstacles to be mitigated. In this study, we demonstrate the ability, instead, to harness thermalization to dynamically engineer and stabilize structured quantum states in a mesoscopically large ensemble of spins. Specifically, we showcase the capacity to generate, control, stabilize, and read out 'she… ▽ More Thermalization phenomena, while ubiquitous in quantum systems, have traditionally been viewed as obstacles to be mitigated. In this study, we demonstrate the ability, instead, to harness thermalization to dynamically engineer and stabilize structured quantum states in a mesoscopically large ensemble of spins. Specifically, we showcase the capacity to generate, control, stabilize, and read out 'shell-like' spin texture with interacting $ {}^{ 13}\mathrm{C}$ nuclear spins in diamond, wherein spins are polarized oppositely on either side of a critical radius. The texture spans several nanometers and encompasses many hundred spins. We capitalize on the thermalization process to impose a quasi-equilibrium upon the generated texture; as a result, it is highly stable, immune to spin diffusion, and endures over multiple-minute long periods -- over a million times longer than the intrinsic interaction scale of the spins. Additionally, the texture is created and interrogated without locally controlling or probing the nuclear spins. These features are accomplished using an electron spin as a nanoscale injector of spin polarization, and employing it as a source of spatially varying dissipation, allowing for serial readout of the emergent spin texture. Long-time stabilization is achieved via prethermalization to a Floquet-induced Hamiltonian under the electronic gradient field. Our work presents a new approach to robust nanoscale spin state engineering and paves the way for new applications in quantum simulation, quantum information science, and nanoscale imaging. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: 8 + 32 pages

arXiv:2309.06912 [pdf, other]

Multi-behavior Recommendation with SVD Graph Neural Networks

Authors: Shengxi Fu, Qianqian Ren, Xingfeng Lv, Jinbao Li

Abstract: Graph Neural Networks (GNNs) have been extensively employed in the field of recommendation systems, offering users personalized recommendations and yielding remarkable outcomes. Recently, GNNs incorporating contrastive learning have demonstrated promising performance in handling the sparse data problem of recommendation systems. However, existing contrastive learning methods still have limitations… ▽ More Graph Neural Networks (GNNs) have been extensively employed in the field of recommendation systems, offering users personalized recommendations and yielding remarkable outcomes. Recently, GNNs incorporating contrastive learning have demonstrated promising performance in handling the sparse data problem of recommendation systems. However, existing contrastive learning methods still have limitations in resisting noise interference, especially for multi-behavior recommendation. To mitigate the aforementioned issues, this paper proposes a GNN-based multi-behavior recommendation model called MB-SVD that utilizes Singular Value Decomposition (SVD) graphs to enhance model performance. In particular, MB-SVD considers user preferences across different behaviors, improving recommendation effectiveness. First, MB-SVD integrates the representation of users and items under different behaviors with learnable weight scores, which efficiently considers the influence of different behaviors. Then, MB-SVD generates augmented graph representation with global collaborative relations. Next, we simplify the contrastive learning framework by directly contrasting original representation with the enhanced representation using the InfoNCE loss. Through extensive experimentation, the remarkable performance of our proposed MB-SVD approach in multi-behavior recommendation endeavors across diverse real-world datasets is exhibited. △ Less

Submitted 9 May, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

arXiv:2309.01961 [pdf, other]

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

Authors: Taehoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-jin Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh , et al. (17 additional authors not shown)

Abstract: In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested… ▽ More In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested using a new evaluation dataset that includes a large variety of visual concepts from many domains. There was no specific training data provided for the challenge, and therefore the challenge entries were required to adapt to new types of image descriptions that had not been seen during training. This report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries. We expect that the outcomes of the challenge will contribute to the improvement of AI models on various vision-language tasks. △ Less

Submitted 10 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: Tech report, project page https://nice.lgresearch.ai/

arXiv:2309.01341 [pdf, ps, other]

Decentralized Control for Discrete-time Mean-Field Systems with Multiple Controllers of Delayed Information

Authors: Qingyuan Qi, Zhiqiang Liu, Qianqian Zhang, Xinbei Lv

Abstract: In this paper, the finite horizon asymmetric information linear quadratic (LQ) control problem is investigated for a discrete-time mean field system. Different from previous works, multiple controllers with different information sets are involved in the mean field system dynamics. The coupling of different controllers makes it quite difficult in finding the optimal control strategy. Fortunately, b… ▽ More In this paper, the finite horizon asymmetric information linear quadratic (LQ) control problem is investigated for a discrete-time mean field system. Different from previous works, multiple controllers with different information sets are involved in the mean field system dynamics. The coupling of different controllers makes it quite difficult in finding the optimal control strategy. Fortunately, by applying the Pontryagin's maximum principle, the corresponding decentralized control problem of the finite horizon is investigated. The contributions of this paper can be concluded as: For the first time, based on the solution of a group of mean-field forward and backward stochastic difference equations (MF-FBSDEs), the necessary and sufficient solvability conditions are derived for the asymmetric information LQ control for the mean field system with multiple controllers. Furthermore, by the use of an innovative orthogonal decomposition approach, the optimal decentralized control strategy is derived, which is based on the solution to a non-symmetric Riccati-type equation. △ Less

Submitted 3 September, 2023; originally announced September 2023.

arXiv:2308.14508 [pdf, other]

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

Authors: Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li

Abstract: Although large language models (LLMs) demonstrate impressive performance for many language tasks, most of them can only handle texts a few thousand tokens long, limiting their applications on longer sequence inputs, such as books, reports, and codebases. Recent works have proposed methods to improve LLMs' long context capabilities by extending context windows and more sophisticated memory mechanis… ▽ More Although large language models (LLMs) demonstrate impressive performance for many language tasks, most of them can only handle texts a few thousand tokens long, limiting their applications on longer sequence inputs, such as books, reports, and codebases. Recent works have proposed methods to improve LLMs' long context capabilities by extending context windows and more sophisticated memory mechanisms. However, comprehensive benchmarks tailored for evaluating long context understanding are lacking. In this paper, we introduce LongBench, the first bilingual, multi-task benchmark for long context understanding, enabling a more rigorous evaluation of long context understanding. LongBench comprises 21 datasets across 6 task categories in both English and Chinese, with an average length of 6,711 words (English) and 13,386 characters (Chinese). These tasks cover key long-text application areas including single-doc QA, multi-doc QA, summarization, few-shot learning, synthetic tasks, and code completion. All datasets in LongBench are standardized into a unified format, allowing for effortless automatic evaluation of LLMs. Upon comprehensive evaluation of 8 LLMs on LongBench, we find that: (1) Commercial model (GPT-3.5-Turbo-16k) outperforms other open-sourced models, but still struggles on longer contexts. (2) Scaled position embedding and fine-tuning on longer sequences lead to substantial improvement on long context understanding. (3) Context compression technique such as retrieval brings improvement for model with weak ability on long contexts, but the performance still lags behind models that have strong long context understanding capability. The code and datasets are available at https://github.com/THUDM/LongBench. △ Less

Submitted 19 June, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: ACL 2024

arXiv:2308.06605 [pdf, other]

Towards Exascale Computation for Turbomachinery Flows

Authors: Yuhang Fu, Weiqi Shen, Jiahuan Cui, Yao Zheng, Guangwen Yang, Zhao Liu, Jifa Zhang, Tingwei Ji, Fangfang Xie, Xiaojing Lv, Hanyue Liu, Xu Liu, Xiyang Liu, Xiaoyu Song, Guocheng Tao, Yan Yan, Paul Tucker, Steven A. E. Miller, Shirui Luo, Seid Koric, Weimin Zheng

Abstract: A state-of-the-art large eddy simulation code has been developed to solve compressible flows in turbomachinery. The code has been engineered with a high degree of scalability, enabling it to effectively leverage the many-core architecture of the new Sunway system. A consistent performance of 115.8 DP-PFLOPs has been achieved on a high-pressure turbine cascade consisting of over 1.69 billion mesh e… ▽ More A state-of-the-art large eddy simulation code has been developed to solve compressible flows in turbomachinery. The code has been engineered with a high degree of scalability, enabling it to effectively leverage the many-core architecture of the new Sunway system. A consistent performance of 115.8 DP-PFLOPs has been achieved on a high-pressure turbine cascade consisting of over 1.69 billion mesh elements and 865 billion Degree of Freedoms (DOFs). By leveraging a high-order unstructured solver and its portability to large heterogeneous parallel systems, we have progressed towards solving the grand challenge problem outlined by NASA, which involves a time-dependent simulation of a complete engine, incorporating all the aerodynamic and heat transfer components. △ Less

Submitted 29 December, 2023; v1 submitted 12 August, 2023; originally announced August 2023.

Comments: SC23, November, 2023, Denver, CO., USA

arXiv:2307.07114 [pdf, other]

Reexamine the dark matter scenario accounting for the positron excess in a new cosmic ray propagation model

Authors: Xing-Jian Lv, Xiao-Jun Bi, Kun Fang, Peng-Fei Yin, Meng-Jie Zhao

Abstract: The positron excess in cosmic rays has stimulated a lot of interests in the last decade. The dark matter origin of the extra positrons has attracted great attention. However, the $γ$-ray search set very stringent constraints on the dark matter annihilation/decay rate, which leads to great disfavor of the dark matter scenario. In the work, we incorporate the recent progress in cosmic rays propagati… ▽ More The positron excess in cosmic rays has stimulated a lot of interests in the last decade. The dark matter origin of the extra positrons has attracted great attention. However, the $γ$-ray search set very stringent constraints on the dark matter annihilation/decay rate, which leads to great disfavor of the dark matter scenario. In the work, we incorporate the recent progress in cosmic rays propagation and reexamine the dark matter scenario accounting for the positron excess. Recent observations indicate that cosmic rays propagation in the Milky Way may be not uniform and diffusion in the Galactic disk should be slower than that in the halo. In the spatial-dependent propagation model, the positrons/electrons are more concentrated in the disk and lead to smaller dark matter annihilation/decay rate to account for the positron excess and also a smaller deficit in the background positron flux. Especially for the $μ^+μ^-$ channel the positron spectrum fit the AMS-02 latest data perfectly and the annihilation rate satisfies all the present constraints from $γ$-ray and CMB observations. △ Less

Submitted 11 February, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

Comments: 11 pages, 4 figures

arXiv:2307.03130 [pdf, other]

VisKoP: Visual Knowledge oriented Programming for Interactive Knowledge Base Question Answering

Authors: Zijun Yao, Yuanyong Chen, Xin Lv, Shulin Cao, Amy Xin, Jifan Yu, Hailong Jin, Jianjun Xu, Peng Zhang, Lei Hou, Juanzi Li

Abstract: We present Visual Knowledge oriented Programming platform (VisKoP), a knowledge base question answering (KBQA) system that integrates human into the loop to edit and debug the knowledge base (KB) queries. VisKoP not only provides a neural program induction module, which converts natural language questions into knowledge oriented program language (KoPL), but also maps KoPL programs into graphical e… ▽ More We present Visual Knowledge oriented Programming platform (VisKoP), a knowledge base question answering (KBQA) system that integrates human into the loop to edit and debug the knowledge base (KB) queries. VisKoP not only provides a neural program induction module, which converts natural language questions into knowledge oriented program language (KoPL), but also maps KoPL programs into graphical elements. KoPL programs can be edited with simple graphical operators, such as dragging to add knowledge operators and slot filling to designate operator arguments. Moreover, VisKoP provides auto-completion for its knowledge base schema and users can easily debug the KoPL program by checking its intermediate results. To facilitate the practical KBQA on a million-entity-level KB, we design a highly efficient KoPL execution engine for the back-end. Experiment results show that VisKoP is highly efficient and user interaction can fix a large portion of wrong KoPL programs to acquire the correct answer. The VisKoP online demo https://demoviskop.xlore.cn (Stable release of this paper) and https://viskop.xlore.cn (Beta release with new features), highly efficient KoPL engine https://pypi.org/project/kopl-engine, and screencast video https://youtu.be/zAbJtxFPTXo are now publicly available. △ Less

Submitted 6 July, 2023; originally announced July 2023.

arXiv:2307.03115 [pdf, other]

KoRC: Knowledge oriented Reading Comprehension Benchmark for Deep Text Understanding

Authors: Zijun Yao, Yantao Liu, Xin Lv, Shulin Cao, Jifan Yu, Lei Hou, Juanzi Li

Abstract: Deep text understanding, which requires the connections between a given document and prior knowledge beyond its text, has been highlighted by many benchmarks in recent years. However, these benchmarks have encountered two major limitations. On the one hand, most of them require human annotation of knowledge, which leads to limited knowledge coverage. On the other hand, they usually use choices or… ▽ More Deep text understanding, which requires the connections between a given document and prior knowledge beyond its text, has been highlighted by many benchmarks in recent years. However, these benchmarks have encountered two major limitations. On the one hand, most of them require human annotation of knowledge, which leads to limited knowledge coverage. On the other hand, they usually use choices or spans in the texts as the answers, which results in narrow answer space. To overcome these limitations, we build a new challenging benchmark named KoRc in this paper. Compared with previous benchmarks, KoRC has two advantages, i.e., broad knowledge coverage and flexible answer format. Specifically, we utilize massive knowledge bases to guide annotators or large language models (LLMs) to construct knowledgable questions. Moreover, we use labels in knowledge bases rather than spans or choices as the final answers. We test state-of-the-art models on KoRC and the experimental results show that the strongest baseline only achieves 68.3% and 30.0% F1 measure in the in-distribution and out-of-distribution test set, respectively. These results indicate that deep text understanding is still an unsolved challenge. The benchmark dataset, leaderboard, and baseline methods are released in https://github.com/THU-KEG/KoRC. △ Less

Submitted 6 July, 2023; originally announced July 2023.

arXiv:2307.03084 [pdf, other]

OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models

Authors: Shengding Hu, Ning Ding, Weilin Zhao, Xingtai Lv, Zhen Zhang, Zhiyuan Liu, Maosong Sun

Abstract: The scale of large pre-trained models (PTMs) poses significant challenges in adapting to downstream tasks due to the high optimization overhead and storage costs associated with full-parameter fine-tuning. To address this, many studies explore parameter-efficient tuning methods, also framed as "delta tuning", which updates only a small subset of parameters, known as "delta modules", while keeping… ▽ More The scale of large pre-trained models (PTMs) poses significant challenges in adapting to downstream tasks due to the high optimization overhead and storage costs associated with full-parameter fine-tuning. To address this, many studies explore parameter-efficient tuning methods, also framed as "delta tuning", which updates only a small subset of parameters, known as "delta modules", while keeping the backbone model's parameters fixed. However, the practicality and flexibility of delta tuning have been limited due to existing implementations that directly modify the code of the backbone PTMs and hard-code specific delta tuning methods for each PTM. In this paper, we present OpenDelta, an open-source library that overcomes these limitations by providing a plug-and-play implementation of various delta tuning methods. Our novel techniques eliminate the need to modify the backbone PTMs' code, making OpenDelta compatible with different, even novel PTMs. OpenDelta is designed to be simple, modular, and extensible, providing a comprehensive platform for researchers and practitioners to adapt large PTMs efficiently. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: Accepted to ACL 2023 Demo track

arXiv:2306.12821 [pdf]

doi 10.1103/PhysRevB.107.235420

Momentum matching and band-alignment type in van der Waals heterostructures: Interfacial effects and materials screening

Authors: Yue-Jiao Zhang, Yin-Ti Ren, Xiao-Huan Lv, Xiao-Lin Zhao, Rui Yang, Nie-Wei Wang, Chen-Dong Jin, Hu Zhang, Ru-Qian Lian, Peng-Lai Gong, Rui-Ning Wang, Jiang-Long Wang, Xing-Qiang Shi

Abstract: Momentum-matched type II van der Waals heterostructures (vdWHs) have been designed by assembling layered two-dimensional semiconductors (2DSs) with special band-structure combinations - that is, the valence band edge at the Gamma point (the Brillouin-zone center) for one 2DS and the conduction band edge at the Gamma point for the other [Ubrig et al., Nat. Mater. 19, 299 (2020)]. However, the band… ▽ More Momentum-matched type II van der Waals heterostructures (vdWHs) have been designed by assembling layered two-dimensional semiconductors (2DSs) with special band-structure combinations - that is, the valence band edge at the Gamma point (the Brillouin-zone center) for one 2DS and the conduction band edge at the Gamma point for the other [Ubrig et al., Nat. Mater. 19, 299 (2020)]. However, the band offset sizes, band-alignment types, and whether momentum matched or not, all are affected by the interfacial effects between the component 2DSs, such as the quasichemical-bonding (QB) interaction between layers and the electrical dipole moment formed around the vdW interface. Here, based on density-functional theory calculations, first we probe the interfacial effects (including different QBs for valence and conduction bands, interface dipole, and, the synergistic effects of these two aspects) on band-edge evolution in energy and valley (location in the Brillouin zone) and the resulting changes in band alignment and momentum matching for a typical vdWH of monolayer InSe and bilayer WS2, in which the band edges of subsystems satisfy the special band-structure combination for a momentum-matched type II vdWH. Then, based on the conclusions of the studied interfacial effects, we propose a practical screening method for robust momentum-matched type II vdWHs. This practical screening method can also be applied to other band alignment types. Our current study opens a way for practical screening and designing of vdWHs with robust momentum-matching and band alignment type. △ Less

Submitted 22 June, 2023; originally announced June 2023.

Journal ref: Phys. Rev. B 2023

arXiv:2306.09296 [pdf, other]

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

Authors: Jifan Yu, Xiaozhi Wang, Shangqing Tu, Shulin Cao, Daniel Zhang-Li, Xin Lv, Hao Peng, Zijun Yao, Xiaohan Zhang, Hanming Li, Chunyang Li, Zheyuan Zhang, Yushi Bai, Yantao Liu, Amy Xin, Nianyi Lin, Kaifeng Yun, Linlu Gong, Jianhui Chen, Zhili Wu, Yunjia Qi, Weikai Li, Yong Guan, Kaisheng Zeng, Ji Qi , et al. (10 additional authors not shown)

Abstract: The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we believe meticulous and thoughtful designs are essential to thorough, unbiased, and applicable evaluations. Given the importance of world knowledge to LLMs, we construct a Knowledge-oriented LLM Assessment benchmark (KoLA), in which we… ▽ More The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we believe meticulous and thoughtful designs are essential to thorough, unbiased, and applicable evaluations. Given the importance of world knowledge to LLMs, we construct a Knowledge-oriented LLM Assessment benchmark (KoLA), in which we carefully design three crucial factors: (1) For \textbf{ability modeling}, we mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks. (2) For \textbf{data}, to ensure fair comparisons, we use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, aiming to evaluate the capacity to handle unseen data and evolving knowledge. (3) For \textbf{evaluation criteria}, we adopt a contrastive system, including overall standard scores for better numerical comparability across tasks and models and a unique self-contrast metric for automatically evaluating knowledge-creating ability. We evaluate $28$ open-source and commercial LLMs and obtain some intriguing findings. The KoLA dataset and open-participation leaderboard are publicly released at https://kola.xlore.cn and will be continuously updated to provide references for developing LLMs and knowledge-related systems. △ Less

Submitted 30 June, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted by ICLR 2024

arXiv:2306.07652 [pdf]

Inactivated COVID-19 Vaccination did not affect In vitro fertilization (IVF) / Intra-Cytoplasmic Sperm Injection (ICSI) cycle outcomes

Authors: Qi Wan, Ying Ling Yao, XingYu Lv, Li Hong Geng, Yue Wang, Enoch Appiah Adu-Gyamfi, Xue Jiao Wang, Yue Qian, Juan Yang, Ming Xing Chend, Zhao Hui Zhong, Yuan Li, Yu Bin Ding

Abstract: Background: The objective of this study is to evaluate the impact of COVID-19 inactivated vaccine administration on the outcomes of in vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI) cycles in infertile couples in China. Methods: We collected data from the CYART prospective cohort, which included couples undergoing IVF treatment from January 2021 to September 2022 at Sichuan… ▽ More Background: The objective of this study is to evaluate the impact of COVID-19 inactivated vaccine administration on the outcomes of in vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI) cycles in infertile couples in China. Methods: We collected data from the CYART prospective cohort, which included couples undergoing IVF treatment from January 2021 to September 2022 at Sichuan Jinxin Xinan Women & Children's Hospital. Based on whether they received vaccination before ovarian stimulation, the couples were divided into the vaccination group and the non-vaccination group. We compared the laboratory parameters and pregnancy outcomes between the two groups. Findings: After performing propensity score matching (PSM), the analysis demonstrated similar clinical pregnancy rates, biochemical pregnancy and ongoing pregnancy rates between vaccinated and unvaccinated women. No significant disparities were found in terms of embryo development and laboratory parameters among the groups. Moreover, male vaccination had no impact on patient performance or pregnancy outcomes in assisted reproductive technology treatments. Additionally, there were no significant differences observed in the effects of vaccination on embryo development and pregnancy outcomes among couples undergoing ART. Interpretation: The findings suggest that COVID-19 vaccination did not have a significant effect on patients undergoing IVF/ICSI with fresh embryo transfer. Therefore, it is recommended that couples should receive COVID-19 vaccination as scheduled to help mitigate the COVID-19 pandemic. △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: 26 pages, 4 figures and 5 tables

arXiv:2306.04181 [pdf, other]

Benchmarking Foundation Models with Language-Model-as-an-Examiner

Authors: Yushi Bai, Jiahao Ying, Yixin Cao, Xin Lv, Yuze He, Xiaozhi Wang, Jifan Yu, Kaisheng Zeng, Yijia Xiao, Haozhe Lyu, Jiayin Zhang, Juanzi Li, Lei Hou

Abstract: Numerous benchmarks have been established to assess the performance of foundation models on open-ended question answering, which serves as a comprehensive test of a model's ability to understand and generate language in a manner similar to humans. Most of these works focus on proposing new datasets, however, we see two main issues within previous benchmarking pipelines, namely testing leakage and… ▽ More Numerous benchmarks have been established to assess the performance of foundation models on open-ended question answering, which serves as a comprehensive test of a model's ability to understand and generate language in a manner similar to humans. Most of these works focus on proposing new datasets, however, we see two main issues within previous benchmarking pipelines, namely testing leakage and evaluation automation. In this paper, we propose a novel benchmarking framework, Language-Model-as-an-Examiner, where the LM serves as a knowledgeable examiner that formulates questions based on its knowledge and evaluates responses in a reference-free manner. Our framework allows for effortless extensibility as various LMs can be adopted as the examiner, and the questions can be constantly updated given more diverse trigger topics. For a more comprehensive and equitable evaluation, we devise three strategies: (1) We instruct the LM examiner to generate questions across a multitude of domains to probe for a broad acquisition, and raise follow-up questions to engage in a more in-depth assessment. (2) Upon evaluation, the examiner combines both scoring and ranking measurements, providing a reliable result as it aligns closely with human annotations. (3) We additionally propose a decentralized Peer-examination method to address the biases in a single examiner. Our data and benchmarking results are available at: http://lmexam.xlore.cn. △ Less

Submitted 4 November, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023 Datasets and Benchmarks

arXiv:2305.19787 [pdf, other]

DeepMerge: Deep-Learning-Based Region-Merging for Image Segmentation

Authors: Xianwei Lv, Claudio Persello, Wangbin Li, Xiao Huang, Dongping Ming, Alfred Stein

Abstract: Image segmentation aims to partition an image according to the objects in the scene and is a fundamental step in analysing very high spatial-resolution (VHR) remote sensing imagery. Current methods struggle to effectively consider land objects with diverse shapes and sizes. Additionally, the determination of segmentation scale parameters frequently adheres to a static and empirical doctrine, posin… ▽ More Image segmentation aims to partition an image according to the objects in the scene and is a fundamental step in analysing very high spatial-resolution (VHR) remote sensing imagery. Current methods struggle to effectively consider land objects with diverse shapes and sizes. Additionally, the determination of segmentation scale parameters frequently adheres to a static and empirical doctrine, posing limitations on the segmentation of large-scale remote sensing images and yielding algorithms with limited interpretability. To address the above challenges, we propose a deep-learning-based region merging method dubbed DeepMerge to handle the segmentation of complete objects in large VHR images by integrating deep learning and region adjacency graph (RAG). This is the first method to use deep learning to learn the similarity and merge similar adjacent super-pixels in RAG. We propose a modified binary tree sampling method to generate shift-scale data, serving as inputs for transformer-based deep learning networks, a shift-scale attention with 3-Dimension relative position embedding to learn features across scales, and an embedding to fuse learned features with hand-crafted features. DeepMerge can achieve high segmentation accuracy in a supervised manner from large-scale remotely sensed images and provides an interpretable optimal scale parameter, which is validated using a remote sensing image of 0.55 m resolution covering an area of 5,660 km^2. The experimental results show that DeepMerge achieves the highest F value (0.9550) and the lowest total error TE (0.0895), correctly segmenting objects of different sizes and outperforming all competing segmentation methods. △ Less

Submitted 5 January, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

arXiv:2305.15056 [pdf, other]

Reasoning over Hierarchical Question Decomposition Tree for Explainable Question Answering

Authors: Jiajie Zhang, Shulin Cao, Tingjia Zhang, Xin Lv, Jiaxin Shi, Qi Tian, Juanzi Li, Lei Hou

Abstract: Explainable question answering (XQA) aims to answer a given question and provide an explanation why the answer is selected. Existing XQA methods focus on reasoning on a single knowledge source, e.g., structured knowledge bases, unstructured corpora, etc. However, integrating information from heterogeneous knowledge sources is essential to answer complex questions. In this paper, we propose to leve… ▽ More Explainable question answering (XQA) aims to answer a given question and provide an explanation why the answer is selected. Existing XQA methods focus on reasoning on a single knowledge source, e.g., structured knowledge bases, unstructured corpora, etc. However, integrating information from heterogeneous knowledge sources is essential to answer complex questions. In this paper, we propose to leverage question decomposing for heterogeneous knowledge integration, by breaking down a complex question into simpler ones, and selecting the appropriate knowledge source for each sub-question. To facilitate reasoning, we propose a novel two-stage XQA framework, Reasoning over Hierarchical Question Decomposition Tree (RoHT). First, we build the Hierarchical Question Decomposition Tree (HQDT) to understand the semantics of a complex question; then, we conduct probabilistic reasoning over HQDT from root to leaves recursively, to aggregate heterogeneous knowledge at different tree levels and search for a best solution considering the decomposing and answering probabilities. The experiments on complex QA datasets KQA Pro and Musique show that our framework outperforms SOTA methods significantly, demonstrating the effectiveness of leveraging question decomposing for knowledge integration and our RoHT framework. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: has been accepted by ACL2023

Showing 1–50 of 165 results for author: Lv, X