Skip to main content

Showing 1–50 of 168 results for author: Cao, Q

  1. arXiv:2406.19853  [pdf, other

    cs.CL cs.AI

    YuLan: An Open-source Large Language Model

    Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.18159  [pdf, other

    cs.CV cs.GR

    Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models

    Authors: Xiaolin Hong, Hongwei Yi, Fazhi He, Qiong Cao

    Abstract: Generating 3D scenes from human motion sequences supports numerous applications, including virtual reality and architectural design. However, previous auto-regression-based human-aware 3D scene generation methods have struggled to accurately capture the joint distribution of multiple objects and input humans, often resulting in overlapping object generation in the same space. To address this limit… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.17223  [pdf, ps, other

    cs.IT

    On Zero-Error Capacity of Graphs with One Edge

    Authors: Qi Cao, Qi Chen, Baoming Bai

    Abstract: In this paper, we study the zero-error capacity of channels with memory, which are represented by graphs. We provide a method to construct code for any graph with one edge, thereby determining a lower bound on its zero-error capacity. Moreover, this code can achieve zero-error capacity when the symbols in a vertex with degree one are the same. We further apply our method to the one-edge graphs rep… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2406.16988  [pdf, other

    cs.LG stat.ML

    MD tree: a model-diagnostic tree grown on loss landscape

    Authors: Yefan Zhou, Jianlong Chen, Qinxue Cao, Konstantin Schürholt, Yaoqing Yang

    Abstract: This paper considers "model diagnosis", which we formulate as a classification problem. Given a pre-trained neural network (NN), the goal is to predict the source of failure from a set of failure modes (such as a wrong hyperparameter, inadequate model size, and insufficient data) without knowing the training configuration of the pre-trained NN. The conventional diagnosis approach uses training and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: ICML 2024, first two authors contributed equally

  5. arXiv:2406.15658  [pdf, other

    cs.CV cs.AI

    TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning

    Authors: Nemin Wu, Qian Cao, Zhangyu Wang, Zeping Liu, Yanlin Qi, Jielu Zhang, Joshua Ni, Xiaobai Yao, Hongxu Ma, Lan Mu, Stefano Ermon, Tanuja Ganu, Akshay Nambi, Ni Lao, Gengchen Mai

    Abstract: Spatial representation learning (SRL) aims at learning general-purpose neural network representations from various types of spatial data (e.g., points, polylines, polygons, networks, images, etc.) in their native formats. Learning good spatial representations is a fundamental problem for various downstream applications such as species distribution modeling, weather forecasting, trajectory generati… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures. Submitted to NeurIPS 2024 Datasets and Benchmarks Track. Under review

  6. arXiv:2406.14408  [pdf, other

    cs.AI cs.CL cs.LG

    FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving

    Authors: Xiaohan Lin, Qingxing Cao, Yinya Huang, Haiming Wang, Jianqiao Lu, Zhengying Liu, Linqi Song, Xiaodan Liang

    Abstract: Formal verification (FV) has witnessed growing significance with current emerging program synthesis by the evolving large language models (LLMs). However, current formal verification mainly resorts to symbolic verifiers or hand-craft rules, resulting in limitations for extensive and flexible verification. On the other hand, formal languages for automated theorem proving, such as Isabelle, as anoth… ▽ More

    Submitted 20 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  7. arXiv:2406.14367  [pdf, other

    cs.CV cs.AI

    PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions

    Authors: Sihan Ma, Jing Zhang, Qiong Cao, Dacheng Tao

    Abstract: Pose estimation aims to accurately identify anatomical keypoints in humans and animals using monocular images, which is crucial for various applications such as human-machine interaction, embodied AI, and autonomous driving. While current models show promising results, they are typically trained and tested on clean data, potentially overlooking the corruption during real-world deployment and thus… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Technical report. Project page: https://xymsh.github.io/PoseBench/

  8. arXiv:2406.03808  [pdf

    cs.LG cs.AI stat.AP

    Cross-variable Linear Integrated ENhanced Transformer for Photovoltaic power forecasting

    Authors: Jiaxin Gao, Qinglong Cao, Yuntian Chen, Dongxiao Zhang

    Abstract: Photovoltaic (PV) power forecasting plays a crucial role in optimizing the operation and planning of PV systems, thereby enabling efficient energy management and grid integration. However, un certainties caused by fluctuating weather conditions and complex interactions between different variables pose significant challenges to accurate PV power forecasting. In this study, we propose PV-Client (Cro… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  9. arXiv:2406.02376  [pdf, other

    cs.CL

    Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs

    Authors: Zhiwei Cao, Qian Cao, Yu Lu, Ningxin Peng, Luyang Huang, Shanbo Cheng, Jinsong Su

    Abstract: The growing popularity of Large Language Models has sparked interest in context compression for Large Language Models (LLMs). However, the performance of previous methods degrades dramatically as compression ratios increase, sometimes even falling to the closed-book level. This decline can be attributed to the loss of key information during the compression process. Our preliminary study supports t… ▽ More

    Submitted 17 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  10. arXiv:2406.01341  [pdf, other

    cs.SI

    Important node identification for complex networks based on improved Electre Multi-Attribute fusion

    Authors: Qi Cao, Yurong Song, Min Li, Ruqi Li, Hongbo Qu, Guo-Ping Jiang, Jinye Xiong

    Abstract: Influence maximization problem involves selecting a subset of seed nodes within a social network to maximize information spread under a given diffusion model, so how to identify the important nodes is the problem to be considered in this paper. Due to the great differences in the reality of the network, a class of multi-attribute decision fusion methods is often used to solve this problem. Electre… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  11. arXiv:2405.11971  [pdf, other

    cs.CV

    Data Augmentation for Text-based Person Retrieval Using Large Language Models

    Authors: Zheng Li, Lijia Si, Caili Guo, Yang Yang, Qiushi Cao

    Abstract: Text-based Person Retrieval (TPR) aims to retrieve person images that match the description given a text query. The performance improvement of the TPR model relies on high-quality data for supervised training. However, it is difficult to construct a large-scale, high-quality TPR dataset due to expensive annotation and privacy protection. Recently, Large Language Models (LLMs) have approached or ev… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  12. arXiv:2405.08668  [pdf, other

    cs.CV cs.AI cs.LG stat.AP

    Promoting AI Equity in Science: Generalized Domain Prompt Learning for Accessible VLM Research

    Authors: Qinglong Cao, Yuntian Chen, Lu Lu, Hao Sun, Zhenzhong Zeng, Xiaokang Yang, Dongxiao Zhang

    Abstract: Large-scale Vision-Language Models (VLMs) have demonstrated exceptional performance in natural vision tasks, motivating researchers across domains to explore domain-specific VLMs. However, the construction of powerful domain-specific VLMs demands vast amounts of annotated data, substantial electrical energy, and computing resources, primarily accessible to industry, yet hindering VLM research in a… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  13. arXiv:2405.07973  [pdf, other

    cs.PL

    A Natural Formalized Proof Language

    Authors: Lihan Xie, Zhicheng Hui, Qinxiang Cao

    Abstract: Artificial intelligence assisted mathematical proof has become a highly focused area nowadays. One key problem in this field is to generate formal mathematical proofs from natural language proofs. Due to historical reasons, the formal proof languages adopted by traditional theorem provers were not intended to represent natural language proofs. Therefore, they are not well-suited for the aforementi… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  14. arXiv:2405.06677  [pdf, other

    cs.CL cs.AI

    ATG: Benchmarking Automated Theorem Generation for Generative Language Models

    Authors: Xiaohan Lin, Qingxing Cao, Yinya Huang, Zhicheng Yang, Zhengying Liu, Zhenguo Li, Xiaodan Liang

    Abstract: Humans can develop new theorems to explore broader and more complex mathematical results. While current generative language models (LMs) have achieved significant improvement in automatically proving theorems, their ability to generate new or reusable theorems is still under-explored. Without the new theorems, current LMs struggle to prove harder theorems that are distant from the given hypotheses… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  15. arXiv:2404.17297  [pdf, ps, other

    cs.PL

    Denotation-based Compositional Compiler Verification

    Authors: Zhang Cheng, Jiyang Wu, Di Wang, Qinxiang Cao

    Abstract: A desired but challenging property of compiler verification is compositionality in the sense that the compilation correctness of a program can be deduced from that of its substructures ranging from statements, functions, and modules incrementally. Previously proposed approaches have devoted extensive effort to module-level compositionality based on small-step semantics and simulation theories. Thi… ▽ More

    Submitted 15 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: 38 pages, 8 figures

  16. arXiv:2404.17287  [pdf, other

    cs.CL

    When to Trust LLMs: Aligning Confidence with Response Quality

    Authors: Shuchang Tao, Liuyi Yao, Hanxing Ding, Yuexiang Xie, Qi Cao, Fei Sun, Jinyang Gao, Huawei Shen, Bolin Ding

    Abstract: Despite the success of large language models (LLMs) in natural language generation, much evidence shows that LLMs may produce incorrect or nonsensical text. This limitation highlights the importance of discerning when to trust LLMs, especially in safety-critical domains. Existing methods often express reliability by confidence level, however, their effectiveness is limited by the lack of objective… ▽ More

    Submitted 9 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by ACL 2024

  17. arXiv:2404.16376  [pdf, ps, other

    cs.IT cs.MA eess.SY

    A Hypergraph Approach to Distributed Broadcast

    Authors: Qi Cao, Yulin Shao, Fan Yang

    Abstract: This paper explores the distributed broadcast problem within the context of network communications, a critical challenge in decentralized information dissemination. We put forth a novel hypergraph-based approach to address this issue, focusing on minimizing the number of broadcasts to ensure comprehensive data sharing among all network users. A key contribution of our work is the establishment of… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  18. arXiv:2404.14619  [pdf, other

    cs.CL cs.AI cs.LG

    OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

    Authors: Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari

    Abstract: The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of… ▽ More

    Submitted 1 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Minor corrections

  19. arXiv:2404.10761  [pdf, ps, other

    cs.LG

    TorchSurv: A Lightweight Package for Deep Survival Analysis

    Authors: Mélodie Monod, Peter Krusche, Qian Cao, Berkman Sahiner, Nicholas Petrick, David Ohlssen, Thibaud Coroller

    Abstract: TorchSurv is a Python package that serves as a companion tool to perform deep survival modeling within the PyTorch environment. Unlike existing libraries that impose specific parametric forms, TorchSurv enables the use of custom PyTorch-based deep survival models. With its lightweight design, minimal input requirements, full PyTorch backend, and freedom from restrictive survival model parameteriza… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: https://opensource.nibr.com/torchsurv/

  20. A Coq Library of Sets for Teaching Denotational Semantics

    Authors: Qinxiang Cao, Xiwei Wu, Yalun Liang

    Abstract: Sets and relations are very useful concepts for defining denotational semantics. In the Coq proof assistant, curried functions to Prop are used to represent sets and relations, e.g. A -> Prop, A -> B -> Prop, A -> B -> C -> Prop, etc. Further, the membership relation can be encoded by function applications, e.g. X a represents a in X if X: A -> Prop. This is very convenient for developing formal d… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: In Proceedings ThEdu'23, arXiv:2404.03709

    Journal ref: EPTCS 400, 2024, pp. 79-95

  21. arXiv:2404.04935  [pdf, other

    cs.CV

    Anomaly Detection in Electrocardiograms: Advancing Clinical Diagnosis Through Self-Supervised Learning

    Authors: Aofan Jiang, Chaoqin Huang, Qing Cao, Yuchen Xu, Zi Zeng, Kang Chen, Ya Zhang, Yanfeng Wang

    Abstract: The electrocardiogram (ECG) is an essential tool for diagnosing heart disease, with computer-aided systems improving diagnostic accuracy and reducing healthcare costs. Despite advancements, existing systems often miss rare cardiac anomalies that could be precursors to serious, life-threatening issues or alterations in the cardiac macro/microstructure. We address this gap by focusing on self-superv… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  22. arXiv:2404.00368  [pdf, other

    cs.CV

    Towards Variable and Coordinated Holistic Co-Speech Motion Generation

    Authors: Yifei Liu, Qiong Cao, Yandong Wen, Huaiguang Jiang, Changxing Ding

    Abstract: This paper addresses the problem of generating lifelike holistic co-speech motions for 3D avatars, focusing on two key aspects: variability and coordination. Variability allows the avatar to exhibit a wide range of motions even with similar speech content, while coordination ensures a harmonious alignment among facial expressions, hand gestures, and body poses. We aim to achieve both with ProbTalk… ▽ More

    Submitted 15 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: CVPR 2024

  23. arXiv:2403.15709  [pdf, other

    cs.CV cs.AI

    Contact-aware Human Motion Generation from Textual Descriptions

    Authors: Sihan Ma, Qiong Cao, Jing Zhang, Dacheng Tao

    Abstract: This paper addresses the problem of generating 3D interactive human motion from text. Given a textual description depicting the actions of different body parts in contact with objects, we synthesize sequences of 3D body poses that are visually natural and physically plausible. Yet, this task poses a significant challenge due to the inadequate consideration of interactions by physical contacts in b… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: Project page: https://xymsh.github.io/RICH-CAT/

  24. arXiv:2403.10588  [pdf, other

    cs.SE cs.AI

    S3LLM: Large-Scale Scientific Software Understanding with LLMs using Source, Metadata, and Document

    Authors: Kareem Shaik, Dali Wang, Weijian Zheng, Qinglei Cao, Heng Fan, Peter Schwartz, Yunhe Feng

    Abstract: The understanding of large-scale scientific software poses significant challenges due to its diverse codebase, extensive code length, and target computing architectures. The emergence of generative AI, specifically large language models (LLMs), provides novel pathways for understanding such complex scientific codes. This paper presents S3LLM, an LLM-based framework designed to enable the examinati… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  25. arXiv:2402.08957  [pdf, other

    cs.AI cs.CL cs.FL cs.LG cs.PL

    MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data

    Authors: Yinya Huang, Xiaohan Lin, Zhengying Liu, Qingxing Cao, Huajian Xin, Haiming Wang, Zhenguo Li, Linqi Song, Xiaodan Liang

    Abstract: Recent large language models (LLMs) have witnessed significant advancement in various tasks, including mathematical reasoning and theorem proving. As these two tasks require strict and formal multi-step inference, they are appealing domains for exploring the reasoning ability of LLMs but still face important challenges. Previous studies such as Chain-of-Thought (CoT) have revealed the effectivenes… ▽ More

    Submitted 22 May, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Journal ref: ICLR 2024 spotlight

  26. arXiv:2401.17723  [pdf, other

    cs.IR

    LoRec: Large Language Model for Robust Sequential Recommendation against Poisoning Attacks

    Authors: Kaike Zhang, Qi Cao, Yunfan Wu, Fei Sun, Huawei Shen, Xueqi Cheng

    Abstract: Sequential recommender systems stand out for their ability to capture users' dynamic interests and the patterns of item-to-item transitions. However, the inherent openness of sequential recommender systems renders them vulnerable to poisoning attacks, where fraudulent users are injected into the training data to manipulate learned patterns. Traditional defense strategies predominantly depend on pr… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  27. arXiv:2401.15913  [pdf, other

    eess.IV cs.CV cs.LG physics.flu-dyn stat.AP

    Vision-Informed Flow Image Super-Resolution with Quaternion Spatial Modeling and Dynamic Flow Convolution

    Authors: Qinglong Cao, Zhengqin Xu, Chao Ma, Xiaokang Yang, Yuntian Chen

    Abstract: Flow image super-resolution (FISR) aims at recovering high-resolution turbulent velocity fields from low-resolution flow images. Existing FISR methods mainly process the flow images in natural image patterns, while the critical and distinct flow visual properties are rarely considered. This negligence would cause the significant domain gap between flow and natural images to severely hamper the acc… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  28. arXiv:2401.12200  [pdf, other

    cs.CL cs.LG

    APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

    Authors: Bowen Zhao, Hannaneh Hajishirzi, Qingqing Cao

    Abstract: Fine-tuning and inference with large Language Models (LM) are generally known to be expensive. Parameter-efficient fine-tuning over pretrained LMs reduces training memory by updating a small number of LM parameters but does not improve inference efficiency. Structured pruning improves LM inference efficiency by removing consistent parameter blocks, yet often increases training memory and time. To… ▽ More

    Submitted 4 June, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted to ICML 2024 Oral; code available at https://github.com/ROIM1998/APT

  29. arXiv:2401.11911  [pdf, other

    cs.CL cs.AI

    Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts?

    Authors: Hexiang Tan, Fei Sun, Wanli Yang, Yuanzhuo Wang, Qi Cao, Xueqi Cheng

    Abstract: While auxiliary information has become a key to enhancing Large Language Models (LLMs), relatively little is known about how LLMs merge these contexts, specifically contexts generated by LLMs and those retrieved from external sources. To investigate this, we formulate a systematic framework to identify whether LLMs' responses are attributed to either generated or retrieved contexts. To easily trac… ▽ More

    Submitted 10 June, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted at ACL 2024 Main, Homepage (https://tan-hexiang.github.io/Blinded_by_Generated_Contexts/)

  30. arXiv:2401.11089  [pdf, other

    cs.CR cs.AI cs.DC cs.IR

    FedRKG: A Privacy-preserving Federated Recommendation Framework via Knowledge Graph Enhancement

    Authors: Dezhong Yao, Tongtong Liu, Qi Cao, Hai Jin

    Abstract: Federated Learning (FL) has emerged as a promising approach for preserving data privacy in recommendation systems by training models locally. Recently, Graph Neural Networks (GNN) have gained popularity in recommendation tasks due to their ability to capture high-order interactions between users and items. However, privacy concerns prevent the global sharing of the entire user-item graph. To addre… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  31. arXiv:2401.09852  [pdf, other

    cs.CV cs.AI

    Enhancing the Fairness and Performance of Edge Cameras with Explainable AI

    Authors: Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Quoc Hung Cao, Van Binh Truong, Quoc Khanh Nguyen, Hung Cao

    Abstract: The rising use of Artificial Intelligence (AI) in human detection on Edge camera systems has led to accurate but complex models, challenging to interpret and debug. Our research presents a diagnostic method using Explainable AI (XAI) for model debugging, with expert-driven problem identification and solution creation. Validated on the Bytetrack model in a real-world office Edge network, we found t… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: IEEE ICCE 2024

  32. arXiv:2312.17016  [pdf, other

    cs.CV cs.AI

    On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications

    Authors: Chenjiao Tan, Qian Cao, Yiwei Li, Jielu Zhang, Xiao Yang, Huaqin Zhao, Zihao Wu, Zhengliang Liu, Hao Yang, Nemin Wu, Tao Tang, Xinyue Ye, Lilong Chai, Ninghao Liu, Changying Li, Lan Mu, Tianming Liu, Gengchen Mai

    Abstract: The advent of large language models (LLMs) has heightened interest in their potential for multimodal applications that integrate language and vision. This paper explores the capabilities of GPT-4V in the realms of geography, environmental science, agriculture, and urban planning by evaluating its performance across a variety of tasks. Data sources comprise satellite imagery, aerial photos, ground-… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: 110 Pages; 61 Figures

    ACM Class: I.2.7; I.2.10; I.4.6; I.4.8; J.2

  33. arXiv:2312.10163  [pdf, other

    cs.CV cs.LG

    Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey

    Authors: Xu Liu, Tong Zhou, Yuanxin Wang, Yuping Wang, Qinjingwen Cao, Weizhi Du, Yonghuan Yang, Junjun He, Yu Qiao, Yiqing Shen

    Abstract: The advent of foundation models, which are pre-trained on vast datasets, has ushered in a new era of computer vision, characterized by their robustness and remarkable zero-shot generalization capabilities. Mirroring the transformative impact of foundation models like large language models (LLMs) in natural language processing, visual foundation models (VFMs) have become a catalyst for groundbreaki… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  34. arXiv:2312.08878  [pdf, other

    cs.CV cs.LG stat.AP

    Domain Prompt Learning with Quaternion Networks

    Authors: Qinglong Cao, Zhengqin Xu, Yuntian Chen, Chao Ma, Xiaokang Yang

    Abstract: Prompt learning has emerged as an effective and data-efficient technique in large Vision-Language Models (VLMs). However, when adapting VLMs to specialized domains such as remote sensing and medical imaging, domain prompt learning remains underexplored. While large-scale domain-specific foundation models can help tackle this challenge, their concentration on a single vision level makes it challeng… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  35. arXiv:2312.05758  [pdf, other

    cs.LG stat.AP

    CLeaRForecast: Contrastive Learning of High-Purity Representations for Time Series Forecasting

    Authors: Jiaxin Gao, Yuxiao Hu, Qinglong Cao, Siqi Dai, Yuntian Chen

    Abstract: Time series forecasting (TSF) holds significant importance in modern society, spanning numerous domains. Previous representation learning-based TSF algorithms typically embrace a contrastive learning paradigm featuring segregated trend-periodicity representations. Yet, these methodologies disregard the inherent high-impact noise embedded within time series data, resulting in representation inaccur… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  36. arXiv:2312.04780  [pdf, other

    cs.CV cs.AI

    Fine-Tuning InstructPix2Pix for Advanced Image Colorization

    Authors: Zifeng An, Zijing Xu, Eric Fan, Qi Cao

    Abstract: This paper presents a novel approach to human image colorization by fine-tuning the InstructPix2Pix model, which integrates a language model (GPT-3) with a text-to-image model (Stable Diffusion). Despite the original InstructPix2Pix model's proficiency in editing images based on textual instructions, it exhibits limitations in the focused domain of colorization. To address this, we fine-tuned the… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  37. arXiv:2312.02298  [pdf, other

    eess.SP cs.CV cs.LG stat.AP

    MoE-AMC: Enhancing Automatic Modulation Classification Performance Using Mixture-of-Experts

    Authors: Jiaxin Gao, Qinglong Cao, Yuntian Chen

    Abstract: Automatic Modulation Classification (AMC) plays a vital role in time series analysis, such as signal classification and identification within wireless communications. Deep learning-based AMC models have demonstrated significant potential in this domain. However, current AMC models inadequately consider the disparities in handling signals under conditions of low and high Signal-to-Noise Ratio (SNR)… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  38. arXiv:2311.18805  [pdf, other

    cs.CL cs.AI

    Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text

    Authors: Qi Cao, Takeshi Kojima, Yutaka Matsuo, Yusuke Iwasawa

    Abstract: While Large Language Models (LLMs) have achieved remarkable performance in many tasks, much about their inner workings remains unclear. In this study, we present novel experimental insights into the resilience of LLMs, particularly GPT-4, when subjected to extensive character-level permutations. To investigate this, we first propose the Scrambled Bench, a suite designed to measure the capacity of… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023 (with an additional analysis section in appendix)

  39. arXiv:2311.10483  [pdf, other

    cs.PL

    Towards General Loop Invariant Generation: A Benchmark of Programs with Memory Manipulation

    Authors: Chang Liu, Xiwei Wu, Yuan Feng, Qinxiang Cao, Junchi Yan

    Abstract: Program verification is vital for ensuring software reliability, especially in the context of increasingly complex systems. Loop invariants, remaining true before and after each iteration of loops, are crucial for this verification process. Traditional provers and machine learning based methods for generating loop invariants often require expert intervention or extensive labeled data, and typicall… ▽ More

    Submitted 7 June, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

    Comments: Preprint, under review

  40. arXiv:2310.19626  [pdf, other

    cs.AI

    Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities

    Authors: Zhengliang Liu, Yiwei Li, Qian Cao, Junwen Chen, Tianze Yang, Zihao Wu, John Hale, John Gibbs, Khaled Rasheed, Ninghao Liu, Gengchen Mai, Tianming Liu

    Abstract: Recent advances in artificial general intelligence (AGI), particularly large language models and creative image generation systems have demonstrated impressive capabilities on diverse tasks spanning the arts and humanities. However, the swift evolution of AGI has also raised critical questions about its responsible deployment in these culturally significant domains traditionally seen as profoundly… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    ACM Class: J.5; I.2.7; I.2.10

  41. arXiv:2310.17616  [pdf, other

    cs.PL

    Verifying Programs with Logic and Extended Proof Rules: Deep Embedding v.s. Shallow Embedding

    Authors: Zhongye Wang, Qinxiang Cao, Yichen Tao

    Abstract: Many foundational program verification tools have been developed to build machine-checked program correctness proofs, a majority of which are based on Hoare logic. Their program logics, their assertion languages, and their underlying programming languages can be formalized by either a shallow embedding or a deep embedding. Tools like Iris and early versions of Verified Software Toolchain (VST) cho… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  42. arXiv:2310.10180  [pdf, other

    cs.CL

    TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models

    Authors: Jing Xiong, Jianhao Shen, Ye Yuan, Haiming Wang, Yichun Yin, Zhengying Liu, Lin Li, Zhijiang Guo, Qingxing Cao, Yinya Huang, Chuanyang Zheng, Xiaodan Liang, Ming Zhang, Qun Liu

    Abstract: Automated theorem proving (ATP) has become an appealing domain for exploring the reasoning ability of the recent successful generative language models. However, current ATP benchmarks mainly focus on symbolic inference, but rarely involve the understanding of complex number combination reasoning. In this work, we propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonomet… ▽ More

    Submitted 24 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP 2023. Code is available at https://github.com/menik1126/TRIGO

  43. arXiv:2310.07730  [pdf, other

    cs.CV eess.IV

    Domain-Controlled Prompt Learning

    Authors: Qinglong Cao, Zhengqin Xu, Yuntian Chen, Chao Ma, Xiaokang Yang

    Abstract: Large pre-trained vision-language models, such as CLIP, have shown remarkable generalization capabilities across various tasks when appropriate text prompts are provided. However, adapting these models to specific domains, like remote sensing images (RSIs), medical images, etc, remains unexplored and challenging. Existing prompt learning methods often lack domain-awareness or domain-transfer mecha… ▽ More

    Submitted 12 December, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

  44. arXiv:2310.02954  [pdf, other

    cs.CL

    DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning

    Authors: Jing Xiong, Zixuan Li, Chuanyang Zheng, Zhijiang Guo, Yichun Yin, Enze Xie, Zhicheng Yang, Qingxing Cao, Haiming Wang, Xiongwei Han, Jing Tang, Chengming Li, Xiaodan Liang

    Abstract: Recent advances in natural language processing, primarily propelled by Large Language Models (LLMs), have showcased their remarkable capabilities grounded in in-context learning. A promising avenue for guiding LLMs in intricate reasoning tasks involves the utilization of intermediate reasoning steps within the Chain-of-Thought (CoT) paradigm. Nevertheless, the central challenge lies in the effecti… ▽ More

    Submitted 2 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted in ICLR 2024

  45. arXiv:2310.01329  [pdf, other

    cs.CL cs.AI

    BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

    Authors: Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh Hajishirzi

    Abstract: Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks. However, running retrieval-augmented language models (LMs) is slow and difficult to scale due to processing large amounts of retrieved text. We introduce binary token representations (BTR), which use 1-bit vectors to precompute every token in passages, significantly… ▽ More

    Submitted 3 May, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 camera-ready version

  46. arXiv:2310.00656  [pdf, other

    cs.AI

    LEGO-Prover: Neural Theorem Proving with Growing Libraries

    Authors: Haiming Wang, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, Jian Yin, Zhenguo Li, Heng Liao, Xiaodan Liang

    Abstract: Despite the success of large language models (LLMs), the task of theorem proving still remains one of the hardest reasoning tasks that is far from being fully solved. Prior methods using language models have demonstrated promising results, but they still struggle to prove even middle school level theorems. One common limitation of these methods is that they assume a fixed theorem library during th… ▽ More

    Submitted 27 October, 2023; v1 submitted 1 October, 2023; originally announced October 2023.

  47. arXiv:2309.15850  [pdf, other

    cs.CV eess.IV

    Reflection Invariance Learning for Few-shot Semantic Segmentation

    Authors: Qinglong Cao, Yuntian Chen, Chao Ma, Xiaokang Yang

    Abstract: Few-shot semantic segmentation (FSS) aims to segment objects of unseen classes in query images with only a few annotated support images. Existing FSS algorithms typically focus on mining category representations from the single-view support to match semantic objects of the single-view query. However, the limited annotated samples render the single-view matching struggle to perceive the reflection… ▽ More

    Submitted 1 June, 2023; originally announced September 2023.

  48. arXiv:2309.05590  [pdf, other

    cs.CV cs.AI cs.MM

    Temporal Action Localization with Enhanced Instant Discriminability

    Authors: Dingfeng Shi, Qiong Cao, Yujie Zhong, Shan An, Jian Cheng, Haogang Zhu, Dacheng Tao

    Abstract: Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video. The unclear boundaries of actions in videos often result in imprecise predictions of action boundaries by existing methods. To resolve this issue, we propose a one-stage framework named TriDet. First, we propose a Trident-head to model the action boundary via an estimated… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: An extended version of the CVPR paper arXiv:2303.07347, submitted to IJCV

  49. arXiv:2309.02057  [pdf, other

    cs.IR

    Robust Recommender System: A Survey and Future Directions

    Authors: Kaike Zhang, Qi Cao, Fei Sun, Yunfan Wu, Shuchang Tao, Huawei Shen, Xueqi Cheng

    Abstract: With the rapid growth of information, recommender systems have become integral for providing personalized suggestions and overcoming information overload. However, their practical deployment often encounters "dirty" data, where noise or malicious information can lead to abnormal recommendations. Research on improving recommender systems' robustness against such dirty data has thus gained significa… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  50. arXiv:2308.01639  [pdf, other

    cs.CV

    Multi-scale Cross-restoration Framework for Electrocardiogram Anomaly Detection

    Authors: Aofan Jiang, Chaoqin Huang, Qing Cao, Shuang Wu, Zi Zeng, Kang Chen, Ya Zhang, Yanfeng Wang

    Abstract: Electrocardiogram (ECG) is a widely used diagnostic tool for detecting heart conditions. Rare cardiac diseases may be underdiagnosed using traditional ECG analysis, considering that no training dataset can exhaust all possible cardiac disorders. This paper proposes using anomaly detection to identify any unhealthy status, with normal ECGs solely for training. However, detecting anomalies in ECG ca… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: MICCAI 2023 Early Accept