Skip to main content

Showing 1–50 of 210 results for author: Du, L

  1. arXiv:2407.11033  [pdf, other

    cs.LG cs.CL

    Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models

    Authors: Yuyan Chen, Qiang Fu, Ge Fan, Lun Du, Jian-Guang Lou, Shi Han, Dongmei Zhang, Zhixu Li, Yanghua Xiao

    Abstract: Recent years, Pre-trained Language models (PLMs) have swept into various fields of artificial intelligence and achieved great success. However, most PLMs, such as T5 and GPT3, have a huge amount of parameters, fine-tuning them is often expensive and time consuming, and storing them takes up a lot of space. Therefore, it is necessary to adopt a parameter-efficient approach to reduce parameters of P… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to CIKM 2023 (Long Paper)

  2. arXiv:2407.04752  [pdf, other

    cs.LG cs.CL cs.NE

    SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

    Authors: Xingrun Xing, Boyan Gao, Zheng Zhang, David A. Clifton, Shitao Xiao, Li Du, Guoqi Li, Jiajun Zhang

    Abstract: The recent advancements in large language models (LLMs) with billions of parameters have significantly boosted their performance across various real-world applications. However, the inference processes for these models require substantial energy and computational resources, presenting considerable deployment challenges. In contrast, human brains, which contain approximately 86 billion biological n… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  3. arXiv:2407.03082  [pdf, other

    cs.LG stat.ML

    Stable Heterogeneous Treatment Effect Estimation across Out-of-Distribution Populations

    Authors: Yuling Zhang, Anpeng Wu, Kun Kuang, Liang Du, Zixun Sun, Zhi Wang

    Abstract: Heterogeneous treatment effect (HTE) estimation is vital for understanding the change of treatment effect across individuals or subgroups. Most existing HTE estimation methods focus on addressing selection bias induced by imbalanced distributions of confounders between treated and control units, but ignore distribution shifts across populations. Thereby, their applicability has been limited to the… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted by ICDE'2024

  4. arXiv:2407.02913  [pdf, other

    cs.LG cs.AI eess.IV eess.SP math.NA

    SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic

    Authors: Liulu He, Yufei Zhao, Rui Gao, Yuan Du, Li Du

    Abstract: Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models. However, these algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization. To resolve this conflict and further improve the efficiency of quantized convolution, we proposes SFC, a new algebra transform for fast co… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  5. arXiv:2406.09008  [pdf, other

    cs.CL

    LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models

    Authors: Xiaohao Yang, He Zhao, Dinh Phung, Wray Buntine, Lan Du

    Abstract: Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging. Existing evaluation methods are either less comparable across different models (e.g., perplexity) or focus on only one specific aspect of a model (e.g., topic quality or document representation quality) at a time, which is insufficient to reflect the ov… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  6. arXiv:2406.04680  [pdf, other

    eess.IV cs.CV

    MTS-Net: Dual-Enhanced Positional Multi-Head Self-Attention for 3D CT Diagnosis of May-Thurner Syndrome

    Authors: Yixin Huang, Yiqi Jin, Ke Tao, Kaijian Xia, Jianfeng Gu, Lei Yu, Lan Du, Cunjian Chen

    Abstract: May-Thurner Syndrome (MTS), also known as iliac vein compression syndrome or Cockett's syndrome, is a condition potentially impacting over 20 percent of the population, leading to an increased risk of iliofemoral deep venous thrombosis. In this paper, we present a 3D-based deep learning approach called MTS-Net for diagnosing May-Thurner Syndrome using CT scans. To effectively capture the spatial-t… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  7. arXiv:2406.02088  [pdf, other

    cs.AR

    Fast and Practical Strassen's Matrix Multiplication using FPGAs

    Authors: Afzal Ahmad, Linfeng Du, Wei Zhang

    Abstract: Matrix multiplication is a cornerstone operation in a wide array of scientific fields, including machine learning and computer graphics. The standard algorithm for matrix multiplication has a complexity of $\mathcal{O}(n^3)$ for $n\times n$ matrices. Strassen's algorithm improves this to $\mathcal{O}(n^{2.807})$, but its practicality is limited for small to medium matrix sizes due to the large num… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted at 34th International Conference on Field-Programmable Logic and Applications (FPL 2024), 7 pages

    ACM Class: C.1.3

  8. arXiv:2406.00958  [pdf, other

    cs.LG cs.CV

    Navigating Conflicting Views: Harnessing Trust for Learning

    Authors: Jueqing Lu, Lan Du, Wray Buntine, Myong Chol Jung, Joanna Dipnall, Belinda Gabbe

    Abstract: Resolving conflicts is essential to make the decisions of multi-view classification more reliable. Much research has been conducted on learning consistent informative representations among different views, assuming that all views are identically important and strictly aligned. However, real-world multi-view data may not always conform to these assumptions, as some views may express distinct inform… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  9. arXiv:2405.16486  [pdf, other

    cs.CV cs.AI

    Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

    Authors: Rongyu Zhang, Aosong Cheng, Yulin Luo, Gaole Dai, Huanrui Yang, Jiaming Liu, Ran Xu, Li Du, Yuan Du, Yanbing Jiang, Shanghang Zhang

    Abstract: Continual Test-Time Adaptation (CTTA), which aims to adapt the pre-trained model to ever-evolving target domains, emerges as an important task for vision models. As current vision models appear to be heavily biased towards texture, continuously adapting the model from one domain distribution to another can result in serious catastrophic forgetting. Drawing inspiration from the human visual system'… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  10. arXiv:2405.16447  [pdf, other

    cs.LG

    Fast Asymmetric Factorization for Large Scale Multiple Kernel Clustering

    Authors: Yan Chen, Liang Du, Lei Duan

    Abstract: Kernel methods are extensively employed for nonlinear data clustering, yet their effectiveness heavily relies on selecting suitable kernels and associated parameters, posing challenges in advance determination. In response, Multiple Kernel Clustering (MKC) has emerged as a solution, allowing the fusion of information from multiple base kernels for clustering. However, both early fusion and late fu… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  11. arXiv:2405.16091  [pdf, other

    cs.CV

    Enhancing Near OOD Detection in Prompt Learning: Maximum Gains, Minimal Costs

    Authors: Myong Chol Jung, He Zhao, Joanna Dipnall, Belinda Gabbe, Lan Du

    Abstract: Prompt learning has shown to be an efficient and effective fine-tuning method for vision-language models like CLIP. While numerous studies have focused on the generalisation of these models in few-shot classification, their capability in near out-of-distribution (OOD) detection has been overlooked. A few recent works have highlighted the promising performance of prompt learning in far OOD detectio… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  12. arXiv:2405.10630  [pdf, other

    cs.CL cs.AI

    Medical Dialogue: A Survey of Categories, Methods, Evaluation and Challenges

    Authors: Xiaoming Shi, Zeming Liu, Li Du, Yuxuan Wang, Hongru Wang, Yuhang Guo, Tong Ruan, Jie Xu, Shaoting Zhang

    Abstract: This paper surveys and organizes research works on medical dialog systems, which is an important yet challenging task. Although these systems have been surveyed in the medical community from an application perspective, a systematic review from a rigorous technical perspective has to date remained noticeably absent. As a result, an overview of the categories, methods, and evaluation of medical dial… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  13. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  14. arXiv:2404.08985  [pdf, other

    cs.LG cs.AI

    Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning

    Authors: Yijiang Liu, Rongyu Zhang, Huanrui Yang, Kurt Keutzer, Yuan Du, Li Du, Shanghang Zhang

    Abstract: Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications, ranging from content generation to interactive entertainment, and artistic creation. However, the diversity of downstream tasks in multitask scenarios presents substantial adaptation challenges for LLMs. While traditional methods often succumb to knowledge confusion on thei… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 13 pages, 5 figures

  15. arXiv:2404.08564  [pdf, ps, other

    cs.LG

    Federated Distillation: A Survey

    Authors: Lin Li, Jianping Gou, Baosheng Yu, Lan Du, Zhang Yiand Dacheng Tao

    Abstract: Federated Learning (FL) seeks to train a model collaboratively without sharing private training data from individual clients. Despite its promise, FL encounters challenges such as high communication costs for large-scale models and the necessity for uniform model architectures across all clients and the server. These challenges severely restrict the practical applications of FL. To address these l… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  16. Accel-NASBench: Sustainable Benchmarking for Accelerator-Aware NAS

    Authors: Afzal Ahmad, Linfeng Du, Zhiyao Xie, Wei Zhang

    Abstract: One of the primary challenges impeding the progress of Neural Architecture Search (NAS) is its extensive reliance on exorbitant computational resources. NAS benchmarks aim to simulate runs of NAS experiments at zero cost, remediating the need for extensive compute. However, existing NAS benchmarks use synthetic datasets and model proxies that make simplified assumptions about the characteristics o… ▽ More

    Submitted 18 June, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted at Design Automation Conference DAC'24

  17. arXiv:2404.01677  [pdf, other

    cs.AI cs.CL

    Towards Generalizable and Faithful Logic Reasoning over Natural Language via Resolution Refutation

    Authors: Zhouhao Sun, Xiao Ding, Li Du, Bibo Cai, Jinglong Gao, Ting Liu, Qin Bing

    Abstract: Large language models (LLMs) have achieved significant performance in various natural language reasoning tasks. However, they still struggle with performing first-order logic reasoning over formal logical theories expressed in natural language. This is because the previous LLMs-based reasoning systems have the theoretical incompleteness issue. As a result, it can only address a limited set of simp… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: LREC-Coling 2024

  18. arXiv:2403.18051  [pdf, other

    cs.CL cs.AI

    Supervisory Prompt Training

    Authors: Jean Ghislain Billa, Min Oh, Liang Du

    Abstract: The performance of Large Language Models (LLMs) relies heavily on the quality of prompts, which are often manually engineered and task-specific, making them costly and non-scalable. We propose a novel approach, Supervisory Prompt Training (SPT). SPT automates the generation of highly effective prompts using a dual LLM system. In this system, one LLM, the generator, performs a task while the other,… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  19. arXiv:2402.11537  [pdf, other

    cs.CL cs.AI

    Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning

    Authors: Yang Zhao, Li Du, Xiao Ding, Kai Xiong, Zhouhao Sun, Jun Shi, Ting Liu, Bing Qin

    Abstract: Through pretraining on a corpus with various sources, Large Language Models (LLMs) have gained impressive performance. However, the impact of each component of the pretraining corpus remains opaque. As a result, the organization of the pretraining corpus is still empirical and may deviate from the optimal. To address this issue, we systematically analyze the impact of 48 datasets from 5 major cate… ▽ More

    Submitted 26 March, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  20. EmoWear: Exploring Emotional Teasers for Voice Message Interaction on Smartwatches

    Authors: Pengcheng An, Jiawen Zhu, Zibo Zhang, Yifei Yin, Qingyuan Ma, Che Yan, Linghao Du, Jian Zhao

    Abstract: Voice messages, by nature, prevent users from gauging the emotional tone without fully diving into the audio content. This hinders the shared emotional experience at the pre-retrieval stage. Research scarcely explored "Emotional Teasers"-pre-retrieval cues offering a glimpse into an awaiting message's emotional tone without disclosing its content. We introduce EmoWear, a smartwatch voice messaging… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

    Comments: To appear at ACM CHI '24

  21. arXiv:2402.05359  [pdf, other

    cs.AI cs.CL cs.LG

    An Examination on the Effectiveness of Divide-and-Conquer Prompting in Large Language Models

    Authors: Yizhou Zhang, Lun Du, Defu Cao, Qiang Fu, Yan Liu

    Abstract: Foundation models, such as Large language Models (LLMs), have attracted significant amount of interest due to their large number of applications. However, when handling tasks involving repetitive sub-tasks and/or deceptive contents, such as arithmetic calculation and article-level fake news detection, simple instructional prompts suffer from inaccurate responses. Existing works show that more comp… ▽ More

    Submitted 2 July, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Preprint

  22. arXiv:2402.03741  [pdf, other

    cs.LG cs.AI cs.CR

    SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems

    Authors: Oubo Ma, Yuwen Pu, Linkang Du, Yang Dai, Ruo Wang, Xiaolei Liu, Yingcai Wu, Shouling Ji

    Abstract: Recent advancements in multi-agent reinforcement learning (MARL) have opened up vast application prospects, such as swarm control of drones, collaborative manipulation by robotic arms, and multi-target encirclement. However, potential security threats during the MARL deployment need more attention and thorough investigation. Recent research reveals that attackers can rapidly exploit the victim's v… ▽ More

    Submitted 26 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: To appear in the ACM Conference on Computer and Communications Security (CCS'24), October 14-18, 2024, Salt Lake City, UT, USA

  23. arXiv:2401.17862  [pdf, other

    cs.CV

    Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis

    Authors: Jianing Li, Xi Nan, Ming Lu, Li Du, Shanghang Zhang

    Abstract: Multi-modal large language models (MLLMs) have demonstrated remarkable vision-language capabilities, primarily due to the exceptional in-context understanding and multi-task learning strengths of large language models (LLMs). The advent of visual instruction tuning has further enhanced MLLMs' performance in vision-language understanding. However, while existing MLLMs adeptly recognize \textit{what… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 15 pages,version 1

    ACM Class: I.5.4; I.2.7

  24. arXiv:2401.07853  [pdf, other

    cs.CV

    VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness

    Authors: Rongyu Zhang, Zefan Cai, Huanrui Yang, Zidong Liu, Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Baobao Chang, Yuan Du, Li Du, Shanghang Zhang

    Abstract: Finetuning a pretrained vision model (PVM) is a common technique for learning downstream vision tasks. However, the conventional finetuning process with randomly sampled data points results in diminished training efficiency. To address this drawback, we propose a novel approach, Vision-language Collaborative Active Finetuning (VeCAF). With the emerging availability of labels and natural language a… ▽ More

    Submitted 13 April, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: 13 pages

  25. arXiv:2401.07525  [pdf, other

    cs.CL cs.AI

    TAROT: A Hierarchical Framework with Multitask Co-Pretraining on Semi-Structured Data towards Effective Person-Job Fit

    Authors: Yihan Cao, Xu Chen, Lun Du, Hao Chen, Qiang Fu, Shi Han, Yushu Du, Yanbin Kang, Guangming Lu, Zi Li

    Abstract: Person-job fit is an essential part of online recruitment platforms in serving various downstream applications like Job Search and Candidate Recommendation. Recently, pretrained large language models have further enhanced the effectiveness by leveraging richer textual information in user profiles and job descriptions apart from user behavior features and job metadata. However, the general domain-o… ▽ More

    Submitted 17 January, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: ICASSP 2024 camera ready. 5 pages, 1 figure, 3 tables

  26. arXiv:2401.07395  [pdf, other

    cs.LG cs.AI

    Harnessing the Power of Beta Scoring in Deep Active Learning for Multi-Label Text Classification

    Authors: Wei Tan, Ngoc Dang Nguyen, Lan Du, Wray Buntine

    Abstract: Within the scope of natural language processing, the domain of multi-label text classification is uniquely challenging due to its expansive and uneven label distribution. The complexity deepens due to the demand for an extensive set of annotated data for training an advanced deep learning model, especially in specialized fields where the labeling task can be labor-intensive and often requires doma… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

    Comments: 7 pages AAAI 2024

  27. arXiv:2401.00010  [pdf, other

    cs.SI cs.LG

    Professional Network Matters: Connections Empower Person-Job Fit

    Authors: Hao Chen, Lun Du, Yuxuan Lu, Qiang Fu, Xu Chen, Shi Han, Yanbin Kang, Guangming Lu, Zi Li

    Abstract: Online recruitment platforms typically employ Person-Job Fit models in the core service that automatically match suitable job seekers with appropriate job positions. While existing works leverage historical or contextual information, they often disregard a crucial aspect: job seekers' social relationships in professional networks. This paper emphasizes the importance of incorporating professional… ▽ More

    Submitted 19 December, 2023; originally announced January 2024.

    Comments: Accepted at WSDM 2024

  28. arXiv:2312.17710  [pdf, other

    cs.CL cs.LG

    Principled Gradient-based Markov Chain Monte Carlo for Text Generation

    Authors: Li Du, Afra Amini, Lucas Torroba Hennigen, Xinyan Velocity Yu, Jason Eisner, Holden Lee, Ryan Cotterell

    Abstract: Recent papers have demonstrated the possibility of energy-based text generation by adapting gradient-based sampling algorithms, a paradigm of MCMC algorithms that promises fast convergence. However, as we show in this paper, previous attempts on this approach to text generation all fail to sample correctly from the target language model distributions. To address this limitation, we consider the pr… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: Preprint

  29. arXiv:2312.13671  [pdf, other

    cs.CL cs.LG

    Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

    Authors: Xinyi He, Mengyu Zhou, Xinrun Xu, Xiaojun Ma, Rui Ding, Lun Du, Yan Gao, Ran Jia, Xu Chen, Shi Han, Zejian Yuan, Dongmei Zhang

    Abstract: Tabular data analysis is crucial in various fields, and large language models show promise in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like forecasting and chart generation. To address this gap, we developed the Text2Analysis benchmark, incorporating advanced analysis tasks that go beyond the SQL-compatible ope… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI'2024

  30. Bayesian Estimate of Mean Proper Scores for Diversity-Enhanced Active Learning

    Authors: Wei Tan, Lan Du, Wray Buntine

    Abstract: The effectiveness of active learning largely depends on the sampling efficiency of the acquisition function. Expected Loss Reduction (ELR) focuses on a Bayesian estimate of the reduction in classification error, and more general costs fit in the same framework. We propose Bayesian Estimate of Mean Proper Scores (BEMPS) to estimate the increase in strictly proper scores such as log probability or n… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: 16 pages, TPAMI. arXiv admin note: text overlap with arXiv:2110.14171

    Journal ref: TPAMI, 2023

  31. arXiv:2312.09039  [pdf, other

    cs.CL cs.AI

    TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning

    Authors: Yuan Sui, Jiaru Zou, Mengyu Zhou, Xinyi He, Lun Du, Shi Han, Dongmei Zhang

    Abstract: Table-based reasoning has shown remarkable progress in combining deep models with discrete reasoning, which requires reasoning over both free-form natural language (NL) questions and semi-structured tabular data. However, previous table reasoning solutions only consider small-sized tables and exhibit limitations in handling larger tables. In addition, most existing methods struggle to reason over… ▽ More

    Submitted 17 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

  32. arXiv:2312.08937  [pdf, other

    cs.LG

    BiPFT: Binary Pre-trained Foundation Transformer with Low-rank Estimation of Binarization Residual Polynomials

    Authors: Xingrun Xing, Li Du, Xinyuan Wang, Xianlin Zeng, Yequan Wang, Zheng Zhang, Jiajun Zhang

    Abstract: Pretrained foundation models offer substantial benefits for a wide range of downstream tasks, which can be one of the most potential techniques to access artificial general intelligence. However, scaling up foundation transformers for maximal task-agnostic knowledge has brought about computational challenges, especially on resource-limited devices such as mobiles. This work proposes the first Bina… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(14): 16094-16102

  33. arXiv:2312.00305  [pdf, other

    stat.ME cs.LG math.ST stat.ML

    Multiple Testing of Linear Forms for Noisy Matrix Completion

    Authors: Wanteng Ma, Lilun Du, Dong Xia, Ming Yuan

    Abstract: Many important tasks of large-scale recommender systems can be naturally cast as testing multiple linear forms for noisy matrix completion. These problems, however, present unique challenges because of the subtle bias-and-variance tradeoff of and an intricate dependence among the estimated entries induced by the low-rank structure. In this paper, we develop a general approach to overcome these dif… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  34. arXiv:2311.18780  [pdf, other

    cs.LG

    MultiResFormer: Transformer with Adaptive Multi-Resolution Modeling for General Time Series Forecasting

    Authors: Linfeng Du, Ji Xin, Alex Labach, Saba Zuberi, Maksims Volkovs, Rahul G. Krishnan

    Abstract: Transformer-based models have greatly pushed the boundaries of time series forecasting recently. Existing methods typically encode time series data into $\textit{patches}$ using one or a fixed set of patch lengths. This, however, could result in a lack of ability to capture the variety of intricate temporal dependencies present in real-world multi-periodic time series. In this paper, we propose Mu… ▽ More

    Submitted 8 February, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  35. arXiv:2311.06065  [pdf, other

    cs.SC

    On the Existence of Telescopers for P-recursive Sequences

    Authors: Lixin Du

    Abstract: We extend the criterion on the existence of telescopers for hypergeometric terms to the case of P-recursive sequences. This criterion is based on the concept of integral bases and the generalized Abramov-Petkovsek reduction for P-recursive sequences.

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: 18 pages

    MSC Class: 68W30; 12H10; 33F10; 39A06; 68W40

  36. arXiv:2311.05246  [pdf, ps, other

    cs.SC

    Reduction-based Creative Telescoping for P-recursive Sequences via Integral Bases

    Authors: Shaoshi Chen, Lixin Du, Manuel Kauers, Rong-Hua Wang

    Abstract: We propose a way to split a given bivariate P-recursive sequence into a summable part and a non-summable part in such a way that the non-summable part is minimal in some sense. This decomposition gives rise to a new reduction-based creative telescoping algorithm based on the concept of integral bases.

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: 20 pages

  37. arXiv:2311.04918  [pdf, other

    cs.CL cs.LG

    Low-Resource Named Entity Recognition: Can One-vs-All AUC Maximization Help?

    Authors: Ngoc Dang Nguyen, Wei Tan, Lan Du, Wray Buntine, Richard Beare, Changyou Chen

    Abstract: Named entity recognition (NER), a task that identifies and categorizes named entities such as persons or organizations from text, is traditionally framed as a multi-class classification problem. However, this approach often overlooks the issues of imbalanced label distributions, particularly in low-resource settings, which is common in certain NER contexts, like biomedical NER (bioNER). To address… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 6 pages, 3 figures, ICDM 2023

  38. arXiv:2311.04329  [pdf, other

    cs.CL

    Formal Aspects of Language Modeling

    Authors: Ryan Cotterell, Anej Svete, Clara Meister, Tianyu Liu, Li Du

    Abstract: Large language models have become one of the most commonly deployed NLP inventions. In the past half-decade, their integration into core natural language processing tools has dramatically increased the performance of such tools, and they have entered the public discourse surrounding artificial intelligence. Consequently, it is important for both developers and researchers alike to understand the m… ▽ More

    Submitted 17 April, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

  39. arXiv:2311.00906  [pdf, other

    cs.CL cs.LG

    Re-weighting Tokens: A Simple and Effective Active Learning Strategy for Named Entity Recognition

    Authors: Haocheng Luo, Wei Tan, Ngoc Dang Nguyen, Lan Du

    Abstract: Active learning, a widely adopted technique for enhancing machine learning models in text and image classification tasks with limited annotation resources, has received relatively little attention in the domain of Named Entity Recognition (NER). The challenge of data imbalance in NER has hindered the effectiveness of active learning, as sequence labellers lack sufficient learning signals. To addre… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  40. On the Representational Capacity of Recurrent Neural Language Models

    Authors: Franz Nowak, Anej Svete, Li Du, Ryan Cotterell

    Abstract: This work investigates the computational expressivity of language models (LMs) based on recurrent neural networks (RNNs). Siegelmann and Sontag (1992) famously showed that RNNs with rational weights and hidden states and unbounded computation time are Turing complete. However, LMs define weightings over strings in addition to just (unweighted) language membership and the analysis of the computatio… ▽ More

    Submitted 30 May, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: Added requirement for non-negative probabilities to definitions 2.3 and 3.1, fixed typos

    Journal ref: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7011-7034

  41. arXiv:2310.03094  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning

    Authors: Murong Yue, Jie Zhao, Min Zhang, Liang Du, Ziyu Yao

    Abstract: Large language models (LLMs) such as GPT-4 have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services. In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs, particularly for performing reasoning (e.g., mathematical, causal) tasks. Our cascade pipeline follows the in… ▽ More

    Submitted 8 February, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024

  42. arXiv:2309.14891  [pdf, other

    cs.IR

    RE-SORT: Removing Spurious Correlation in Multilevel Interaction for CTR Prediction

    Authors: Song-Li Wu, Liang Du, Jia-Qi Yang, Yu-Ai Wang, De-Chuan Zhan, Shuang Zhao, Zi-Xun Sun

    Abstract: Click-through rate (CTR) prediction is a critical task in recommendation systems, serving as the ultimate filtering step to sort items for a user. Most recent cutting-edge methods primarily focus on investigating complex implicit and explicit feature interactions; however, these methods neglect the spurious correlation issue caused by confounding factors, thereby diminishing the model's generaliza… ▽ More

    Submitted 10 May, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: 15 pages, 7 figures

  43. arXiv:2309.14623  [pdf, other

    cs.CV

    Text-to-Image Generation for Abstract Concepts

    Authors: Jiayi Liao, Xu Chen, Qiang Fu, Lun Du, Xiangnan He, Xiang Wang, Shi Han, Dongmei Zhang

    Abstract: Recent years have witnessed the substantial progress of large-scale models across various domains, such as natural language processing and computer vision, facilitating the expression of concrete concepts. Unlike concrete concepts that are usually directly associated with physical objects, expressing abstract concepts through natural language requires considerable effort, which results from their… ▽ More

    Submitted 27 September, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

  44. arXiv:2309.05217  [pdf, other

    cs.AI cs.CL

    Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis

    Authors: Li Du, Yequan Wang, Xingrun Xing, Yiqun Ya, Xiang Li, Xin Jiang, Xuezhi Fang

    Abstract: Although demonstrating superb performance on various NLP tasks, large language models (LLMs) still suffer from the hallucination problem, which threatens the reliability of LLMs. To measure the level of hallucination of LLMs, previous works first categorize the hallucination according to the phenomenon similarity, then quantify the proportion that model outputs contain hallucinatory contents. Howe… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  45. arXiv:2309.05021  [pdf, other

    cs.CL

    Chat2Brain: A Method for Mapping Open-Ended Semantic Queries to Brain Activation Maps

    Authors: Yaonai Wei, Tuo Zhang, Han Zhang, Tianyang Zhong, Lin Zhao, Zhengliang Liu, Chong Ma, Songyao Zhang, Muheng Shang, Lei Du, Xiao Li, Tianming Liu, Junwei Han

    Abstract: Over decades, neuroscience has accumulated a wealth of research results in the text modality that can be used to explore cognitive processes. Meta-analysis is a typical method that successfully establishes a link from text queries to brain activation maps using these research results, but it still relies on an ideal query environment. In practical applications, text queries used for meta-analyses… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: 8 pages, 4 figures

  46. arXiv:2309.04506  [pdf, other

    cs.CV

    Unsupervised Gaze-aware Contrastive Learning with Subject-specific Condition

    Authors: Lingyu Du, Xucong Zhang, Guohao Lan

    Abstract: Appearance-based gaze estimation has shown great promise in many applications by using a single general-purpose camera as the input device. However, its success is highly depending on the availability of large-scale well-annotated gaze datasets, which are sparse and expensive to collect. To alleviate this challenge we propose ConGaze, a contrastive learning-based framework that leverages unlabeled… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  47. arXiv:2309.03852  [pdf, other

    cs.CL cs.AI

    FLM-101B: An Open LLM and How to Train It with $100K Budget

    Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Xuying Meng, Siqi Fan, Peng Han, Jing Li, Li Du, Bowen Qin, Zheng Zhang, Aixin Sun, Yequan Wang

    Abstract: Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks, among others. Despite these successes, two main challenges remain in developing LLMs: (i) high computational cost, and (ii) fair and objective evaluations. In this paper, we report a solution to significantly reduce LLM training cost through a growth strategy. We demonstrate that a 101B-parameter LLM with 0.… ▽ More

    Submitted 17 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

  48. ORL-AUDITOR: Dataset Auditing in Offline Deep Reinforcement Learning

    Authors: Linkang Du, Min Chen, Mingyang Sun, Shouling Ji, Peng Cheng, Jiming Chen, Zhikun Zhang

    Abstract: Data is a critical asset in AI, as high-quality datasets can significantly improve the performance of machine learning models. In safety-critical domains such as autonomous vehicles, offline deep reinforcement learning (offline DRL) is frequently used to train models on pre-collected datasets, as opposed to training these models by interacting with the real-world environment as the online DRL. To… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: To appear in the Network and Distributed System Security Symposium (NDSS) 2024, San Diego, CA, USA

  49. arXiv:2309.00384  [pdf, other

    cs.CL

    BatchPrompt: Accomplish more with less

    Authors: Jianzhe Lin, Maurice Diesendruck, Liang Du, Robin Abraham

    Abstract: As the ever-increasing token limits of large language models (LLMs) have enabled long context as input, prompting with single data samples might no longer an efficient way. A straightforward strategy improving efficiency is to batch data within the token limit (e.g., 8k for gpt-3.5-turbo; 32k for GPT-4), which we call BatchPrompt. We have two initial observations for prompting with batched data. F… ▽ More

    Submitted 15 July, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: 20 pages, 5 figures

  50. arXiv:2308.15987  [pdf, other

    cs.CL cs.AI cs.LG

    FPTQ: Fine-grained Post-Training Quantization for Large Language Models

    Authors: Qingyuan Li, Yifan Zhang, Liang Li, Peng Yao, Bo Zhang, Xiangxiang Chu, Yerui Sun, Li Du, Yuchen Xie

    Abstract: In the era of large-scale language models, the substantial parameter size poses significant challenges for deployment. Being a prevalent compression technique, quantization has emerged as the mainstream practice to tackle this issue, which is mainly centered on two recipes W8A8 and W4A16 (i.e. weights and activations in such bit widths). In this study, we propose a novel W4A8 post-training quantiz… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.