Skip to main content

Showing 51–100 of 887 results for author: Tao, D

  1. arXiv:2403.10980  [pdf, other

    cs.GT eess.SY math.OC

    Inverse learning of black-box aggregator for robust Nash equilibrium

    Authors: Guanpu Chen, Gehui Xu, Fengxiang He, Dacheng Tao, Thomas Parisini, Karl Henrik Johansson

    Abstract: In this note, we investigate the robustness of Nash equilibria (NE) in multi-player aggregative games with coupling constraints. There are many algorithms for computing an NE of an aggregative game given a known aggregator. When the coupling parameters are affected by uncertainty, robust NE need to be computed. We consider a scenario where players' weight in the aggregator is unknown, making the a… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  2. arXiv:2403.09963  [pdf, other

    cs.CL cs.AI cs.IR

    Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction

    Authors: Ziyang Xu, Keqin Peng, Liang Ding, Dacheng Tao, Xiliang Lu

    Abstract: Recent research shows that pre-trained language models (PLMs) suffer from "prompt bias" in factual knowledge extraction, i.e., prompts tend to introduce biases toward specific labels. Prompt bias presents a significant challenge in assessing the factual knowledge within PLMs. Therefore, this paper aims to improve the reliability of existing benchmarks by thoroughly investigating and mitigating pro… ▽ More

    Submitted 26 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted by COLING 2024

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2403.04228  [pdf, other

    cs.CV eess.IV

    Single-Image HDR Reconstruction Assisted Ghost Suppression and Detail Preservation Network for Multi-Exposure HDR Imaging

    Authors: Huafeng Li, Zhenmei Yang, Yafei Zhang, Dapeng Tao, Zhengtao Yu

    Abstract: The reconstruction of high dynamic range (HDR) images from multi-exposure low dynamic range (LDR) images in dynamic scenes presents significant challenges, especially in preserving and restoring information in oversaturated regions and avoiding ghosting artifacts. While current methods often struggle to address these challenges, our work aims to bridge this gap by developing a multi-exposure HDR i… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: IEEE Transactions on Computational Imaging

  5. arXiv:2403.01454  [pdf, ps, other

    cs.IT

    Maximum Length RLL Sequences in de Bruijn Graph

    Authors: Yeow Meng Chee, Tuvi Etzion, Tien Long Nguyen, Duy Hoang Ta, Vinh Duc Tran, Van Khu Vu

    Abstract: A timing and synchronization system based on a de Bruijn sequence has been proposed and studied recently for a channel associated with quantum communication that requires reliable synchronization. To avoid a long period of no-pulse in such a system on-off pulses are used to simulate a zero and on-on pulses are used to simulate a one. However, these sequences have high redundancy. To reduce the red… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  6. arXiv:2403.00467  [pdf, other

    cs.CV

    When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability

    Authors: Wenjie Xuan, Yufei Xu, Shanshan Zhao, Chaoyue Wang, Juhua Liu, Bo Du, Dacheng Tao

    Abstract: ControlNet excels at creating content that closely matches precise contours in user-provided masks. However, when these masks contain noise, as a frequent occurrence with non-expert users, the output would include unwanted artifacts. This paper first highlights the crucial role of controlling the impact of these inexplicit masks with diverse deterioration levels through in-depth analysis. Subseque… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  7. arXiv:2402.19159  [pdf, other

    cs.CV

    Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping

    Authors: Jianbin Zheng, Minghui Hu, Zhongyi Fan, Chaoyue Wang, Changxing Ding, Dacheng Tao, Tat-Jen Cham

    Abstract: Latent Consistency Model (LCM) extends the Consistency Model to the latent space and leverages the guided consistency distillation technique to achieve impressive performance in accelerating text-to-image synthesis. However, we observed that LCM struggles to generate images with both clarity and detailed intricacy. Consequently, we introduce Trajectory Consistency Distillation (TCD), which encompa… ▽ More

    Submitted 15 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Project Page: https://mhh0318.github.io/tcd

  8. arXiv:2402.17537  [pdf

    cond-mat.mes-hall

    Forming 1D Periodic J-aggregates by Mechanical Bending of BNNTs: Evidence of Activated Molecular Diffusion

    Authors: J. -B. Marceau, D. -M Ta, A. Aguilar, A. Loiseau, R. Martel, P. Bon, R. Voituriez, G. Recher, E. Gaufrès

    Abstract: Driving molecular assembly into micrometer-scale patterns is key for defining advanced materials of interest in various fields, including life sciences, photovoltaics, and quantum photonics. However, the driving process competes with other forces, such as Brownian motion, ripening phenomena, capillary forces, and non-specific adsorption. Here we report on a guided diffusion mechanism of luminescen… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Supplementary added page 21

  9. arXiv:2402.15253  [pdf, other

    cs.DC

    PICO: Accelerating All k-Core Paradigms on GPU

    Authors: Chen Zhao, Ting Yu, Zhigao Zheng, Song Jin, Jiawei Jiang, Bo Du, Dacheng Tao

    Abstract: Core decomposition is a well-established graph mining problem with various applications that involves partitioning the graph into hierarchical subgraphs. Solutions to this problem have been developed using both bottom-up and top-down approaches from the perspective of vertex convergence dependency. However, existing algorithms have not effectively harnessed GPU performance to expedite core decompo… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  10. arXiv:2402.13589  [pdf, other

    cs.HC

    Affective Computing for Healthcare: Recent Trends, Applications, Challenges, and Beyond

    Authors: Yuanyuan Liu, Ke Wang, Lin Wei, Jingying Chen, Yibing Zhan, Dapeng Tao, Zhe Chen

    Abstract: Affective computing, which aims to recognize, interpret, and understand human emotions, provides benefits in healthcare, such as improving patient care and enhancing doctor-patient communication. However, there is a noticeable absence of a comprehensive summary of recent advancements in affective computing for healthcare, which could pose difficulties for researchers entering this field. To addres… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  11. arXiv:2402.13408  [pdf, other

    cs.CL

    Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation

    Authors: Zhiyao Ren, Yibing Zhan, Baosheng Yu, Liang Ding, Dacheng Tao

    Abstract: The copilot framework, which aims to enhance and tailor large language models (LLMs) for specific complex tasks without requiring fine-tuning, is gaining increasing attention from the community. In this paper, we introduce the construction of a Healthcare Copilot designed for medical consultation. The proposed Healthcare Copilot comprises three main components: 1) the Dialogue component, responsib… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  12. arXiv:2402.13116  [pdf, other

    cs.CL

    A Survey on Knowledge Distillation of Large Language Models

    Authors: Xiaohan Xu, Ming Li, Chongyang Tao, Tao Shen, Reynold Cheng, Jinyang Li, Can Xu, Dacheng Tao, Tianyi Zhou

    Abstract: In the era of Large Language Models (LLMs), Knowledge Distillation (KD) emerges as a pivotal methodology for transferring advanced capabilities from leading proprietary LLMs, such as GPT-4, to their open-source counterparts like LLaMA and Mistral. Additionally, as open-source LLMs flourish, KD plays a crucial role in both compressing these models, and facilitating their self-improvement by employi… ▽ More

    Submitted 8 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: 44 pages

  13. arXiv:2402.11960  [pdf, other

    cs.LG cs.AI cs.CL

    DB-LLM: Accurate Dual-Binarization for Efficient LLMs

    Authors: Hong Chen, Chengtao Lv, Liang Ding, Haotong Qin, Xiabin Zhou, Yifu Ding, Xuebo Liu, Min Zhang, Jinyang Guo, Xianglong Liu, Dacheng Tao

    Abstract: Large language models (LLMs) have significantly advanced the field of natural language processing, while the expensive memory and computation consumption impede their practical deployment. Quantization emerges as one of the most effective methods for improving the computational efficiency of LLMs. However, existing ultra-low-bit quantization always causes severe accuracy drops. In this paper, we e… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  14. arXiv:2402.11890  [pdf, other

    cs.CL

    Revisiting Knowledge Distillation for Autoregressive Language Models

    Authors: Qihuang Zhong, Liang Ding, Li Shen, Juhua Liu, Bo Du, Dacheng Tao

    Abstract: Knowledge distillation (KD) is a common approach to compress a teacher model to reduce its inference cost and memory footprint, by training a smaller student model. However, in the context of autoregressive language models (LMs), we empirically find that larger teacher LMs might dramatically result in a poorer student. In response to this problem, we conduct a series of analyses and reveal that di… ▽ More

    Submitted 16 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL2024 Main Conference

  15. arXiv:2402.11889  [pdf, other

    cs.CL

    ROSE Doesn't Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding

    Authors: Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao

    Abstract: With the development of instruction-tuned large language models (LLMs), improving the safety of LLMs has become more critical. However, the current approaches for aligning the LLMs output with expected safety usually require substantial training efforts, e.g., high-quality safety data and expensive computational resources, which are costly and inefficient. To this end, we present reverse prompt co… ▽ More

    Submitted 16 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL2024 Findings

  16. arXiv:2402.11857  [pdf, other

    cs.LG cs.DC

    Communication-Efficient Distributed Learning with Local Immediate Error Compensation

    Authors: Yifei Cheng, Li Shen, Linli Xu, Xun Qian, Shiwei Wu, Yiming Zhou, Tie Zhang, Dacheng Tao, Enhong Chen

    Abstract: Gradient compression with error compensation has attracted significant attention with the target of reducing the heavy communication overhead in distributed learning. However, existing compression methods either perform only unidirectional compression in one iteration with higher communication cost, or bidirectional compression with slower convergence rate. In this work, we propose the Local Immed… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  17. arXiv:2402.11778  [pdf, other

    cs.LG cs.AI

    Towards Theoretical Understandings of Self-Consuming Generative Models

    Authors: Shi Fu, Sen Zhang, Yingjie Wang, Xinmei Tian, Dacheng Tao

    Abstract: This paper tackles the emerging challenge of training generative models within a self-consuming loop, wherein successive generations of models are recursively trained on mixtures of real and synthetic data from previous generations. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models, including parametric a… ▽ More

    Submitted 24 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted at ICML 2024

  18. arXiv:2402.11565  [pdf, other

    cs.LG cs.AI

    Continual Learning on Graphs: Challenges, Solutions, and Opportunities

    Authors: Xikun Zhang, Dongjin Song, Dacheng Tao

    Abstract: Continual learning on graph data has recently attracted paramount attention for its aim to resolve the catastrophic forgetting problem on existing tasks while adapting the sequentially updated model to newly emerged graph tasks. While there have been efforts to summarize progress on continual learning research over Euclidean data, e.g., images and texts, a systematic review of progress in continua… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  19. arXiv:2402.09345  [pdf, other

    cs.LG cs.AI

    InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling

    Authors: Yuchun Miao, Sen Zhang, Liang Ding, Rong Bao, Lefei Zhang, Dacheng Tao

    Abstract: Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models with human values, reward hacking, also termed reward overoptimization, remains a critical challenge. This issue primarily arises from reward misgeneralization, where reward models (RMs) compute reward using spurious features that are irrelevant to human preferences. In this work, we tackle this pr… ▽ More

    Submitted 23 May, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 35 pages, 28 figures

  20. arXiv:2402.08552  [pdf, other

    cs.LG cs.CV

    Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases

    Authors: Ziyi Zhang, Sen Zhang, Yibing Zhan, Yong Luo, Yonggang Wen, Dacheng Tao

    Abstract: Bridging the gap between diffusion models and human preferences is crucial for their integration into practical generative workflows. While optimizing downstream reward models has emerged as a promising alignment strategy, concerns arise regarding the risk of excessive optimization with learned reward models, which potentially compromises ground-truth performance. In this work, we confront the rew… ▽ More

    Submitted 5 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024

  21. arXiv:2402.05964  [pdf, other

    cs.LG cs.CL cs.CV

    A Survey on Transformer Compression

    Authors: Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, Dacheng Tao

    Abstract: Transformer plays a vital role in the realms of natural language processing (NLP) and computer vision (CV), specially for constructing large language models (LLM) and large vision models (LVM). Model compression methods reduce the memory and computational cost of Transformer, which is a necessary step to implement large language/vision models on practical devices. Given the unique architecture of… ▽ More

    Submitted 7 April, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Model Compression, Transformer, Large Language Model, Large Vision Model, LLM

  22. arXiv:2402.03667  [pdf, other

    cs.CL cs.AI

    Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning

    Authors: Yanfang Zhang, Yiliu Sun, Yibing Zhan, Dapeng Tao, Dacheng Tao, Chen Gong

    Abstract: Recently, increasing attention has been focused drawn on to improve the ability of Large Language Models (LLMs) to perform complex reasoning. However, previous methods, such as Chain-of-Thought and Self-Consistency, mainly follow Direct Reasoning (DR) frameworks, so they will meet difficulty in solving numerous real-world tasks which can hardly be solved via DR. Therefore, to strengthen the reason… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 20 pages,13 figures,4 tables

  23. arXiv:2402.02705  [pdf, other

    cs.LG cs.AI cs.CV

    Representation Surgery for Multi-Task Model Merging

    Authors: Enneng Yang, Li Shen, Zhenyi Wang, Guibing Guo, Xiaojun Chen, Xingwei Wang, Dacheng Tao

    Abstract: Multi-task learning (MTL) compresses the information from multiple tasks into a unified backbone to improve computational efficiency and generalization. Recent work directly merges multiple independently trained models to perform MTL instead of collecting their raw data for joint training, greatly expanding the application scenarios of MTL. However, by visualizing the representation distribution o… ▽ More

    Submitted 28 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: Forty-first International Conference on Machine Learning (ICML 2024)

  24. arXiv:2402.02687  [pdf, other

    cs.LG cs.AI stat.ML

    Poisson Process for Bayesian Optimization

    Authors: Xiaoxing Wang, Jiaxing Li, Chao Xue, Wei Liu, Weifeng Liu, Xiaokang Yang, Junchi Yan, Dacheng Tao

    Abstract: BayesianOptimization(BO) is a sample-efficient black-box optimizer, and extensive methods have been proposed to build the absolute function response of the black-box function through a probabilistic surrogate model, including Tree-structured Parzen Estimator (TPE), random forest (SMAC), and Gaussian process (GP). However, few methods have been explored to estimate the relative rankings of candidat… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  25. arXiv:2402.02399  [pdf, other

    cs.LG cs.AI stat.AP stat.ML

    FreDF: Learning to Forecast in Frequency Domain

    Authors: Hao Wang, Licheng Pan, Zhichao Chen, Degui Yang, Sen Zhang, Yifei Yang, Xinggao Liu, Haoxuan Li, Dacheng Tao

    Abstract: Time series modeling is uniquely challenged by the presence of autocorrelation in both historical and label sequences. Current research predominantly focuses on handling autocorrelation within the historical sequence but often neglects its presence in the label sequence. Specifically, emerging forecast models mainly conform to the direct forecast (DF) paradigm, generating multi-step forecasts unde… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  26. arXiv:2402.00433  [pdf, other

    cs.LG cs.CV

    Merging Multi-Task Models via Weight-Ensembling Mixture of Experts

    Authors: Anke Tang, Li Shen, Yong Luo, Nan Yin, Lefei Zhang, Dacheng Tao

    Abstract: Merging various task-specific Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently. Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable. Existing methods have primarily focused on seeking a static optimal solution within the original model parameter space. A notable challenge is mitig… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  27. arXiv:2401.17904  [pdf, other

    cs.CV

    Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

    Authors: Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, Baocai Yin, Cong Liu, Bo Du, Dacheng Tao

    Abstract: The Segment Anything Model (SAM), a profound vision foundation model pre-trained on a large-scale dataset, breaks the boundaries of general segmentation and sparks various downstream applications. This paper introduces Hi-SAM, a unified model leveraging SAM for hierarchical text segmentation. Hi-SAM excels in text segmentation across four hierarchies, including stroke, word, text-line, and paragra… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: GitHub repository: https://github.com/ymy-k/Hi-SAM

  28. arXiv:2401.17766  [pdf, other

    cs.CV

    Fine-Grained Zero-Shot Learning: Advances, Challenges, and Prospects

    Authors: Jingcai Guo, Zhijie Rao, Zhi Chen, Jingren Zhou, Dacheng Tao

    Abstract: Recent zero-shot learning (ZSL) approaches have integrated fine-grained analysis, i.e., fine-grained ZSL, to mitigate the commonly known seen/unseen domain bias and misaligned visual-semantics mapping problems, and have made profound progress. Notably, this paradigm differs from existing close-set fine-grained methods and, therefore, can pose unique and nontrivial challenges. However, to the best… ▽ More

    Submitted 4 February, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: 9 pages, 1 figure, 4 tables

  29. arXiv:2401.15287  [pdf, other

    cs.CV cs.DM math.NA

    Applications of Tao General Difference in Discrete Domain

    Authors: Linmi Tao, Ruiyang Liu, Donglai Tao, Wu Xia, Feilong Ma, Yu Cheng, Jingmao Cui

    Abstract: Numerical difference computation is one of the cores and indispensable in the modern digital era. Tao general difference (TGD) is a novel theory and approach to difference computation for discrete sequences and arrays in multidimensional space. Built on the solid theoretical foundation of the general difference in a finite interval, the TGD operators demonstrate exceptional signal processing capab… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: This paper is the application part of the paper "Tao General Differential and Difference: Theory and Application". The theory part of the paper is renamed as "A Theory of General Difference in Continuous and Discrete Domain", which is Arxived in arXiv:2305.08098v2

  30. arXiv:2401.13444  [pdf, other

    cs.CL cs.AI

    Clue-Guided Path Exploration: An Efficient Knowledge Base Question-Answering Framework with Low Computational Resource Consumption

    Authors: Dehao Tao, Feng Huang, Yongfeng Huang, Minghu Jiang

    Abstract: In recent times, large language models (LLMs) have showcased remarkable capabilities. However, updating their knowledge poses challenges, potentially leading to inaccuracies when confronted with unfamiliar queries. While integrating knowledge graphs with LLMs has been explored, existing approaches treat LLMs as primary decision-makers, imposing high demands on their capabilities. This is particula… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  31. Topology-aware Embedding Memory for Continual Learning on Expanding Networks

    Authors: Xikun Zhang, Dongjin Song, Yixin Chen, Dacheng Tao

    Abstract: Memory replay based techniques have shown great success for continual learning with incrementally accumulated Euclidean data. Directly applying them to continually expanding networks, however, leads to the potential memory explosion problem due to the need to buffer representative nodes and their associated topological neighborhood structures. To this end, we systematically analyze the key challen… ▽ More

    Submitted 30 June, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: This paper has been accepted by KDD 2024

  32. arXiv:2401.12479  [pdf, other

    cs.CV

    TD^2-Net: Toward Denoising and Debiasing for Dynamic Scene Graph Generation

    Authors: Xin Lin, Chong Shi, Yibing Zhan, Zuopeng Yang, Yaqi Wu, Dacheng Tao

    Abstract: Dynamic scene graph generation (SGG) focuses on detecting objects in a video and determining their pairwise relationships. Existing dynamic SGG methods usually suffer from several issues, including 1) Contextual noise, as some frames might contain occluded and blurred objects. 2) Label bias, primarily due to the high imbalance between a few positive relationship samples and numerous negative ones.… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI 2024

  33. arXiv:2401.12087  [pdf, other

    cs.CL

    Revisiting Demonstration Selection Strategies in In-Context Learning

    Authors: Keqin Peng, Liang Ding, Yancheng Yuan, Xuebo Liu, Min Zhang, Yuanxin Ouyang, Dacheng Tao

    Abstract: Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL), where a few examples are used to describe a task to the model. However, the performance of ICL varies significantly with the choice of demonstrations, and it is still unclear why this happens or what factors will influence its choice. In this work, we first revisit the fa… ▽ More

    Submitted 23 June, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: ACL 2024

  34. arXiv:2401.08478  [pdf, other

    cs.LG cs.AI

    Solving Continual Offline Reinforcement Learning with Decision Transformer

    Authors: Kaixin Huang, Li Shen, Chen Zhao, Chun Yuan, Dacheng Tao

    Abstract: Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning, enabling agents to learn multiple tasks from static datasets without forgetting prior tasks. However, CORL faces challenges in balancing stability and plasticity. Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and… ▽ More

    Submitted 7 April, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: 11 pages, 6 figures

  35. arXiv:2401.07080  [pdf, other

    cs.CV

    GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching

    Authors: Haibin He, Maoyuan Ye, Jing Zhang, Juhua Liu, Dacheng Tao

    Abstract: Beyond the text detection and recognition tasks in image text spotting, video text spotting presents an augmented challenge with the inclusion of tracking. While advanced end-to-end trainable methods have shown commendable performance, the pursuit of multi-task optimization may pose the risk of producing sub-optimal outcomes for individual tasks. In this paper, we highlight a main bottleneck in th… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  36. arXiv:2401.06659  [pdf, other

    cs.CL

    WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge

    Authors: Wenbin Wang, Liang Ding, Li Shen, Yong Luo, Han Hu, Dacheng Tao

    Abstract: Sentiment analysis is rapidly advancing by utilizing various data modalities (e.g., text, image). However, most previous works relied on superficial information, neglecting the incorporation of contextual world knowledge (e.g., background information derived from but beyond the given image and text pairs) and thereby restricting their ability to achieve better multimodal sentiment analysis (MSA).… ▽ More

    Submitted 20 February, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  37. arXiv:2401.06628  [pdf, other

    cs.CL

    OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models

    Authors: Shuai Wang, Liang Ding, Li Shen, Yong Luo, Bo Du, Dacheng Tao

    Abstract: Advancing automated programming necessitates robust and comprehensive code generation benchmarks, yet current evaluation frameworks largely neglect object-oriented programming (OOP) in favor of functional programming (FP), e.g., HumanEval and MBPP. To address this, our study introduces a pioneering OOP-focused benchmark, featuring 431 Python programs that encompass essential OOP concepts and featu… ▽ More

    Submitted 21 February, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: 20 pages, 15 figures

  38. arXiv:2401.06561  [pdf, other

    cs.CL

    Intention Analysis Makes LLMs A Good Jailbreak Defender

    Authors: Yuqi Zhang, Liang Ding, Lefei Zhang, Dacheng Tao

    Abstract: Aligning large language models (LLMs) with human values, particularly in the face of complex and stealthy jailbreak attacks, presents a formidable challenge. In this study, we present a simple yet highly effective defense strategy, i.e., Intention Analysis ($\mathbb{IA}$). The principle behind this is to trigger LLMs' inherent self-correct and improve ability through a two-stage process: 1) essent… ▽ More

    Submitted 29 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: 20 pages, 16 figures

  39. arXiv:2401.05806  [pdf, other

    cs.CV

    CLIP-Driven Semantic Discovery Network for Visible-Infrared Person Re-Identification

    Authors: Xiaoyan Yu, Neng Dong, Liehuang Zhu, Hao Peng, Dapeng Tao

    Abstract: Visible-infrared person re-identification (VIReID) primarily deals with matching identities across person images from different modalities. Due to the modality gap between visible and infrared images, cross-modality identity matching poses significant challenges. Recognizing that high-level semantics of pedestrian appearance, such as gender, shape, and clothing style, remain consistent across moda… ▽ More

    Submitted 12 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  40. arXiv:2401.03203  [pdf, other

    cs.CV

    Hi-Map: Hierarchical Factorized Radiance Field for High-Fidelity Monocular Dense Mapping

    Authors: Tongyan Hua, Haotian Bai, Zidong Cao, Ming Liu, Dacheng Tao, Lin Wang

    Abstract: In this paper, we introduce Hi-Map, a novel monocular dense mapping approach based on Neural Radiance Field (NeRF). Hi-Map is exceptional in its capacity to achieve efficient and high-fidelity mapping using only posed RGB inputs. Our method eliminates the need for external depth priors derived from e.g., a depth estimation model. Our key idea is to represent the scene as a hierarchical feature gri… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  41. arXiv:2312.17276  [pdf, other

    cs.CL cs.LG

    PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation

    Authors: Yunhe Wang, Hanting Chen, Yehui Tang, Tianyu Guo, Kai Han, Ying Nie, Xutao Wang, Hailin Hu, Zheyuan Bai, Yun Wang, Fangcheng Liu, Zhicheng Liu, Jianyuan Guo, Sinan Zeng, Yinchen Zhang, Qinghua Xu, Qun Liu, Jun Yao, Chao Xu, Dacheng Tao

    Abstract: The recent trend of large language models (LLMs) is to increase the scale of both model size (\aka the number of parameters) and dataset to achieve better generative ability, which is definitely proved by a lot of work such as the famous GPT and Llama. However, large models often involve massive computational costs, and practical applications cannot afford such high prices. However, the method of… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  42. arXiv:2312.15939  [pdf

    cond-mat.mes-hall cond-mat.mtrl-sci

    Discovery of acousto-drag photovoltaic effect

    Authors: Jiaming Gu, Yicheng Mou, Jianwen Ma, Haonan Chen, Chuanxin Zhang, Yuxiang Wang, Jiayu Wang, Hangwen Guo, Wu Shi, Xiang Yuan, Xue Jiang, Dean Ta, Jian Shen, Cheng Zhang

    Abstract: As a key ingredient in energy harvesting and photodetection, light-to-electricity conversion requires efficient separation of photoexcited electron-hole pairs before recombination. Traditional junction-based mechanisms mainly use build-in electric fields to achieve pair separation and generate photovoltaic effect, which fail to collect photoexcited pairs away from local barrier region. The ability… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  43. arXiv:2312.15172  [pdf, other

    cs.CV

    Pre-trained Trojan Attacks for Visual Recognition

    Authors: Aishan Liu, Xinwei Zhang, Yisong Xiao, Yuguang Zhou, Siyuan Liang, Jiakai Wang, Xianglong Liu, Xiaochun Cao, Dacheng Tao

    Abstract: Pre-trained vision models (PVMs) have become a dominant component due to their exceptional performance when fine-tuned for downstream tasks. However, the presence of backdoors within PVMs poses significant threats. Unfortunately, existing studies primarily focus on backdooring PVMs for the classification task, neglecting potential inherited backdoors in downstream tasks such as detection and segme… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: 19 pages

  44. arXiv:2312.11112  [pdf, other

    cs.CV

    ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding

    Authors: Lunhao Duan, Shanshan Zhao, Nan Xue, Mingming Gong, Gui-Song Xia, Dacheng Tao

    Abstract: Transformers have been recently explored for 3D point cloud understanding with impressive progress achieved. A large number of points, over 0.1 million, make the global self-attention infeasible for point cloud data. Thus, most methods propose to apply the transformer in a local region, e.g., spherical or cubic window. However, it still contains a large number of Query-Key pairs, which requires hi… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023. Code: https://github.com/LHDuan/ConDaFormer

  45. arXiv:2312.10301  [pdf, other

    cs.DB

    FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data

    Authors: Xinyu Chen, Jiannan Tian, Ian Beaver, Cynthia Freeman, Yan Yan, Jianguo Wang, Dingwen Tao

    Abstract: While both the database and high-performance computing (HPC) communities utilize lossless compression methods to minimize floating-point data size, a disconnect persists between them. Each community designs and assesses methods in a domain-specific manner, making it unclear if HPC compression techniques can benefit database applications or vice versa. With the HPC community increasingly leaning to… ▽ More

    Submitted 20 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: 12 pages, 11 figures, 11 tables, accepted by VLDB '24

  46. arXiv:2312.07495  [pdf, other

    cs.CV

    Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly Detection

    Authors: Jiangning Zhang, Xuhai Chen, Yabiao Wang, Chengjie Wang, Yong Liu, Xiangtai Li, Ming-Hsuan Yang, Dacheng Tao

    Abstract: This work studies the recently proposed challenging and practical Multi-class Unsupervised Anomaly Detection (MUAD) task, which only requires normal images for training while simultaneously testing both normal/anomaly images for multiple classes. Existing reconstruction-based methods typically adopt pyramid networks as encoders/decoders to obtain multi-resolution features, accompanied by elaborate… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  47. arXiv:2312.06173  [pdf, other

    cs.LG

    Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

    Authors: Anke Tang, Li Shen, Yong Luo, Liang Ding, Han Hu, Bo Du, Dacheng Tao

    Abstract: Merging models fine-tuned from a common, extensively pre-trained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multi-task model that performs well across diverse tasks. Recent research, exemplified by task arithmetic, highlights that this multi-task model can be derived through arithmetic operations on task vectors. Neverthele… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  48. arXiv:2312.05492  [pdf, other

    cs.DC

    cuSZ-$i$: High-Ratio Scientific Lossy Compression on GPUs with Optimized Multi-Level Interpolation

    Authors: Jinyang Liu, Jiannan Tian, Shixun Wu, Sheng Di, Boyuan Zhang, Robert Underwood, Yafan Huang, Jiajun Huang, Kai Zhao, Guanpeng Li, Dingwen Tao, Zizhong Chen, Franck Cappello

    Abstract: Error-bounded lossy compression is a critical technique for significantly reducing scientific data volumes. Compared to CPU-based compressors, GPU-based compressors exhibit substantially higher throughputs, fitting better for today's HPC applications. However, the critical limitations of existing GPU-based compressors are their low compression ratios and qualities, severely restricting their appli… ▽ More

    Submitted 11 July, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: accepted by SC '24

  49. arXiv:2312.05479  [pdf, other

    cs.LG cs.AI

    Exploring Sparsity in Graph Transformers

    Authors: Chuang Liu, Yibing Zhan, Xueqi Ma, Liang Ding, Dapeng Tao, Jia Wu, Wenbin Hu, Bo Du

    Abstract: Graph Transformers (GTs) have achieved impressive results on various graph-related tasks. However, the huge computational cost of GTs hinders their deployment and application, especially in resource-constrained environments. Therefore, in this paper, we explore the feasibility of sparsifying GTs, a significant yet under-explored topic. We first discuss the redundancy of GTs based on the characteri… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: 9 pages, 8 figures

  50. arXiv:2312.01713  [pdf, other

    cs.CV

    Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection

    Authors: Xubin Zhong, Changxing Ding, Yupeng Hu, Dacheng Tao

    Abstract: Human-Object Interaction (HOI) detection is a core task for human-centric image understanding. Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction; however, the interaction representations obtained using this method are entangled and lack interpretability. In contrast, traditional two-stage methods benefit significantly from th… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.