Skip to main content

Showing 151–200 of 887 results for author: Tao, D

  1. arXiv:2308.05721  [pdf, other

    cs.CV

    Deformable Mixer Transformer with Gating for Multi-Task Learning of Dense Prediction

    Authors: Yangyang Xu, Yibo Yang, Bernard Ghanem, Lefei Zhang, Du Bo, Dacheng Tao

    Abstract: CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL). Most of the current studies on MTL solely rely on CNN or Transformer. In this work, we present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction. This combination may o… ▽ More

    Submitted 21 September, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

    Comments: submitted to IJCV; an extension to our previous AAAI 2023 paper arXiv:2301.03461

  2. arXiv:2308.03822  [pdf, other

    astro-ph.HE

    Search for Eccentric Black Hole Coalescences during the Third Observing Run of LIGO and Virgo

    Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi , et al. (1750 additional authors not shown)

    Abstract: Despite the growing number of confident binary black hole coalescences observed through gravitational waves so far, the astrophysical origin of these binaries remains uncertain. Orbital eccentricity is one of the clearest tracers of binary formation channels. Identifying binary eccentricity, however, remains challenging due to the limited availability of gravitational waveforms that include effect… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 24 pages, 5 figures

    Report number: LIGO-P2300080

  3. arXiv:2308.02883  [pdf, other

    cs.CV

    Cross-modal & Cross-domain Learning for Unsupervised LiDAR Semantic Segmentation

    Authors: Yiyang Chen, Shanshan Zhao, Changxing Ding, Liyao Tang, Chaoyue Wang, Dacheng Tao

    Abstract: In recent years, cross-modal domain adaptation has been studied on the paired 2D image and 3D LiDAR data to ease the labeling costs for 3D LiDAR semantic segmentation (3DLSS) in the target domain. However, in such a setting the paired 2D and 3D data in the source domain are still collected with additional effort. Since the 2D-3D projections can enable the 3D model to learn semantic information fro… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM Multimedia 2023

  4. arXiv:2308.01746  [pdf, other

    cs.LG cs.CV

    Neural Collapse Terminus: A Unified Solution for Class Incremental Learning and Its Variants

    Authors: Yibo Yang, Haobo Yuan, Xiangtai Li, Jianlong Wu, Lefei Zhang, Zhouchen Lin, Philip Torr, Dacheng Tao, Bernard Ghanem

    Abstract: How to enable learnability for new classes while keeping the capability well on old classes has been a crucial challenge for class incremental learning. Beyond the normal case, long-tail class incremental learning and few-shot class incremental learning are also proposed to consider the data imbalance and data scarcity, respectively, which are common in real-world implementations and further exace… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: An extension of our ICLR 2023 paper https://openreview.net/pdf?id=y5W8tpojhtJ. arXiv admin note: text overlap with arXiv:2302.03004

  5. arXiv:2308.00522  [pdf, other

    cs.LG cs.DC math.OC

    Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup

    Authors: Yan Sun, Li Shen, Hao Sun, Liang Ding, Dacheng Tao

    Abstract: Adaptive optimization has achieved notable success for distributed learning while extending adaptive optimizer to federated Learning (FL) suffers from severe inefficiency, including (i) rugged convergence due to inaccurate gradient estimation in global adaptive optimizer; (ii) client drifts exacerbated by local over-fitting with the local adaptive optimizer. In this work, we propose a novel moment… ▽ More

    Submitted 30 July, 2023; originally announced August 2023.

    Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence

  6. PNT-Edge: Towards Robust Edge Detection with Noisy Labels by Learning Pixel-level Noise Transitions

    Authors: Wenjie Xuan, Shanshan Zhao, Yu Yao, Juhua Liu, Tongliang Liu, Yixin Chen, Bo Du, Dacheng Tao

    Abstract: Relying on large-scale training data with pixel-level labels, previous edge detection methods have achieved high performance. However, it is hard to manually label edges accurately, especially for large datasets, and thus the datasets inevitably contain noisy labels. This label-noise issue has been studied extensively for classification, while still remaining under-explored for edge detection. To… ▽ More

    Submitted 15 October, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: Accepted by ACM-MM 2023

  7. arXiv:2307.13962  [pdf, other

    cs.LG cs.AI

    Understanding Deep Neural Networks via Linear Separability of Hidden Layers

    Authors: Chao Zhang, Xinyu Chen, Wensheng Li, Lixue Liu, Wei Wu, Dacheng Tao

    Abstract: In this paper, we measure the linear separability of hidden layer outputs to study the characteristics of deep neural networks. In particular, we first propose Minkowski difference based linear separability measures (MD-LSMs) to evaluate the linear separability degree of two points sets. Then, we demonstrate that there is a synchronicity between the linear separability degree of hidden layer outpu… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

  8. arXiv:2307.12180  [pdf, other

    eess.IV cs.CV cs.LG

    Prototype-Driven and Multi-Expert Integrated Multi-Modal MR Brain Tumor Image Segmentation

    Authors: Yafei Zhang, Zhiyuan Li, Huafeng Li, Dapeng Tao

    Abstract: For multi-modal magnetic resonance (MR) brain tumor image segmentation, current methods usually directly extract the discriminative features from input images for tumor sub-region category determination and localization. However, the impact of information aliasing caused by the mutual inclusion of tumor sub-regions is often ignored. Moreover, existing methods usually do not take tailored efforts t… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

  9. arXiv:2307.12049  [pdf, other

    cs.CV

    Patch-Wise Point Cloud Generation: A Divide-and-Conquer Approach

    Authors: Cheng Wen, Baosheng Yu, Rao Fu, Dacheng Tao

    Abstract: A generative model for high-fidelity point clouds is of great importance in synthesizing 3d environments for applications such as autonomous driving and robotics. Despite the recent success of deep generative models for 2d images, it is non-trivial to generate 3d point clouds without a comprehensive understanding of both local and global geometric structures. In this paper, we devise a new 3d poin… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

  10. arXiv:2307.10616  [pdf, other

    cs.LG cs.AI cs.CV

    Heterogeneous Federated Learning: State-of-the-art and Research Challenges

    Authors: Mang Ye, Xiuwen Fang, Bo Du, Pong C. Yuen, Dacheng Tao

    Abstract: Federated learning (FL) has drawn increasing attention owing to its potential use in large-scale industrial applications. Existing federated learning works mainly focus on model homogeneous settings. However, practical federated learning typically faces the heterogeneity of data distributions, model architectures, network environments, and hardware devices among participant clients. Heterogeneous… ▽ More

    Submitted 8 September, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: 42 pages, 11 figures, and 4 tables

  11. arXiv:2307.09609  [pdf, other

    cs.DC

    AMRIC: A Novel In Situ Lossy Compression Framework for Efficient I/O in Adaptive Mesh Refinement Applications

    Authors: Daoce Wang, Jesus Pulido, Pascal Grosset, Jiannan Tian, Sian Jin, Houjun Tang, Jean Sexton, Sheng Di, Zarija Lukić, Kai Zhao, Bo Fang, Franck Cappello, James Ahrens, Dingwen Tao

    Abstract: As supercomputers advance towards exascale capabilities, computational intensity increases significantly, and the volume of data requiring storage and transmission experiences exponential growth. Adaptive Mesh Refinement (AMR) has emerged as an effective solution to address these two challenges. Concurrently, error-bounded lossy compression is recognized as one of the most efficient approaches to… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: 12 pages, 18 figures, 3 tables, accepted by ACM/IEEE SC '23

  12. arXiv:2307.08526  [pdf, other

    cs.CV cs.AI cs.LG

    Image Captions are Natural Prompts for Text-to-Image Models

    Authors: Shiye Lei, Hao Chen, Sen Zhang, Bo Zhao, Dacheng Tao

    Abstract: With the rapid development of Artificial Intelligence Generated Content (AIGC), it has become common practice in many learning tasks to train or fine-tune large models on synthetic data due to the data-scarcity and privacy leakage problems. Albeit promising with unlimited data generation, owing to massive and diverse information conveyed in real images, it is challenging for text-to-image generati… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 20 pages, 1 figure, 10 tables

  13. arXiv:2307.06527  [pdf, other

    cs.CV

    Free-Form Composition Networks for Egocentric Action Recognition

    Authors: Haoran Wang, Qinghua Cheng, Baosheng Yu, Yibing Zhan, Dapeng Tao, Liang Ding, Haibin Ling

    Abstract: Egocentric action recognition is gaining significant attention in the field of human action recognition. In this paper, we address data scarcity issue in egocentric action recognition from a compositional generalization perspective. To tackle this problem, we propose a free-form composition network (FFCN) that can simultaneously learn disentangled verb, preposition, and noun representations, and t… ▽ More

    Submitted 14 October, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

  14. arXiv:2307.05510  [pdf, ps, other

    physics.soc-ph quant-ph

    Carbon Emissions of Quantum Circuit Simulation: More than You Would Think

    Authors: Jinyang Li, Qiang Guan, Dingwen Tao, Weiwen Jiang

    Abstract: The rapid advancement of quantum hardware brings a host of research opportunities and the potential for quantum advantages across numerous fields. In this landscape, quantum circuit simulations serve as an indispensable tool by emulating quantum behavior on classical computers. They offer easy access, noise-free environments, and real-time observation of quantum states. However, the sustainability… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

  15. arXiv:2307.03903  [pdf, other

    cs.CV

    Adversarial Self-Attack Defense and Spatial-Temporal Relation Mining for Visible-Infrared Video Person Re-Identification

    Authors: Huafeng Li, Le Xu, Yafei Zhang, Dapeng Tao, Zhengtao Yu

    Abstract: In visible-infrared video person re-identification (re-ID), extracting features not affected by complex scenes (such as modality, camera views, pedestrian pose, background, etc.) changes, and mining and utilizing motion information are the keys to solving cross-modal pedestrian identity matching. To this end, the paper proposes a new visible-infrared video person re-ID method from a novel perspect… ▽ More

    Submitted 11 August, 2023; v1 submitted 8 July, 2023; originally announced July 2023.

    Comments: 11 pages,8 figures

  16. arXiv:2306.17504  [pdf, other

    cs.AI

    Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer

    Authors: Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Tianshuo Xu, Xiaoshuai Sun, Tongliang Liu, Rongrong Ji, Dacheng Tao

    Abstract: Deep neural networks often suffer from poor generalization due to complex and non-convex loss landscapes. Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of training loss when adding a perturbation to the weight. However, indiscriminate perturbation of SAM on all parameters is suboptimal and results in excessive computatio… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2210.05177

  17. arXiv:2306.16736  [pdf, other

    cs.CV cs.AI

    GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction

    Authors: Sihan Ma, Qiong Cao, Hongwei Yi, Jing Zhang, Dacheng Tao

    Abstract: Demystifying complex human-ground interactions is essential for accurate and realistic 3D human motion reconstruction from RGB videos, as it ensures consistency between the humans and the ground plane. Prior methods have modeled human-ground interactions either implicitly or in a sparse manner, often resulting in unrealistic and incorrect motions when faced with noise and uncertainty. In contrast,… ▽ More

    Submitted 16 August, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: Accepted to ACM Multimedia 2023. The code will be available at https://github.com/xymsh/GraMMaR

  18. arXiv:2306.15880  [pdf, other

    cs.CV cs.AI

    Towards Open Vocabulary Learning: A Survey

    Authors: Jianzong Wu, Xiangtai Li, Shilin Xu, Haobo Yuan, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, Dacheng Tao

    Abstract: In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progr… ▽ More

    Submitted 1 February, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: Accepted by IEEE T-PAMI. Project page: https://github.com/jianzongwu/Awesome-Open-Vocabulary

  19. arXiv:2306.10858  [pdf, other

    cs.CV

    FHA-Kitchens: A Novel Dataset for Fine-Grained Hand Action Recognition in Kitchen Scenes

    Authors: Ting Zhe, Yongqian Li, Jing Zhang, Yong Luo, Han Hu, Bo Du, Yonggang Wen, Dacheng Tao

    Abstract: A typical task in the field of video understanding is hand action recognition, which has a wide range of applications. Existing works either mainly focus on full-body actions, or the defined action categories are relatively coarse-grained. In this paper, we propose FHA-Kitchens, a novel dataset of fine-grained hand actions in kitchen scenes. In particular, we focus on human hand interaction region… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

  20. arXiv:2306.09595  [pdf, other

    cs.LG cs.AI

    Structured Cooperative Learning with Graphical Model Priors

    Authors: Shuangtong Li, Tianyi Zhou, Xinmei Tian, Dacheng Tao

    Abstract: We study how to train personalized models for different tasks on decentralized devices with limited local data. We propose "Structured Cooperative Learning (SCooL)", in which a cooperation graph across devices is generated by a graphical model prior to automatically coordinate mutual learning between devices. By choosing graphical models enforcing different structures, we can derive a rich class o… ▽ More

    Submitted 21 June, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Accepted by icml 2023

  21. arXiv:2306.05706  [pdf, other

    cs.LG cs.DC math.OC

    Understanding How Consistency Works in Federated Learning via Stage-wise Relaxed Initialization

    Authors: Yan Sun, Li Shen, Dacheng Tao

    Abstract: Federated learning (FL) is a distributed paradigm that coordinates massive local clients to collaboratively train a global model via stage-wise local training processes on the heterogeneous dataset. Previous works have implicitly studied that FL suffers from the ``client-drift'' problem, which is caused by the inconsistent optimum across local clients. However, till now it still lacks solid theore… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: 32 pages

  22. arXiv:2306.04875  [pdf, other

    cs.LG

    Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning

    Authors: Jifeng Hu, Yanchao Sun, Sili Huang, SiYuan Guo, Hechang Chen, Li Shen, Lichao Sun, Yi Chang, Dacheng Tao

    Abstract: Recent works have shown the potential of diffusion models in computer vision and natural language processing. Apart from the classical supervised learning fields, diffusion models have also shown strong competitiveness in reinforcement learning (RL) by formulating decision-making as sequential generation. However, incorporating temporal information of sequential data and utilizing it to guide diff… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  23. arXiv:2306.03679  [pdf, other

    cs.CV cs.AI cs.CR cs.LG stat.ML

    Human-imperceptible, Machine-recognizable Images

    Authors: Fusheng Hao, Fengxiang He, Yikai Wang, Fuxiang Wu, Jing Zhang, Jun Cheng, Dacheng Tao

    Abstract: Massive human-related data is collected to train neural networks for computer vision tasks. A major conflict is exposed relating to software engineers between better developing AI systems and distancing from the sensitive training data. To reconcile this conflict, this paper proposes an efficient privacy-preserving learning paradigm, where images are first encrypted to become ``human-imperceptible… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

  24. arXiv:2306.03481  [pdf, other

    quant-ph cs.AI cs.IT cs.LG

    Transition Role of Entangled Data in Quantum Machine Learning

    Authors: Xinbiao Wang, Yuxuan Du, Zhuozhuo Tu, Yong Luo, Xiao Yuan, Dacheng Tao

    Abstract: Entanglement serves as the resource to empower quantum computing. Recent progress has highlighted its positive impact on learning quantum dynamics, wherein the integration of entanglement into quantum operations or measurements of quantum machine learning (QML) models leads to substantial reductions in training data size, surpassing a specified prediction error threshold. However, an analytical un… ▽ More

    Submitted 12 May, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: Accept to Nature Communications 15, 3716 (2024)

  25. arXiv:2306.03266  [pdf, other

    cs.LG stat.ML

    Extending the Design Space of Graph Neural Networks by Rethinking Folklore Weisfeiler-Lehman

    Authors: Jiarui Feng, Lecheng Kong, Hao Liu, Dacheng Tao, Fuhai Li, Muhan Zhang, Yixin Chen

    Abstract: Message passing neural networks (MPNNs) have emerged as the most popular framework of graph neural networks (GNNs) in recent years. However, their expressive power is limited by the 1-dimensional Weisfeiler-Lehman (1-WL) test. Some works are inspired by $k$-WL/FWL (Folklore WL) and design the corresponding neural versions. Despite the high expressive power, there are serious limitations in this li… ▽ More

    Submitted 14 January, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2023

  26. arXiv:2306.03166  [pdf, other

    cs.IR cs.CL

    Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training

    Authors: Yibin Lei, Liang Ding, Yu Cao, Changtong Zan, Andrew Yates, Dacheng Tao

    Abstract: Dense retrievers have achieved impressive performance, but their demand for abundant training data limits their application scenarios. Contrastive pre-training, which constructs pseudo-positive examples from unlabeled data, has shown great potential to solve this problem. However, the pseudo-positive examples crafted by data augmentations can be irrelevant. To this end, we propose relevance-aware… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: ACL 2023 Findings (Short), 5 pages main + 1 page references + 1 page appendix

  27. arXiv:2306.02913  [pdf, other

    cs.LG cs.CY cs.DC eess.SY stat.ML

    Decentralized SGD and Average-direction SAM are Asymptotically Equivalent

    Authors: Tongtian Zhu, Fengxiang He, Kaixuan Chen, Mingli Song, Dacheng Tao

    Abstract: Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on massive devices simultaneously without the control of a central server. However, existing theories claim that decentralization invariably undermines generalization. In this paper, we challenge the conventional belief and present a completely new perspective for understanding decentralized learning. We prove that D-S… ▽ More

    Submitted 9 November, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: 40th International Conference on Machine Learning (ICML 2023)

  28. arXiv:2306.02585  [pdf, other

    cs.CV

    MotionTrack: Learning Motion Predictor for Multiple Object Tracking

    Authors: Changcheng Xiao, Qiong Cao, Yujie Zhong, Long Lan, Xiang Zhang, Zhigang Luo, Dacheng Tao

    Abstract: Significant progress has been achieved in multi-object tracking (MOT) through the evolution of detection and re-identification (ReID) techniques. Despite these advancements, accurately tracking objects in scenarios with homogeneous appearance and heterogeneous motion remains a challenge. This challenge arises from two main factors: the insufficient discriminability of ReID features and the predomi… ▽ More

    Submitted 11 March, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

  29. arXiv:2306.01257  [pdf, other

    cs.CV

    Collect-and-Distribute Transformer for 3D Point Cloud Analysis

    Authors: Haibo Qiu, Baosheng Yu, Dacheng Tao

    Abstract: Remarkable advancements have been made recently in point cloud analysis through the exploration of transformer architecture, but it remains challenging to effectively learn local and global structures within point clouds. In this paper, we propose a new transformer network equipped with a collect-and-distribute mechanism to communicate short- and long-range contexts of point clouds, which we refer… ▽ More

    Submitted 30 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Code is available at https://github.com/haibo-qiu/CDFormer

  30. arXiv:2306.00964  [pdf, other

    cs.CV cs.LG

    Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation

    Authors: Minghui Hu, Jianbin Zheng, Daqing Liu, Chuanxia Zheng, Chaoyue Wang, Dacheng Tao, Tat-Jen Cham

    Abstract: Text-conditional diffusion models are able to generate high-fidelity images with diverse contents. However, linguistic representations frequently exhibit ambiguous descriptions of the envisioned objective imagery, requiring the incorporation of additional control signals to bolster the efficacy of text-guided diffusion models. In this work, we propose Cocktail, a pipeline to mix various modalities… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Project Page: https://mhh0318.github.io/cocktail/

  31. arXiv:2306.00434  [pdf, other

    cs.CL

    Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking

    Authors: Qingyue Wang, Liang Ding, Yanan Cao, Yibing Zhan, Zheng Lin, Shi Wang, Dacheng Tao, Li Guo

    Abstract: Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of task-oriented dialogue domains without the cost of collecting in-domain data. Existing works mainly study common data- or model-level augmentation methods to enhance the generalization but fail to effectively decouple the semantics of samples, limiting the zero-shot performance of DST. In this paper, we pres… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted to ACL 2023

  32. arXiv:2305.19957  [pdf, other

    cs.CV

    DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting

    Authors: Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, Dacheng Tao

    Abstract: End-to-end text spotting aims to integrate scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. Although Transformer-based methods eliminate the heuristic post-processing, they still suffer from the synergy issue between the sub-tasks and low training efficiency. Besides, they ov… ▽ More

    Submitted 18 March, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: The extension of the CVPR 2023 paper (DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting). arXiv admin note: substantial text overlap with arXiv:2211.10772

  33. arXiv:2305.18413  [pdf, other

    cs.LG cs.AI cs.CV

    Learning to Learn from APIs: Black-Box Data-Free Meta-Learning

    Authors: Zixuan Hu, Li Shen, Zhenyi Wang, Baoyuan Wu, Chun Yuan, Dacheng Tao

    Abstract: Data-free meta-learning (DFML) aims to enable efficient learning of new tasks by meta-learning from a collection of pre-trained models without access to the training data. Existing DFML work can only meta-learn from (i) white-box and (ii) small-scale pre-trained models (iii) with the same architecture, neglecting the more practical setting where the users only have inference access to the APIs wit… ▽ More

    Submitted 19 June, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

  34. arXiv:2305.16690  [pdf, other

    eess.AS

    Learning Representation of Therapist Empathy in Counseling Conversation Using Siamese Hierarchical Attention Network

    Authors: Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

    Abstract: Counseling is an activity of conversational speaking between a therapist and a client. Therapist empathy is an essential indicator of counseling quality and assessed subjectively by considering the entire conversation. This paper proposes to encode long counseling conversation using a hierarchical attention network. Conversations with extreme values of empathy rating are used to train a Siamese ne… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  35. arXiv:2305.16379  [pdf, other

    cs.LG cs.AI cs.CV

    Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning

    Authors: Guozheng Ma, Linrui Zhang, Haoyu Wang, Lu Li, Zilin Wang, Zhen Wang, Li Shen, Xueqian Wang, Dacheng Tao

    Abstract: Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms. Notably, employing simple observation transformations alone can yield outstanding performance without extra auxiliary representation tasks or pre-trained encoders. However, it remains unclear which attributes of DA account for its effectiveness in achieving sample-eff… ▽ More

    Submitted 27 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023 poster

  36. arXiv:2305.15832  [pdf, other

    cs.CV

    All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation

    Authors: Liyao Tang, Zhe Chen, Shanshan Zhao, Chaoyue Wang, Dacheng Tao

    Abstract: Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning. Existing methods often rely on empirical label selection strategies, such as confidence thresholding, to generate beneficial pseudo-labels for model training. This approach may, however, hinder the comprehensive exploitation of unlabeled data points. We hyp… ▽ More

    Submitted 20 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  37. arXiv:2305.15275  [pdf, other

    cs.CL

    Self-Evolution Learning for Discriminative Language Model Pretraining

    Authors: Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao

    Abstract: Masked language modeling, widely used in discriminative language model (e.g., BERT) pretraining, commonly adopts a random masking strategy. However, random masking does not consider the importance of the different words in the sentence meaning, where some of them are more worthy to be predicted. Therefore, various masking strategies (e.g., entity-level masking) are proposed, but most of them requi… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of ACL2023

  38. arXiv:2305.15273  [pdf, other

    cs.CL

    Revisiting Token Dropping Strategy in Efficient BERT Pretraining

    Authors: Qihuang Zhong, Liang Ding, Juhua Liu, Xuebo Liu, Min Zhang, Bo Du, Dacheng Tao

    Abstract: Token dropping is a recently-proposed strategy to speed up the pretraining of masked language models, such as BERT, by skipping the computation of a subset of the input tokens at several middle layers. It can effectively reduce the training time without degrading much performance on downstream tasks. However, we empirically find that token dropping is prone to a semantic loss problem and falls sho… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL2023 Main Conference

  39. arXiv:2305.15157  [pdf, other

    cs.LG cs.DC math.OC

    Towards More Suitable Personalization in Federated Learning via Decentralized Partial Model Training

    Authors: Yifan Shi, Yingqi Liu, Yan Sun, Zihao Lin, Li Shen, Xueqian Wang, Dacheng Tao

    Abstract: Personalized federated learning (PFL) aims to produce the greatest personalized model for each client to face an insurmountable problem--data heterogeneity in real FL systems. However, almost all existing works have to face large communication burdens and the risk of disruption if the central server fails. Only limited efforts have been used in a decentralized way but still suffers from inferior r… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: 26 pages

  40. arXiv:2305.13871  [pdf, other

    cs.LG

    Improving Heterogeneous Model Reuse by Density Estimation

    Authors: Anke Tang, Yong Luo, Han Hu, Fengxiang He, Kehua Su, Bo Du, Yixin Chen, Dacheng Tao

    Abstract: This paper studies multiparty learning, aiming to learn a model using the private data of different participants. Model reuse is a promising solution for multiparty learning, assuming that a local model has been trained for each party. Considering the potential sample selection bias among different parties, some heterogeneous model reuse approaches have been developed. However, although pre-traine… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 9 pages, 5 figues. Accepted by IJCAI 2023

  41. arXiv:2305.13547  [pdf, other

    cs.CL cs.NI

    Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks

    Authors: Haoqi Zheng, Qihuang Zhong, Liang Ding, Zhiliang Tian, Xin Niu, Dongsheng Li, Dacheng Tao

    Abstract: Text classification tasks often encounter few shot scenarios with limited labeled data, and addressing data scarcity is crucial. Data augmentation with mixup has shown to be effective on various text classification tasks. However, most of the mixup methods do not consider the varying degree of learning difficulty in different stages of training and generate new samples with one hot labels, resulti… ▽ More

    Submitted 27 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  42. arXiv:2305.12972  [pdf, other

    cs.CV

    VanillaNet: the Power of Minimalism in Deep Learning

    Authors: Hanting Chen, Yunhe Wang, Jianyuan Guo, Dacheng Tao

    Abstract: At the heart of foundation models is the philosophy of "more is different", exemplified by the astonishing success in computer vision and natural language processing. However, the challenges of optimization and inherent complexity of transformer models call for a paradigm shift towards simplicity. In this study, we introduce VanillaNet, a neural network architecture that embraces elegance in desig… ▽ More

    Submitted 23 May, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  43. arXiv:2305.11584  [pdf, other

    cs.LG cs.DC math.OC

    Dynamic Regularized Sharpness Aware Minimization in Federated Learning: Approaching Global Consistency and Smooth Landscape

    Authors: Yan Sun, Li Shen, Shixiang Chen, Liang Ding, Dacheng Tao

    Abstract: In federated learning (FL), a cluster of local clients are chaired under the coordination of the global server and cooperatively train one model with privacy protection. Due to the multiple local updates and the isolated non-iid dataset, clients are prone to overfit into their own optima, which extremely deviates from the global objective and significantly undermines the performance. Most previous… ▽ More

    Submitted 1 April, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: ICML2023, Oral Presentation

    Journal ref: PMLR 202:32991-33013, 2023

  44. arXiv:2305.10714  [pdf, other

    cs.CV

    Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding

    Authors: Taolin Zhang, Sunan He, Dai Tao, Bin Chen, Zhi Wang, Shu-Tao Xia

    Abstract: In recent years, vision language pre-training frameworks have made significant progress in natural language processing and computer vision, achieving remarkable performance improvement on various downstream tasks. However, when extended to point cloud data, existing works mainly focus on building task-specific models, and fail to extract universal 3D vision-language embedding that generalize well.… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  45. arXiv:2305.09648  [pdf, other

    cs.LG cs.AI

    Prompt-Tuning Decision Transformer with Preference Ranking

    Authors: Shengchao Hu, Li Shen, Ya Zhang, Dacheng Tao

    Abstract: Prompt-tuning has emerged as a promising method for adapting pre-trained models to downstream tasks or aligning with human preferences. Prompt learning is widely used in NLP but has limited applicability to RL due to the complex physical meaning and environment-specific information contained within RL prompts. These factors require supervised learning to imitate the demonstrations and may result i… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: 18 pages

  46. arXiv:2305.08098  [pdf, other

    cs.DM cs.CV math.NA

    A Theory of General Difference in Continuous and Discrete Domain

    Authors: Linmi Tao, Ruiyang Liu, Donglai Tao, Wu Xia, Feilong Ma, Yu Cheng, Jingmao Cui

    Abstract: Though a core element of the digital age, numerical difference algorithms struggle with noise susceptibility. This stems from a key disconnect between the infinitesimal quantities in continuous differentiation and the finite intervals in its discrete counterpart. This disconnect violates the fundamental definition of differentiation (Leibniz and Cauchy). To bridge this gap, we build a novel genera… ▽ More

    Submitted 25 January, 2024; v1 submitted 14 May, 2023; originally announced May 2023.

  47. arXiv:2305.05992  [pdf, other

    cs.CV

    MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis

    Authors: Jianbin Zheng, Daqing Liu, Chaoyue Wang, Minghui Hu, Zuopeng Yang, Changxing Ding, Dacheng Tao

    Abstract: Existing multimodal conditional image synthesis (MCIS) methods generate images conditioned on any combinations of various modalities that require all of them must be exactly conformed, hindering the synthesis controllability and leaving the potential of cross-modality under-exploited. To this end, we propose to generate images conditioned on the compositions of multimodal control signals, where mo… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

  48. Coronal Heating as Determined by the Solar Flare Frequency Distribution Obtained by Aggregating Case Studies

    Authors: James Paul Mason, Alexandra Werth, Colin G. West, Allison A. Youngblood, Donald L. Woodraska, Courtney Peck, Kevin Lacjak, Florian G. Frick, Moutamen Gabir, Reema A. Alsinan, Thomas Jacobsen, Mohammad Alrubaie, Kayla M. Chizmar, Benjamin P. Lau, Lizbeth Montoya Dominguez, David Price, Dylan R. Butler, Connor J. Biron, Nikita Feoktistov, Kai Dewey, N. E. Loomis, Michal Bodzianowski, Connor Kuybus, Henry Dietrick, Aubrey M. Wolfe , et al. (977 additional authors not shown)

    Abstract: Flare frequency distributions represent a key approach to addressing one of the largest problems in solar and stellar physics: determining the mechanism that counter-intuitively heats coronae to temperatures that are orders of magnitude hotter than the corresponding photospheres. It is widely accepted that the magnetic field is responsible for the heating, but there are two competing mechanisms th… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: 1,002 authors, 14 pages, 4 figures, 3 tables, published by The Astrophysical Journal on 2023-05-09, volume 948, page 71

  49. arXiv:2305.02034  [pdf, other

    cs.CV

    SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model

    Authors: Di Wang, Jing Zhang, Bo Du, Minqiang Xu, Lin Liu, Dacheng Tao, Liangpei Zhang

    Abstract: The success of the Segment Anything Model (SAM) demonstrates the significance of data-centric machine learning. However, due to the difficulties and high costs associated with annotating Remote Sensing (RS) images, a large amount of valuable RS data remains unlabeled, particularly at the pixel level. In this study, we leverage SAM and existing RS object detection datasets to develop an efficient p… ▽ More

    Submitted 12 October, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted by NeurIPS 2023 Datasets and Benchmarks Track

  50. arXiv:2305.01899  [pdf, other

    cs.AI cs.CY eess.IV

    Revolutionizing Agrifood Systems with Artificial Intelligence: A Survey

    Authors: Tao Chen, Liang Lv, Di Wang, Jing Zhang, Yue Yang, Zeyang Zhao, Chen Wang, Xiaowei Guo, Hao Chen, Qingye Wang, Yufei Xu, Qiming Zhang, Bo Du, Liangpei Zhang, Dacheng Tao

    Abstract: With the world population rapidly increasing, transforming our agrifood systems to be more productive, efficient, safe, and sustainable is crucial to mitigate potential food shortages. Recently, artificial intelligence (AI) techniques such as deep learning (DL) have demonstrated their strong abilities in various areas, including language, vision, remote sensing (RS), and agrifood systems applicati… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: Submitted to ACM