subscribe to arXiv mailings

Iterative Data Augmentation with Large Language Models for Aspect-based Sentiment Analysis

Authors: Haiyun Li, Qihuang Zhong, Ke Zhu, Juhua Liu, Bo Du, Dacheng Tao

Abstract: Aspect-based Sentiment Analysis (ABSA) is an important sentiment analysis task, which aims to determine the sentiment polarity towards an aspect in a sentence. Due to the expensive and limited labeled data, data augmentation (DA) has become the standard for improving the performance of ABSA. However, current DA methods usually have some shortcomings: 1) poor fluency and coherence, 2) lack of diver… ▽ More Aspect-based Sentiment Analysis (ABSA) is an important sentiment analysis task, which aims to determine the sentiment polarity towards an aspect in a sentence. Due to the expensive and limited labeled data, data augmentation (DA) has become the standard for improving the performance of ABSA. However, current DA methods usually have some shortcomings: 1) poor fluency and coherence, 2) lack of diversity of generated data, and 3) reliance on some existing labeled data, hindering its applications in real-world scenarios. In response to these problems, we propose a systematic Iterative Data augmentation framework, namely IterD, to boost the performance of ABSA. The core of IterD is to leverage the powerful ability of large language models (LLMs) to iteratively generate more fluent and diverse synthetic labeled data, starting from an unsupervised sentence corpus. Extensive experiments on 4 widely-used ABSA benchmarks show that IterD brings consistent and significant performance gains among 5 baseline ABSA models. More encouragingly, the synthetic data generated by IterD can achieve comparable or even better performance against the manually annotated data. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: Work in process

arXiv:2404.14963 [pdf, other]

Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems

Authors: Qihuang Zhong, Kang Wang, Ziyang Xu, Juhua Liu, Liang Ding, Bo Du, Dacheng Tao

Abstract: Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks. However, CoT still falls short in dealing with complex math word problems, as it usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors and step-missing errors. Prior studies involve addressing the calculation errors and step-missing error… ▽ More Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks. However, CoT still falls short in dealing with complex math word problems, as it usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors and step-missing errors. Prior studies involve addressing the calculation errors and step-missing errors, but neglect the semantic misunderstanding errors, which is the major factor limiting the LLMs' performance. To this end, we propose a simple-yet-effective method, namely Deeply Understanding the Problems (DUP), to improve the LLMs' math problem-solving ability by addressing semantic misunderstanding errors. The core of our method is to encourage the LLMs to deeply understand the problems and extract the key problem-solving information used for better reasoning. Extensive experiments on 10 diverse reasoning benchmarks show that our DUP method consistently outperforms the other counterparts by a large margin. More encouragingly, DUP achieves a new SOTA result on the GSM8K benchmark, with an accuracy of 97.1% under zero-shot setting. △ Less

Submitted 29 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

Comments: Work in progress

arXiv:2403.07673 [pdf, other]

Towards Model Extraction Attacks in GAN-Based Image Translation via Domain Shift Mitigation

Authors: Di Mi, Yanjun Zhang, Leo Yu Zhang, Shengshan Hu, Qi Zhong, Haizhuan Yuan, Shirui Pan

Abstract: Model extraction attacks (MEAs) enable an attacker to replicate the functionality of a victim deep neural network (DNN) model by only querying its API service remotely, posing a severe threat to the security and integrity of pay-per-query DNN-based services. Although the majority of current research on MEAs has primarily concentrated on neural classifiers, there is a growing prevalence of image-to… ▽ More Model extraction attacks (MEAs) enable an attacker to replicate the functionality of a victim deep neural network (DNN) model by only querying its API service remotely, posing a severe threat to the security and integrity of pay-per-query DNN-based services. Although the majority of current research on MEAs has primarily concentrated on neural classifiers, there is a growing prevalence of image-to-image translation (I2IT) tasks in our everyday activities. However, techniques developed for MEA of DNN classifiers cannot be directly transferred to the case of I2IT, rendering the vulnerability of I2IT models to MEA attacks often underestimated. This paper unveils the threat of MEA in I2IT tasks from a new perspective. Diverging from the traditional approach of bridging the distribution gap between attacker queries and victim training samples, we opt to mitigate the effect caused by the different distributions, known as the domain shift. This is achieved by introducing a new regularization term that penalizes high-frequency noise, and seeking a flatter minimum to avoid overfitting to the shifted distribution. Extensive experiments on different image translation tasks, including image super-resolution and style transfer, are performed on different backbone victim models, and the new design consistently outperforms the baseline by a large margin across all metrics. A few real-life I2IT APIs are also verified to be extremely vulnerable to our attack, emphasizing the need for enhanced defenses and potentially revised API publishing policies. △ Less

Submitted 19 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: Accepted by AAAI 2024

arXiv:2402.16072 [pdf]

Demonstration of 3 V Programmable Josephson Junction Arrays Using Non-Integer-Multiple Logic

Authors: Wenhui Cao, Erkun Yang, Jinjin Li, Huan Qiao, Yuan Zhong, Qing Zhong, Da Xu, Xueshen Wang, Xiaolong Xu, Shijian Wang, Jian Chen

Abstract: This article demonstrates a new kind of programmable logic for the representation of an integer that can be used for the programmable Josephson voltage standard. It can enable the numbers of junctions in most bits to be variable integer values, which is different from normal binary logic or ternary logic. Consequently, missing junctions due to superconducting short circuits can be tolerated under… ▽ More This article demonstrates a new kind of programmable logic for the representation of an integer that can be used for the programmable Josephson voltage standard. It can enable the numbers of junctions in most bits to be variable integer values, which is different from normal binary logic or ternary logic. Consequently, missing junctions due to superconducting short circuits can be tolerated under this logic. This logic can also have nearly the same segmentation efficiency as ternary logic. The completeness of the sequences using this logic is proven by the recursive method in mathematics in this paper. After that, a new algorithm for the representation of integers is presented according to the proven process, and an analysis of the number of fault-tolerant junctions for each bit is provided. Although the first and second bits are not tolerant to missing junctions, bits beyond these can tolerate one to hundreds of missing junctions. Due to the non-fixed multiples between the bits of the sequence, this logic is called non-integer-multiple logic. Finally, the design and fabrication of a 3 V programmable Josephson junction array using this logic are described, and the measurements and analysis of the characteristic parameters are presented. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.11890 [pdf, other]

Revisiting Knowledge Distillation for Autoregressive Language Models

Authors: Qihuang Zhong, Liang Ding, Li Shen, Juhua Liu, Bo Du, Dacheng Tao

Abstract: Knowledge distillation (KD) is a common approach to compress a teacher model to reduce its inference cost and memory footprint, by training a smaller student model. However, in the context of autoregressive language models (LMs), we empirically find that larger teacher LMs might dramatically result in a poorer student. In response to this problem, we conduct a series of analyses and reveal that di… ▽ More Knowledge distillation (KD) is a common approach to compress a teacher model to reduce its inference cost and memory footprint, by training a smaller student model. However, in the context of autoregressive language models (LMs), we empirically find that larger teacher LMs might dramatically result in a poorer student. In response to this problem, we conduct a series of analyses and reveal that different tokens have different teaching modes, neglecting which will lead to performance degradation. Motivated by this, we propose a simple yet effective adaptive teaching approach (ATKD) to improve the KD. The core of ATKD is to reduce rote learning and make teaching more diverse and flexible. Extensive experiments on 8 LM tasks show that, with the help of ATKD, various baseline KD methods can achieve consistent and significant performance gains (up to +3.04% average score) across all model types and sizes. More encouragingly, ATKD can improve the student model generalization effectively. △ Less

Submitted 16 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Accepted to ACL2024 Main Conference

arXiv:2402.11889 [pdf, other]

ROSE Doesn't Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding

Authors: Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao

Abstract: With the development of instruction-tuned large language models (LLMs), improving the safety of LLMs has become more critical. However, the current approaches for aligning the LLMs output with expected safety usually require substantial training efforts, e.g., high-quality safety data and expensive computational resources, which are costly and inefficient. To this end, we present reverse prompt co… ▽ More With the development of instruction-tuned large language models (LLMs), improving the safety of LLMs has become more critical. However, the current approaches for aligning the LLMs output with expected safety usually require substantial training efforts, e.g., high-quality safety data and expensive computational resources, which are costly and inefficient. To this end, we present reverse prompt contrastive decoding (ROSE), a simple-yet-effective method to directly boost the safety of existing instruction-tuned LLMs without any additional training. The principle of ROSE is to improve the probability of desired safe output via suppressing the undesired output induced by the carefully-designed reverse prompts. Experiments on 6 safety and 2 general-purpose tasks show that, our ROSE not only brings consistent and significant safety improvements (up to +13.8% safety score) upon 5 types of instruction-tuned LLMs, but also benefits the general-purpose ability of LLMs. In-depth analyses explore the underlying mechanism of ROSE, and reveal when and where to use it. △ Less

Submitted 16 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Accepted to ACL2024 Findings

arXiv:2401.11117 [pdf]

A Finger on the Pulse of Cardiovascular Health: Smartphone Photoplethysmography-Based Pulse Waveform Analysis for Blood Pressure Measurement

Authors: Ivan Liu, Fangyuan Liu, Qi Zhong, Shiguang Ni

Abstract: Routine blood pressure (BP) monitoring, crucial for health assessment, faces challenges such as limited access to medical-grade equipment and expertise. Portable cuff BP devices, on the other hand, are cumbersome to carry all day and often cost-prohibitive in less developed countries. Besides, these sphygmomanometer-based devices can cause discomfort and disrupt blood flow during measurement. This… ▽ More Routine blood pressure (BP) monitoring, crucial for health assessment, faces challenges such as limited access to medical-grade equipment and expertise. Portable cuff BP devices, on the other hand, are cumbersome to carry all day and often cost-prohibitive in less developed countries. Besides, these sphygmomanometer-based devices can cause discomfort and disrupt blood flow during measurement. This study explores the use of smartphones for continuous BP monitoring, focusing on overcoming the trust barriers associated with the opacity of machine learning models in predicting BP from low-quality PPG signals. Our approach included developing models based on cardiovascular literature, using simple statistical methods to estimate BP from smartphone PPG signals with comprehensive data pre-processing, applying SHAP for enhanced interpretability and feature identification, and comparing our methods against standard references using Bland-Altman analysis. Validated with data from 125 participants, the study demonstrated significant correlations in waveform features between smartphone and reference BP monitoring devices. The cross-validation of linear regression [MAE=9.86 and 8.01 mmHg for systolic blood pressure (SBP) and diastolic blood pressure (DBP), respectively] and random forest model (MAE=8.91 and 6.68 mmHg for SBP and DBP) using waveform-only variables demonstrated the feasibility of using a smartphone to estimate BP. Although SHAP analysis identified key feature sets, Bland-Altman results did not fully meet established thresholds (84.64% and 94.69% of MAE<15 mmHg for SBP and DBP, respectively). The study suggests the potential of smartphone cameras to enhance the accuracy and interpretability of machine learning models for daily BP estimation, but also indicates that smartphone PPG-based BP prediction is not yet a replacement for traditional medical devices. △ Less

Submitted 20 January, 2024; originally announced January 2024.

Comments: 33 pages, 9 figures

arXiv:2401.09145 [pdf]

Your blush gives you away: detecting hidden mental states with remote photoplethysmography and thermal imaging

Authors: Ivan Liu, Fangyuan Liu, Qi Zhong, Fei Ma, Shiguang Ni

Abstract: Multimodal emotion recognition techniques are increasingly essential for assessing mental states. Image-based methods, however, tend to focus predominantly on overt visual cues and often overlook subtler mental state changes. Psychophysiological research has demonstrated that HR and skin temperature are effective in detecting ANS activities, thereby revealing these subtle changes. However, traditi… ▽ More Multimodal emotion recognition techniques are increasingly essential for assessing mental states. Image-based methods, however, tend to focus predominantly on overt visual cues and often overlook subtler mental state changes. Psychophysiological research has demonstrated that HR and skin temperature are effective in detecting ANS activities, thereby revealing these subtle changes. However, traditional HR tools are generally more costly and less portable, while skin temperature analysis usually necessitates extensive manual processing. Advances in remote-PPG and automatic thermal ROI detection algorithms have been developed to address these issues, yet their accuracy in practical applications remains limited. This study aims to bridge this gap by integrating r-PPG with thermal imaging to enhance prediction performance. Ninety participants completed a 20-minute questionnaire to induce cognitive stress, followed by watching a film aimed at eliciting moral elevation. The results demonstrate that the combination of r-PPG and thermal imaging effectively detects emotional shifts. Using r-PPG alone, the prediction accuracy was 77% for cognitive stress and 61% for moral elevation, as determined by SVM. Thermal imaging alone achieved 79% accuracy for cognitive stress and 78% for moral elevation, utilizing a RF algorithm. An early fusion strategy of these modalities significantly improved accuracies, achieving 87% for cognitive stress and 83% for moral elevation using RF. Further analysis, which utilized statistical metrics and explainable machine learning methods including SHAP, highlighted key features and clarified the relationship between cardiac responses and facial temperature variations. Notably, it was observed that cardiovascular features derived from r-PPG models had a more pronounced influence in data fusion, despite thermal imaging's higher predictive accuracy in unimodal analysis. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: 28 pages, 6 figures

arXiv:2311.16831 [pdf, other]

Tracking a Year of Polarized Twitter Discourse on Abortion

Authors: Ashwin Rao, Rong-Ching Chang, Qiankun Zhong, Kristina Lerman, Magdalena Wojcieszak

Abstract: Abortion is one of the most contentious issues in American politics. The Dobbs v. Jackson Women's Health Organization ruling in 2022, which shifted the authority to regulate abortion from the federal government to the states, triggering intense protests and emotional debates across the nation. Yet, little is known about how online discourse about abortion rights fluctuated on social media platform… ▽ More Abortion is one of the most contentious issues in American politics. The Dobbs v. Jackson Women's Health Organization ruling in 2022, which shifted the authority to regulate abortion from the federal government to the states, triggering intense protests and emotional debates across the nation. Yet, little is known about how online discourse about abortion rights fluctuated on social media platforms. This study analyzes a corpus of over 57M abortion-related tweets from January 2022 to January 2023 to show how emotions, hateful rhetoric, toxic speech, use of obscenities and insults, and also framing strategies fluctuated over the span of one year among liberal and conservative users. We offer three key findings. (1) Fluctuations in emotions were temporary; key events during the analyzed period did not bring about lasting shifts in expressed emotions. (2) We observe significant ideological differences in the use of hate speech: conservatives resorted to hateful rhetoric more than liberals. Yet, liberals were especially likely to use obscenities and insults, especially on the days the ruling was leaked and after the Dobbs decision. In turn, toxic language sharply increased among both groups following the leak and after the SCOTUS ruling. (3) Conservatives employ religious and fetal personhood frames, while liberals emphasize women's health and bodily autonomy, with each group reacting negatively to the other group's frames. Our results offer an in-depth insight into the dynamics of online discourse on one of the most contentious issues in contemporary America. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2310.13315 [pdf, other]

Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models

Authors: Miaoxi Zhu, Qihuang Zhong, Li Shen, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao

Abstract: Quantization is a promising approach for reducing memory overhead and accelerating inference, especially in large pre-trained language model (PLM) scenarios. While having no access to original training data due to security and privacy concerns has emerged the demand for zero-shot quantization. Most of the cutting-edge zero-shot quantization methods primarily 1) apply to computer vision tasks, and… ▽ More Quantization is a promising approach for reducing memory overhead and accelerating inference, especially in large pre-trained language model (PLM) scenarios. While having no access to original training data due to security and privacy concerns has emerged the demand for zero-shot quantization. Most of the cutting-edge zero-shot quantization methods primarily 1) apply to computer vision tasks, and 2) neglect of overfitting problem in the generative adversarial learning process, leading to sub-optimal performance. Motivated by this, we propose a novel zero-shot sharpness-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs. The key algorithm in solving ZSAQ is the SAM-SGA optimization, which aims to improve the quantization accuracy and model generalization via optimizing a minimax problem. We theoretically prove the convergence rate for the minimax optimization problem and this result can be applied to other nonconvex-PL minimax optimization frameworks. Extensive experiments on 11 tasks demonstrate that our method brings consistent and significant performance gains on both discriminative and generative PLMs, i.e., up to +6.98 average score. Furthermore, we empirically validate that our method can effectively improve the model generalization. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP2023 (Main). Miaoxi Zhu and Qihuang Zhong contribute equally to this work

arXiv:2310.01753 [pdf, other]

CausalTime: Realistically Generated Time-series for Benchmarking of Causal Discovery

Authors: Yuxiao Cheng, Ziqian Wang, Tingxiong Xiao, Qin Zhong, Jinli Suo, Kunlun He

Abstract: Time-series causal discovery (TSCD) is a fundamental problem of machine learning. However, existing synthetic datasets cannot properly evaluate or predict the algorithms' performance on real data. This study introduces the CausalTime pipeline to generate time-series that highly resemble the real data and with ground truth causal graphs for quantitative performance evaluation. The pipeline starts f… ▽ More Time-series causal discovery (TSCD) is a fundamental problem of machine learning. However, existing synthetic datasets cannot properly evaluate or predict the algorithms' performance on real data. This study introduces the CausalTime pipeline to generate time-series that highly resemble the real data and with ground truth causal graphs for quantitative performance evaluation. The pipeline starts from real observations in a specific scenario and produces a matching benchmark dataset. Firstly, we harness deep neural networks along with normalizing flow to accurately capture realistic dynamics. Secondly, we extract hypothesized causal graphs by performing importance analysis on the neural network or leveraging prior knowledge. Thirdly, we derive the ground truth causal graphs by splitting the causal model into causal term, residual term, and noise term. Lastly, using the fitted network and the derived causal graph, we generate corresponding versatile time-series proper for algorithm assessment. In the experiments, we validate the fidelity of the generated data through qualitative and quantitative experiments, followed by a benchmarking of existing TSCD algorithms using these generated datasets. CausalTime offers a feasible solution to evaluating TSCD algorithms in real applications and can be generalized to a wide range of fields. For easy use of the proposed approach, we also provide a user-friendly website, hosted on www.causaltime.cc. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2309.08096 [pdf, other]

GelSplitter: Tactile Reconstruction from Near Infrared and Visible Images

Authors: Yuankai Lin, Yulin Zhou, Kaiji Huang, Qi Zhong, Tao Cheng, Hua Yang, Zhouping Yin

Abstract: The GelSight-like visual tactile (VT) sensor has gained popularity as a high-resolution tactile sensing technology for robots, capable of measuring touch geometry using a single RGB camera. However, the development of multi-modal perception for VT sensors remains a challenge, limited by the mono camera. In this paper, we propose the GelSplitter, a new framework approach the multi-modal VT sensor w… ▽ More The GelSight-like visual tactile (VT) sensor has gained popularity as a high-resolution tactile sensing technology for robots, capable of measuring touch geometry using a single RGB camera. However, the development of multi-modal perception for VT sensors remains a challenge, limited by the mono camera. In this paper, we propose the GelSplitter, a new framework approach the multi-modal VT sensor with synchronized multi-modal cameras and resemble a more human-like tactile receptor. Furthermore, we focus on 3D tactile reconstruction and implement a compact sensor structure that maintains a comparable size to state-of-the-art VT sensors, even with the addition of a prism and a near infrared (NIR) camera. We also design a photometric fusion stereo neural network (PFSNN), which estimates surface normals of objects and reconstructs touch geometry from both infrared and visible images. Our results demonstrate that the accuracy of RGB and NIR fusion is higher than that of RGB images alone. Additionally, our GelSplitter framework allows for a flexible configuration of different camera sensor combinations, such as RGB and thermal imaging. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2307.12616 [pdf, other]

CTVIS: Consistent Training for Online Video Instance Segmentation

Authors: Kaining Ying, Qing Zhong, Weian Mao, Zhenhua Wang, Hao Chen, Lin Yuanbo Wu, Yifan Liu, Chengxiang Fan, Yunzhi Zhuge, Chunhua Shen

Abstract: The discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS). Instance embedding learning is directly supervised by the contrastive loss computed upon the contrastive items (CIs), which are sets of anchor/positive/negative embeddings. Recent online VIS methods leverage CIs sourced from one reference frame only, which… ▽ More The discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS). Instance embedding learning is directly supervised by the contrastive loss computed upon the contrastive items (CIs), which are sets of anchor/positive/negative embeddings. Recent online VIS methods leverage CIs sourced from one reference frame only, which we argue is insufficient for learning highly discriminative embeddings. Intuitively, a possible strategy to enhance CIs is replicating the inference phase during training. To this end, we propose a simple yet effective training strategy, called Consistent Training for Online VIS (CTVIS), which devotes to aligning the training and inference pipelines in terms of building CIs. Specifically, CTVIS constructs CIs by referring inference the momentum-averaged embedding and the memory bank storage mechanisms, and adding noise to the relevant embeddings. Such an extension allows a reliable comparison between embeddings of current instances and the stable representations of historical instances, thereby conferring an advantage in modeling VIS challenges such as occlusion, re-identification, and deformation. Empirically, CTVIS outstrips the SOTA VIS models by up to +5.0 points on three VIS benchmarks, including YTVIS19 (55.1% AP), YTVIS21 (50.1% AP) and OVIS (35.5% AP). Furthermore, we find that pseudo-videos transformed from images can train robust models surpassing fully-supervised ones. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: Accepted by ICCV 2023. The code is available at https://github.com/KainingYing/CTVIS

arXiv:2305.15275 [pdf, other]

Self-Evolution Learning for Discriminative Language Model Pretraining

Authors: Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao

Abstract: Masked language modeling, widely used in discriminative language model (e.g., BERT) pretraining, commonly adopts a random masking strategy. However, random masking does not consider the importance of the different words in the sentence meaning, where some of them are more worthy to be predicted. Therefore, various masking strategies (e.g., entity-level masking) are proposed, but most of them requi… ▽ More Masked language modeling, widely used in discriminative language model (e.g., BERT) pretraining, commonly adopts a random masking strategy. However, random masking does not consider the importance of the different words in the sentence meaning, where some of them are more worthy to be predicted. Therefore, various masking strategies (e.g., entity-level masking) are proposed, but most of them require expensive prior knowledge and generally train from scratch without reusing existing model weights. In this paper, we present Self-Evolution learning (SE), a simple and effective token masking and learning method to fully and wisely exploit the knowledge from data. SE focuses on learning the informative yet under-explored tokens and adaptively regularizes the training by introducing a novel Token-specific Label Smoothing approach. Experiments on 10 tasks show that our SE brings consistent and significant improvements (+1.43~2.12 average scores) upon different PLMs. In-depth analyses demonstrate that SE improves linguistic knowledge learning and generalization. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted to Findings of ACL2023

arXiv:2305.15273 [pdf, other]

Revisiting Token Dropping Strategy in Efficient BERT Pretraining

Authors: Qihuang Zhong, Liang Ding, Juhua Liu, Xuebo Liu, Min Zhang, Bo Du, Dacheng Tao

Abstract: Token dropping is a recently-proposed strategy to speed up the pretraining of masked language models, such as BERT, by skipping the computation of a subset of the input tokens at several middle layers. It can effectively reduce the training time without degrading much performance on downstream tasks. However, we empirically find that token dropping is prone to a semantic loss problem and falls sho… ▽ More Token dropping is a recently-proposed strategy to speed up the pretraining of masked language models, such as BERT, by skipping the computation of a subset of the input tokens at several middle layers. It can effectively reduce the training time without degrading much performance on downstream tasks. However, we empirically find that token dropping is prone to a semantic loss problem and falls short in handling semantic-intense tasks. Motivated by this, we propose a simple yet effective semantic-consistent learning method (ScTD) to improve the token dropping. ScTD aims to encourage the model to learn how to preserve the semantic information in the representation space. Extensive experiments on 12 tasks show that, with the help of our ScTD, token dropping can achieve consistent and significant performance gains across all task types and model sizes. More encouragingly, ScTD saves up to 57% of pretraining time and brings up to +1.56% average improvement over the vanilla token dropping. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted to ACL2023 Main Conference

arXiv:2305.13547 [pdf, other]

Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks

Authors: Haoqi Zheng, Qihuang Zhong, Liang Ding, Zhiliang Tian, Xin Niu, Dongsheng Li, Dacheng Tao

Abstract: Text classification tasks often encounter few shot scenarios with limited labeled data, and addressing data scarcity is crucial. Data augmentation with mixup has shown to be effective on various text classification tasks. However, most of the mixup methods do not consider the varying degree of learning difficulty in different stages of training and generate new samples with one hot labels, resulti… ▽ More Text classification tasks often encounter few shot scenarios with limited labeled data, and addressing data scarcity is crucial. Data augmentation with mixup has shown to be effective on various text classification tasks. However, most of the mixup methods do not consider the varying degree of learning difficulty in different stages of training and generate new samples with one hot labels, resulting in the model over confidence. In this paper, we propose a self evolution learning (SE) based mixup approach for data augmentation in text classification, which can generate more adaptive and model friendly pesudo samples for the model training. SE focuses on the variation of the model's learning ability. To alleviate the model confidence, we introduce a novel instance specific label smoothing approach, which linearly interpolates the model's output and one hot labels of the original samples to generate new soft for label mixing up. Through experimental analysis, in addition to improving classification accuracy, we demonstrate that SE also enhances the model's generalize ability. △ Less

Submitted 27 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.05890 [pdf, other]

CUTS+: High-dimensional Causal Discovery from Irregular Time-series

Authors: Yuxiao Cheng, Lianglong Li, Tingxiong Xiao, Zongren Li, Qin Zhong, Jinli Suo, Kunlun He

Abstract: Causal discovery in time-series is a fundamental problem in the machine learning community, enabling causal reasoning and decision-making in complex scenarios. Recently, researchers successfully discover causality by combining neural networks with Granger causality, but their performances degrade largely when encountering high-dimensional data because of the highly redundant network design and hug… ▽ More Causal discovery in time-series is a fundamental problem in the machine learning community, enabling causal reasoning and decision-making in complex scenarios. Recently, researchers successfully discover causality by combining neural networks with Granger causality, but their performances degrade largely when encountering high-dimensional data because of the highly redundant network design and huge causal graphs. Moreover, the missing entries in the observations further hamper the causal structural learning. To overcome these limitations, We propose CUTS+, which is built on the Granger-causality-based causal discovery method CUTS and raises the scalability by introducing a technique called Coarse-to-fine-discovery (C2FD) and leveraging a message-passing-based graph neural network (MPGNN). Compared to previous methods on simulated, quasi-real, and real datasets, we show that CUTS+ largely improves the causal discovery performance on high-dimensional data with different types of irregular sampling. △ Less

Submitted 16 August, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: Submit to AAAI-24

arXiv:2304.08767 [pdf, other]

Masked Language Model Based Textual Adversarial Example Detection

Authors: Xiaomei Zhang, Zhaoxi Zhang, Qi Zhong, Xufei Zheng, Yanjun Zhang, Shengshan Hu, Leo Yu Zhang

Abstract: Adversarial attacks are a serious threat to the reliable deployment of machine learning models in safety-critical applications. They can misguide current models to predict incorrectly by slightly modifying the inputs. Recently, substantial work has shown that adversarial examples tend to deviate from the underlying data manifold of normal examples, whereas pre-trained masked language models can fi… ▽ More Adversarial attacks are a serious threat to the reliable deployment of machine learning models in safety-critical applications. They can misguide current models to predict incorrectly by slightly modifying the inputs. Recently, substantial work has shown that adversarial examples tend to deviate from the underlying data manifold of normal examples, whereas pre-trained masked language models can fit the manifold of normal NLP data. To explore how to use the masked language model in adversarial detection, we propose a novel textual adversarial example detection method, namely Masked Language Model-based Detection (MLMD), which can produce clearly distinguishable signals between normal examples and adversarial examples by exploring the changes in manifolds induced by the masked language model. MLMD features a plug and play usage (i.e., no need to retrain the victim model) for adversarial defense and it is agnostic to classification tasks, victim model's architectures, and to-be-defended attack methods. We evaluate MLMD on various benchmark textual datasets, widely studied machine learning models, and state-of-the-art (SOTA) adversarial attacks (in total $3*4*4 = 48$ settings). Experimental results show that MLMD can achieve strong performance, with detection accuracy up to 0.984, 0.967, and 0.901 on AG-NEWS, IMDB, and SST-2 datasets, respectively. Additionally, MLMD is superior, or at least comparable to, the SOTA detection defenses in detection accuracy and F1 score. Among many defenses based on the off-manifold assumption of adversarial examples, this work offers a new angle for capturing the manifold change. The code for this work is openly accessible at \url{https://github.com/mlmddetection/MLMDdetection}. △ Less

Submitted 28 January, 2024; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: 13 pages,3 figures

arXiv:2304.03898 [pdf, other]

The Short Text Matching Model Enhanced with Knowledge via Contrastive Learning

Authors: Ruiqiang Liu, Qiqiang Zhong, Mengmeng Cui, Hanjie Mai, Qiang Zhang, Shaohua Xu, Xiangzheng Liu, Yanlong Du

Abstract: In recent years, short Text Matching tasks have been widely applied in the fields ofadvertising search and recommendation. The difficulty lies in the lack of semantic information and word ambiguity caused by the short length of the text. Previous works have introduced complement sentences or knowledge bases to provide additional feature information. However, these methods have not fully interacted… ▽ More In recent years, short Text Matching tasks have been widely applied in the fields ofadvertising search and recommendation. The difficulty lies in the lack of semantic information and word ambiguity caused by the short length of the text. Previous works have introduced complement sentences or knowledge bases to provide additional feature information. However, these methods have not fully interacted between the original sentence and the complement sentence, and have not considered the noise issue that may arise from the introduction of external knowledge bases. Therefore, this paper proposes a short Text Matching model that combines contrastive learning and external knowledge. The model uses a generative model to generate corresponding complement sentences and uses the contrastive learning method to guide the model to obtain more semantically meaningful encoding of the original sentence. In addition, to avoid noise, we use keywords as the main semantics of the original sentence to retrieve corresponding knowledge words in the knowledge base, and construct a knowledge graph. The graph encoding model is used to integrate the knowledge base information into the model. Our designed model achieves state-of-the-art performance on two publicly available Chinese Text Matching datasets, demonstrating the effectiveness of our model. △ Less

Submitted 19 December, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

Comments: 11 pages,2 figures

arXiv:2304.02205 [pdf, other]

MoocRadar: A Fine-grained and Multi-aspect Knowledge Repository for Improving Cognitive Student Modeling in MOOCs

Authors: Jifan Yu, Mengying Lu, Qingyang Zhong, Zijun Yao, Shangqing Tu, Zhengshan Liao, Xiaoya Li, Manli Li, Lei Hou, Hai-Tao Zheng, Juanzi Li, Jie Tang

Abstract: Student modeling, the task of inferring a student's learning characteristics through their interactions with coursework, is a fundamental issue in intelligent education. Although the recent attempts from knowledge tracing and cognitive diagnosis propose several promising directions for improving the usability and effectiveness of current models, the existing public datasets are still insufficient… ▽ More Student modeling, the task of inferring a student's learning characteristics through their interactions with coursework, is a fundamental issue in intelligent education. Although the recent attempts from knowledge tracing and cognitive diagnosis propose several promising directions for improving the usability and effectiveness of current models, the existing public datasets are still insufficient to meet the need for these potential solutions due to their ignorance of complete exercising contexts, fine-grained concepts, and cognitive labels. In this paper, we present MoocRadar, a fine-grained, multi-aspect knowledge repository consisting of 2,513 exercise questions, 5,600 knowledge concepts, and over 12 million behavioral records. Specifically, we propose a framework to guarantee a high-quality and comprehensive annotation of fine-grained concepts and cognitive labels. The statistical and experimental results indicate that our dataset provides the basis for the future improvements of existing methods. Moreover, to support the convenient usage for researchers, we release a set of tools for data querying, model adaption, and even the extension of our repository, which are now available at https://github.com/THU-KEG/MOOC-Radar. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: Accepted by SIGIR 2023

arXiv:2303.13780 [pdf, other]

Towards Making the Most of ChatGPT for Machine Translation

Authors: Keqin Peng, Liang Ding, Qihuang Zhong, Li Shen, Xuebo Liu, Min Zhang, Yuanxin Ouyang, Dacheng Tao

Abstract: ChatGPT shows remarkable capabilities for machine translation (MT). Several prior studies have shown that it achieves comparable results to commercial systems for high-resource languages, but lags behind in complex tasks, e.g., low-resource and distant-language-pairs translation. However, they usually adopt simple prompts which can not fully elicit the capability of ChatGPT. In this paper, we aim… ▽ More ChatGPT shows remarkable capabilities for machine translation (MT). Several prior studies have shown that it achieves comparable results to commercial systems for high-resource languages, but lags behind in complex tasks, e.g., low-resource and distant-language-pairs translation. However, they usually adopt simple prompts which can not fully elicit the capability of ChatGPT. In this paper, we aim to further mine ChatGPT's translation ability by revisiting several aspects: temperature, task information, and domain information, and correspondingly propose an optimal temperature setting and two (simple but effective) prompts: Task-Specific Prompts (TSP) and Domain-Specific Prompts (DSP). We show that: 1) The performance of ChatGPT depends largely on temperature, and a lower temperature usually can achieve better performance; 2) Emphasizing the task information can further improve ChatGPT's performance, particularly in complex MT tasks; 3) Introducing domain information can elicit ChatGPT's generalization ability and improve its performance in the specific domain; 4) ChatGPT tends to generate hallucinations for non-English-centric MT tasks, which can be partially addressed by our proposed prompts but still need to be highlighted for the MT/NLP community. We also explore the effects of advanced in-context learning strategies and find a (negative but interesting) observation: the powerful chain-of-thought prompt leads to word-by-word translation behavior, thus bringing significant translation degradation. △ Less

Submitted 20 October, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: EMNLP 2023 (findings)

arXiv:2303.00565 [pdf, other]

AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks

Authors: Hao Sun, Li Shen, Qihuang Zhong, Liang Ding, Shixiang Chen, Jingwei Sun, Jing Li, Guangzhong Sun, Dacheng Tao

Abstract: Sharpness aware minimization (SAM) optimizer has been extensively explored as it can generalize better for training deep neural networks via introducing extra perturbation steps to flatten the landscape of deep learning models. Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks withou… ▽ More Sharpness aware minimization (SAM) optimizer has been extensively explored as it can generalize better for training deep neural networks via introducing extra perturbation steps to flatten the landscape of deep learning models. Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks without theoretical guarantee due to the triple difficulties in analyzing the coupled perturbation step, adaptive learning rate and momentum step. In this paper, we try to analyze the convergence rate of AdaSAM in the stochastic non-convex setting. We theoretically show that AdaSAM admits a $\mathcal{O}(1/\sqrt{bT})$ convergence rate, which achieves linear speedup property with respect to mini-batch size $b$. Specifically, to decouple the stochastic gradient steps with the adaptive learning rate and perturbed gradient, we introduce the delayed second-order momentum term to decompose them to make them independent while taking an expectation during the analysis. Then we bound them by showing the adaptive learning rate has a limited range, which makes our analysis feasible. To the best of our knowledge, we are the first to provide the non-trivial convergence rate of SAM with an adaptive learning rate and momentum acceleration. At last, we conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMSGrad, and SAM optimizers. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: 18 pages

arXiv:2302.10198 [pdf, other]

Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT

Authors: Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao

Abstract: Recently, ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries. Several prior studies have shown that ChatGPT attains remarkable generation ability compared with existing models. However, the quantitative analysis of ChatGPT's understanding ability has been given little attention. In this report, we explore the understanding ability of Chat… ▽ More Recently, ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries. Several prior studies have shown that ChatGPT attains remarkable generation ability compared with existing models. However, the quantitative analysis of ChatGPT's understanding ability has been given little attention. In this report, we explore the understanding ability of ChatGPT by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models. We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question-answering tasks. Additionally, by combining some advanced prompting strategies, we show that the understanding ability of ChatGPT can be further improved. △ Less

Submitted 2 March, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

Comments: Work in progress. Added results of advanced prompting strategies, e.g., CoT. (19 pages)

arXiv:2302.09268 [pdf, other]

Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE

Authors: Qihuang Zhong, Liang Ding, Keqin Peng, Juhua Liu, Bo Du, Li Shen, Yibing Zhan, Dacheng Tao

Abstract: This technical report briefly describes our JDExplore d-team's submission Vega v1 on the General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference. [Method] We investigate sever… ▽ More This technical report briefly describes our JDExplore d-team's submission Vega v1 on the General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference. [Method] We investigate several effective strategies and choose their best combination setting as the training recipes. As for model structure, we employ the vanilla Transformer with disentangled attention as the basic block encoder. For self-supervised training, we employ the representative denoising objective (i.e., replaced token detection) in phase 1 and combine the contrastive objective (i.e., sentence embedding contrastive learning) with it in phase 2. During fine-tuning, several advanced techniques such as transductive fine-tuning, self-calibrated fine-tuning, and adversarial fine-tuning are adopted. [Results] According to our submission record (Jan. 2022), with our optimized pretraining and fine-tuning strategies, our 1.3 billion model sets new state-of-the-art on 4/9 tasks, achieving the best average score of 91.3. Encouragingly, our Vega v1 is the first to exceed powerful human performance on the two challenging tasks, i.e., SST-2 and WNLI. We believe our empirically successful recipe with a bag of tricks could shed new light on developing efficient discriminative large language models. △ Less

Submitted 18 February, 2023; originally announced February 2023.

Comments: Technical report. arXiv admin note: text overlap with arXiv:2212.01853

arXiv:2302.01439 [pdf, other]

#RoeOverturned: Twitter Dataset on the Abortion Rights Controversy

Authors: Rong-Ching Chang, Ashwin Rao, Qiankun Zhong, Magdalena Wojcieszak, Kristina Lerman

Abstract: On June 24, 2022, the United States Supreme Court overturned landmark rulings made in its 1973 verdict in Roe v. Wade. The justices by way of a majority vote in Dobbs v. Jackson Women's Health Organization, decided that abortion wasn't a constitutional right and returned the issue of abortion to the elected representatives. This decision triggered multiple protests and debates across the US, espec… ▽ More On June 24, 2022, the United States Supreme Court overturned landmark rulings made in its 1973 verdict in Roe v. Wade. The justices by way of a majority vote in Dobbs v. Jackson Women's Health Organization, decided that abortion wasn't a constitutional right and returned the issue of abortion to the elected representatives. This decision triggered multiple protests and debates across the US, especially in the context of the midterm elections in November 2022. Given that many citizens use social media platforms to express their views and mobilize for collective action, and given that online debate provides tangible effects on public opinion, political participation, news media coverage, and the political decision-making, it is crucial to understand online discussions surrounding this topic. Toward this end, we present the first large-scale Twitter dataset collected on the abortion rights debate in the United States. We present a set of 74M tweets systematically collected over the course of one year from January 1, 2022 to January 6, 2023. △ Less

Submitted 2 February, 2023; originally announced February 2023.

Comments: 9 pages, 5 figures

arXiv:2212.01853 [pdf, other]

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

Authors: Qihuang Zhong, Liang Ding, Yibing Zhan, Yu Qiao, Yonggang Wen, Li Shen, Juhua Liu, Baosheng Yu, Bo Du, Yixin Chen, Xinbo Gao, Chunyan Miao, Xiaoou Tang, Dacheng Tao

Abstract: This technical report briefly describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard. SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks, including question answering, natural language inference, word sense disambiguation, coreference resolution, and reasoni… ▽ More This technical report briefly describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard. SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks, including question answering, natural language inference, word sense disambiguation, coreference resolution, and reasoning. [Method] Instead of arbitrarily increasing the size of a pretrained language model (PLM), our aim is to 1) fully extract knowledge from the input pretraining data given a certain parameter budget, e.g., 6B, and 2) effectively transfer this knowledge to downstream tasks. To achieve goal 1), we propose self-evolution learning for PLMs to wisely predict the informative tokens that should be masked, and supervise the masked language modeling (MLM) process with rectified smooth labels. For goal 2), we leverage the prompt transfer technique to improve the low-resource tasks by transferring the knowledge from the foundation model and related downstream tasks to the target task. [Results] According to our submission record (Oct. 2022), with our optimized pretraining and fine-tuning strategies, our 6B Vega method achieved new state-of-the-art performance on 4/8 tasks, sitting atop the SuperGLUE leaderboard on Oct. 8, 2022, with an average score of 91.3. △ Less

Submitted 4 December, 2022; originally announced December 2022.

Comments: Technical report

arXiv:2210.05497 [pdf, other]

Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models

Authors: Qihuang Zhong, Liang Ding, Li Shen, Peng Mi, Juhua Liu, Bo Du, Dacheng Tao

Abstract: Fine-tuning large pretrained language models on a limited training corpus usually suffers from poor generalization. Prior works show that the recently-proposed sharpness-aware minimization (SAM) optimization method can improve the model generalization. However, SAM adds a perturbation to each model parameter equally (but not all parameters contribute equally to the optimization of training), which… ▽ More Fine-tuning large pretrained language models on a limited training corpus usually suffers from poor generalization. Prior works show that the recently-proposed sharpness-aware minimization (SAM) optimization method can improve the model generalization. However, SAM adds a perturbation to each model parameter equally (but not all parameters contribute equally to the optimization of training), which we argue is sub-optimal and will lead to excessive computation. In this paper, we propose a novel optimization procedure, namely FSAM, which introduces a Fisher mask to improve the efficiency and performance of SAM. In short, instead of adding perturbation to all parameters, FSAM uses the Fisher information to identity the important parameters and formulates a Fisher mask to obtain the sparse perturbation, i.e., making the optimizer focus on these important parameters. Experiments on various tasks in GLUE and SuperGLUE benchmarks show that FSAM consistently outperforms the vanilla SAM by 0.67~1.98 average score among four different pretrained models. We also empirically show that FSAM works well in other complex scenarios, e.g., fine-tuning on generation tasks or limited training data. Encouragingly, when training data is limited, FSAM improves the SAM by a large margin, i.e., up to 15.1. △ Less

Submitted 11 October, 2022; originally announced October 2022.

Comments: Accepted by EMNLP 2022 (Findings)

arXiv:2208.10160 [pdf, other]

PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation

Authors: Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao

Abstract: Prompt Transfer (PoT) is a recently-proposed approach to improve prompt-tuning, by initializing the target prompt with the existing prompt trained on similar source tasks. However, such a vanilla PoT approach usually achieves sub-optimal performance, as (i) the PoT is sensitive to the similarity of source-target pair and (ii) directly fine-tuning the prompt initialized with source prompt on target… ▽ More Prompt Transfer (PoT) is a recently-proposed approach to improve prompt-tuning, by initializing the target prompt with the existing prompt trained on similar source tasks. However, such a vanilla PoT approach usually achieves sub-optimal performance, as (i) the PoT is sensitive to the similarity of source-target pair and (ii) directly fine-tuning the prompt initialized with source prompt on target task might lead to forgetting of the useful general knowledge learned from source task. To tackle these issues, we propose a new metric to accurately predict the prompt transferability (regarding (i)), and a novel PoT approach (namely PANDA) that leverages the knowledge distillation technique to alleviate the knowledge forgetting effectively (regarding (ii)). Extensive and systematic experiments on 189 combinations of 21 source and 9 target datasets across 5 scales of PLMs demonstrate that: 1) our proposed metric works well to predict the prompt transferability; 2) our PANDA consistently outperforms the vanilla PoT approach by 2.3% average score (up to 24.1%) among all tasks and model sizes; 3) with our PANDA approach, prompt-tuning can achieve competitive and even better performance than model-tuning in various PLM scales scenarios. We have publicly released our code in https://github.com/WHU-ZQH/PANDA. △ Less

Submitted 2 April, 2024; v1 submitted 22 August, 2022; originally announced August 2022.

Comments: Accepted by IEEE TKDE

arXiv:2208.04708 [pdf, other]

Towards a General Pre-training Framework for Adaptive Learning in MOOCs

Authors: Qingyang Zhong, Jifan Yu, Zheyuan Zhang, Yiming Mao, Yuquan Wang, Yankai Lin, Lei Hou, Juanzi Li, Jie Tang

Abstract: Adaptive learning aims to stimulate and meet the needs of individual learners, which requires sophisticated system-level coordination of diverse tasks, including modeling learning resources, estimating student states, and making personalized recommendations. Existing deep learning methods have achieved great success over statistical models; however, they still lack generalization for diverse tasks… ▽ More Adaptive learning aims to stimulate and meet the needs of individual learners, which requires sophisticated system-level coordination of diverse tasks, including modeling learning resources, estimating student states, and making personalized recommendations. Existing deep learning methods have achieved great success over statistical models; however, they still lack generalization for diverse tasks and suffer from insufficient capacity since they are composed of highly-coupled task-specific architectures and rely on small-scale, coarse-grained recommendation scenarios. To realize the idea of general adaptive systems proposed in pedagogical theory, with the emerging pre-training techniques in NLP, we try to conduct a practical exploration on applying pre-training to adaptive learning, to propose a unified framework based on data observation and learning style analysis, properly leveraging heterogeneous learning elements. Through a series of downstream tasks of Learning Recommendation, Learning Resource Evaluation, Knowledge Tracing, and Dropout Prediction, we find that course structures, text, and knowledge are helpful for modeling and inherently coherent to student non-sequential learning behaviors and that indirectly relevant information included in the pre-training foundation can be shared across downstream tasks to facilitate effectiveness. We finally build a simplified systematic application of adaptive learning and reflect on the insights brought back to pedagogy. The source code and dataset will be released. △ Less

Submitted 18 July, 2022; originally announced August 2022.

Comments: 13 pages, 8 figures

arXiv:2206.07992 [pdf, other]

Deconstructing written rules and hierarchy in peer produced software communities

Authors: Mahasweta Chakraborti, Beril Bulat, Qiankun Zhong, Anamika Sen, Seth Frey

Abstract: We employ recent advances in computational institutional analysis and NLP to investigate the systems of authority that are reflected in the written policy documents of the ASF. Our study to decipher the effective similarities or departures of the ASF model from conventional software companies reveals evidence of both flat and bureaucratic governance in a peer production set up, suggesting a compli… ▽ More We employ recent advances in computational institutional analysis and NLP to investigate the systems of authority that are reflected in the written policy documents of the ASF. Our study to decipher the effective similarities or departures of the ASF model from conventional software companies reveals evidence of both flat and bureaucratic governance in a peer production set up, suggesting a complicated relationship between business-based theories of administrative hierarchy and foundational principles of the OSS movement. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: 9 pages

ACM Class: H.5.3

arXiv:2205.14912 [pdf, other]

doi 10.1109/TKDE.2023.3341917

E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation

Authors: Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao

Abstract: Sequence-to-sequence (seq2seq) learning is a popular fashion for large-scale pretraining language models. However, the prior seq2seq pretraining models generally focus on reconstructive objectives on the decoder side and neglect the effect of encoder-side supervision, which we argue may lead to sub-optimal performance. To verify our hypothesis, we first empirically study the functionalities of the… ▽ More Sequence-to-sequence (seq2seq) learning is a popular fashion for large-scale pretraining language models. However, the prior seq2seq pretraining models generally focus on reconstructive objectives on the decoder side and neglect the effect of encoder-side supervision, which we argue may lead to sub-optimal performance. To verify our hypothesis, we first empirically study the functionalities of the encoder and decoder in seq2seq pretrained language models, and find that the encoder takes an important but under-exploitation role than the decoder regarding the downstream performance and neuron activation. Therefore, we propose an encoding-enhanced seq2seq pretraining strategy, namely E2S2, which improves the seq2seq models via integrating more efficient self-supervised information into the encoders. Specifically, E2S2 adopts two self-supervised objectives on the encoder side from two aspects: 1) locally denoising the corrupted sentence (denoising objective); and 2) globally learning better sentence representations (contrastive objective). With the help of both objectives, the encoder can effectively distinguish the noise tokens and capture high-level (i.e., syntactic and semantic) knowledge, thus strengthening the ability of seq2seq model to accurately achieve the conditional generation. On a large diversity of downstream natural language understanding and generation tasks, E2S2 dominantly improves the performance of its powerful backbone models, e.g., BART and T5. For example, upon BART backbone, we achieve +1.1% averaged gain on the general language understanding evaluation (GLUE) benchmark and +1.75% F_0.5 score improvement on CoNLL2014 dataset. We also provide in-depth analyses to show the improvement stems from better linguistic representation. We hope that our work will foster future self-supervision research on seq2seq language model pretraining. △ Less

Submitted 9 January, 2024; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: Accepted by IEEE TKDE 2023

arXiv:2205.11126 [pdf, other]

KRNet: Towards Efficient Knowledge Replay

Authors: Yingying Zhang, Qiaoyong Zhong, Di Xie, Shiliang Pu

Abstract: The knowledge replay technique has been widely used in many tasks such as continual learning and continuous domain adaptation. The key lies in how to effectively encode the knowledge extracted from previous data and replay them during current training procedure. A simple yet effective model to achieve knowledge replay is autoencoder. However, the number of stored latent codes in autoencoder increa… ▽ More The knowledge replay technique has been widely used in many tasks such as continual learning and continuous domain adaptation. The key lies in how to effectively encode the knowledge extracted from previous data and replay them during current training procedure. A simple yet effective model to achieve knowledge replay is autoencoder. However, the number of stored latent codes in autoencoder increases linearly with the scale of data and the trained encoder is redundant for the replaying stage. In this paper, we propose a novel and efficient knowledge recording network (KRNet) which directly maps an arbitrary sample identity number to the corresponding datum. Compared with autoencoder, our KRNet requires significantly ($400\times$) less storage cost for the latent codes and can be trained without the encoder sub-network. Extensive experiments validate the efficiency of KRNet, and as a showcase, it is successfully applied in the task of continual learning. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: Accepted by ICPR 2022

arXiv:2205.11071 [pdf, other]

Self-distilled Knowledge Delegator for Exemplar-free Class Incremental Learning

Authors: Fanfan Ye, Liang Ma, Qiaoyong Zhong, Di Xie, Shiliang Pu

Abstract: Exemplar-free incremental learning is extremely challenging due to inaccessibility of data from old tasks. In this paper, we attempt to exploit the knowledge encoded in a previously trained classification model to handle the catastrophic forgetting problem in continual learning. Specifically, we introduce a so-called knowledge delegator, which is capable of transferring knowledge from the trained… ▽ More Exemplar-free incremental learning is extremely challenging due to inaccessibility of data from old tasks. In this paper, we attempt to exploit the knowledge encoded in a previously trained classification model to handle the catastrophic forgetting problem in continual learning. Specifically, we introduce a so-called knowledge delegator, which is capable of transferring knowledge from the trained model to a randomly re-initialized new model by generating informative samples. Given the previous model only, the delegator is effectively learned using a self-distillation mechanism in a data-free manner. The knowledge extracted by the delegator is then utilized to maintain the performance of the model on old tasks in incremental learning. This simple incremental learning framework surpasses existing exemplar-free methods by a large margin on four widely used class incremental benchmarks, namely CIFAR-100, ImageNet-Subset, Caltech-101 and Flowers-102. Notably, we achieve comparable performance to some exemplar-based methods without accessing any exemplars. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: Accepted by IJCNN 2022

arXiv:2204.12521 [pdf]

doi 10.3390/e24091185

Quantifying the selective, stochastic, and complementary drivers of the institutional evolution in online communities

Authors: Qiankun Zhong, Seth Frey, Martin Hilbert

Abstract: Institutions and cultures evolve adaptively in response to the current environmental incentives, usually. But sometimes institutional change is due to stochastic drives beyond current fitness, including drift, path dependency, blind imitation, and complementary cooperation in fluctuating environments. Disentangling the selective and stochastic components of social system change enables us to ident… ▽ More Institutions and cultures evolve adaptively in response to the current environmental incentives, usually. But sometimes institutional change is due to stochastic drives beyond current fitness, including drift, path dependency, blind imitation, and complementary cooperation in fluctuating environments. Disentangling the selective and stochastic components of social system change enables us to identify the key features to organizational development in the long run. Evolutionary approaches provide organizational science abundant theories to demonstrate organizational evolution by tracking particular beneficial or harmful features. We measure these different drivers empirically in institutional evolution among 20,000 Minecraft communities with the help of two of the most applied evolutionary models, the Price equation and the bet-hedging model. As a result, we find strong selection pressure on administrative rules and information rules, suggesting that their positive correlation with community fitness is the main reason for their frequency change. We also find that stochastic drives decrease the average frequency of administrative rules. The result makes sense when explained in light of evolutionary bet-hedging. We show through the bet-hedging result that institutional diversity contributes to the growth and stability of rules related to information, communication, and economic behaviors. △ Less

Submitted 21 August, 2022; v1 submitted 26 April, 2022; originally announced April 2022.

Comments: 34 pages, 5 figures

arXiv:2204.07832 [pdf, other]

A Contrastive Cross-Channel Data Augmentation Framework for Aspect-based Sentiment Analysis

Authors: Bing Wang, Liang Ding, Qihuang Zhong, Ximing Li, Dacheng Tao

Abstract: Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task, which focuses on detecting the sentiment polarity towards the aspect in a sentence. However, it is always sensitive to the multi-aspect challenge, where features of multiple aspects in a sentence will affect each other. To mitigate this issue, we design a novel training framework, called Contrastive Cross-Channel Dat… ▽ More Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task, which focuses on detecting the sentiment polarity towards the aspect in a sentence. However, it is always sensitive to the multi-aspect challenge, where features of multiple aspects in a sentence will affect each other. To mitigate this issue, we design a novel training framework, called Contrastive Cross-Channel Data Augmentation (C3 DA), which leverages an in-domain generator to construct more multi-aspect samples and then boosts the robustness of ABSA models via contrastive learning on these generated data. In practice, given a generative pretrained language model and some limited ABSA labeled data, we first employ some parameter-efficient approaches to perform the in-domain fine-tuning. Then, the obtained in-domain generator is used to generate the synthetic sentences from two channels, i.e., Aspect Augmentation Channel and Polarity Augmentation Channel, which generate the sentence condition on a given aspect and polarity respectively. Specifically, our C3 DA performs the sentence generation in a cross-channel manner to obtain more sentences, and proposes an Entropy-Minimization Filter to filter low-quality generated samples. Extensive experiments show that our C3 DA can outperform those baselines without any augmentations by about 1% on accuracy and Macro- F1. Code and data are released in https://github.com/wangbing1416/C3DA. △ Less

Submitted 7 September, 2022; v1 submitted 16 April, 2022; originally announced April 2022.

Comments: COLING 2022

arXiv:2204.01934 [pdf, other]

Attention Distraction: Watermark Removal Through Continual Learning with Selective Forgetting

Authors: Qi Zhong, Leo Yu Zhang, Shengshan Hu, Longxiang Gao, Jun Zhang, Yong Xiang

Abstract: Fine-tuning attacks are effective in removing the embedded watermarks in deep learning models. However, when the source data is unavailable, it is challenging to just erase the watermark without jeopardizing the model performance. In this context, we introduce Attention Distraction (AD), a novel source data-free watermark removal attack, to make the model selectively forget the embedded watermarks… ▽ More Fine-tuning attacks are effective in removing the embedded watermarks in deep learning models. However, when the source data is unavailable, it is challenging to just erase the watermark without jeopardizing the model performance. In this context, we introduce Attention Distraction (AD), a novel source data-free watermark removal attack, to make the model selectively forget the embedded watermarks by customizing continual learning. In particular, AD first anchors the model's attention on the main task using some unlabeled data. Then, through continual learning, a small number of \textit{lures} (randomly selected natural images) that are assigned a new label distract the model's attention away from the watermarks. Experimental results from different datasets and networks corroborate that AD can thoroughly remove the watermark with a small resource budget without compromising the model's performance on the main task, which outperforms the state-of-the-art works. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: Accepted by ICME2022

arXiv:2202.09769 [pdf, other]

Dynamic Spatial Propagation Network for Depth Completion

Authors: Yuankai Lin, Tao Cheng, Qi Zhong, Wending Zhou, Hua Yang

Abstract: Image-guided depth completion aims to generate dense depth maps with sparse depth measurements and corresponding RGB images. Currently, spatial propagation networks (SPNs) are the most popular affinity-based methods in depth completion, but they still suffer from the representation limitation of the fixed affinity and the over smoothing during iterations. Our solution is to estimate independent af… ▽ More Image-guided depth completion aims to generate dense depth maps with sparse depth measurements and corresponding RGB images. Currently, spatial propagation networks (SPNs) are the most popular affinity-based methods in depth completion, but they still suffer from the representation limitation of the fixed affinity and the over smoothing during iterations. Our solution is to estimate independent affinity matrices in each SPN iteration, but it is over-parameterized and heavy calculation. This paper introduces an efficient model that learns the affinity among neighboring pixels with an attention-based, dynamic approach. Specifically, the Dynamic Spatial Propagation Network (DySPN) we proposed makes use of a non-linear propagation model (NLPM). It decouples the neighborhood into parts regarding to different distances and recursively generates independent attention maps to refine these parts into adaptive affinity matrices. Furthermore, we adopt a diffusion suppression (DS) operation so that the model converges at an early stage to prevent over-smoothing of dense depth. Finally, in order to decrease the computational cost required, we also introduce three variations that reduce the amount of neighbors and attentions needed while still retaining similar accuracy. In practice, our method requires less iteration to match the performance of other SPNs and yields better results overall. DySPN outperforms other state-of-the-art (SoTA) methods on KITTI Depth Completion (DC) evaluation by the time of submission and is able to yield SoTA performance in NYU Depth v2 dataset as well. △ Less

Submitted 20 February, 2022; originally announced February 2022.

arXiv:2202.01317 [pdf]

Governing online goods: Maturity and formalization in Minecraft, Reddit, and World of Warcraft communities

Authors: Seth Frey, Qiankun Zhong, Beril Bulat, William D. Weisman, Caitlyn Liu, Stephen Fujimoto, Hannah M. Wang, Charles M. Schweik

Abstract: Building a successful community means governing active populations and limited resources. This challenge often requires communities to design formal governance systems from scratch. But the characteristics of successful institutional designs are unclear. Communities that are more mature and established may have more elaborate formal policy systems. Alternatively, they may require less formalizatio… ▽ More Building a successful community means governing active populations and limited resources. This challenge often requires communities to design formal governance systems from scratch. But the characteristics of successful institutional designs are unclear. Communities that are more mature and established may have more elaborate formal policy systems. Alternatively, they may require less formalization precisely because of their maturity. Indeed, scholars often downplay the role that formal rules relative to unwritten rules, norms, and values. But in a community with formal rules, decisions are more consistent, transparent, and legitimate. To understand the relationship of formal institutions to community maturity and governance style, we conduct a large-scale quantitative analysis applying institutional analysis frameworks of self-governance scholar Elinor Ostrom to 80,000 communities across 3 platforms: the sandbox game Minecraft, the MMO game World of Warcraft, and Reddit. We classify communities' written rules to test predictors of institutional formalization. From this analysis we extract two major findings. First, institutional formalization, the size and complexity of an online community's governance system, is generally positively associated with maturity, as measured by age, population size, or degree of user engagement. Second, we find that online communities employ similar governance styles across platforms, strongly favoring "weak" norms to "strong" requirements. These findings suggest that designers and founders of online communities converge on styles of governance practice that are correlated with successful self-governance. With deeper insights into the patterns of successful self-governance, we can help more communities overcome the challenges of self-governance and create for their members powerful experiences of shared meaning and collective empowerment. △ Less

Submitted 2 February, 2022; originally announced February 2022.

Comments: 23 pages. 4 figures

ACM Class: J.4; H.5.3

arXiv:2201.04831 [pdf, other]

doi 10.1109/TKDE.2023.3250499

Knowledge Graph Augmented Network Towards Multiview Representation Learning for Aspect-based Sentiment Analysis

Authors: Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Hua Jin, Dacheng Tao

Abstract: Aspect-based sentiment analysis (ABSA) is a fine-grained task of sentiment analysis. To better comprehend long complicated sentences and obtain accurate aspect-specific information, linguistic and commonsense knowledge are generally required in this task. However, most current methods employ complicated and inefficient approaches to incorporate external knowledge, e.g., directly searching the grap… ▽ More Aspect-based sentiment analysis (ABSA) is a fine-grained task of sentiment analysis. To better comprehend long complicated sentences and obtain accurate aspect-specific information, linguistic and commonsense knowledge are generally required in this task. However, most current methods employ complicated and inefficient approaches to incorporate external knowledge, e.g., directly searching the graph nodes. Additionally, the complementarity between external knowledge and linguistic information has not been thoroughly studied. To this end, we propose a knowledge graph augmented network KGAN, which aims to effectively incorporate external knowledge with explicitly syntactic and contextual information. In particular, KGAN captures the sentiment feature representations from multiple different perspectives, i.e., context-, syntax- and knowledge-based. First, KGAN learns the contextual and syntactic representations in parallel to fully extract the semantic features. Then, KGAN integrates the knowledge graphs into the embedding space, based on which the aspect-specific knowledge representations are further obtained via an attention mechanism. Last, we propose a hierarchical fusion module to complement these multi-view representations in a local-to-global manner. Extensive experiments on five popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN. Notably, with the help of the pretrained model of RoBERTa, KGAN achieves a new record of state-of-the-art performance among all datasets. △ Less

Submitted 13 March, 2023; v1 submitted 13 January, 2022; originally announced January 2022.

Comments: Accepted by IEEE TKDE 2023

arXiv:2112.04178 [pdf, other]

Topology-aware Convolutional Neural Network for Efficient Skeleton-based Action Recognition

Authors: Kailin Xu, Fanfan Ye, Qiaoyong Zhong, Di Xie

Abstract: In the context of skeleton-based action recognition, graph convolutional networks (GCNs) have been rapidly developed, whereas convolutional neural networks (CNNs) have received less attention. One reason is that CNNs are considered poor in modeling the irregular skeleton topology. To alleviate this limitation, we propose a pure CNN architecture named Topology-aware CNN (Ta-CNN) in this paper. In p… ▽ More In the context of skeleton-based action recognition, graph convolutional networks (GCNs) have been rapidly developed, whereas convolutional neural networks (CNNs) have received less attention. One reason is that CNNs are considered poor in modeling the irregular skeleton topology. To alleviate this limitation, we propose a pure CNN architecture named Topology-aware CNN (Ta-CNN) in this paper. In particular, we develop a novel cross-channel feature augmentation module, which is a combo of map-attend-group-map operations. By applying the module to the coordinate level and the joint level subsequently, the topology feature is effectively enhanced. Notably, we theoretically prove that graph convolution is a special case of normal convolution when the joint dimension is treated as channels. This confirms that the topology modeling power of GCNs can also be implemented by using a CNN. Moreover, we creatively design a SkeletonMix strategy which mixes two persons in a unique manner and further boosts the performance. Extensive experiments are conducted on four widely used datasets, i.e. N-UCLA, SBU, NTU RGB+D and NTU RGB+D 120 to verify the effectiveness of Ta-CNN. We surpass existing CNN-based methods significantly. Compared with leading GCN-based methods, we achieve comparable performance with much less complexity in terms of the required GFLOPs and parameters. △ Less

Submitted 8 December, 2021; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: Accepted by AAAI 2022

arXiv:2110.13398 [pdf, other]

Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis

Authors: Juhua Liu, Qihuang Zhong, Liang Ding, Hua Jin, Bo Du, Dacheng Tao

Abstract: Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect. Because of the expensive and limited labelled data, the pretraining strategy has become the de-facto standard for ABSA. However, there always exists severe domain shift between the pretraining and downstream ABSA datasets, hindering the effective knowledge transfer when directly finetuning and making… ▽ More Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect. Because of the expensive and limited labelled data, the pretraining strategy has become the de-facto standard for ABSA. However, there always exists severe domain shift between the pretraining and downstream ABSA datasets, hindering the effective knowledge transfer when directly finetuning and making the downstream task performs sub-optimal. To mitigate such domain shift, we introduce a unified alignment pretraining framework into the vanilla pretrain-finetune pipeline with both instance- and knowledge-level alignments. Specifically, we first devise a novel coarse-to-fine retrieval sampling approach to select target domain-related instances from the large-scale pretraining dataset, thus aligning the instances between pretraining and target domains (First Stage). Then, we introduce a knowledge guidance-based strategy to further bridge the domain gap at the knowledge level. In practice, we formulate the model pretrained on the sampled instances into a knowledge guidance model and a learner model, respectively. On the target dataset, we design an on-the-fly teacher-student joint fine-tuning approach to progressively transfer the knowledge from the knowledge guidance model to the learner model (Second Stage). Thereby, the learner model can maintain more domain-invariant knowledge when learning new knowledge from the target dataset. In the Third Stage, the learner model is finetuned to better adapt its learned knowledge to the target dataset. Extensive experiments and analyses on several ABSA benchmarks demonstrate the effectiveness and universality of our proposed pretraining framework. Our source code and models are publicly available at https://github.com/WHU-ZQH/UIKA. △ Less

Submitted 26 June, 2023; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: Accepted by IEEE TASLP 2023

arXiv:2107.13118 [pdf, other]

Divide-and-Assemble: Learning Block-wise Memory for Unsupervised Anomaly Detection

Authors: Jinlei Hou, Yingying Zhang, Qiaoyong Zhong, Di Xie, Shiliang Pu, Hong Zhou

Abstract: Reconstruction-based methods play an important role in unsupervised anomaly detection in images. Ideally, we expect a perfect reconstruction for normal samples and poor reconstruction for abnormal samples. Since the generalizability of deep neural networks is difficult to control, existing models such as autoencoder do not work well. In this work, we interpret the reconstruction of an image as a d… ▽ More Reconstruction-based methods play an important role in unsupervised anomaly detection in images. Ideally, we expect a perfect reconstruction for normal samples and poor reconstruction for abnormal samples. Since the generalizability of deep neural networks is difficult to control, existing models such as autoencoder do not work well. In this work, we interpret the reconstruction of an image as a divide-and-assemble procedure. Surprisingly, by varying the granularity of division on feature maps, we are able to modulate the reconstruction capability of the model for both normal and abnormal samples. That is, finer granularity leads to better reconstruction, while coarser granularity leads to poorer reconstruction. With proper granularity, the gap between the reconstruction error of normal and abnormal samples can be maximized. The divide-and-assemble framework is implemented by embedding a novel multi-scale block-wise memory module into an autoencoder network. Besides, we introduce adversarial learning and explore the semantic latent representation of the discriminator, which improves the detection of subtle anomaly. We achieve state-of-the-art performance on the challenging MVTec AD dataset. Remarkably, we improve the vanilla autoencoder model by 10.1% in terms of the AUROC score. △ Less

Submitted 27 July, 2021; originally announced July 2021.

Comments: accepted by ICCV 2021

arXiv:2107.00316 [pdf, other]

Leveraging Domain Agnostic and Specific Knowledge for Acronym Disambiguation

Authors: Qiwei Zhong, Guanxiong Zeng, Danqing Zhu, Yang Zhang, Wangli Lin, Ben Chen, Jiayu Tang

Abstract: An obstacle to scientific document understanding is the extensive use of acronyms which are shortened forms of long technical phrases. Acronym disambiguation aims to find the correct meaning of an ambiguous acronym in a given text. Recent efforts attempted to incorporate word embeddings and deep learning architectures, and achieved significant effects in this task. In general domains, kinds of fin… ▽ More An obstacle to scientific document understanding is the extensive use of acronyms which are shortened forms of long technical phrases. Acronym disambiguation aims to find the correct meaning of an ambiguous acronym in a given text. Recent efforts attempted to incorporate word embeddings and deep learning architectures, and achieved significant effects in this task. In general domains, kinds of fine-grained pretrained language models have sprung up, thanks to the largescale corpora which can usually be obtained through crowdsourcing. However, these models based on domain agnostic knowledge might achieve insufficient performance when directly applied to the scientific domain. Moreover, obtaining large-scale high-quality annotated data and representing high-level semantics in the scientific domain is challenging and expensive. In this paper, we consider both the domain agnostic and specific knowledge, and propose a Hierarchical Dual-path BERT method coined hdBERT to capture the general fine-grained and high-level specific representations for acronym disambiguation. First, the context-based pretrained models, RoBERTa and SciBERT, are elaborately involved in encoding these two kinds of knowledge respectively. Second, multiple layer perceptron is devised to integrate the dualpath representations simultaneously and outputs the prediction. With a widely adopted SciAD dataset contained 62,441 sentences, we investigate the effectiveness of hdBERT. The experimental results exhibit that the proposed approach outperforms state-of-the-art methods among various evaluation metrics. Specifically, its macro F1 achieves 93.73%. △ Less

Submitted 1 July, 2021; originally announced July 2021.

Comments: Second Place Solution, Accepted to SDU@AAAI-21

arXiv:2105.03567 [pdf, other]

Multimodal and Contrastive Learning for Click Fraud Detection

Authors: Weibin Li, Qiwei Zhong, Qingyang Zhao, Hongchun Zhang, Xiaonan Meng

Abstract: Advertising click fraud detection plays one of the vital roles in current E-commerce websites as advertising is an essential component of its business model. It aims at, given a set of corresponding features, e.g., demographic information of users and statistical features of clicks, predicting whether a click is fraudulent or not in the community. Recent efforts attempted to incorporate attributed… ▽ More Advertising click fraud detection plays one of the vital roles in current E-commerce websites as advertising is an essential component of its business model. It aims at, given a set of corresponding features, e.g., demographic information of users and statistical features of clicks, predicting whether a click is fraudulent or not in the community. Recent efforts attempted to incorporate attributed behavior sequence and heterogeneous network for extracting complex features of users and achieved significant effects on click fraud detection. In this paper, we propose a Multimodal and Contrastive learning network for Click Fraud detection (MCCF). Specifically, motivated by the observations on differences of demographic information, behavior sequences and media relationship between fraudsters and genuine users on E-commerce platform, MCCF jointly utilizes wide and deep features, behavior sequence and heterogeneous network to distill click representations. Moreover, these three modules are integrated by contrastive learning and collaboratively contribute to the final predictions. With the real-world datasets containing 2.54 million clicks on Alibaba platform, we investigate the effectiveness of MCCF. The experimental results show that the proposed approach is able to improve AUC by 7.2% and F1-score by 15.6%, compared with the state-of-the-art methods. △ Less

Submitted 7 May, 2021; originally announced May 2021.

Comments: Accepted to DeMal@WWW 2021

arXiv:2104.13636 [pdf, ps, other]

Point Cloud Learning with Transformer

Authors: Qi Zhong, Xian-Feng Han

Abstract: Remarkable performance from Transformer networks in Natural Language Processing promote the development of these models in dealing with computer vision tasks such as image recognition and segmentation. In this paper, we introduce a novel framework, called Multi-level Multi-scale Point Transformer (MLMSPT) that works directly on the irregular point clouds for representation learning. Specifically,… ▽ More Remarkable performance from Transformer networks in Natural Language Processing promote the development of these models in dealing with computer vision tasks such as image recognition and segmentation. In this paper, we introduce a novel framework, called Multi-level Multi-scale Point Transformer (MLMSPT) that works directly on the irregular point clouds for representation learning. Specifically, a point pyramid transformer is investigated to model features with diverse resolutions or scales we defined, followed by a multi-level transformer module to aggregate contextual information from different levels of each scale and enhance their interactions. While a multi-scale transformer module is designed to capture the dependencies among representations across different scales. Extensive evaluation on public benchmark datasets demonstrate the effectiveness and the competitive performance of our methods on 3D shape classification, segmentation tasks. △ Less

Submitted 24 October, 2022; v1 submitted 28 April, 2021; originally announced April 2021.

Comments: 10 pages, 4 figures

arXiv:2103.10685 [pdf, other]

doi 10.1145/3447548.3467418

Controllable Generation from Pre-trained Language Models via Inverse Prompting

Authors: Xu Zou, Da Yin, Qingyang Zhong, Ming Ding, Hongxia Yang, Zhilin Yang, Jie Tang

Abstract: Large-scale pre-trained language models have demonstrated strong capabilities of generating realistic text. However, it remains challenging to control the generation results. Previous approaches such as prompting are far from sufficient, which limits the usage of language models. To tackle this challenge, we propose an innovative method, inverse prompting, to better control text generation. The co… ▽ More Large-scale pre-trained language models have demonstrated strong capabilities of generating realistic text. However, it remains challenging to control the generation results. Previous approaches such as prompting are far from sufficient, which limits the usage of language models. To tackle this challenge, we propose an innovative method, inverse prompting, to better control text generation. The core idea of inverse prompting is to use generated text to inversely predict the prompt during beam search, which enhances the relevance between the prompt and the generated text and provides better controllability. Empirically, we pre-train a large-scale Chinese language model to perform a systematic study using human evaluation on the tasks of open-domain poem generation and open-domain long-form question answering. Our results show that our proposed method substantially outperforms the baselines and that our generation quality is close to human performance on some of the tasks. Narrators can try our poem generation demo at https://pretrain.aminer.cn/apps/poetry.html, while our QA demo can be found at https://pretrain.aminer.cn/app/qa. For researchers, the code is provided in https://github.com/THUDM/InversePrompting. △ Less

Submitted 9 November, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

Comments: Slightly different from the KDD version

arXiv:2103.08958 [pdf, other]

Modulating Localization and Classification for Harmonized Object Detection

Authors: Taiheng Zhang, Qiaoyong Zhong, Shiliang Pu, Di Xie

Abstract: Object detection involves two sub-tasks, i.e. localizing objects in an image and classifying them into various categories. For existing CNN-based detectors, we notice the widespread divergence between localization and classification, which leads to degradation in performance. In this work, we propose a mutual learning framework to modulate the two tasks. In particular, the two tasks are forced to… ▽ More Object detection involves two sub-tasks, i.e. localizing objects in an image and classifying them into various categories. For existing CNN-based detectors, we notice the widespread divergence between localization and classification, which leads to degradation in performance. In this work, we propose a mutual learning framework to modulate the two tasks. In particular, the two tasks are forced to learn from each other with a novel mutual labeling strategy. Besides, we introduce a simple yet effective IoU rescoring scheme, which further reduces the divergence. Moreover, we define a Spearman rank correlation-based metric to quantify the divergence, which correlates well with the detection performance. The proposed approach is general-purpose and can be easily injected into existing detectors such as FCOS and RetinaNet. We achieve a significant performance gain over the baseline detectors on the COCO dataset. △ Less

Submitted 25 March, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

Comments: Accepted by ICME 2021

arXiv:2101.03700 [pdf, other]

AT-BERT: Adversarial Training BERT for Acronym Identification Winning Solution for SDU@AAAI-21

Authors: Danqing Zhu, Wangli Lin, Yang Zhang, Qiwei Zhong, Guanxiong Zeng, Weilin Wu, Jiayu Tang

Abstract: Acronym identification focuses on finding the acronyms and the phrases that have been abbreviated, which is crucial for scientific document understanding tasks. However, the limited size of manually annotated datasets hinders further improvement for the problem. Recent breakthroughs of language models pre-trained on large corpora clearly show that unsupervised pre-training can vastly improve the p… ▽ More Acronym identification focuses on finding the acronyms and the phrases that have been abbreviated, which is crucial for scientific document understanding tasks. However, the limited size of manually annotated datasets hinders further improvement for the problem. Recent breakthroughs of language models pre-trained on large corpora clearly show that unsupervised pre-training can vastly improve the performance of downstream tasks. In this paper, we present an Adversarial Training BERT method named AT-BERT, our winning solution to acronym identification task for Scientific Document Understanding (SDU) Challenge of AAAI 2021. Specifically, the pre-trained BERT is adopted to capture better semantic representation. Then we incorporate the FGM adversarial training strategy into the fine-tuning of BERT, which makes the model more robust and generalized. Furthermore, an ensemble mechanism is devised to involve the representations learned from multiple BERT variants. Assembling all these components together, the experimental results on the SciAI dataset show that our proposed approach outperforms all other competitive state-of-the-art methods. △ Less

Submitted 12 January, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

Comments: Accepted to SDU @ AAAI 2021, 8 pages, 3 figures

arXiv:2009.04597 [pdf]

Institutional Similarity Drives Cultural Similarity among Online Communities

Authors: Qiankun Zhong, Seth Frey

Abstract: Understanding online communities requires an appreciation of both structure and culture. But basic questions remain difficult to pose. How do these facets interact and drive each other? Using data on the membership and governance styles of 5,000 small-scale online communities, we construct empirical measures for cross-server similarities in institutional structure and culture to explore the influe… ▽ More Understanding online communities requires an appreciation of both structure and culture. But basic questions remain difficult to pose. How do these facets interact and drive each other? Using data on the membership and governance styles of 5,000 small-scale online communities, we construct empirical measures for cross-server similarities in institutional structure and culture to explore the influence of institutional environment on their culture, and the influence of culture on their institutional environment. To establish the influence of culture and institutions on each other, we construct networks of communities, linking those that are more similar either in their members or governance. We then use network analysis to assess the causal relationships between shared culture and institutions. Our result shows that while effects in both directions are evident, there is a much stronger role for institutions on culture than culture on institutions. These processes are evident within administrative and informational type rules. △ Less

Submitted 9 September, 2020; originally announced September 2020.

Comments: 39 pages, 8 figures

MSC Class: H.5.3; J.4; K.6.4

arXiv:2007.14690 [pdf, other]

Dynamic GCN: Context-enriched Topology Learning for Skeleton-based Action Recognition

Authors: Fanfan Ye, Shiliang Pu, Qiaoyong Zhong, Chao Li, Di Xie, Huiming Tang

Abstract: Graph Convolutional Networks (GCNs) have attracted increasing interests for the task of skeleton-based action recognition. The key lies in the design of the graph structure, which encodes skeleton topology information. In this paper, we propose Dynamic GCN, in which a novel convolutional neural network named Contextencoding Network (CeN) is introduced to learn skeleton topology automatically. In p… ▽ More Graph Convolutional Networks (GCNs) have attracted increasing interests for the task of skeleton-based action recognition. The key lies in the design of the graph structure, which encodes skeleton topology information. In this paper, we propose Dynamic GCN, in which a novel convolutional neural network named Contextencoding Network (CeN) is introduced to learn skeleton topology automatically. In particular, when learning the dependency between two joints, contextual features from the rest joints are incorporated in a global manner. CeN is extremely lightweight yet effective, and can be embedded into a graph convolutional layer. By stacking multiple CeN-enabled graph convolutional layers, we build Dynamic GCN. Notably, as a merit of CeN, dynamic graph topologies are constructed for different input samples as well as graph convolutional layers of various depths. Besides, three alternative context modeling architectures are well explored, which may serve as a guideline for future research on graph topology learning. CeN brings only ~7% extra FLOPs for the baseline model, and Dynamic GCN achieves better performance with $2\times$~$4\times$ fewer FLOPs than existing methods. By further combining static physical body connections and motion modalities, we achieve state-of-the-art performance on three large-scale benchmarks, namely NTU-RGB+D, NTU-RGB+D 120 and Skeleton-Kinetics. △ Less

Submitted 29 July, 2020; originally announced July 2020.

Comments: accepted by ACMMM2020

Showing 1–50 of 58 results for author: Zhong, Q