Skip to main content

Showing 1–34 of 34 results for author: Chao, L S

  1. arXiv:2406.03450  [pdf, other

    cs.CL cs.AI

    What is the Best Way for ChatGPT to Translate Poetry?

    Authors: Shanshan Wang, Derek F. Wong, Jingming Yao, Lidia S. Chao

    Abstract: Machine translation (MT) has historically faced significant challenges when applied to literary works, particularly in the domain of poetry translation. The advent of Large Language Models such as ChatGPT holds potential for innovation in this field. This study examines ChatGPT's capabilities in English-Chinese poetry translation tasks, utilizing targeted prompts and small sample scenarios to asce… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 19 pages, 1 figure. The paper has been accepted by ACL 2024(Main Conference)

  2. arXiv:2406.00839  [pdf, other

    cs.CL cs.AI

    FOCUS: Forging Originality through Contrastive Use in Self-Plagiarism for Language Models

    Authors: Kaixin Lan, Tao Fang, Derek F. Wong, Yabo Xu, Lidia S. Chao, Cecilia G. Zhao

    Abstract: Pre-trained Language Models (PLMs) have shown impressive results in various Natural Language Generation (NLG) tasks, such as powering chatbots and generating stories. However, an ethical concern arises due to their potential to produce verbatim copies of paragraphs from their training data. This is problematic as PLMs are trained on corpora constructed by human authors. As such, there is a pressin… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 16 pages, 8 figures. The paper has been accepted by ACL 2024 (Findings), with Kaixin Lan and Tao Fang contributing equally, and Derek F. Wong serving as the corresponding author

  3. arXiv:2405.04286  [pdf, other

    cs.CL

    Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore

    Authors: Junchao Wu, Runzhe Zhan, Derek F. Wong, Shu Yang, Xuebo Liu, Lidia S. Chao, Min Zhang

    Abstract: The efficacy of an large language model (LLM) generated text detector depends substantially on the availability of sizable training data. White-box zero-shot detectors, which require no such data, are nonetheless limited by the accessibility of the source model of the LLM-generated text. In this paper, we propose an simple but effective black-box zero-shot detection approach, predicated on the obs… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  4. arXiv:2405.02925  [pdf, other

    cs.CL

    A Two-Stage Prediction-Aware Contrastive Learning Framework for Multi-Intent NLU

    Authors: Guanhua Chen, Yutong Yao, Derek F. Wong, Lidia S. Chao

    Abstract: Multi-intent natural language understanding (NLU) presents a formidable challenge due to the model confusion arising from multiple intents within a single utterance. While previous works train the model contrastively to increase the margin between different multi-intent labels, they are less suited to the nuances of multi-intent NLU. They ignore the rich information between the shared intents, whi… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: LREC-COLING 2024

  5. arXiv:2404.18413  [pdf, other

    cs.CV cs.AI

    3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset

    Authors: Xinyu Ma, Xuebo Liu, Derek F. Wong, Jun Rao, Bei Li, Liang Ding, Lidia S. Chao, Dacheng Tao, Min Zhang

    Abstract: Multimodal machine translation (MMT) is a challenging task that seeks to improve translation quality by incorporating visual information. However, recent studies have indicated that the visual information provided by existing MMT datasets is insufficient, causing models to disregard it and overestimate their capabilities. This issue presents a significant obstacle to the development of MMT researc… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  6. arXiv:2404.16766  [pdf, other

    cs.CL cs.AI

    Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model

    Authors: Runzhe Zhan, Xinyi Yang, Derek F. Wong, Lidia S. Chao, Yue Zhang

    Abstract: While supervised fine-tuning (SFT) has been a straightforward approach for tailoring the output of foundation large language model (LLM) to specific preferences, concerns have been raised about the depth of this alignment, with some critiques suggesting it is merely "superficial". We critically examine this hypothesis within the scope of cross-lingual generation tasks, proposing that the effective… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  7. arXiv:2403.11621  [pdf, other

    cs.CL

    Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model

    Authors: Haoyun Xu, Runzhe Zhan, Derek F. Wong, Lidia S. Chao

    Abstract: Large Language Models (LLMs) are composed of neurons that exhibit various behaviors and roles, which become increasingly diversified as models scale. Recent studies have revealed that not all neurons are active across different datasets, and this sparsity correlates positively with the task-specific ability, leading to advancements in model pruning and training efficiency. Traditional fine-tuning… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  8. arXiv:2310.14724  [pdf, other

    cs.CL cs.AI

    A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions

    Authors: Junchao Wu, Shu Yang, Runzhe Zhan, Yulin Yuan, Derek F. Wong, Lidia S. Chao

    Abstract: The powerful ability to understand, follow, and generate complex language emerging from large language models (LLMs) makes LLM-generated text flood many areas of our daily lives at an incredible speed and is widely accepted by humans. As LLMs continue to expand, there is an imperative need to develop detectors that can detect LLM-generated text. This is crucial to mitigate potential misuse of LLMs… ▽ More

    Submitted 19 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  9. arXiv:2310.08908  [pdf, other

    cs.CL

    Human-in-the-loop Machine Translation with Large Language Model

    Authors: Xinyi Yang, Runzhe Zhan, Derek F. Wong, Junchao Wu, Lidia S. Chao

    Abstract: The large language model (LLM) has garnered significant attention due to its in-context learning mechanisms and emergent capabilities. The research community has conducted several pilot studies to apply LLMs to machine translation tasks and evaluate their performance from diverse perspectives. However, previous research has primarily focused on the LLM itself and has not explored human interventio… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted to MT Summit 2023

  10. arXiv:2305.01951  [pdf, other

    cs.CL

    Can LMs Generalize to Future Data? An Empirical Analysis on Text Summarization

    Authors: Chi Seng Cheang, Hou Pong Chan, Derek F. Wong, Xuebo Liu, Zhaocong Li, Yanming Sun, Shudong Liu, Lidia S. Chao

    Abstract: Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets. However, existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets. Hence, the strong performance of PLMs may rely on the parametric knowledge that is memorized during pre-training and fine-tuning. Moreover, the knowledge memoriz… ▽ More

    Submitted 2 November, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted at EMNLP 2023

  11. arXiv:2304.01746  [pdf, other

    cs.CL

    Is ChatGPT a Highly Fluent Grammatical Error Correction System? A Comprehensive Evaluation

    Authors: Tao Fang, Shu Yang, Kaixin Lan, Derek F. Wong, Jinpeng Hu, Lidia S. Chao, Yue Zhang

    Abstract: ChatGPT, a large-scale language model based on the advanced GPT-3.5 architecture, has shown remarkable potential in various Natural Language Processing (NLP) tasks. However, there is currently a dearth of comprehensive study exploring its potential in the area of Grammatical Error Correction (GEC). To showcase its capabilities in GEC, we design zero-shot chain-of-thought (CoT) and few-shot CoT set… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  12. arXiv:2212.04262  [pdf, other

    cs.CL cs.AI cs.LG

    ConsistTL: Modeling Consistency in Transfer Learning for Low-Resource Neural Machine Translation

    Authors: Zhaocong Li, Xuebo Liu, Derek F. Wong, Lidia S. Chao, Min Zhang

    Abstract: Transfer learning is a simple and powerful method that can be used to boost model performance of low-resource neural machine translation (NMT). Existing transfer learning methods for NMT are static, which simply transfer knowledge from a parent model to a child model once via parameter initialization. In this paper, we propose a novel transfer learning method for NMT, namely ConsistTL, which can c… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Comments: Accepted to EMNLP 2022

  13. arXiv:2210.09683  [pdf, other

    cs.CL

    Alibaba-Translate China's Submission for WMT 2022 Metrics Shared Task

    Authors: Yu Wan, Keqin Bao, Dayiheng Liu, Baosong Yang, Derek F. Wong, Lidia S. Chao, Wenqiang Lei, Jun Xie

    Abstract: In this report, we present our submission to the WMT 2022 Metrics Shared Task. We build our system based on the core idea of UNITE (Unified Translation Evaluation), which unifies source-only, reference-only, and source-reference-combined evaluation scenarios into one single model. Specifically, during the model pre-training phase, we first apply the pseudo-labeled data examples to continuously pre… ▽ More

    Submitted 17 February, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: WMT 2022 Metrics Shared Task

  14. Attention Mechanism with Energy-Friendly Operations

    Authors: Yu Wan, Baosong Yang, Dayiheng Liu, Rong Xiao, Derek F. Wong, Haibo Zhang, Boxing Chen, Lidia S. Chao

    Abstract: Attention mechanism has become the dominant module in natural language processing models. It is computationally intensive and depends on massive power-hungry multiplications. In this paper, we rethink variants of attention mechanism from the energy consumption aspects. After reaching the conclusion that the energy costs of several energy-friendly operations are far less than their multiplication c… ▽ More

    Submitted 28 April, 2022; originally announced April 2022.

    Comments: Findings@ACL2022

  15. arXiv:2204.13352  [pdf, other

    cs.CL

    RoBLEURT Submission for the WMT2021 Metrics Task

    Authors: Yu Wan, Dayiheng Liu, Baosong Yang, Tianchi Bi, Haibo Zhang, Boxing Chen, Weihua Luo, Derek F. Wong, Lidia S. Chao

    Abstract: In this paper, we present our submission to Shared Metrics Task: RoBLEURT (Robustly Optimizing the training of BLEURT). After investigating the recent advances of trainable metrics, we conclude several aspects of vital importance to obtain a well-performed metric model by: 1) jointly leveraging the advantages of source-included model and reference-only model, 2) continuously pre-training the model… ▽ More

    Submitted 28 April, 2022; originally announced April 2022.

    Comments: WMT2021 Metrics Shared Task

  16. UniTE: Unified Translation Evaluation

    Authors: Yu Wan, Dayiheng Liu, Baosong Yang, Haibo Zhang, Boxing Chen, Derek F. Wong, Lidia S. Chao

    Abstract: Translation quality evaluation plays a crucial role in machine translation. According to the input format, it is mainly separated into three tasks, i.e., reference-only, source-only and source-reference-combined. Recent methods, despite their promising results, are specifically designed and optimized on one of them. This limits the convenience of these methods, and overlooks the commonalities amon… ▽ More

    Submitted 28 April, 2022; originally announced April 2022.

    Comments: ACL2022

  17. arXiv:2111.04079  [pdf, other

    cs.CL

    Variance-Aware Machine Translation Test Sets

    Authors: Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao

    Abstract: We release 70 small and discriminative test sets for machine translation (MT) evaluation called variance-aware test sets (VAT), covering 35 translation directions from WMT16 to WMT20 competitions. VAT is automatically created by a novel variance-aware filtering method that filters the indiscriminative test instances of the current MT test sets without any human labor. Experimental results show tha… ▽ More

    Submitted 7 November, 2021; originally announced November 2021.

    Comments: Accepted to NeurIPS 2021 Datasets and Benchmarks Track

  18. arXiv:2110.01811  [pdf, other

    cs.CL cs.LG

    On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation

    Authors: Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu

    Abstract: Pre-training (PT) and back-translation (BT) are two simple and powerful methods to utilize monolingual data for improving the model performance of neural machine translation (NMT). This paper takes the first step to investigate the complementarity between PT and BT. We introduce two probing tasks for PT and BT respectively and find that PT mainly contributes to the encoder module while BT brings m… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Comments: Accepted to Findings of EMNLP 2021

  19. arXiv:2107.14402  [pdf, other

    cs.CL cs.AI

    Difficulty-Aware Machine Translation Evaluation

    Authors: Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao

    Abstract: The high-quality translation results produced by machine translation (MT) systems still pose a huge challenge for automatic evaluation. Current MT evaluation pays the same attention to each sentence component, while the questions of real-world examinations (e.g., university examinations) have different difficulties and weightings. In this paper, we propose a novel difficulty-aware MT evaluation me… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

    Comments: Accepted to ACL 2021

  20. arXiv:2107.08212  [pdf, other

    cs.CL cs.LG

    On the Copying Behaviors of Pre-Training for Neural Machine Translation

    Authors: Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu

    Abstract: Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to re… ▽ More

    Submitted 17 July, 2021; originally announced July 2021.

    Comments: Accepted to Findings of ACL 2021

  21. arXiv:2103.02262  [pdf, other

    cs.CL cs.LG

    Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation

    Authors: Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao

    Abstract: Meta-learning has been sufficiently validated to be beneficial for low-resource neural machine translation (NMT). However, we find that meta-trained NMT fails to improve the translation performance of the domain unseen at the meta-training stage. In this paper, we aim to alleviate this issue by proposing a novel meta-curriculum learning for domain adaptation in NMT. During meta-training, the NMT f… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: Accepted to AAAI 2021

  22. arXiv:2012.14768  [pdf, other

    cs.CL cs.LG

    Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning

    Authors: Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Zhaopeng Tu

    Abstract: Encoder layer fusion (EncoderFusion) is a technique to fuse all the encoder layers (instead of the uppermost layer) for sequence-to-sequence (Seq2Seq) models, which has proven effective on various NLP tasks. However, it is still not entirely clear why and when EncoderFusion should work. In this paper, our main contribution is to take a step further in understanding EncoderFusion. Many of previous… ▽ More

    Submitted 18 March, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: Accepted to ICLR 2021

  23. arXiv:2012.03477  [pdf, other

    cs.CL

    Document Graph for Neural Machine Translation

    Authors: Mingzhou Xu, Liangyou Li, Derek. F. Wong, Qun Liu, Lidia S. Chao

    Abstract: Previous works have shown that contextual information can improve the performance of neural machine translation (NMT). However, most existing document-level NMT methods only consider a few number of previous sentences. How to make use of the whole document as global contexts is still a challenge. To address this issue, we hypothesize that a document can be represented as a graph that connects rele… ▽ More

    Submitted 14 September, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: Accepted by EMNLP2021

  24. Self-Paced Learning for Neural Machine Translation

    Authors: Yu Wan, Baosong Yang, Derek F. Wong, Yikai Zhou, Lidia S. Chao, Haibo Zhang, Boxing Chen

    Abstract: Recent studies have proven that the training of neural machine translation (NMT) can be facilitated by mimicking the learning process of humans. Nevertheless, achievements of such kind of curriculum learning rely on the quality of artificial schedule drawn up with the handcrafted features, e.g. sentence length or word rarity. We ameliorate this procedure with a more flexible manner by proposing se… ▽ More

    Submitted 13 October, 2020; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: Accepted by EMNLP2020

  25. arXiv:2006.02014  [pdf, other

    cs.CL cs.LG

    Norm-Based Curriculum Learning for Neural Machine Translation

    Authors: Xuebo Liu, Houtim Lai, Derek F. Wong, Lidia S. Chao

    Abstract: A neural machine translation (NMT) system is expensive to train, especially with high-resource settings. As the NMT architectures become deeper and wider, this issue gets worse and worse. In this paper, we aim to improve the efficiency of training an NMT by introducing a novel norm-based curriculum learning method. We use the norm (aka length or module) of a word embedding as a measure of 1) the d… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: Accepted to ACL 2020

  26. Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling

    Authors: Yu Wan, Baosong Yang, Derek F. Wong, Lidia S. Chao, Haihua Du, Ben C. H. Ao

    Abstract: As a special machine translation task, dialect translation has two main characteristics: 1) lack of parallel training corpus; and 2) possessing similar grammar between two sides of the translation. In this paper, we investigate how to exploit the commonality and diversity between dialects thus to build unsupervised translation models merely accessing to monolingual data. Specifically, we leverage… ▽ More

    Submitted 11 December, 2019; originally announced December 2019.

    Comments: AAAI 2020

  27. arXiv:1906.03100  [pdf, other

    cs.CL cs.AI

    Shared-Private Bilingual Word Embeddings for Neural Machine Translation

    Authors: Xuebo Liu, Derek F. Wong, Yang Liu, Lidia S. Chao, Tong Xiao, Jingbo Zhu

    Abstract: Word embedding is central to neural machine translation (NMT), which has attracted intensive research interest in recent years. In NMT, the source embedding plays the role of the entrance while the target embedding acts as the terminal. These layers occupy most of the model parameters for representation learning. Furthermore, they indirectly interface via a soft-attention mechanism, which makes th… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

    Comments: Accepted to ACL 2019

  28. arXiv:1906.01787  [pdf, other

    cs.CL cs.LG

    Learning Deep Transformer Models for Machine Translation

    Authors: Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, Lidia S. Chao

    Abstract: Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for the development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep network… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: Accepted by ACL 2019

  29. arXiv:1906.00592  [pdf, other

    cs.CL cs.AI

    Assessing the Ability of Self-Attention Networks to Learn Word Order

    Authors: Baosong Yang, Longyue Wang, Derek F. Wong, Lidia S. Chao, Zhaopeng Tu

    Abstract: Self-attention networks (SAN) have attracted a lot of interests due to their high parallelization and strong performance on a variety of NLP tasks, e.g. machine translation. Due to the lack of recurrence structure such as recurrent neural networks (RNN), SAN is ascribed to be weak at learning positional information of words for sequence modeling. However, neither this speculation has been empirica… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: ACL 2019

  30. arXiv:1904.03107  [pdf, other

    cs.CL cs.AI

    Convolutional Self-Attention Networks

    Authors: Baosong Yang, Longyue Wang, Derek Wong, Lidia S. Chao, Zhaopeng Tu

    Abstract: Self-attention networks (SANs) have drawn increasing interest due to their high parallelization in computation and flexibility in modeling dependencies. SANs can be further enhanced with multi-head attention by allowing the model to attend to information from different representation subspaces. In this work, we propose novel convolutional self-attention networks, which offer SANs the abilities to… ▽ More

    Submitted 5 April, 2019; originally announced April 2019.

    Comments: NAACL 2019

  31. arXiv:1902.05766  [pdf, other

    cs.CL

    Context-Aware Self-Attention Networks

    Authors: Baosong Yang, Jian Li, Derek Wong, Lidia S. Chao, Xing Wang, Zhaopeng Tu

    Abstract: Self-attention model have shown its flexibility in parallel computation and the effectiveness on modeling both long- and short-term dependencies. However, it calculates the dependencies between representations without considering the contextual information, which have proven useful for modeling dependencies among neural representations in various natural language tasks. In this work, we focus on i… ▽ More

    Submitted 15 February, 2019; originally announced February 2019.

    Comments: AAAI 2019

  32. arXiv:1810.13320   

    cs.CL

    Convolutional Self-Attention Network

    Authors: Baosong Yang, Longyue Wang, Derek F. Wong, Lidia S. Chao, Zhaopeng Tu

    Abstract: Self-attention network (SAN) has recently attracted increasing interest due to its fully parallelized computation and flexibility in modeling dependencies. It can be further enhanced with multi-headed attention mechanism by allowing the model to jointly attend to information from different representation subspaces at different positions (Vaswani et al., 2017). In this work, we propose a novel conv… ▽ More

    Submitted 8 April, 2019; v1 submitted 31 October, 2018; originally announced October 2018.

    Comments: The least version of this paper has been uploaded to another link: arXiv:1904.03107

  33. arXiv:1810.10182  [pdf, other

    cs.CL cs.AI

    Modeling Localness for Self-Attention Networks

    Authors: Baosong Yang, Zhaopeng Tu, Derek F. Wong, Fandong Meng, Lidia S. Chao, Tong Zhang

    Abstract: Self-attention networks have proven to be of profound value for its strength of capturing global dependencies. In this work, we propose to model localness for self-attention networks, which enhances the ability of capturing useful local context. We cast localness modeling as a learnable Gaussian bias, which indicates the central and scope of the local region to be paid more attention. The bias is… ▽ More

    Submitted 24 October, 2018; originally announced October 2018.

    Comments: EMNLP 2018

  34. arXiv:1707.05114  [pdf, ps, other

    cs.CL

    Towards Bidirectional Hierarchical Representations for Attention-Based Neural Machine Translation

    Authors: Baosong Yang, Derek F. Wong, Tong Xiao, Lidia S. Chao, Jingbo Zhu

    Abstract: This paper proposes a hierarchical attentional neural translation model which focuses on enhancing source-side hierarchical representations by covering both local and global semantic information using a bidirectional tree-based encoder. To maximize the predictive likelihood of target words, a weighted variant of an attention mechanism is used to balance the attentive information between lexical an… ▽ More

    Submitted 17 July, 2017; originally announced July 2017.

    Comments: Accepted for publication at EMNLP 2017