Skip to main content

Showing 1–50 of 52 results for author: Wong, D F

  1. arXiv:2407.08733  [pdf, other

    cs.CL

    Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist

    Authors: Zihao Zhou, Shudong Liu, Maizhen Ning, Wei Liu, Jindong Wang, Derek F. Wong, Xiaowei Huang, Qiufeng Wang, Kaizhu Huang

    Abstract: Exceptional mathematical reasoning ability is one of the key features that demonstrate the power of large language models (LLMs). How to comprehensively define and evaluate the mathematical abilities of LLMs, and even reflect the user experience in real-world scenarios, has emerged as a critical issue. Current benchmarks predominantly concentrate on problem-solving capabilities, which presents a s… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 35 pages, 10 figures, preprint

  2. arXiv:2406.11432  [pdf, other

    cs.CV cs.AI

    AnyTrans: Translate AnyText in the Image with Large Scale Models

    Authors: Zhipeng Qian, Pei Zhang, Baosong Yang, Kai Fan, Yiwei Ma, Derek F. Wong, Xiaoshuai Sun, Rongrong Ji

    Abstract: This paper introduces AnyTrans, an all-encompassing framework for the task-Translate AnyText in the Image (TATI), which includes multilingual text translation and text fusion within images. Our framework leverages the strengths of large-scale models, such as Large Language Models (LLMs) and text-guided diffusion models, to incorporate contextual cues from both textual and visual elements during tr… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2406.07054  [pdf, other

    cs.CL cs.AI

    CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation

    Authors: Renhao Li, Minghuan Tan, Derek F. Wong, Min Yang

    Abstract: In recent years, instruction fine-tuning (IFT) on large language models (LLMs) has garnered considerable attention to enhance model performance on unseen tasks. Attempts have been made on automatic construction and effective selection for IFT data. However, we posit that previous methods have not fully harnessed the potential of LLMs for enhancing data quality. The responses within IFT data could… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  4. arXiv:2406.03450  [pdf, other

    cs.CL cs.AI

    What is the Best Way for ChatGPT to Translate Poetry?

    Authors: Shanshan Wang, Derek F. Wong, Jingming Yao, Lidia S. Chao

    Abstract: Machine translation (MT) has historically faced significant challenges when applied to literary works, particularly in the domain of poetry translation. The advent of Large Language Models such as ChatGPT holds potential for innovation in this field. This study examines ChatGPT's capabilities in English-Chinese poetry translation tasks, utilizing targeted prompts and small sample scenarios to asce… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 19 pages, 1 figure. The paper has been accepted by ACL 2024(Main Conference)

  5. arXiv:2406.00839  [pdf, other

    cs.CL cs.AI

    FOCUS: Forging Originality through Contrastive Use in Self-Plagiarism for Language Models

    Authors: Kaixin Lan, Tao Fang, Derek F. Wong, Yabo Xu, Lidia S. Chao, Cecilia G. Zhao

    Abstract: Pre-trained Language Models (PLMs) have shown impressive results in various Natural Language Generation (NLG) tasks, such as powering chatbots and generating stories. However, an ethical concern arises due to their potential to produce verbatim copies of paragraphs from their training data. This is problematic as PLMs are trained on corpora constructed by human authors. As such, there is a pressin… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 16 pages, 8 figures. The paper has been accepted by ACL 2024 (Findings), with Kaixin Lan and Tao Fang contributing equally, and Derek F. Wong serving as the corresponding author

  6. arXiv:2405.14039  [pdf, other

    cs.CL cs.AI cs.LG

    Trajectory Volatility for Out-of-Distribution Detection in Mathematical Reasoning

    Authors: Yiming Wang, Pei Zhang, Baosong Yang, Derek F. Wong, Zhuosheng Zhang, Rui Wang

    Abstract: Real-world data deviating from the independent and identically distributed (i.i.d.) assumption of in-distribution training data poses security threats to deep networks, thus advancing out-of-distribution (OOD) detection algorithms. Detection methods in generative language models (GLMs) mainly focus on uncertainty estimation and embedding distance measurement, with the latter proven to be most effe… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 27 pages, 6 figures, 12 tables

  7. arXiv:2405.04286  [pdf, other

    cs.CL

    Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore

    Authors: Junchao Wu, Runzhe Zhan, Derek F. Wong, Shu Yang, Xuebo Liu, Lidia S. Chao, Min Zhang

    Abstract: The efficacy of an large language model (LLM) generated text detector depends substantially on the availability of sizable training data. White-box zero-shot detectors, which require no such data, are nonetheless limited by the accessibility of the source model of the LLM-generated text. In this paper, we propose an simple but effective black-box zero-shot detection approach, predicated on the obs… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  8. arXiv:2405.02925  [pdf, other

    cs.CL

    A Two-Stage Prediction-Aware Contrastive Learning Framework for Multi-Intent NLU

    Authors: Guanhua Chen, Yutong Yao, Derek F. Wong, Lidia S. Chao

    Abstract: Multi-intent natural language understanding (NLU) presents a formidable challenge due to the model confusion arising from multiple intents within a single utterance. While previous works train the model contrastively to increase the margin between different multi-intent labels, they are less suited to the nuances of multi-intent NLU. They ignore the rich information between the shared intents, whi… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: LREC-COLING 2024

  9. arXiv:2404.18413  [pdf, other

    cs.CV cs.AI

    3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset

    Authors: Xinyu Ma, Xuebo Liu, Derek F. Wong, Jun Rao, Bei Li, Liang Ding, Lidia S. Chao, Dacheng Tao, Min Zhang

    Abstract: Multimodal machine translation (MMT) is a challenging task that seeks to improve translation quality by incorporating visual information. However, recent studies have indicated that the visual information provided by existing MMT datasets is insufficient, causing models to disregard it and overestimate their capabilities. This issue presents a significant obstacle to the development of MMT researc… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  10. arXiv:2404.16766  [pdf, other

    cs.CL cs.AI

    Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model

    Authors: Runzhe Zhan, Xinyi Yang, Derek F. Wong, Lidia S. Chao, Yue Zhang

    Abstract: While supervised fine-tuning (SFT) has been a straightforward approach for tailoring the output of foundation large language model (LLM) to specific preferences, concerns have been raised about the depth of this alignment, with some critiques suggesting it is merely "superficial". We critically examine this hypothesis within the scope of cross-lingual generation tasks, proposing that the effective… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  11. arXiv:2403.11621  [pdf, other

    cs.CL

    Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model

    Authors: Haoyun Xu, Runzhe Zhan, Derek F. Wong, Lidia S. Chao

    Abstract: Large Language Models (LLMs) are composed of neurons that exhibit various behaviors and roles, which become increasingly diversified as models scale. Recent studies have revealed that not all neurons are active across different datasets, and this sparsity correlates positively with the task-specific ability, leading to advancements in model pruning and training efficiency. Traditional fine-tuning… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  12. arXiv:2402.16705  [pdf, other

    cs.CL cs.AI cs.LG

    SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection

    Authors: Liangxin Liu, Xuebo Liu, Derek F. Wong, Dongfang Li, Ziyi Wang, Baotian Hu, Min Zhang

    Abstract: Instruction tuning (IT) is crucial to tailoring large language models (LLMs) towards human-centric interactions. Recent advancements have shown that the careful selection of a small, high-quality subset of IT data can significantly enhance the performance of LLMs. Despite this, common approaches often rely on additional models or data sets, which increases costs and limits widespread adoption. In… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  13. arXiv:2402.07616  [pdf, other

    cs.CL cs.AI

    Anchor-based Large Language Models

    Authors: Jianhui Pang, Fanghua Ye, Derek Fai Wong, Xin He, Wanshun Chen, Longyue Wang

    Abstract: Large language models (LLMs) predominantly employ decoder-only transformer architectures, necessitating the retention of keys/values information for historical tokens to provide contextual information and avoid redundant computation. However, the substantial size and parameter volume of these LLMs require massive GPU memory. This memory demand increases with the length of the input text, leading t… ▽ More

    Submitted 1 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: The paper has been accepted by the ACL2024 conference. Work was done when Jianhui Pang and Fanghua Ye were interning at Tencent AI Lab

  14. arXiv:2401.12794  [pdf, other

    cs.CL

    Benchmarking LLMs via Uncertainty Quantification

    Authors: Fanghua Ye, Mingming Yang, Jianhui Pang, Longyue Wang, Derek F. Wong, Emine Yilmaz, Shuming Shi, Zhaopeng Tu

    Abstract: The proliferation of open-source Large Language Models (LLMs) from various institutions has highlighted the urgent need for comprehensive evaluation methods. However, current evaluation platforms, such as the widely recognized HuggingFace open LLM leaderboard, neglect a crucial aspect -- uncertainty, which is vital for thoroughly assessing LLMs. To bridge this gap, we introduce a new benchmarking… ▽ More

    Submitted 25 April, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: 25 pages, preprints

  15. arXiv:2401.08350  [pdf, other

    cs.CL

    Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models

    Authors: Jianhui Pang, Fanghua Ye, Longyue Wang, Dian Yu, Derek F. Wong, Shuming Shi, Zhaopeng Tu

    Abstract: The evolution of Neural Machine Translation (NMT) has been significantly influenced by six core challenges (Koehn and Knowles, 2017), which have acted as benchmarks for progress in this field. This study revisits these challenges, offering insights into their ongoing relevance in the context of advanced Large Language Models (LLMs): domain mismatch, amount of parallel data, rare word prediction, t… ▽ More

    Submitted 17 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: 17 pages. Longyue Wang is the Corresponding Author

  16. arXiv:2310.14724  [pdf, other

    cs.CL cs.AI

    A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions

    Authors: Junchao Wu, Shu Yang, Runzhe Zhan, Yulin Yuan, Derek F. Wong, Lidia S. Chao

    Abstract: The powerful ability to understand, follow, and generate complex language emerging from large language models (LLMs) makes LLM-generated text flood many areas of our daily lives at an incredible speed and is widely accepted by humans. As LLMs continue to expand, there is an imperative need to develop detectors that can detect LLM-generated text. This is crucial to mitigate potential misuse of LLMs… ▽ More

    Submitted 19 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  17. arXiv:2310.08908  [pdf, other

    cs.CL

    Human-in-the-loop Machine Translation with Large Language Model

    Authors: Xinyi Yang, Runzhe Zhan, Derek F. Wong, Junchao Wu, Lidia S. Chao

    Abstract: The large language model (LLM) has garnered significant attention due to its in-context learning mechanisms and emergent capabilities. The research community has conducted several pilot studies to apply LLMs to machine translation tasks and evaluate their performance from diverse perspectives. However, previous research has primarily focused on the LLM itself and has not explored human interventio… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted to MT Summit 2023

  18. arXiv:2305.19847  [pdf, other

    cs.CL cs.AI

    How Does Pretraining Improve Discourse-Aware Translation?

    Authors: Zhihong Huang, Longyue Wang, Siyou Liu, Derek F. Wong

    Abstract: Pretrained language models (PLMs) have produced substantial improvements in discourse-aware neural machine translation (NMT), for example, improved coherence in spoken language translation. However, the underlying reasons for their strong performance have not been well explained. To bridge this gap, we introduce a probing task to interpret the ability of PLMs to capture discourse relation knowledg… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023

  19. arXiv:2305.01951  [pdf, other

    cs.CL

    Can LMs Generalize to Future Data? An Empirical Analysis on Text Summarization

    Authors: Chi Seng Cheang, Hou Pong Chan, Derek F. Wong, Xuebo Liu, Zhaocong Li, Yanming Sun, Shudong Liu, Lidia S. Chao

    Abstract: Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets. However, existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets. Hence, the strong performance of PLMs may rely on the parametric knowledge that is memorized during pre-training and fine-tuning. Moreover, the knowledge memoriz… ▽ More

    Submitted 2 November, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted at EMNLP 2023

  20. arXiv:2305.01181  [pdf, other

    cs.CL

    A Paradigm Shift: The Future of Machine Translation Lies with Large Language Models

    Authors: Chenyang Lyu, Zefeng Du, Jitao Xu, Yitao Duan, Minghao Wu, Teresa Lynn, Alham Fikri Aji, Derek F. Wong, Siyou Liu, Longyue Wang

    Abstract: Machine Translation (MT) has greatly advanced over the years due to the developments in deep neural networks. However, the emergence of Large Language Models (LLMs) like GPT-4 and ChatGPT is introducing a new phase in the MT domain. In this context, we believe that the future of MT is intricately tied to the capabilities of LLMs. These models not only offer vast linguistic understandings but also… ▽ More

    Submitted 1 April, 2024; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: Accepted to LREC-COLING 2024

  21. arXiv:2304.01746  [pdf, other

    cs.CL

    Is ChatGPT a Highly Fluent Grammatical Error Correction System? A Comprehensive Evaluation

    Authors: Tao Fang, Shu Yang, Kaixin Lan, Derek F. Wong, Jinpeng Hu, Lidia S. Chao, Yue Zhang

    Abstract: ChatGPT, a large-scale language model based on the advanced GPT-3.5 architecture, has shown remarkable potential in various Natural Language Processing (NLP) tasks. However, there is currently a dearth of comprehensive study exploring its potential in the area of Grammatical Error Correction (GEC). To showcase its capabilities in GEC, we design zero-shot chain-of-thought (CoT) and few-shot CoT set… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  22. arXiv:2303.12723  [pdf, other

    cs.CV

    AdaOPC: A Self-Adaptive Mask Optimization Framework For Real Design Patterns

    Authors: Wenqian Zhao, Xufeng Yao, Ziyang Yu, Guojin Chen, Yuzhe Ma, Bei Yu, Martin D. F. Wong

    Abstract: Optical proximity correction (OPC) is a widely-used resolution enhancement technique (RET) for printability optimization. Recently, rigorous numerical optimization and fast machine learning are the research focus of OPC in both academia and industry, each of which complements the other in terms of robustness or efficiency. We inspect the pattern distribution on a design layer and find that differe… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

  23. A High-Performance Accelerator for Super-Resolution Processing on Embedded GPU

    Authors: Wenqian Zhao, Qi Sun, Yang Bai, Wenbo Li, Haisheng Zheng, Bei Yu, Martin D. F. Wong

    Abstract: Recent years have witnessed impressive progress in super-resolution (SR) processing. However, its real-time inference requirement sets a challenge not only for the model design but also for the on-chip implementation. In this paper, we implement a full-stack SR acceleration framework on embedded GPU devices. The special dictionary learning algorithm used in SR models was analyzed in detail and acc… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

  24. arXiv:2303.08435  [pdf, other

    cs.CV cs.LG eess.IV

    Physics-Informed Optical Kernel Regression Using Complex-valued Neural Fields

    Authors: Guojin Chen, Zehua Pei, Haoyu Yang, Yuzhe Ma, Bei Yu, Martin D. F. Wong

    Abstract: Lithography is fundamental to integrated circuit fabrication, necessitating large computation overhead. The advancement of machine learning (ML)-based lithography models alleviates the trade-offs between manufacturing process expense and capability. However, all previous methods regard the lithography system as an image-to-image black box mapping, utilizing network parameters to learn by rote mapp… ▽ More

    Submitted 9 April, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted by DAC23

  25. arXiv:2302.08975  [pdf, other

    cs.CL

    Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors

    Authors: Keqin Bao, Yu Wan, Dayiheng Liu, Baosong Yang, Wenqiang Lei, Xiangnan He, Derek F. Wong, Jun Xie

    Abstract: Fine-grained information on translation errors is helpful for the translation evaluation community. Existing approaches can not synchronously consider error position and type, failing to integrate the error information of both. In this paper, we propose Fine-Grained Translation Error Detection (FG-TED) task, aiming at identifying both the position and the type of translation errors on given source… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  26. arXiv:2212.10179  [pdf, other

    cs.CL

    Toward Human-Like Evaluation for Natural Language Generation with Error Analysis

    Authors: Qingyu Lu, Liang Ding, Liping Xie, Kanjian Zhang, Derek F. Wong, Dacheng Tao

    Abstract: The state-of-the-art language model-based automatic metrics, e.g. BARTScore, benefiting from large-scale contextualized pre-training, have been successfully used in a wide range of natural language generation (NLG) tasks, including machine translation, text summarization, and data-to-text. Recent studies show that considering both major errors (e.g. mistranslated tokens) and minor errors (e.g. imp… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: work in progress

  27. arXiv:2212.04262  [pdf, other

    cs.CL cs.AI cs.LG

    ConsistTL: Modeling Consistency in Transfer Learning for Low-Resource Neural Machine Translation

    Authors: Zhaocong Li, Xuebo Liu, Derek F. Wong, Lidia S. Chao, Min Zhang

    Abstract: Transfer learning is a simple and powerful method that can be used to boost model performance of low-resource neural machine translation (NMT). Existing transfer learning methods for NMT are static, which simply transfer knowledge from a parent model to a child model once via parameter initialization. In this paper, we propose a novel transfer learning method for NMT, namely ConsistTL, which can c… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Comments: Accepted to EMNLP 2022

  28. arXiv:2210.10049  [pdf, other

    cs.CL

    Alibaba-Translate China's Submission for WMT 2022 Quality Estimation Shared Task

    Authors: Keqin Bao, Yu Wan, Dayiheng Liu, Baosong Yang, Wenqiang Lei, Xiangnan He, Derek F. Wong, Jun Xie

    Abstract: In this paper, we present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE (Unified Translation Evaluation). Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model. First, we apply the pseudo-labeled data examples for the continuously pre-training phase.… ▽ More

    Submitted 17 February, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: WMT 2022 QE Shared Task. arXiv admin note: text overlap with arXiv:2210.09683

  29. arXiv:2210.09683  [pdf, other

    cs.CL

    Alibaba-Translate China's Submission for WMT 2022 Metrics Shared Task

    Authors: Yu Wan, Keqin Bao, Dayiheng Liu, Baosong Yang, Derek F. Wong, Lidia S. Chao, Wenqiang Lei, Jun Xie

    Abstract: In this report, we present our submission to the WMT 2022 Metrics Shared Task. We build our system based on the core idea of UNITE (Unified Translation Evaluation), which unifies source-only, reference-only, and source-reference-combined evaluation scenarios into one single model. Specifically, during the model pre-training phase, we first apply the pseudo-labeled data examples to continuously pre… ▽ More

    Submitted 17 February, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: WMT 2022 Metrics Shared Task

  30. Attention Mechanism with Energy-Friendly Operations

    Authors: Yu Wan, Baosong Yang, Dayiheng Liu, Rong Xiao, Derek F. Wong, Haibo Zhang, Boxing Chen, Lidia S. Chao

    Abstract: Attention mechanism has become the dominant module in natural language processing models. It is computationally intensive and depends on massive power-hungry multiplications. In this paper, we rethink variants of attention mechanism from the energy consumption aspects. After reaching the conclusion that the energy costs of several energy-friendly operations are far less than their multiplication c… ▽ More

    Submitted 28 April, 2022; originally announced April 2022.

    Comments: Findings@ACL2022

  31. arXiv:2204.13352  [pdf, other

    cs.CL

    RoBLEURT Submission for the WMT2021 Metrics Task

    Authors: Yu Wan, Dayiheng Liu, Baosong Yang, Tianchi Bi, Haibo Zhang, Boxing Chen, Weihua Luo, Derek F. Wong, Lidia S. Chao

    Abstract: In this paper, we present our submission to Shared Metrics Task: RoBLEURT (Robustly Optimizing the training of BLEURT). After investigating the recent advances of trainable metrics, we conclude several aspects of vital importance to obtain a well-performed metric model by: 1) jointly leveraging the advantages of source-included model and reference-only model, 2) continuously pre-training the model… ▽ More

    Submitted 28 April, 2022; originally announced April 2022.

    Comments: WMT2021 Metrics Shared Task

  32. UniTE: Unified Translation Evaluation

    Authors: Yu Wan, Dayiheng Liu, Baosong Yang, Haibo Zhang, Boxing Chen, Derek F. Wong, Lidia S. Chao

    Abstract: Translation quality evaluation plays a crucial role in machine translation. According to the input format, it is mainly separated into three tasks, i.e., reference-only, source-only and source-reference-combined. Recent methods, despite their promising results, are specifically designed and optimized on one of them. This limits the convenience of these methods, and overlooks the commonalities amon… ▽ More

    Submitted 28 April, 2022; originally announced April 2022.

    Comments: ACL2022

  33. arXiv:2111.04079  [pdf, other

    cs.CL

    Variance-Aware Machine Translation Test Sets

    Authors: Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao

    Abstract: We release 70 small and discriminative test sets for machine translation (MT) evaluation called variance-aware test sets (VAT), covering 35 translation directions from WMT16 to WMT20 competitions. VAT is automatically created by a novel variance-aware filtering method that filters the indiscriminative test instances of the current MT test sets without any human labor. Experimental results show tha… ▽ More

    Submitted 7 November, 2021; originally announced November 2021.

    Comments: Accepted to NeurIPS 2021 Datasets and Benchmarks Track

  34. arXiv:2110.01811  [pdf, other

    cs.CL cs.LG

    On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation

    Authors: Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu

    Abstract: Pre-training (PT) and back-translation (BT) are two simple and powerful methods to utilize monolingual data for improving the model performance of neural machine translation (NMT). This paper takes the first step to investigate the complementarity between PT and BT. We introduce two probing tasks for PT and BT respectively and find that PT mainly contributes to the encoder module while BT brings m… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Comments: Accepted to Findings of EMNLP 2021

  35. arXiv:2107.14402  [pdf, other

    cs.CL cs.AI

    Difficulty-Aware Machine Translation Evaluation

    Authors: Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao

    Abstract: The high-quality translation results produced by machine translation (MT) systems still pose a huge challenge for automatic evaluation. Current MT evaluation pays the same attention to each sentence component, while the questions of real-world examinations (e.g., university examinations) have different difficulties and weightings. In this paper, we propose a novel difficulty-aware MT evaluation me… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

    Comments: Accepted to ACL 2021

  36. arXiv:2107.08212  [pdf, other

    cs.CL cs.LG

    On the Copying Behaviors of Pre-Training for Neural Machine Translation

    Authors: Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu

    Abstract: Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to re… ▽ More

    Submitted 17 July, 2021; originally announced July 2021.

    Comments: Accepted to Findings of ACL 2021

  37. arXiv:2106.05546  [pdf, other

    cs.CL

    Progressive Multi-Granularity Training for Non-Autoregressive Translation

    Authors: Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, Dacheng Tao, Zhaopeng Tu

    Abstract: Non-autoregressive translation (NAT) significantly accelerates the inference process via predicting the entire target sequence. However, recent studies show that NAT is weak at learning high-mode of knowledge such as one-to-many translations. We argue that modes can be divided into various granularities which can be learned from easy to hard. In this study, we empirically show that NAT models are… ▽ More

    Submitted 11 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: ACL 2021, Short Findings

  38. arXiv:2106.00903  [pdf, other

    cs.CL cs.AI

    Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation

    Authors: Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, Dacheng Tao, Zhaopeng Tu

    Abstract: Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models. However, there exists a discrepancy on low-frequency words between the distilled and the original data, leading to more errors on predicting low-frequency words. To alleviate the problem, we directly expose the raw data into NAT by leveraging pretraining. By analyzing… ▽ More

    Submitted 26 April, 2022; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: ACL 2021

  39. arXiv:2103.02262  [pdf, other

    cs.CL cs.LG

    Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation

    Authors: Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao

    Abstract: Meta-learning has been sufficiently validated to be beneficial for low-resource neural machine translation (NMT). However, we find that meta-trained NMT fails to improve the translation performance of the domain unseen at the meta-training stage. In this paper, we aim to alleviate this issue by proposing a novel meta-curriculum learning for domain adaptation in NMT. During meta-training, the NMT f… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: Accepted to AAAI 2021

  40. arXiv:2012.14768  [pdf, other

    cs.CL cs.LG

    Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning

    Authors: Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Zhaopeng Tu

    Abstract: Encoder layer fusion (EncoderFusion) is a technique to fuse all the encoder layers (instead of the uppermost layer) for sequence-to-sequence (Seq2Seq) models, which has proven effective on various NLP tasks. However, it is still not entirely clear why and when EncoderFusion should work. In this paper, our main contribution is to take a step further in understanding EncoderFusion. Many of previous… ▽ More

    Submitted 18 March, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: Accepted to ICLR 2021

  41. arXiv:2012.14583  [pdf, other

    cs.CL

    Understanding and Improving Lexical Choice in Non-Autoregressive Translation

    Authors: Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, Dacheng Tao, Zhaopeng Tu

    Abstract: Knowledge distillation (KD) is essential for training non-autoregressive translation (NAT) models by reducing the complexity of the raw data with an autoregressive teacher model. In this study, we empirically show that as a side effect of this training, the lexical choice errors on low-frequency words are propagated to the NAT model from the teacher model. To alleviate this problem, we propose to… ▽ More

    Submitted 27 January, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

    Comments: ICLR 2021

  42. arXiv:2012.03477  [pdf, other

    cs.CL

    Document Graph for Neural Machine Translation

    Authors: Mingzhou Xu, Liangyou Li, Derek. F. Wong, Qun Liu, Lidia S. Chao

    Abstract: Previous works have shown that contextual information can improve the performance of neural machine translation (NMT). However, most existing document-level NMT methods only consider a few number of previous sentences. How to make use of the whole document as global contexts is still a challenge. To address this issue, we hypothesize that a document can be represented as a graph that connects rele… ▽ More

    Submitted 14 September, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: Accepted by EMNLP2021

  43. Self-Paced Learning for Neural Machine Translation

    Authors: Yu Wan, Baosong Yang, Derek F. Wong, Yikai Zhou, Lidia S. Chao, Haibo Zhang, Boxing Chen

    Abstract: Recent studies have proven that the training of neural machine translation (NMT) can be facilitated by mimicking the learning process of humans. Nevertheless, achievements of such kind of curriculum learning rely on the quality of artificial schedule drawn up with the handcrafted features, e.g. sentence length or word rarity. We ameliorate this procedure with a more flexible manner by proposing se… ▽ More

    Submitted 13 October, 2020; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: Accepted by EMNLP2020

  44. Modeling Voting for System Combination in Machine Translation

    Authors: Xuancheng Huang, Jiacheng Zhang, Zhixing Tan, Derek F. Wong, Huanbo Luan, Jingfang Xu, Maosong Sun, Yang Liu

    Abstract: System combination is an important technique for combining the hypotheses of different machine translation systems to improve translation performance. Although early statistical approaches to system combination have been proven effective in analyzing the consensus between hypotheses, they suffer from the error propagation problem due to the use of pipelines. While this problem has been alleviated… ▽ More

    Submitted 14 July, 2020; originally announced July 2020.

    Comments: Accepted by main track of IJCAI2020;SOLE copyright holder is IJCAI (international Joint Conferences on Artificial Intelligence), all rights reserved. https://www.ijcai.org/Proceedings/2020/511

  45. arXiv:2006.02014  [pdf, other

    cs.CL cs.LG

    Norm-Based Curriculum Learning for Neural Machine Translation

    Authors: Xuebo Liu, Houtim Lai, Derek F. Wong, Lidia S. Chao

    Abstract: A neural machine translation (NMT) system is expensive to train, especially with high-resource settings. As the NMT architectures become deeper and wider, this issue gets worse and worse. In this paper, we aim to improve the efficiency of training an NMT by introducing a novel norm-based curriculum learning method. We use the norm (aka length or module) of a word embedding as a measure of 1) the d… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: Accepted to ACL 2020

  46. Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling

    Authors: Yu Wan, Baosong Yang, Derek F. Wong, Lidia S. Chao, Haihua Du, Ben C. H. Ao

    Abstract: As a special machine translation task, dialect translation has two main characteristics: 1) lack of parallel training corpus; and 2) possessing similar grammar between two sides of the translation. In this paper, we investigate how to exploit the commonality and diversity between dialects thus to build unsupervised translation models merely accessing to monolingual data. Specifically, we leverage… ▽ More

    Submitted 11 December, 2019; originally announced December 2019.

    Comments: AAAI 2020

  47. arXiv:1906.03100  [pdf, other

    cs.CL cs.AI

    Shared-Private Bilingual Word Embeddings for Neural Machine Translation

    Authors: Xuebo Liu, Derek F. Wong, Yang Liu, Lidia S. Chao, Tong Xiao, Jingbo Zhu

    Abstract: Word embedding is central to neural machine translation (NMT), which has attracted intensive research interest in recent years. In NMT, the source embedding plays the role of the entrance while the target embedding acts as the terminal. These layers occupy most of the model parameters for representation learning. Furthermore, they indirectly interface via a soft-attention mechanism, which makes th… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

    Comments: Accepted to ACL 2019

  48. arXiv:1906.01787  [pdf, other

    cs.CL cs.LG

    Learning Deep Transformer Models for Machine Translation

    Authors: Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, Lidia S. Chao

    Abstract: Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for the development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep network… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: Accepted by ACL 2019

  49. arXiv:1906.00592  [pdf, other

    cs.CL cs.AI

    Assessing the Ability of Self-Attention Networks to Learn Word Order

    Authors: Baosong Yang, Longyue Wang, Derek F. Wong, Lidia S. Chao, Zhaopeng Tu

    Abstract: Self-attention networks (SAN) have attracted a lot of interests due to their high parallelization and strong performance on a variety of NLP tasks, e.g. machine translation. Due to the lack of recurrence structure such as recurrent neural networks (RNN), SAN is ascribed to be weak at learning positional information of words for sequence modeling. However, neither this speculation has been empirica… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: ACL 2019

  50. arXiv:1810.13320   

    cs.CL

    Convolutional Self-Attention Network

    Authors: Baosong Yang, Longyue Wang, Derek F. Wong, Lidia S. Chao, Zhaopeng Tu

    Abstract: Self-attention network (SAN) has recently attracted increasing interest due to its fully parallelized computation and flexibility in modeling dependencies. It can be further enhanced with multi-headed attention mechanism by allowing the model to jointly attend to information from different representation subspaces at different positions (Vaswani et al., 2017). In this work, we propose a novel conv… ▽ More

    Submitted 8 April, 2019; v1 submitted 31 October, 2018; originally announced October 2018.

    Comments: The least version of this paper has been uploaded to another link: arXiv:1904.03107