Skip to main content

Showing 1–18 of 18 results for author: D'Haro, L F

  1. arXiv:2406.05967  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

    Authors: David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja, Bontu Fufa Balcha, Chenxi Whitehouse, Christian Salamea, Dan John Velasco, David Ifeoluwa Adelani, David Le Meur, Emilio Villa-Cueva, Fajri Koto, Fauzan Farooqui, Frederico Belcavello, Ganzorig Batnasan, Gisela Vallejo, Grainne Caulfield, Guido Ivetta, Haiyue Song , et al. (50 additional authors not shown)

    Abstract: Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recen… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  2. arXiv:2405.14646  [pdf, other

    cs.CL

    Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models

    Authors: Yiming Chen, Chen Zhang, Danqing Luo, Luis Fernando D'Haro, Robby T. Tan, Haizhou Li

    Abstract: The automatic evaluation of natural language generation (NLG) systems presents a long-lasting challenge. Recent studies have highlighted various neural metrics that align well with human evaluations. Yet, the robustness of these evaluators against adversarial perturbations remains largely under-explored due to the unique challenges in obtaining adversarial data for different NLG evaluation tasks.… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: ACL24 Finding

  3. arXiv:2402.09030  [pdf, other

    cs.RO

    Awareness in robotics: An early perspective from the viewpoint of the EIC Pathfinder Challenge "Awareness Inside''

    Authors: Cosimo Della Santina, Carlos Hernandez Corbato, Burak Sisman, Luis A. Leiva, Ioannis Arapakis, Michalis Vakalellis, Jean Vanderdonckt, Luis Fernando D'Haro, Guido Manzi, Cristina Becchio, Aïda Elamrani, Mohsen Alirezaei, Ginevra Castellano, Dimos V. Dimarogonas, Arabinda Ghosh, Sofie Haesaert, Sadegh Soudjani, Sybert Stroeve, Paul Verschure, Davide Bacciu, Ophelia Deroy, Bahador Bahrami, Claudio Gallicchio, Sabine Hauert, Ricardo Sanz , et al. (6 additional authors not shown)

    Abstract: Consciousness has been historically a heavily debated topic in engineering, science, and philosophy. On the contrary, awareness had less success in raising the interest of scholars in the past. However, things are changing as more and more researchers are getting interested in answering questions concerning what awareness is and how it can be artificially generated. The landscape is rapidly evolvi… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  4. arXiv:2312.15407  [pdf, other

    cs.CL

    A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

    Authors: Chen Zhang, Luis Fernando D'Haro, Yiming Chen, Malu Zhang, Haizhou Li

    Abstract: Automatic evaluation is an integral aspect of dialogue system research. The traditional reference-based NLG metrics are generally found to be unsuitable for dialogue assessment. Consequently, recent studies have suggested various unique, reference-free neural metrics that better align with human evaluations. Notably among them, large language models (LLMs), particularly the instruction-tuned varia… ▽ More

    Submitted 20 January, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

    Comments: An extended version of AAAI-2024 camera-ready paper (appendix included, 16 pages)

  5. arXiv:2310.08958  [pdf, other

    cs.CL

    xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark

    Authors: Chen Zhang, Luis Fernando D'Haro, Chengguang Tang, Ke Shi, Guohua Tang, Haizhou Li

    Abstract: Recent advancements in reference-free learned metrics for open-domain dialogue evaluation have been driven by the progress in pre-trained language models and the availability of dialogue data with high-quality human annotations. However, current studies predominantly concentrate on English dialogues, and the generalization of these metrics to other languages has not been fully examined. This is la… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP-2023 Findings

  6. arXiv:2306.12794  [pdf, other

    cs.CL

    Overview of Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems at DSTC 11 Track 4

    Authors: Mario Rodríguez-Cantelar, Chen Zhang, Chengguang Tang, Ke Shi, Sarik Ghazarian, João Sedoc, Luis Fernando D'Haro, Alexander Rudnicky

    Abstract: The advent and fast development of neural networks have revolutionized the research on dialogue systems and subsequently have triggered various challenges regarding their automatic evaluation. Automatic evaluation of open-domain dialogue systems as an open challenge has been the center of the attention of many researchers. Despite the consistent efforts to improve automatic metrics' correlations w… ▽ More

    Submitted 13 September, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

  7. arXiv:2212.08992  [pdf, other

    cs.CL

    PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment

    Authors: Chen Zhang, Luis Fernando D'Haro, Qiquan Zhang, Thomas Friedrichs, Haizhou Li

    Abstract: Chatbots are expected to be knowledgeable across multiple domains, e.g. for daily chit-chat, exchange of information, and grounding in emotional situations. To effectively measure the quality of such conversational agents, a model-based automatic dialogue evaluation metric (ADEM) is expected to perform well across multiple domains. Despite significant progress, an ADEM that works well in one domai… ▽ More

    Submitted 17 December, 2022; originally announced December 2022.

    Comments: Currently under review at TASLP, upload to arxiv for easy cross-reference

  8. arXiv:2210.13832  [pdf, other

    cs.CL

    FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation

    Authors: Chen Zhang, Luis Fernando D'Haro, Qiquan Zhang, Thomas Friedrichs, Haizhou Li

    Abstract: Recent model-based reference-free metrics for open-domain dialogue evaluation exhibit promising correlations with human judgment. However, they either perform turn-level evaluation or look at a single dialogue quality dimension. One would expect a good evaluation metric to assess multiple quality dimensions at the dialogue level. To this end, we are motivated to propose a multi-dimensional dialogu… ▽ More

    Submitted 29 October, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: EMNLP-2022, 20 pages

  9. arXiv:2203.10012  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges

    Authors: Shikib Mehri, Jinho Choi, Luis Fernando D'Haro, Jan Deriu, Maxine Eskenazi, Milica Gasic, Kallirroi Georgila, Dilek Hakkani-Tur, Zekang Li, Verena Rieser, Samira Shaikh, David Traum, Yi-Ting Yeh, Zhou Yu, Yizhe Zhang, Chen Zhang

    Abstract: This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog. The workshop explored the current state of the art along with its limitations and suggested promising directions for future work in this important and very rapidly changing area of research.

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: Report from the NSF AED Workshop (http://dialrc.org/AED/)

  10. arXiv:2112.07194  [pdf, other

    cs.CL

    MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation

    Authors: Chen Zhang, Luis Fernando D'Haro, Thomas Friedrichs, Haizhou Li

    Abstract: Chatbots are designed to carry out human-like conversations across different domains, such as general chit-chat, knowledge exchange, and persona-grounded conversations. To measure the quality of such conversational agents, a dialogue evaluator is expected to conduct assessment across domains as well. However, most of the state-of-the-art automatic dialogue evaluation metrics (ADMs) are not designe… ▽ More

    Submitted 16 January, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: AAAI-2022 Preprint (corrected the missing citation issue.)

  11. arXiv:2111.02110  [pdf, other

    cs.CL cs.HC

    Automatic Evaluation and Moderation of Open-domain Dialogue Systems

    Authors: Chen Zhang, João Sedoc, Luis Fernando D'Haro, Rafael Banchs, Alexander Rudnicky

    Abstract: The development of Open-Domain Dialogue Systems (ODS)is a trending topic due to the large number of research challenges, large societal and business impact, and advances in the underlying technology. However, the development of these kinds of systems requires two important characteristics:1) automatic evaluation mechanisms that show high correlations with human judgements across multiple dialogue… ▽ More

    Submitted 23 December, 2021; v1 submitted 3 November, 2021; originally announced November 2021.

  12. arXiv:2110.01895  [pdf, other

    cs.CL

    Investigating the Impact of Pre-trained Language Models on Dialog Evaluation

    Authors: Chen Zhang, Luis Fernando D'Haro, Yiming Chen, Thomas Friedrichs, Haizhou Li

    Abstract: Recently, there is a surge of interest in applying pre-trained language models (Pr-LM) in automatic open-domain dialog evaluation. Pr-LMs offer a promising direction for addressing the multi-domain evaluation challenge. Yet, the impact of different Pr-LMs on the performance of automatic metrics is not well-understood. This paper examines 8 different Pr-LMs and studies their impact on three typical… ▽ More

    Submitted 2 November, 2021; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: Accepted by IWSDS2021 (Long Paper)

  13. arXiv:2106.01112  [pdf, other

    cs.CL

    DynaEval: Unifying Turn and Dialogue Level Evaluation

    Authors: Chen Zhang, Yiming Chen, Luis Fernando D'Haro, Yan Zhang, Thomas Friedrichs, Grandee Lee, Haizhou Li

    Abstract: A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation metrics should reflect the dynamics of such interaction. Existing automatic metrics are focused very much on the turn-level quality, while ignoring such dynamics. To this end, we propose DynaEval, a unified automatic evaluation framework which is not only capable of performing turn-level evaluation, but al… ▽ More

    Submitted 6 June, 2021; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: ACL-IJCNLP 2021 (Main conference, Long paper)

  14. arXiv:2011.06486  [pdf, ps, other

    cs.CL

    Overview of the Ninth Dialog System Technology Challenge: DSTC9

    Authors: Chulaka Gunasekara, Seokhwan Kim, Luis Fernando D'Haro, Abhinav Rastogi, Yun-Nung Chen, Mihail Eric, Behnam Hedayatnia, Karthik Gopalakrishnan, Yang Liu, Chao-Wei Huang, Dilek Hakkani-Tür, Jinchao Li, Qi Zhu, Lingxiao Luo, Lars Liden, Kaili Huang, Shahin Shayandeh, Runze Liang, Baolin Peng, Zheng Zhang, Swadheen Shukla, Minlie Huang, Jianfeng Gao, Shikib Mehri, Yulan Feng , et al. (14 additional authors not shown)

    Abstract: This paper introduces the Ninth Dialog System Technology Challenge (DSTC-9). This edition of the DSTC focuses on applying end-to-end dialog technologies for four distinct tasks in dialog systems, namely, 1. Task-oriented dialog Modeling with unstructured knowledge access, 2. Multi-domain task-oriented dialog, 3. Interactive evaluation of dialog, and 4. Situated interactive multi-modal dialog. This… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

  15. arXiv:1910.07150  [pdf, other

    cs.CL cs.AI cs.LG

    Joint Learning of Word and Label Embeddings for Sequence Labelling in Spoken Language Understanding

    Authors: Jiewen Wu, Luis Fernando D'Haro, Nancy F. Chen, Pavitra Krishnaswamy, Rafael E. Banchs

    Abstract: We propose an architecture to jointly learn word and label embeddings for slot filling in spoken language understanding. The proposed approach encodes labels using a combination of word embeddings and straightforward word-label association from the training data. Compared to the state-of-the-art methods, our approach does not require label embeddings as part of the input and therefore lends itself… ▽ More

    Submitted 15 October, 2019; originally announced October 2019.

    Comments: Accepted for publication at ASRU 2019

  16. arXiv:1901.03461  [pdf, ps, other

    cs.CL

    Dialog System Technology Challenge 7

    Authors: Koichiro Yoshino, Chiori Hori, Julien Perez, Luis Fernando D'Haro, Lazaros Polymenakos, Chulaka Gunasekara, Walter S. Lasecki, Jonathan K. Kummerfeld, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan, Xiang Gao, Huda Alamari, Tim K. Marks, Devi Parikh, Dhruv Batra

    Abstract: This paper introduces the Seventh Dialog System Technology Challenges (DSTC), which use shared datasets to explore the problem of building dialog systems. Recently, end-to-end dialog modeling approaches have been applied to various dialog tasks. The seventh DSTC (DSTC7) focuses on developing technologies related to end-to-end dialog systems for (1) sentence selection, (2) sentence generation and (… ▽ More

    Submitted 10 January, 2019; originally announced January 2019.

    Comments: This paper is presented at NIPS2018 2nd Conversational AI workshop

  17. arXiv:1711.01714  [pdf, other

    cs.CV

    End-to-End Video Classification with Knowledge Graphs

    Authors: Fang Yuan, Zhe Wang, Jie Lin, Luis Fernando D'Haro, Kim Jung Jae, Zeng Zeng, Vijay Chandrasekhar

    Abstract: Video understanding has attracted much research attention especially since the recent availability of large-scale video benchmarks. In this paper, we address the problem of multi-label video classification. We first observe that there exists a significant knowledge gap between how machines and humans learn. That is, while current machine learning approaches including deep neural networks largely f… ▽ More

    Submitted 5 November, 2017; originally announced November 2017.

    Comments: 9 pages, 5 figures

  18. arXiv:1706.05461  [pdf, other

    cs.CV

    Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text

    Authors: Zhe Wang, Kingsley Kuan, Mathieu Ravaut, Gaurav Manek, Sibo Song, Yuan Fang, Seokhwan Kim, Nancy Chen, Luis Fernando D'Haro, Luu Anh Tuan, Hongyuan Zhu, Zeng Zeng, Ngai Man Cheung, Georgios Piliouras, Jie Lin, Vijay Chandrasekhar

    Abstract: The YouTube-8M video classification challenge requires teams to classify 0.7 million videos into one or more of 4,716 classes. In this Kaggle competition, we placed in the top 3% out of 650 participants using released video and audio features. Beyond that, we extend the original competition by including text information in the classification, making this a truly multi-modal approach with vision, a… ▽ More

    Submitted 9 July, 2017; v1 submitted 16 June, 2017; originally announced June 2017.

    Comments: 8 pages, Accepted to CVPR'17 Workshop on YouTube-8M Large-Scale Video Understanding