GumGum’s Post

View organization page for GumGum, graphic

28,696 followers

We're thrilled to announce that a research paper by our Senior NLP Scientist Adam King has been accepted into the 25th Annual Conference of the European Association for Machine Translation! 🎉 Adam's research focuses on training more performant multilingual classification models using translated training data, pushing the boundaries of what's possible in machine translation. 🌐✨ The conference will be held at the University of Sheffield from June 24-27. Check out his research paper here: https://lnkd.in/g3zZDcDR #EAMT2024

Using Machine Translation to Augment Multilingual Classification

arxiv.org

2 Comments

Kayla S.

loudly & unapologetically black ✨• storyteller 💌 • fierce deib advocate 📣 • social impact aficionado 🌿

yaaaaas Adam King!!! smarty pants!! 🤩

Meghan Slaven, SHRM-CP

People + Culture Professional | 10+ years

Adam!!! Yes!!! 👏🏼👏🏼👏🏼

See more comments

To view or add a comment, sign in

More Relevant Posts

Steve Dept (he/him/his)

founding partner at cApStAn Linguistic Quality Control
10mo Edited
Report this post
Partial or full automation of translation quality evaluation (TQE) is the Holy Grail. If algorithms can reliably indicate (the vast majority of) segments that (are most likely to) require human post-editing, machine translation (MT) could be used at scale for a broader range of content. Maria Stasimioti from Slator reports on promising research that combines Large Language Models (LLMs) with human annotations so that the LLM receives feedback that ends up being (quite) similar to that of human evaluators. The idea is to train the model until it ready to categorise errors (using the same multidimensional quality metrics framework, aka MQM, and to derive a score from that fine-grained auto-generated feedback. I find this rather exciting. Will read more tonight (it's a 19-page paper available on Arxiv: https://lnkd.in/ejkU6M9b) https://ow.ly/GS6U50PFkJV

Top Language AI Researchers Propose New Way to Auto-Evaluate Machine Translation

https://slator.com
Like Comment
To view or add a comment, sign in
Raj Dabre

Researcher at NICT, Adjunct Faculty at IIT Madras, Visiting Professor at IIT Bombay
8mo
Report this post
Last week at AACL-IJCNLP 2023, Jay Gala, Pranjal Chitale and I delivered a tutorial on "Massively Multilingual Machine Translation for Related Languages". If you are interested in this but could not attend, we are making everything available: https://lnkd.in/djf46rQa The GitHub repo contains our slides, recorded talk and all the papers we referred to to prepare the tutorial slides. We are happy to present this tutorial again upon request so please feel free to reach out to us. We hope that this helps motivate further research into language relatedness for massively multilingual machine translation. A big thanks to Prof Kurohashi for motivating us to submit a tutorial application. Also special thanks to Varun Gumma for their feedback. This tutorial is a part of the series of tutorials on: a. NMT (https://lnkd.in/dnTbgMPW) and b. Multilingual Machine Translation (https://lnkd.in/d6tepmwu) Feel free to take a look and reach out if you have any questions.

GitHub - AI4Bharat/aacl23-mnmt-tutorial: Additional resources from our AACL tutorial

github.com

1 Comment
Like Comment
To view or add a comment, sign in
Rafael Macário Fernandes

Computational Linguist | NLP | LQA Analyst | Prompting, RAG and Fine-tuning LLMs for Machine Translation | Language Instructor
3mo Edited
Report this post
🌟 Excited to share insights from my research into #GenAI for Machine Translation! 🚀🔍 In my Master's dissertation, I've explored the dynamic world of Neural Machine Translation (NMT), focusing specifically on the intricate nuances of spatial language within TedTalk subtitles. 🌍💬 Research Questions: 🔹 How do leading open-source Large Language Models (#LLMs) like LLama 2, Gemma, and Mistral compare against established NMT giants such as Google and DeepL? 🔹 What potential lies in leveraging #GenAI for machine translation tasks in terms of accuracy, fluency, and post-editing needs when translating spatial prepositions like "across", "through", "into", and "onto", which present challenges from English to Portuguese? 🔹 How does human translation compare? Is it always "the gold standard"? To achieve this, I'm examining how these systems tackle issues of prepositional semantics like crosslinguistic variation, polysemy, and idiomatic expressions, intrinsic factors in spatial language.🧠💭 🔶Evaluation Metrics: To evaluate the effectiveness of these systems, I'm analyzing a range of established metrics such as BLEU, METEOR, BERTScore, and COMET☄️. Additionally, I'm using human evaluation to assess these scores' accuracy in capturing spatial preposition nuances and overall translation quality. Stay tuned for forthcoming insights and revelations from my research journey! 🎓✨ #LLMs #GenAI #MachineTranslation #NeuralMT #SpatialLanguage #ResearchInsights #LinkedInLearning
Like Comment
To view or add a comment, sign in
𝕁𝕦𝕝𝕚𝕖𝕟 ℙ𝕚𝕖𝕣𝕣𝕒𝕥

𝕀𝕟𝕘𝕖́𝕟𝕚𝕖𝕦𝕣 𝕋𝕖́𝕝𝕖́𝕔𝕠𝕞𝕤 & ℙ𝕒𝕣𝕖𝕟𝕥 𝔼𝕟𝕤𝕖𝕚𝕘𝕟𝕒𝕟𝕥 - ℂ𝕣𝕖́𝕒𝕥𝕖𝕦𝕣 𝕕𝕖 𝕃𝕒 𝕞𝕖́𝕥𝕙𝕠𝕕𝕖 𝜋ꝛ𝕒²
2mo
Report this post
[🤖] "We show that Claude 3 Opus, a large language model (LLM) released by Anthropic in March 2024, exhibits stronger machine translation competence than other LLMs. Though we find evidence of data contamination with Claude on FLORES-200, we curate new benchmarks that corroborate the effectiveness of Claude for low-resource machine translation into English. We find that Claude has remarkable resource efficiency – the degree to which the quality of the translation model depends on a language pair’s resource level. Finally, we show that advancements in LLM translation can be compressed into traditional neural machine translation (NMT) models. Using Claude to generate synthetic data, we demonstrate that knowledge distillation advances the state-of-the-art in Yoruba-English translation, meeting or surpassing strong baselines like NLLB-54B and Google Translate."

From LLM to NMT: Advancing Low-Resource Machine Translation with Claude

arxiv.org
Like Comment
To view or add a comment, sign in
CTTIC

608 followers
8mo
Report this post
“Some uses [of machine translation] are low-risk, others much higher. You have to consider the context in which these tools are used and their impact.” – Dr. Benoît Dubreuil, Quebec’s French language commissioner Read more: Machine translation: a game changer in science – The English language may be king in science, but neural machine translation could well put an end to its dominance. https://lnkd.in/emz6eSH4

Machine translation: a game changer in science — University Affairs

https://universityaffairs.ca
Like Comment
To view or add a comment, sign in
Baban Gain

Was 5* on Codechef
8mo
Report this post
🚀 We're thrilled to share the preprint of our paper on development in the realm of multilingual Community Question-Answering (CQA) portals that we've been working on. Our research focuses on addressing the challenges of language barriers, particularly when translating noisy questions. Paper Link: https://lnkd.in/d9AfNjBB 🔑 Our approach centers around "Reference-Free Domain Adaptation for Translation of Noisy Questions with Question-Specific Rewards." In this paper, we introduce several key contributions: Reference-Free Training: We've devised a novel methodology for fine-tuning Neural Machine Translation (NMT) systems using only source-side data. This means we no longer rely on synthetic target data, making our method more robust. Adequacy and Fluency Balance: To ensure our translations maintain both adequacy and fluency, we employ a combination of BERTScore and Masked Language Model (MLM) Score. Impressive Results: Our model outperforms the traditional Maximum Likelihood Estimation (MLE) based fine-tuning approach with a remarkable 1.9 BLEU score improvement. Open Source Initiative: In the spirit of collaboration and advancing the field, we've made our codes and datasets publicly available. You can access them here: https://lnkd.in/dsXTZjsH Published at: Findings of EMNLP 2023 📚 If you're curious to delve deeper into the details, feel free to check out the full paper: "Reference-Free Domain Adaptation for Translation of Noisy Questions with Question-Specific Rewards," authored by Baban Gain, Ramakrishna Appicharla, Soumya Chennabasavaraj, Nikesh Garera, Asif Ekbal, and Muthusamy Chelliah. Link: https://lnkd.in/d9AfNjBB #AI #NMT #Research #EMNLP2023 #LanguageTechnology #Community #QuestionAnswering #Innovation #OpenSource #Collaboration

Reference Free Domain Adaptation for Translation of Noisy Questions with Question Specific Rewards

arxiv.org
Like Comment
To view or add a comment, sign in
The LINGUIST List

15,314 followers
4mo
Report this post
Calls: Humor and Artificial Intelligence Panel: Call for Papers: We invite 20-minute presentations on AI-based technology for generating, processing, or analyzing humor, for our dedicated panel that kicks off ISHS's 2024 webinar series: Application areas include, but are not limited to: * human–computer interaction * computer-mediated communication * intelligent writing assistants * conversational agents * machine and computer-assisted translation * digital humanities * natural language proc

LINGUIST List 35.939 Calls: Humor and Artificial Intelligence Panel

linguistlist.org
Like Comment
To view or add a comment, sign in
Thanh Binh Le

Data Scientist | Machine Learning | NLP
2w
Report this post
Hey LinkedIn family! I'm absolutely thrilled to announce that my latest blog post on Medium, "BLEU score," is now live! This is part three, and is the final part of our "Evaluation RAG for LLM Models" series. Trust me, it's as fun and easy to understand as a piece of cake 🍰!!!! In this post, you'll discover: 1- What is BLEU score? 2- How does it work? 3- Examples of good vs. bad translations Think of BLEU score as the ultimate spelling bee for translations! And to make it even better, we've got simple example to help explain everything. So, let's keep learning and improving together! 🔗 Read the full blog post on Medium, here: https://lnkd.in/eHFV7S3z Remember, the journey of a thousand miles begins with a single step. So, let's begin! 🚀✨ PS: 😜 Please let me know which topic you would like me to cover in the next blog. #AI #MachineLearning #NLP #BLEUScore #LLM #TechBlog #ArtificialIntelligence #DataScience #MediumBlog #LearningTogether

I Wish I Knew This Before: Evaluating LLM Models (3)

medium.com

1 Comment
Like Comment
To view or add a comment, sign in
Mammad Hajili

ML @Microsoft | EPFL CS2021 | Multilingual NLP Research @ACL SIGTurk
3mo
Report this post
Our findings paper from 1st Shared Task on Multi-lingual Multi-task Information Retrieval at MRL 2023 Workshop. Abstract: Our new shared task introduces a novel benchmark to assess the ability of multilingual LLMs to comprehend and produce language under sparse settings, particularly in scenarios with underresourced languages, with an emphasis on the ability to capture logical, factual, or causal relationships within lengthy text contexts. The shared task consists of two sub-tasks crucial to information retrieval: Named Entity Recognition (NER) and Reading Comprehension (RC), in 7 data-scarce languages: Azerbaijani, Igbo, Indonesian, Swiss German, Turkish, Uzbek and Yorùbá, which previously lacked annotated resources in information retrieval tasks. Our evaluation of leading LLMs reveals that, despite their competitive performance, they still have notable weaknesses such as producing output in the non-target language or providing counterfactual information that cannot be inferred from the context. As more advanced models emerge, the benchmark will remain essential for supporting fairness and applicability in information retrieval systems. https://lnkd.in/gx-WpqtF

2023.mrl-1.24.pdf

aclanthology.org
Like Comment
To view or add a comment, sign in

28,696 followers

View Profile Follow

GumGum’s Post

Using Machine Translation to Augment Multilingual Classification

arxiv.org

More from this author

Women's History Month Spotlight: Gabrielle Mills

Women's History Month Spotlight: Shelby Boyle

Women's History Month Spotlight: Lauren Angelini

Explore topics