Annie Lee’s Post

View profile for Annie Lee, graphic

Assistant Professor at OntarioTechU

way to go team!

View profile for David Ifeoluwa Adelani, graphic

DeepMind Academic Fellow @UCL

I'm very happy to share our new benchmark, IrokoBench--a human translated benchmark dataset for 16 African languages covering: - natural language inference (AfriXNLI) - Maths reasoning (AfriMGSM) - Multi-choice QA (AfriMMLU) Paper: https://lnkd.in/e_arMaYg Data: https://lnkd.in/eFwkXYfj Project funded by Lacuna Fund with Masakhane NLP We cover 18 languages in our benchmark, 16 native African languages (amh, ewe, hau, ibo, kin, lin, lug, orm, sna, sot, swa, twi, wol, xho, yor, & zul) , and two European languages (eng and fra) translated from MGSM, MMLU and XNLI datasets. We provide zero/few-shot evaluation of the performance of 14 LLMs (open weight models and closed models) in two settings leveraging lm-eval: 1) in-language and 2) translate-test (where test sets were automatically translated to English using NLLB-200-3B). GPT-4o is the best across all tasks for native African languages, however, the performance is worse for eng & fra, where GPT-4-Turbo is more than +9.0 better. Aya-101 is the best open-model, but in the translate-test setting LLaMa 3 70B is better since it's more English-centric. In few-shot evaluation, LLaMa 3 70B significantly benefits from few-shot examples for AfriMMLU and AfriXNLI but did not for AfriMGSM since it's only able to reason effectively on maths in English. GPT-4o consistently improves in performance with additional few-shot examples. This is a great collaboration with many authors Jessica Ojo Israel Abebe Azime Jesujoba Alabi Millicent Ochieng Sara Hooker Andiswa Bukula Annie Lee Happy Buzaaba Blessing Sibanda Jonathan Mukiibi Salomey Osei Salomon Kabongo KABENAMUALU Foutse Yuehgoh Rooweither Mabuya Shamsuddeen Hassan Muhammad, PhD sokhar samb Mmasibidi Setaka Lolwethu Ndolela Nkiruka Odu Tadesse Kebede Xuanli He Pontus Stenetorp A big thank you to OpenAI , Cohere For AI and Oracle for the compute/API credits

  • No alternative text description for this image
  • No alternative text description for this image
  • No alternative text description for this image
  • No alternative text description for this image
  • No alternative text description for this image

To view or add a comment, sign in

Explore topics