Salomey Osei

Bilbao, País Vasco / Euskadi, España Información de contacto
2 mil seguidores Más de 500 contactos

Unirse para ver el perfil

Acerca de

Salomey is Ghanaian Machine Learning and AI researcher from the ashanti region. She is…

Artículos de Salomey

Actividad

Unirse para ver toda la actividad

Experiencia y educación

  • University of Deusto

Mira la experiencia completa de Salomey

Mira su cargo, antigüedad y más

o

Al hacer clic en «Continuar» para unirte o iniciar sesión, aceptas las Condiciones de uso, la Política de privacidad y la Política de cookies de LinkedIn.

Licencias y certificaciones

Experiencia de voluntariado

  • Gráfico African Institute for Mathematical Sciences- AIMS

    Student Assistant

    African Institute for Mathematical Sciences- AIMS

    - 1 año 7 meses

    Ciencia y tecnología


    Community Volunteer (Supervised by African Institute For Mathematical Sciences,Cameroon
    and the Mastercard Foundation)
    Saker Baptist College, St Anne Secondary School, Government High School (Limbe, Cameroon,
    2017-2019)
    *Teaching and mentoring students in Mathematical Science and preparation of lecture notes
    for teachers
    **Computer training for non teaching staff of University of Buea, Cameroon.

  • Gráfico Women in Machine Learning

    Volunteer

    Women in Machine Learning

    - 1 mes

    Ciencia y tecnología

    * To assist breakout session leaders by encouraging participant interactions and taking notes of the discussions, potentially to be shared with participants later
    **Direct attendees to the appropriate event links for Zoom.
    ***Direct attendees reporting code of conduct violations to WiML and D&I chairs.

  • Volunteer

    ICML

    - 1 mes

    Ciencia y tecnología

    *Monitoring the #helpdesk Rocket Chat channel, answering questions, recording todos, and helping with important and urgent issues.

  • Gráfico The 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020)

    Volunteer

    The 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020)

    - 1 mes

    Ciencia y tecnología

    As an On-call volunteer
    *monitor 3 different channels: #presenter-helpdesk channel, #helpdesk channel and #incidents channel and the google email groups to provide assistance to participants of the conference.

  • Gráfico Ghana NLP

    Team Lead

    Ghana NLP

    - actualidad 4 años 4 meses

    Ciencia y tecnología

    Team Lead for Unsupervised Methods

  • Gráfico PyCon Africa

    Moderator

    PyCon Africa

    - 1 mes

    Ciencia y tecnología

    Conference moderator for Sessions.

  • Gráfico Women Promoting Science to the Younger Generation

    Tutor

    Women Promoting Science to the Younger Generation

    - 1 mes

    Ciencia y tecnología

    This was a two days workshop on Python organized by Women Promoting Science to the Younger Generation

  • Gráfico UofT AI

    Mentor

    UofT AI

    - actualidad 3 años 11 meses

    Ciencia y tecnología

    ProjectX is a three month long remote machine learning research competition
    that will be taking place from September to November, 2020. This year's focus is
    on Climate Change with 22 teams, 120 undergraduate students with
    previous research experience from 22 universities across the Americas and
    Europe. Winners present their research at University of Toronto's AI annual conference in
    February.

    Research subtopics include:

    Infectious Disease
    Weather and Natural…

    ProjectX is a three month long remote machine learning research competition
    that will be taking place from September to November, 2020. This year's focus is
    on Climate Change with 22 teams, 120 undergraduate students with
    previous research experience from 22 universities across the Americas and
    Europe. Winners present their research at University of Toronto's AI annual conference in
    February.

    Research subtopics include:

    Infectious Disease
    Weather and Natural Disaster Prediction
    Emissions and Energy Efficiency


    Essentially, mentors provide advice, answer questions, answering about the research process/research topic, etc.

  • Volunteer

    Thirty-fourth Conference on Neural Information Processing Systems (NeuRIPS 2020)

    - actualidad 3 años 9 meses

    Ciencia y tecnología

    The purpose of the Neural Information Processing Systems annual meeting is to foster the exchange of research on neural information processing systems in their biological, technological, mathematical, and theoretical aspects

  • Gráfico Women in Machine Learning

    Volunteer

    Women in Machine Learning

    - actualidad 3 años 9 meses

    Ciencia y tecnología

    Poster Mentor Guide - WiML 2020:

    To encourage more engagement and further the professional development of poster presenters within the virtual format of WiML this year, we are introducing a poster buddy program. This initiative will pair up experienced researchers with WiML poster presenters based on research interests. We hope to provide poster presenters with valuable feedback about their work and encourage dialog.
    Responsibilities
    Read the poster of your assigned mentee prior…

    Poster Mentor Guide - WiML 2020:

    To encourage more engagement and further the professional development of poster presenters within the virtual format of WiML this year, we are introducing a poster buddy program. This initiative will pair up experienced researchers with WiML poster presenters based on research interests. We hope to provide poster presenters with valuable feedback about their work and encourage dialog.
    Responsibilities
    Read the poster of your assigned mentee prior to the poster session
    Attend the poster session and find your mentee’s poster
    Ask your mentee to present their poster to you
    Ask questions and provide constructive feedback (when appropriate) to your mentee
    Complete a post-workshop feedback form to help us improve the process

    Volunteer Tasks:

    Super-volunteer:

    You are the contact point between volunteers and organizers. The super-volunteers will check in the volunteers on the day of the workshop and update them on ongoing tasks and responsibilities. If volunteers have issues, they will be asked to contact super-volunteers first.


  • Gráfico Black in AI

    Co-organiser 2020

    Black in AI

    - actualidad 3 años 10 meses

    Ciencia y tecnología

    Black in AI is a place for sharing ideas, fostering collaborations and discussing initiatives to increase the presence of Black people in the field of Artificial Intelligence.

    Role:

    Organize Black in AI conference co-located with NeuRIPS 2020.

  • Volunteer

    Society for Artificial Intelligence and Statistics

    - 1 mes

    Ciencia y tecnología

    Check out various posters at the Conference. Monitor Rocketchat or Zoom for technical questions, inappropriate language, Harassment, and Advertising (not allowed)

  • Gráfico Deep Learning Indaba

    Diversity, Safety and Inclusion Chair

    Deep Learning Indaba

    - actualidad 1 año 3 meses

    Ciencia y tecnología

  • Gráfico Women in Machine Learning and Data Science Accra Ghana

    Co-Chair

    Women in Machine Learning and Data Science Accra Ghana

    - actualidad 4 años 9 meses

    Ciencia y tecnología

Publicaciones

  • Contextual Text Embeddings for Twi

    Published at African NLP @EACL 2021

    Transformer-based language models have been changing the modern Natural Language Processing (NLP) landscape for high-resource languages such as English, Chinese, Russian, etc. However, this technology does not yet exist for any Ghanaian language. In this paper, we introduce the first of such models for Twi or Akan, the most widely spoken Ghanaian language. The specific contribution of this research work is the development of several pretrained transformer language models for the Akuapem and…

    Transformer-based language models have been changing the modern Natural Language Processing (NLP) landscape for high-resource languages such as English, Chinese, Russian, etc. However, this technology does not yet exist for any Ghanaian language. In this paper, we introduce the first of such models for Twi or Akan, the most widely spoken Ghanaian language. The specific contribution of this research work is the development of several pretrained transformer language models for the Akuapem and Asante dialects of Twi, paving the way for advances in application areas such as Named Entity Recognition (NER), Neural Machine Translation (NMT), Sentiment Analysis (SA) and Part-of-Speech (POS) tagging. Specifically, we introduce four different flavours of ABENA -- A BERT model Now in Akan that is fine-tuned on a set of Akan corpora, and BAKO - BERT with Akan Knowledge only, which is trained from scratch. We open-source the model through the Hugging Face model hub and demonstrate its use via a simple sentiment classification example.

    Ver publicación
  • English-Twi Parallel Corpus for Machine Translation

    African NLP @EACL 2021

    We present a parallel machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. We used a transformer-based translator to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers to eliminate any occurrence of translationese. In addition, 697 higher quality crowd-sourced sentences are provided for use as an evaluation set for downstream Natural Language Processing (NLP) tasks. The typical use case…

    We present a parallel machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. We used a transformer-based translator to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers to eliminate any occurrence of translationese. In addition, 697 higher quality crowd-sourced sentences are provided for use as an evaluation set for downstream Natural Language Processing (NLP) tasks. The typical use case for the larger human-verified dataset is for further training of machine translation models in Akuapem Twi. The higher quality 697 crowd-sourced dataset is recommended as a testing dataset for machine translation of English to Twi and Twi to English models. Furthermore, the Twi part of the crowd-sourced data may also be used for other tasks, such as representation learning, classification, etc. We fine-tune the transformer translation model on the training corpus and report benchmarks on the crowd-sourced test set.

    Ver publicación
  • NLP for Ghanaian Languages

    African NLP @EACL 2021

    NLP Ghana is an open-source non-profit organization aiming to advance the development and adoption of state-of-the-art NLP techniques and digital language tools to Ghanaian languages and problems. In this paper, we first present the motivation and necessity for the efforts of the organization; by introducing some popular Ghanaian languages while presenting the state of NLP in Ghana. We then present the NLP Ghana organization and outline its aims, scope of work, some of the methods employed and…

    NLP Ghana is an open-source non-profit organization aiming to advance the development and adoption of state-of-the-art NLP techniques and digital language tools to Ghanaian languages and problems. In this paper, we first present the motivation and necessity for the efforts of the organization; by introducing some popular Ghanaian languages while presenting the state of NLP in Ghana. We then present the NLP Ghana organization and outline its aims, scope of work, some of the methods employed and contributions made thus far in the NLP community in Ghana.

    Ver publicación
  • MasakhaNER: Named Entity Recognition for African Languages

    African NLP @eacl 2021

    We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders. We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER. We analyze our datasets and conduct an extensive empirical evaluation of state-of-the-art methods…

    We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders. We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER. We analyze our datasets and conduct an extensive empirical evaluation of state-of-the-art methods across both supervised and transfer learning settings. We release the data, code, and models in order to inspire future research on African NLP.

    Ver publicación
  • Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

    African NLP @EACL 2021

    With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. However, to date there has been no systematic analysis of the quality of these publicly available datasets, or whether the datasets actually contain content in the languages they claim to represent. In this work, we manually audit the quality of 205 language-specific corpora released…

    With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. However, to date there has been no systematic analysis of the quality of these publicly available datasets, or whether the datasets actually contain content in the languages they claim to represent. In this work, we manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4), and audit the correctness of language codes in a sixth (JW300). We find that lower-resource corpora have systematic issues: at least 15 corpora are completely erroneous, and a significant fraction contains less than 50% sentences of acceptable quality. Similarly, we find 82 corpora that are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-speakers of the languages in question, and supplement the human judgements with automatic analyses. Inspired by our analysis, we recommend techniques to evaluate and improve multilingual corpora and discuss the risks that come with low-quality data releases.

    Ver publicación
  • Graph Deep Learning for Long Range Forecasting

    EGU 2021/Copernicus Meetings

    Deep learning-based models have been recently shown to be competitive with, or even outperform, state-of-the-art long range forecasting models, such as for projecting the El Niño-Southern Oscillation (ENSO). However, current deep learning models are based on convolutional neural networks which are difficult to interpret and can fail to model large-scale dependencies, such as teleconnections, that are particularly important for long range projections. Hence, we propose to explicitly model…

    Deep learning-based models have been recently shown to be competitive with, or even outperform, state-of-the-art long range forecasting models, such as for projecting the El Niño-Southern Oscillation (ENSO). However, current deep learning models are based on convolutional neural networks which are difficult to interpret and can fail to model large-scale dependencies, such as teleconnections, that are particularly important for long range projections. Hence, we propose to explicitly model large-scale dependencies with Graph Neural Networks (GNN) to enhance explainability and improve the predictive skill of long lead time forecasts.

    In preliminary experiments focusing on ENSO, our GNN model outperforms previous state-of-the-art machine learning based systems for forecasts up to 6 months ahead. The explicit modeling of information flow via edges makes our model more explainable, and it is indeed shown to learn a sensible graph structure from scratch that correlates with the ENSO anomaly pattern for a given number of lead months.

    Ver publicación
  • The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

    GEM - ACL 2021 Workshop

    We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress…

    We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for which we are organizing a shared task at our ACL 2021 Workshop and to which we invite the entire NLG coGEMmmunity to participate.

    Ver publicación
  • Graph Neural Networks for Improved El Niño Forecasting

    Proposed at the NeurIPS 2020 Workshop on Tackling Climate Change with Machine Learning

    Deep learning-based models have recently outperformed state-of-the-art seasonal forecasting models, such as for predicting El Niño-Southern Oscillation (ENSO). However, current deep learning models are based on convolutional neural networks which are difficult to interpret and can fail to model large-scale atmospheric patterns called teleconnections. Hence, we propose the application of spatiotemporal Graph Neural Networks (GNN) to forecast ENSO at long lead times, finer granularity and…

    Deep learning-based models have recently outperformed state-of-the-art seasonal forecasting models, such as for predicting El Niño-Southern Oscillation (ENSO). However, current deep learning models are based on convolutional neural networks which are difficult to interpret and can fail to model large-scale atmospheric patterns called teleconnections. Hence, we propose the application of spatiotemporal Graph Neural Networks (GNN) to forecast ENSO at long lead times, finer granularity and improved predictive skill than current state-of-the-art methods. The explicit modeling of information flow via edges may also allow for more interpretable forecasts. Preliminary results are promising and outperform state-of-the art systems for projections 1 and 3 months ahead.

    Ver publicación
  • Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

    Findings of EMNLP 2020

    Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few…

    Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under this https://github.com/masakhane-io/masakhane-mt

    Otros autores
    • Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa and others (Masakhane)
    Ver publicación

Reconocimientos y premios

  • Society for Artificial Intelligence and Statistics

    AISTATS

    Since its inception in 1985, AISTATS has been an interdisciplinary gathering of researchers at the intersection of artificial intelligence, machine learning, statistics, and related areas.

    The Society organises annual meetings where researchers gather to share their ideas. The next meeting will be held in 2021

  • Deep Learning + Reinforcement Learning Summer School

    CIFAR and MILA

    Each year, the DLRL Summer School attracts thousands of graduate students, postdocs and industry professionals from more than 20 countries to explore the latest artificial intelligence (AI) techniques, build research networks and start collaborative discussions. Only 300 of the most promising early career researchers across the globe are accepted. This year, due to the global pandemic, the DLRL Summer School will take place virtually from August 3 - 7, 2020.

    The DLRL Summer School is…

    Each year, the DLRL Summer School attracts thousands of graduate students, postdocs and industry professionals from more than 20 countries to explore the latest artificial intelligence (AI) techniques, build research networks and start collaborative discussions. Only 300 of the most promising early career researchers across the globe are accepted. This year, due to the global pandemic, the DLRL Summer School will take place virtually from August 3 - 7, 2020.

    The DLRL Summer School is hosted by CIFAR and Mila, with participation and support from Amii and the Vector Institute. The DLRL Summer School is a part of both the CIFAR Learning in Machines & Brains program and CIFAR Pan-Canadian AI Strategy’s National Program of Activities. The summer school plays a significant role in supporting Canada’s early leadership in machine learning.

  • The 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020)

    ACL

    Conference on Interpretability and Analysis in Neural NLP (cutting-edge), Commonsense Reasoning for Natural Language Processing (Introductory) and NLP related discussions, workshops and socials.

  • Programming Language Design and Implementation (PLDI 2020) Virtual Conference

    PDLI

    PLDI 2020 - SIGPLAN Conference on Programming Language Design and Implementation: London, United Kingdom (Moved online due to COVID-19)
    General Chair: Alastair F. Donaldson
    Program Chair: Emina Torlak

    History:

    Programming Language Design and Implementation (PLDI) is one of the ACM SIGPLAN's most important conferences. The precursor of PLDI was the Symposium on Compiler Optimization, held July 27–28, 1970 at the University of Illinois at Urbana-Champaign and chaired by…

    PLDI 2020 - SIGPLAN Conference on Programming Language Design and Implementation: London, United Kingdom (Moved online due to COVID-19)
    General Chair: Alastair F. Donaldson
    Program Chair: Emina Torlak

    History:

    Programming Language Design and Implementation (PLDI) is one of the ACM SIGPLAN's most important conferences. The precursor of PLDI was the Symposium on Compiler Optimization, held July 27–28, 1970 at the University of Illinois at Urbana-Champaign and chaired by Robert S. Northcote. That conference included papers by Frances E. Allen, John Cocke, Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. The first conference in the current PLDI series took place in 1979 under the name SIGPLAN Symposium on Compiler Construction in Denver, Colorado. The next Compiler Construction conference took place in 1982 in Boston, Massachusetts.

    Reference: Wikepedia

  • IEEE ICASSP 2020 45th International Conference on Acoustics, Speech, and Signal Processing

    IEEE ICASSP

    ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2020 conference featured world-class presentations by internationally renowned speakers, cutting-edge session topics and provided a fantastic opportunity to network with like-minded professionals from around the world.

  • Online International Conference in Actuarial Science, Data Science and Finance (OICA)

    OICA

    Covid19 is going to change the financial markets and the insurance-reinsurance network.
    As actuarial science, data science and finance researchers, we cannot easily help for the current phase as we are not medical doctors or epidemiologists. But we can help to mitigate/manage the financial/risk management consequences by carrying out relevant research in 2020-2021. For this we need to learn NOW from experts and decision makers what the research challenges are.

  • WiML ICLR 2020 Registration Fee Funding

    Women in Machine Learning(WiML)

    WiML ICLR 2020 Registration Fee Funding Committee, award for WiML Funding for ICLR registration

  • Women in Machine Learning Conference

    Yielding Accomplished African Women

    Full funding for the Women in Machine Learning Conference

  • African Master's in Machine Intelligence(AMMI) Scholarship

    African Master's in Machine Intelligence(AMMI)

    Masters program in Machine Intelligence taught by the best in the field with the opportunity to intern with research institutions like Google, Facebook, DeepMind, MILA, IVADO, Vector Institute, Qualcomm, among others

  • Black in AI 2019 Travel Grant for Neurips

    Black In AI

    Full travel grant to attend the 3rd Black in AI Workshop, co-located with NeurIPS 2019.

  • Ghana Data Science Summit 2019 (IndabaX Ghana)

    Ghana Data Science Summit (IndabaX Ghana) Committee

    The first Ghana Data Science Summit (IndabaX Ghana)

  • African Institute for Mathematical Science | NextEinstein Initiative Scholarship

    African Institute for Mathematical Science(AIMS)

    The African Institute for Mathematical Sciences (AIMS) is a pan-African network of Centres of Excellence for post-graduate training, research and public engagement in mathematics with a mission to lead the transformation of Africa through innovative scientific training, technological advances, breakthrough discoveries, strategic foresight, and innovative policy design.

  • MasterCard Foundation Scholarship

    MasterCard Foundation

    MasterCard Foundation Scholar at AIMS Cameroon for the academic year 2017/2019

Recomendaciones recibidas

Más actividad de Salomey

Ver el perfil completo de Salomey

  • Descubrir a quién conocéis en común
  • Conseguir una presentación
  • Contactar con Salomey directamente
Unirse para ver el perfil completo

Perfiles similares

Otras personas con el nombre de Salomey Osei

Añade nuevas aptitudes con estos cursos