Presents
Associate Partner
Granthm
Education Partner
XAT
Samsung
Thursday, Jul 25, 2024
Advertisement

Ganga-1B — pre-trained Hindi AI model developed at IITGN

This project aims to develop pocket size open-source large language models for Indic languages, says Prof Mayank Singh

IIT Gandhinagar, AI model in Hindi-Ganga-1B, artificial intelligence, language models, Lingo Research Group, Ganga-1B, Ganga-1B Hindi model, academic research laboratory, Indian express newsIndian Institute of Technology Gandhinagar (File Photo)

The Lingo Research Group at Indian Institute of Technology Gandhinagar (IITGN) has developed an artificial intelligence (AI) model in Hindi-Ganga-1B — “a breakthrough in language models”. Named after the longest river flowing through the country, Ganga-1B is the first pre-trained Hindi model developed by an academic research laboratory.

“The initiative strives to achieve performance in understanding and generating text in Indian languages. The first milestone of which is the release of the Ganga-1B model, trained on an extensive monolingual Hindi language dataset,” said Professor Mayank Singh, assistant professor (Computer Science and Engineering) and head of IITGN’s Lingo Research Group.

The Ganga-1B model has been based on the dataset found on the public domain in regard to Hindi language, including news articles, web documents, books, government publications, educational materials and quality-filtered social media conversations.

Advertisement

“The unity project aims to develop pocket size open-source Large Language Models (LLMs) for Indic languages, created and trained from scratch from Indian data. This initiative will propel the Indian open-source community to build LLMs and chatbots that can be trained and deployed under resource-constrained scenarios,” Professor Mayank Singh told The Indian Express.

Ganga-1B — which has already been downloaded by over 600 people in less than 48 hours following the announcement — was built over nearly 1.5 years to develop, using open-source data from various websites.

Festive offer

The research team has been working on models for other languages including Gujarati, Urdu, Tamil, Telugu and Marathi; they are exploring the use of AI in e-governance for regional languages as well as on an education LLM to support school students and teachers.
Native Indian speakers have further curated the dataset to ensure high quality.

Live Updates | Click here for Union Budget 2024 announcements by FM Nirmala Sitharaman | New Income Tax changes announced - check here

First uploaded on: 09-07-2024 at 05:29 IST
Latest Comment
Post Comment
Read Comments
Advertisement
Advertisement
Advertisement
Advertisement
close