- India
- International
The Lingo Research Group at Indian Institute of Technology Gandhinagar (IITGN) has developed an artificial intelligence (AI) model in Hindi-Ganga-1B — “a breakthrough in language models”. Named after the longest river flowing through the country, Ganga-1B is the first pre-trained Hindi model developed by an academic research laboratory.
“The initiative strives to achieve performance in understanding and generating text in Indian languages. The first milestone of which is the release of the Ganga-1B model, trained on an extensive monolingual Hindi language dataset,” said Professor Mayank Singh, assistant professor (Computer Science and Engineering) and head of IITGN’s Lingo Research Group.
The Ganga-1B model has been based on the dataset found on the public domain in regard to Hindi language, including news articles, web documents, books, government publications, educational materials and quality-filtered social media conversations.
“The unity project aims to develop pocket size open-source Large Language Models (LLMs) for Indic languages, created and trained from scratch from Indian data. This initiative will propel the Indian open-source community to build LLMs and chatbots that can be trained and deployed under resource-constrained scenarios,” Professor Mayank Singh told The Indian Express.
Ganga-1B — which has already been downloaded by over 600 people in less than 48 hours following the announcement — was built over nearly 1.5 years to develop, using open-source data from various websites.
The research team has been working on models for other languages including Gujarati, Urdu, Tamil, Telugu and Marathi; they are exploring the use of AI in e-governance for regional languages as well as on an education LLM to support school students and teachers.
Native Indian speakers have further curated the dataset to ensure high quality.
Live Updates | Click here for Union Budget 2024 announcements by FM Nirmala Sitharaman | New Income Tax changes announced - check here