Narrow Transformer: Starcoder-Based Java-LM For Desktop
Authors:
Kamalkumar Rathinasamy,
Balaji A J,
Ankush Kumar,
Gagan Gayari,
Harshini K,
Rajab Ali Mondal,
Sreenivasa Raghavan K S,
Swayam Singh
Abstract:
This paper presents NT-Java-1.1B, an open-source specialized code language model built on StarCoderBase-1.1B, designed for coding tasks in Java programming. NT-Java-1.1B achieves state-of-the-art performance, surpassing its base model and majority of other models of similar size on MultiPL-E Java code benchmark. While there have been studies on extending large, generic pre-trained models to improv…
▽ More
This paper presents NT-Java-1.1B, an open-source specialized code language model built on StarCoderBase-1.1B, designed for coding tasks in Java programming. NT-Java-1.1B achieves state-of-the-art performance, surpassing its base model and majority of other models of similar size on MultiPL-E Java code benchmark. While there have been studies on extending large, generic pre-trained models to improve proficiency in specific programming languages like Python, similar investigations on small code models for other programming languages are lacking. Large code models require specialized hardware like GPUs for inference, highlighting the need for research into building small code models that can be deployed on developer desktops. This paper addresses this research gap by focusing on the development of a small Java code model, NT-Java-1.1B, and its quantized versions, which performs comparably to open models around 1.1B on MultiPL-E Java code benchmarks, making them ideal for desktop deployment. This paper establishes the foundation for specialized models across languages and sizes for a family of NT Models.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
EnterpriseEM: Fine-tuned Embeddings for Enterprise Semantic Search
Authors:
Kamalkumar Rathinasamy,
Jayarama Nettar,
Amit Kumar,
Vishal Manchanda,
Arun Vijayakumar,
Ayush Kataria,
Venkateshprasanna Manjunath,
Chidambaram GS,
Jaskirat Singh Sodhi,
Shoeb Shaikh,
Wasim Akhtar Khan,
Prashant Singh,
Tanishq Dattatray Ige,
Vipin Tiwari,
Rajab Ali Mondal,
Harshini K,
S Reka,
Chetana Amancharla,
Faiz ur Rahman,
Harikrishnan P A,
Indraneel Saha,
Bhavya Tiwary,
Navin Shankar Patel,
Pradeep T S,
Balaji A J
, et al. (2 additional authors not shown)
Abstract:
Enterprises grapple with the significant challenge of managing proprietary unstructured data, hindering efficient information retrieval. This has led to the emergence of AI-driven information retrieval solutions, designed to adeptly extract relevant insights to address employee inquiries. These solutions often leverage pre-trained embedding models and generative models as foundational components.…
▽ More
Enterprises grapple with the significant challenge of managing proprietary unstructured data, hindering efficient information retrieval. This has led to the emergence of AI-driven information retrieval solutions, designed to adeptly extract relevant insights to address employee inquiries. These solutions often leverage pre-trained embedding models and generative models as foundational components. While pre-trained embeddings may exhibit proximity or disparity based on their original training objectives, they might not fully align with the unique characteristics of enterprise-specific data, leading to suboptimal alignment with the retrieval goals of enterprise environments. In this paper, we propose a methodology to fine-tune pre-trained embedding models specifically for enterprise environments. By adapting the embeddings to better suit the retrieval tasks prevalent in enterprises, we aim to enhance the performance of information retrieval solutions. We discuss the process of fine-tuning, its effect on retrieval accuracy, and the potential benefits for enterprise information management. Our findings demonstrate the efficacy of fine-tuned embedding models in improving the precision and relevance of search results in enterprise settings.
△ Less
Submitted 18 May, 2024;
originally announced June 2024.