“Well I would like to write about the Dr. Abhay Kumar ALok. He is a very good researcher. He is dust in diamond. He is self motivated and highly skilled person. He is working on unsupervised clustering. He is a very good coder. He is expert in C, Java and MATLAB. His specialization is machine learning. My experience with him very fruitful. He is a knowledgeable person with sound mathematical background. As researcher, faculty and software developer he is fit for all the three role. i wish him for very success for future endeavor. ”
About
Activity
-
Starting with a service-based company, my brother Kapil Ahuja spent four years learning and working hard before finally getting placed at…
Starting with a service-based company, my brother Kapil Ahuja spent four years learning and working hard before finally getting placed at…
Liked by Dr. Abhay Alok
-
Accelerating LLMs by 2x with Graph-structured Speculative Decoding. Researchers have found a way to make speculative decoding up to 2x faster by…
Accelerating LLMs by 2x with Graph-structured Speculative Decoding. Researchers have found a way to make speculative decoding up to 2x faster by…
Liked by Dr. Abhay Alok
-
A nice tutorial about Semantic Search in the NLP course by Lewis Tunstall from Hugging Face. This course helps me to teach students in Egypt about…
A nice tutorial about Semantic Search in the NLP course by Lewis Tunstall from Hugging Face. This course helps me to teach students in Egypt about…
Liked by Dr. Abhay Alok
Experience & Education
Licenses & Certifications
Publications
-
Multi-objective semi-supervised clustering of tissue samples for cancer diagnosis
Springer/Soft Computing
In the domain of bioinformatics, the clustering of gene expression profiles of different tissue samples over different experimental conditions has gained importance with the invention of micro-array based technology. This study also has some impact on cancer diagnosis. The proper classification of cancer tissue samples generated using the micro-array technology helps in detecting cancers in an automated way. In the current paper we have developed a semi-supervised clustering technique for…
In the domain of bioinformatics, the clustering of gene expression profiles of different tissue samples over different experimental conditions has gained importance with the invention of micro-array based technology. This study also has some impact on cancer diagnosis. The proper classification of cancer tissue samples generated using the micro-array technology helps in detecting cancers in an automated way. In the current paper we have developed a semi-supervised clustering technique for proper partitioning of these gene expression data sets. Semi-supervised clustering is a combination of unsupervised and supervised classification techniques. It uses some amount of supervised information and a large collection of unsupervised data. Here a multi-objective based semi-supervised clustering technique is developed for solving the cancer tissue classification problem. Different combinations of objective functions are used. As the supervised information we assume that class labels of 10 % data are available. The proposed technique is evaluated for three open source benchmark cancer data sets (brain tumor data set, adult malignancy and small round blood cell tumors). Two classification quality measures, viz., Adjusted Rand Index and Classification Accuracy are used to measure the goodness of the obtained partitionings. Obtained results are compared with several state-of-the-art clustering techniques. Moreover, significant gene markers have been identified and demonstrated visually from the clustering solutions obtained.
-
Use of Semi-supervised Clustering and Feature Selection Techniques for Gene-Expression Data
IEEE/ IEEE Jpurnal of Biomedical and Health Informatics
Studying the patterns hidden in gene expression data helps to understand the functionality of genes. In general, clustering techniques are widely used for the identification of natural partitionings from the gene expression data. In order to put constraints on dimensionality, feature selection is the key issue because not all features are important from clustering point of view. Moreover some limited amount of supervised information can help to fine-tune the obtained clustering solution. In…
Studying the patterns hidden in gene expression data helps to understand the functionality of genes. In general, clustering techniques are widely used for the identification of natural partitionings from the gene expression data. In order to put constraints on dimensionality, feature selection is the key issue because not all features are important from clustering point of view. Moreover some limited amount of supervised information can help to fine-tune the obtained clustering solution. In this paper the problem of simultaneous feature selection and semi-supervised clustering is formulated as a multi-objective optimization task. A modern simulated annealing based multiobjective optimization technique namely AMOSA is utilized as the background optimization methodology. Here features and cluster centers are represented in the form of a string and the assignment of points to different clusters is done using a point symmetry based distance. Six optimization criteria based on several internal and external cluster validity indices are utilized. In order to generate the supervised information, a popular clustering technique, Fuzzy C-mean, is utilized. Appropriate subset of features, proper number of clusters and the proper partitioning are determined using the search capability of AMOSA. The effectiveness of this proposed semi-supervised clustering technique, Semi-FeaClustMOO, is demonstrated on five publicly available benchmark gene expression data sets. Comparison results with the existing techniques for gene expression data clustering again reveal the superiority of the proposed technique. Statistical and biological significance tests have also been carried out.
-
Multi-objective semi-supervised clustering for automatic pixel classification from remote sensing imagery
Springer/ Soft Computing
Classifying the pixels of satellite images into homogeneous regions is a very challenging task as different regions have different types of land covers. Some land covers contain more regions, while some contain relatively smaller regions (e.g., bridges, roads). In satellite image segmentation, no prior information is available about the number of clusters. Here, in this paper, we have solved this problem using the concepts of semi-supervised clustering which utilizes the property of…
Classifying the pixels of satellite images into homogeneous regions is a very challenging task as different regions have different types of land covers. Some land covers contain more regions, while some contain relatively smaller regions (e.g., bridges, roads). In satellite image segmentation, no prior information is available about the number of clusters. Here, in this paper, we have solved this problem using the concepts of semi-supervised clustering which utilizes the property of unsupervised and supervised classification. Three cluster validity indices are utilized, which are simultaneously optimized using AMOSA, a modern multiobjective optimization technique based on the concepts of simulated annealing. The first two cluster validity indices, symmetry distance based Sym-index, and Euclidean distance based I-index, are based on unsupervised properties. The last one is a supervised information based cluster validity index, Minkowski index. For supervised information, initially fuzzy C-mean clustering technique is used. Thereafter, based on the highest membership values of the data points to their respective clusters, randomly 10 % data points with their class labels are chosen. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on three satellite image data sets of different cities of India. Results are also compared with existing clustering techniques.
-
A new semi-supervised clustering technique using multi-objective optimization
Applied Intelligence/ Springer
Semi-supervised clustering techniques have been proposed in the literature to overcome the problems associated with unsupervised and supervised classification. It considers a small amount of labeled data and the whole data distribution during the process of clustering a data. In this paper, a new approach towards semi-supervised clustering is implemented using multiobjective optimization (MOO) framework. Four objective functions are optimized using the search capability of a multiobjective…
Semi-supervised clustering techniques have been proposed in the literature to overcome the problems associated with unsupervised and supervised classification. It considers a small amount of labeled data and the whole data distribution during the process of clustering a data. In this paper, a new approach towards semi-supervised clustering is implemented using multiobjective optimization (MOO) framework. Four objective functions are optimized using the search capability of a multiobjective simulated annealing based technique, AMOSA. These objective functions are based on some unsupervised and supervised information. First three objective functions represent, respectively, the goodness of the partitioning in terms of Euclidean distance, total symmetry present in the clusters and the cluster connectedness. For the last objective function, we have considered different external cluster validity indices, including adjusted rand index, rand index, a newly developed min-max distance based MMI index, NMMI index and Minkowski Score. Results show that the proposed semi-supervised clustering technique can effectively detect the appropriate number of clusters as well as the appropriate partitioning from the data sets having either well-separated clusters of any shape or symmetrical clusters with or without overlaps. Twenty four artificial and five real-life data sets have been used in the evaluation. We develop five different versions of Semi-GenClustMOO clustering technique by varying the external cluster validity indices. Obtained partitioning results are compared with another recently developed multiobjective semi-supervised clustering technique, Mock-Semi. At the end of the paper the effectiveness of the proposed Semi-GenClustMOO clustering technique is shown in segmenting one remote sensing satellite image on the part from the city of Kolkata
-
Semi-supervised clustering for gene-expression data in multiobjective optimization framework
Springer/ International Journal of Machine Learning and Cybernetics
Studying the patterns hidden in gene expression data helps to understand the functionality of genes. But due to the large volume of genes and the complexity of biological networks it is difficult to study the resulting mass of data which often consists of millions of measurements. In order to reveal natural structures and to identify interesting patterns from the given gene expression data set, clustering techniques are applied. Semi-supervised classification is a new direction of machine…
Studying the patterns hidden in gene expression data helps to understand the functionality of genes. But due to the large volume of genes and the complexity of biological networks it is difficult to study the resulting mass of data which often consists of millions of measurements. In order to reveal natural structures and to identify interesting patterns from the given gene expression data set, clustering techniques are applied. Semi-supervised classification is a new direction of machine learning. It requires huge unlabeled data and a few labeled data. Semi-supervised classification in general performs better than unsupervised classification. But to the best of our knowledge there are no works for solving gene expression data clustering problem using semi-supervised classification techniques. In the current paper we have made an attempt to solve the gene expression data clustering problem using a multiobjective optimization based semi-supervised classification technique with the aim to attain good quality partitions by using few labeled data. In order to generate the labeled data, initially Fuzzy C-means clustering technique is applied. In order to automatically determine the partitioning, multiple cluster centers corresponding to a cluster are encoded in the form of a string. In order to compute the quality of the obtained partitioning, values of five objective functions are computed. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on five publicly available benchmark gene expression data sets. Comparison results with the existing techniques for gene expression data clustering prove that the proposed method is the most effective one. Statistical and biological significance tests have also been carried out.
-
A min-max distance based external cluster validity index: MMI
IEEE
Evaluating a given clustering result is a very difficult problem in real world. Cluster validity indices are developed for this purpose. There are two different types of cluster validity indices available : External and Internal. External cluster validity indices utilize some supervised information and internal cluster validity indices utilize the intrinsic structure of the data. In this paper a new external cluster validity index, MMI has been implemented based on Max-Min distance among data…
Evaluating a given clustering result is a very difficult problem in real world. Cluster validity indices are developed for this purpose. There are two different types of cluster validity indices available : External and Internal. External cluster validity indices utilize some supervised information and internal cluster validity indices utilize the intrinsic structure of the data. In this paper a new external cluster validity index, MMI has been implemented based on Max-Min distance among data points and prior information based on structure of the data. A new probabilistic approach has been implemented to find the correct correspondence between the true and obtained clustering. Genetic K-means algorithm (GAK-means) and single linkage have been used as the underlying clustering techniques. Results of the proposed index for identifying the appropriate number of clusters is shown for five artificial and two real-life data sets. GAK-means and single linkage clustering techniques are used as the underlying partitioning techniques with the number of clusters varied over a range. The MMI index is then used to determine the appropriate number of clusters. The performance of MMI is compared with existing external cluster validity indices, adjusted rand index (ARI) and rand index (RI). It works well for two class and multi class data sets.
-
Feature selection and semi-supervised clustering using multiobjective optimization
Springer Plus
In this paper we have coupled feature selection problem with semi-supervised clustering. Semi-supervised clustering
utilizes the information of unsupervised and supervised learning in order to overcome the problems related to them.
But in general all the features present in the data set may not be important for clustering purpose. Thus appropriate
selection of features from the set of all features is very much relevant from clustering point of view. In this paper we
have solved the…In this paper we have coupled feature selection problem with semi-supervised clustering. Semi-supervised clustering
utilizes the information of unsupervised and supervised learning in order to overcome the problems related to them.
But in general all the features present in the data set may not be important for clustering purpose. Thus appropriate
selection of features from the set of all features is very much relevant from clustering point of view. In this paper we
have solved the problem of automatic feature selection and semi-supervised clustering using multiobjective
optimization. A recently created simulated annealing based multiobjective optimization technique titled archived
multiobjective simulated annealing (AMOSA) is used as the underlying optimization technique. Here features and
cluster centers are encoded in the form of a string. We assume that for each data set for 10% data points class level
information are known to us. Two internal cluster validity indices reflecting different data properties, an external cluster
validity index measuring the similarity between the obtained partitioning and the true labelling for 10% data points
and a measure counting the number of features present in a particular string are optimized using the search capability
of AMOSA. AMOSA is utilized to detect the appropriate subset of features, appropriate number of clusters as well as the
appropriate partitioning from any given data set. The effectiveness of the proposed semi-supervised feature selection
technique as compared to the existing techniques is shown for seven real-life data sets of varying complexities.
Honors & Awards
-
Technical program committee of International Conference
-
Reviewer of International Conference like...ICACCI 2015, IEEE SPICES 2015, ICCME 2015, VisioNet 2015, Confluence 2013 and CIMTA
2013. -
Organising member of workshop on Optimization Technique for Language technology
IIT Bombay, Coling Conference
It is under the influence of Coling -2012 , 25th International Conference.
-
Organizing member of Indo-Australia workshop on optimization Technique for Human Language Technology
India-Australia
The aim of the workshop is to bring together the communities who are working in the areas of: evolutionary computation, optimization techniques, machine learning, language technology/Natural Language Processing, information retrieval, text mining. The workshop will be a starting platform to explore the possibilities of interdisciplinary research works that will focus on developing optimization based methods on the above fields within the context of human language technology. Almost all the…
The aim of the workshop is to bring together the communities who are working in the areas of: evolutionary computation, optimization techniques, machine learning, language technology/Natural Language Processing, information retrieval, text mining. The workshop will be a starting platform to explore the possibilities of interdisciplinary research works that will focus on developing optimization based methods on the above fields within the context of human language technology. Almost all the research and development activities in human language technology rely on the high level of performance to satisfy the users' intended needs, and have to deal with many objectives and parameters. For example, in Information Retrieval, it is often necessary to optimize the recall and precision parameters. In automatic summarisation, it is desired to optimize different objective functions like similarity to user query, ROUGE metric, important sentence score, and difference in length between the scored sentence and the desired sentence and many others. Other examples of optimization in NLP include parsing, machine translation, and computational models of language acquisition.
-
MHRD Scholarship
Govt of India
Four Year got MHRD scholarship during period 2011-2015 for completion of Doctorate in Philosophy.
-
MHRD Scholarship
Govt Of India
Got 2 Year MHRD Scholarship during 2008-2010 for completion of Masters in Technology
-
Reviewer of SCI Journal
IEEE/Springer/PLOSONE
Reviewer of IEEE/ ACM Transaction on Computational biology, Intelligent Service Robotics(JIST), Springer, PLOS One Journa, IJMLC, Environment and earth Sciences.
Languages
-
English
Professional working proficiency
-
Hindi
Full professional proficiency
-
Sanskrit
Limited working proficiency
-
Bhojpuri
Professional working proficiency
-
Bengali
Elementary proficiency
-
Maithili
Elementary proficiency
Organizations
-
IEEE
Student Member
- Present
Recommendations received
1 person has recommended Dr. Abhay
Join now to viewMore activity by Dr. Abhay
-
IIT Bombay has signed an MoU with the Centre for Railway Information System (CRIS), an IT wing of Indian Railways. The collaboration aims to solve…
IIT Bombay has signed an MoU with the Centre for Railway Information System (CRIS), an IT wing of Indian Railways. The collaboration aims to solve…
Liked by Dr. Abhay Alok
-
🚀 GPT-4o mini... OpenAI's most cost-efficient small model yet! GPT-4o mini, with its improved performance and drastically reduced costs, is…
🚀 GPT-4o mini... OpenAI's most cost-efficient small model yet! GPT-4o mini, with its improved performance and drastically reduced costs, is…
Liked by Dr. Abhay Alok
-
🌟 Exciting Collaboration Announcement! 🌟 We are thrilled to announce the successful signing of a Memorandum of Understanding (MoU) between Indian…
🌟 Exciting Collaboration Announcement! 🌟 We are thrilled to announce the successful signing of a Memorandum of Understanding (MoU) between Indian…
Liked by Dr. Abhay Alok
-
Learn how to improve customer experiences and safeguard data with AI and a unified cloud infrastructure.
Learn how to improve customer experiences and safeguard data with AI and a unified cloud infrastructure.
Liked by Dr. Abhay Alok
-
Dive deep into advanced chunking strategies that transform large text documents into coherent, searchable units. Join me for an exciting workshop on…
Dive deep into advanced chunking strategies that transform large text documents into coherent, searchable units. Join me for an exciting workshop on…
Liked by Dr. Abhay Alok
-
During a recent talk a person asked: "It seems there is a glass ceiling that avoids to get into Data Science if you don’t have a PhD. Is it true?"…
During a recent talk a person asked: "It seems there is a glass ceiling that avoids to get into Data Science if you don’t have a PhD. Is it true?"…
Liked by Dr. Abhay Alok
-
Nvidia & Mistral! have just launched Mistral NeMo 12B, an exceptional model licensed under Apache 2.0. Here’s what makes it standout: - It…
Nvidia & Mistral! have just launched Mistral NeMo 12B, an exceptional model licensed under Apache 2.0. Here’s what makes it standout: - It…
Liked by Dr. Abhay Alok
-
I think 12B is the new 7B for on device LLMs. Mistral x NVIDIA collab dropped the Mistral-NeMo 12B open weights LLM with 128K context length. This is…
I think 12B is the new 7B for on device LLMs. Mistral x NVIDIA collab dropped the Mistral-NeMo 12B open weights LLM with 128K context length. This is…
Liked by Dr. Abhay Alok
-
Introducing Jude Bellingham, our EA SPORTS #FC25 Standard Edition Cover Star. “I played this game with my brother all the time growing up, and I’ve…
Introducing Jude Bellingham, our EA SPORTS #FC25 Standard Edition Cover Star. “I played this game with my brother all the time growing up, and I’ve…
Liked by Dr. Abhay Alok
-
AI + on-device = ❤️ There is a new small language model family called SmolLM with 135M, 360M, and 1.7B parameters. It's trained also on a new…
AI + on-device = ❤️ There is a new small language model family called SmolLM with 135M, 360M, and 1.7B parameters. It's trained also on a new…
Liked by Dr. Abhay Alok
-
The next chapter of the world’s game ⚽. Reveal trailer out now! 👇 #FC25 #WeAreEA #EASPORTS
The next chapter of the world’s game ⚽. Reveal trailer out now! 👇 #FC25 #WeAreEA #EASPORTS
Liked by Dr. Abhay Alok
Other similar profiles
-
Devesh Sharma
IISER Bhopal graduate.
Connect -
Deepak Banka
Connect -
Neha Gupta
Connect -
Sriparna Saha
Connect -
Amrith Krishna
AI Researcher | Alum at UniCambridge, ITU | PhD at IITKgp | AI Researcher | Youtuber - 100K+ Subs
Connect -
Dr. Manjeet Dahiya
Connect -
Mathematics and Statistics, IIT Kanpur
Placements || Internships
Connect -
V Krish Nimmagadda
Connect -
Prakhar Mishra
Lead Data Scientist @ UnitedHealth Group | MS in Data Science
Connect -
Alok Verma, PhD
Scientist | SMIEEE | Condition & Health Monitoring | IoT and Industry 4.0 | Signal Processing | Data Analysis | Biomedical Signal Processing | ML |
Connect
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More