Dr. Abhay Alok

Hyderabad, Telangana, India Contact Info
2K followers 500+ connections

Join to view profile

About

Working on core critical problems as Enhance user adoption metrics, Customer life time…

Activity

Join now to see all activity

Experience & Education

  • Electronic Arts (EA)

View Dr. Abhay’s full experience

See their title, tenure and more.

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Licenses & Certifications

Publications

  • Multi-objective semi-supervised clustering of tissue samples for cancer diagnosis

    Springer/Soft Computing

    In the domain of bioinformatics, the clustering of gene expression profiles of different tissue samples over different experimental conditions has gained importance with the invention of micro-array based technology. This study also has some impact on cancer diagnosis. The proper classification of cancer tissue samples generated using the micro-array technology helps in detecting cancers in an automated way. In the current paper we have developed a semi-supervised clustering technique for…

    In the domain of bioinformatics, the clustering of gene expression profiles of different tissue samples over different experimental conditions has gained importance with the invention of micro-array based technology. This study also has some impact on cancer diagnosis. The proper classification of cancer tissue samples generated using the micro-array technology helps in detecting cancers in an automated way. In the current paper we have developed a semi-supervised clustering technique for proper partitioning of these gene expression data sets. Semi-supervised clustering is a combination of unsupervised and supervised classification techniques. It uses some amount of supervised information and a large collection of unsupervised data. Here a multi-objective based semi-supervised clustering technique is developed for solving the cancer tissue classification problem. Different combinations of objective functions are used. As the supervised information we assume that class labels of 10 % data are available. The proposed technique is evaluated for three open source benchmark cancer data sets (brain tumor data set, adult malignancy and small round blood cell tumors). Two classification quality measures, viz., Adjusted Rand Index and Classification Accuracy are used to measure the goodness of the obtained partitionings. Obtained results are compared with several state-of-the-art clustering techniques. Moreover, significant gene markers have been identified and demonstrated visually from the clustering solutions obtained.

    See publication
  • Use of Semi-supervised Clustering and Feature Selection Techniques for Gene-Expression Data

    IEEE/ IEEE Jpurnal of Biomedical and Health Informatics

    Studying the patterns hidden in gene expression data helps to understand the functionality of genes. In general, clustering techniques are widely used for the identification of natural partitionings from the gene expression data. In order to put constraints on dimensionality, feature selection is the key issue because not all features are important from clustering point of view. Moreover some limited amount of supervised information can help to fine-tune the obtained clustering solution. In…

    Studying the patterns hidden in gene expression data helps to understand the functionality of genes. In general, clustering techniques are widely used for the identification of natural partitionings from the gene expression data. In order to put constraints on dimensionality, feature selection is the key issue because not all features are important from clustering point of view. Moreover some limited amount of supervised information can help to fine-tune the obtained clustering solution. In this paper the problem of simultaneous feature selection and semi-supervised clustering is formulated as a multi-objective optimization task. A modern simulated annealing based multiobjective optimization technique namely AMOSA is utilized as the background optimization methodology. Here features and cluster centers are represented in the form of a string and the assignment of points to different clusters is done using a point symmetry based distance. Six optimization criteria based on several internal and external cluster validity indices are utilized. In order to generate the supervised information, a popular clustering technique, Fuzzy C-mean, is utilized. Appropriate subset of features, proper number of clusters and the proper partitioning are determined using the search capability of AMOSA. The effectiveness of this proposed semi-supervised clustering technique, Semi-FeaClustMOO, is demonstrated on five publicly available benchmark gene expression data sets. Comparison results with the existing techniques for gene expression data clustering again reveal the superiority of the proposed technique. Statistical and biological significance tests have also been carried out.

    See publication
  • Multi-objective semi-supervised clustering for automatic pixel classification from remote sensing imagery

    Springer/ Soft Computing

    Classifying the pixels of satellite images into homogeneous regions is a very challenging task as different regions have different types of land covers. Some land covers contain more regions, while some contain relatively smaller regions (e.g., bridges, roads). In satellite image segmentation, no prior information is available about the number of clusters. Here, in this paper, we have solved this problem using the concepts of semi-supervised clustering which utilizes the property of…

    Classifying the pixels of satellite images into homogeneous regions is a very challenging task as different regions have different types of land covers. Some land covers contain more regions, while some contain relatively smaller regions (e.g., bridges, roads). In satellite image segmentation, no prior information is available about the number of clusters. Here, in this paper, we have solved this problem using the concepts of semi-supervised clustering which utilizes the property of unsupervised and supervised classification. Three cluster validity indices are utilized, which are simultaneously optimized using AMOSA, a modern multiobjective optimization technique based on the concepts of simulated annealing. The first two cluster validity indices, symmetry distance based Sym-index, and Euclidean distance based I-index, are based on unsupervised properties. The last one is a supervised information based cluster validity index, Minkowski index. For supervised information, initially fuzzy C-mean clustering technique is used. Thereafter, based on the highest membership values of the data points to their respective clusters, randomly 10 % data points with their class labels are chosen. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on three satellite image data sets of different cities of India. Results are also compared with existing clustering techniques.

    See publication
  • A new semi-supervised clustering technique using multi-objective optimization

    Applied Intelligence/ Springer

    Semi-supervised clustering techniques have been proposed in the literature to overcome the problems associated with unsupervised and supervised classification. It considers a small amount of labeled data and the whole data distribution during the process of clustering a data. In this paper, a new approach towards semi-supervised clustering is implemented using multiobjective optimization (MOO) framework. Four objective functions are optimized using the search capability of a multiobjective…

    Semi-supervised clustering techniques have been proposed in the literature to overcome the problems associated with unsupervised and supervised classification. It considers a small amount of labeled data and the whole data distribution during the process of clustering a data. In this paper, a new approach towards semi-supervised clustering is implemented using multiobjective optimization (MOO) framework. Four objective functions are optimized using the search capability of a multiobjective simulated annealing based technique, AMOSA. These objective functions are based on some unsupervised and supervised information. First three objective functions represent, respectively, the goodness of the partitioning in terms of Euclidean distance, total symmetry present in the clusters and the cluster connectedness. For the last objective function, we have considered different external cluster validity indices, including adjusted rand index, rand index, a newly developed min-max distance based MMI index, NMMI index and Minkowski Score. Results show that the proposed semi-supervised clustering technique can effectively detect the appropriate number of clusters as well as the appropriate partitioning from the data sets having either well-separated clusters of any shape or symmetrical clusters with or without overlaps. Twenty four artificial and five real-life data sets have been used in the evaluation. We develop five different versions of Semi-GenClustMOO clustering technique by varying the external cluster validity indices. Obtained partitioning results are compared with another recently developed multiobjective semi-supervised clustering technique, Mock-Semi. At the end of the paper the effectiveness of the proposed Semi-GenClustMOO clustering technique is shown in segmenting one remote sensing satellite image on the part from the city of Kolkata

    See publication
  • Semi-supervised clustering for gene-expression data in multiobjective optimization framework

    Springer/ International Journal of Machine Learning and Cybernetics

    Studying the patterns hidden in gene expression data helps to understand the functionality of genes. But due to the large volume of genes and the complexity of biological networks it is difficult to study the resulting mass of data which often consists of millions of measurements. In order to reveal natural structures and to identify interesting patterns from the given gene expression data set, clustering techniques are applied. Semi-supervised classification is a new direction of machine…

    Studying the patterns hidden in gene expression data helps to understand the functionality of genes. But due to the large volume of genes and the complexity of biological networks it is difficult to study the resulting mass of data which often consists of millions of measurements. In order to reveal natural structures and to identify interesting patterns from the given gene expression data set, clustering techniques are applied. Semi-supervised classification is a new direction of machine learning. It requires huge unlabeled data and a few labeled data. Semi-supervised classification in general performs better than unsupervised classification. But to the best of our knowledge there are no works for solving gene expression data clustering problem using semi-supervised classification techniques. In the current paper we have made an attempt to solve the gene expression data clustering problem using a multiobjective optimization based semi-supervised classification technique with the aim to attain good quality partitions by using few labeled data. In order to generate the labeled data, initially Fuzzy C-means clustering technique is applied. In order to automatically determine the partitioning, multiple cluster centers corresponding to a cluster are encoded in the form of a string. In order to compute the quality of the obtained partitioning, values of five objective functions are computed. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on five publicly available benchmark gene expression data sets. Comparison results with the existing techniques for gene expression data clustering prove that the proposed method is the most effective one. Statistical and biological significance tests have also been carried out.

    See publication
  • A min-max distance based external cluster validity index: MMI

    IEEE

    Evaluating a given clustering result is a very difficult problem in real world. Cluster validity indices are developed for this purpose. There are two different types of cluster validity indices available : External and Internal. External cluster validity indices utilize some supervised information and internal cluster validity indices utilize the intrinsic structure of the data. In this paper a new external cluster validity index, MMI has been implemented based on Max-Min distance among data…

    Evaluating a given clustering result is a very difficult problem in real world. Cluster validity indices are developed for this purpose. There are two different types of cluster validity indices available : External and Internal. External cluster validity indices utilize some supervised information and internal cluster validity indices utilize the intrinsic structure of the data. In this paper a new external cluster validity index, MMI has been implemented based on Max-Min distance among data points and prior information based on structure of the data. A new probabilistic approach has been implemented to find the correct correspondence between the true and obtained clustering. Genetic K-means algorithm (GAK-means) and single linkage have been used as the underlying clustering techniques. Results of the proposed index for identifying the appropriate number of clusters is shown for five artificial and two real-life data sets. GAK-means and single linkage clustering techniques are used as the underlying partitioning techniques with the number of clusters varied over a range. The MMI index is then used to determine the appropriate number of clusters. The performance of MMI is compared with existing external cluster validity indices, adjusted rand index (ARI) and rand index (RI). It works well for two class and multi class data sets.

    See publication
  • Feature selection and semi-supervised clustering using multiobjective optimization

    Springer Plus

    In this paper we have coupled feature selection problem with semi-supervised clustering. Semi-supervised clustering
    utilizes the information of unsupervised and supervised learning in order to overcome the problems related to them.
    But in general all the features present in the data set may not be important for clustering purpose. Thus appropriate
    selection of features from the set of all features is very much relevant from clustering point of view. In this paper we
    have solved the…

    In this paper we have coupled feature selection problem with semi-supervised clustering. Semi-supervised clustering
    utilizes the information of unsupervised and supervised learning in order to overcome the problems related to them.
    But in general all the features present in the data set may not be important for clustering purpose. Thus appropriate
    selection of features from the set of all features is very much relevant from clustering point of view. In this paper we
    have solved the problem of automatic feature selection and semi-supervised clustering using multiobjective
    optimization. A recently created simulated annealing based multiobjective optimization technique titled archived
    multiobjective simulated annealing (AMOSA) is used as the underlying optimization technique. Here features and
    cluster centers are encoded in the form of a string. We assume that for each data set for 10% data points class level
    information are known to us. Two internal cluster validity indices reflecting different data properties, an external cluster
    validity index measuring the similarity between the obtained partitioning and the true labelling for 10% data points
    and a measure counting the number of features present in a particular string are optimized using the search capability
    of AMOSA. AMOSA is utilized to detect the appropriate subset of features, appropriate number of clusters as well as the
    appropriate partitioning from any given data set. The effectiveness of the proposed semi-supervised feature selection
    technique as compared to the existing techniques is shown for seven real-life data sets of varying complexities.

    See publication

Honors & Awards

  • Technical program committee of International Conference

    -

    Reviewer of International Conference like...ICACCI 2015, IEEE SPICES 2015, ICCME 2015, VisioNet 2015, Confluence 2013 and CIMTA
    2013.

  • Organising member of workshop on Optimization Technique for Language technology

    IIT Bombay, Coling Conference

    It is under the influence of Coling -2012 , 25th International Conference.

  • Organizing member of Indo-Australia workshop on optimization Technique for Human Language Technology

    India-Australia

    The aim of the workshop is to bring together the communities who are working in the areas of: evolutionary computation, optimization techniques, machine learning, language technology/Natural Language Processing, information retrieval, text mining. The workshop will be a starting platform to explore the possibilities of interdisciplinary research works that will focus on developing optimization based methods on the above fields within the context of human language technology. Almost all the…

    The aim of the workshop is to bring together the communities who are working in the areas of: evolutionary computation, optimization techniques, machine learning, language technology/Natural Language Processing, information retrieval, text mining. The workshop will be a starting platform to explore the possibilities of interdisciplinary research works that will focus on developing optimization based methods on the above fields within the context of human language technology. Almost all the research and development activities in human language technology rely on the high level of performance to satisfy the users' intended needs, and have to deal with many objectives and parameters. For example, in Information Retrieval, it is often necessary to optimize the recall and precision parameters. In automatic summarisation, it is desired to optimize different objective functions like similarity to user query, ROUGE metric, important sentence score, and difference in length between the scored sentence and the desired sentence and many others. Other examples of optimization in NLP include parsing, machine translation, and computational models of language acquisition.

  • MHRD Scholarship

    Govt of India

    Four Year got MHRD scholarship during period 2011-2015 for completion of Doctorate in Philosophy.

  • MHRD Scholarship

    Govt Of India

    Got 2 Year MHRD Scholarship during 2008-2010 for completion of Masters in Technology

  • Reviewer of SCI Journal

    IEEE/Springer/PLOSONE

    Reviewer of IEEE/ ACM Transaction on Computational biology, Intelligent Service Robotics(JIST), Springer, PLOS One Journa, IJMLC, Environment and earth Sciences.

Languages

  • English

    Professional working proficiency

  • Hindi

    Full professional proficiency

  • Sanskrit

    Limited working proficiency

  • Bhojpuri

    Professional working proficiency

  • Bengali

    Elementary proficiency

  • Maithili

    Elementary proficiency

Organizations

  • IEEE

    Student Member

    - Present

Recommendations received

More activity by Dr. Abhay

View Dr. Abhay’s full profile

  • See who you know in common
  • Get introduced
  • Contact Dr. Abhay directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Add new skills with these courses