Dipankar Mazumdar, M.Sc 🥑

Toronto, Ontario, Canada Contact Info
10K followers 500+ connections

Join to view profile

About

Dipankar is currently a Staff Data Engineering Advocate whose primary focus is helping…

Articles by Dipankar

Activity

Join now to see all activity

Experience & Education

  • Onehouse

View Dipankar’s full experience

See their title, tenure and more.

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Licenses & Certifications

Volunteer Experience

  • General Secretary

    Jorhat Engineering College

    - 1 year 2 months

    Poverty Alleviation

    1. Objectives:
    Donation of Clothes to needy:
    The types of Cloth to be donated would be-
    i) Used Clothes: Any types of used cloths (M/F)
    ii) School Material: Shirts & Pants that can be used as School Dress(White & Blue or Black), Salwars or Skirts.
    iii) Blankets, Shawls

    Distribution of Packet lunch boxes:

  • Dalhousie University Graphic

    Student Mentor

    Dalhousie University

    - 1 month

    Education

    Volunteered as a mentor for new graduate students at the Faculty of Computer Science, Dalhousie University.

Publications

  • The Data Lakehouse: Data Warehousing and More

    Arxiv

    Relational Database Management Systems designed for Online Analytical Processing (RDBMS-OLAP) have been foundational to democratizing data and enabling analytical use cases such as business intelligence and reporting for many years. However, RDBMS-OLAP systems present some well-known challenges. They are primarily optimized only for relational workloads, lead to proliferation of data copies which can become unmanageable, and since the data is stored in proprietary formats, it can lead to vendor…

    Relational Database Management Systems designed for Online Analytical Processing (RDBMS-OLAP) have been foundational to democratizing data and enabling analytical use cases such as business intelligence and reporting for many years. However, RDBMS-OLAP systems present some well-known challenges. They are primarily optimized only for relational workloads, lead to proliferation of data copies which can become unmanageable, and since the data is stored in proprietary formats, it can lead to vendor lock-in, restricting access to engines, tools, and capabilities beyond what the vendor offers.

    This paper discusses how a data lakehouse, a new architectural approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake combined, while also providing additional advan- tages. We take today’s data warehousing and break it down into implementation-independent components, capabilities, and practices. We then take these aspects and show how a lakehouse architecture satisfies them. Then, we go a step further and discuss what additional capabilities and benefits a lakehouse architecture provides over an RDBMS-OLAP.

    See publication
  • Random Forest Similarity Maps: A Scalable Visual Representation for Global and Local Interpretation

    MDPI Electronics

    Machine Learning prediction algorithms have made significant contributions in today’s world, leading to increased usage in various domains. However, as ML algorithms surge, the need for transparent and interpretable models becomes essential. Visual representations have shown to be instrumental in addressing such an issue, allowing users to grasp models’ inner workings. Despite their popularity, visualization techniques still present visual scalability limitations, mainly when applied to analyze…

    Machine Learning prediction algorithms have made significant contributions in today’s world, leading to increased usage in various domains. However, as ML algorithms surge, the need for transparent and interpretable models becomes essential. Visual representations have shown to be instrumental in addressing such an issue, allowing users to grasp models’ inner workings. Despite their popularity, visualization techniques still present visual scalability limitations, mainly when applied to analyze popular and complex models, such as Random Forests (RF). In this work, we propose Random Forest Similarity Map (RFMap), a scalable interactive visual analytics tool designed to analyze RF ensemble models. RFMap focuses on explaining the inner working mechanism of models through different views describing individual data instance predictions, providing an overview of the entire forest of trees, and highlighting instance input feature values. The interactive nature of RFMap allows users to visually interpret model errors and decisions, establishing the necessary confidence and user trust in RF models and improving performance.

    See publication

Courses

  • Deep Learning

    CSCI6516

  • Human Computer Interaction

    CSCI6610

  • Machine learning for Big data

    CSCI6515

  • Visual Analytics

    CSCI6612

  • Visualization

    CSCI6406

Projects

  • Qlik Sense 'Scatter-Pie Chart' using Extension API(SaaS)

    - Present

    This project focuses on building an out-of-the-box Visualization chart 'Scatter-Pie plot' using Qlik Sense Extension API.
    Technology stack: D3.js, Qlik Sense API's

    See project
  • Full Stack Machine Learning using Plotly's Dash

    - Present

    The Project is an experimental work for developing a Full-stack Machine Learning solution using Plotly's Dash based on Python Flask. The goal was to use Python as the main programming language instead of Javascript to design a Dashboard for Cannabis dataset. This Visual Analytics solution leverages Machine Learning Algorithm(Random Forest) in the back-end to make predictions for the 3 type of Cannabis strains. A word embedding is also generated using the Gensim package. The tool also allows to…

    The Project is an experimental work for developing a Full-stack Machine Learning solution using Plotly's Dash based on Python Flask. The goal was to use Python as the main programming language instead of Javascript to design a Dashboard for Cannabis dataset. This Visual Analytics solution leverages Machine Learning Algorithm(Random Forest) in the back-end to make predictions for the 3 type of Cannabis strains. A word embedding is also generated using the Gensim package. The tool also allows to input any keyword related to the dataset and see the Top 5 similar words.

    PS: The dashboard is a bit slow when it loads up initially since my Heroku dynamos are limited. So, kindly give it some time.

    Visualization: Plotly
    Machine Learning Algorithm: Random Forest
    Framework: Flask
    Programming Language: Python

    See project
  • Multivariate data representation using RadViz

    - Present

    With the ever-increasing dimensionality of complex datasets, the ability to represent high dimensional data has always been a concern. Although there have been quite some advancements in the field of Visualization for the effective representation of a multidimensional dataset, problems like user interpretation and clutter-free representation contribute negatively to future usability. This project aims at addressing the aforementioned problems by proposing a radial visualization called RadViz…

    With the ever-increasing dimensionality of complex datasets, the ability to represent high dimensional data has always been a concern. Although there have been quite some advancements in the field of Visualization for the effective representation of a multidimensional dataset, problems like user interpretation and clutter-free representation contribute negatively to future usability. This project aims at addressing the aforementioned problems by proposing a radial visualization called RadViz. Radviz allows dimensions to be placed as anchors on the perimeter of a circle, thus allowing multidimensional datasets to be projected to low dimensional space. To allow for better user interpretability of a high -dimensional dataset, this project will use Machine Learning techniques in the backend and principles by two researchers Alexander et al. and Hyunwoo Han et al.

    Technologies:
    Python Flask framework
    Backend: Machine Learning clustering algorithm
    Visual Representations: D3.js

    See project
  • Building an advanced Visualization extension using Qlik's Nebula.js.

    -

    The focus of this project is to develop an out-of-the-box Visualization object, Parallel Coordinate Plot(PCP) using Qlik's Open source solution, Nebula.js.

    Technology stack: Nebula.js, D3.js, Qlik Sense SaaS.

    See project
  • Multi-class text classification and Feature engineering using Deep Learning techniques

    -

    The objective of the project is to classify multi-class Newspaper topics for the Reuter corpus dataset. In order to achieve so, the project has been primarily divided into 2 major components:
    1. Feature engineering using Deep learning techniques
    2. Building up a multi-layer Neural network to perform classification (LSTM).

  • Sarcasm Detection with an Automated Machine Learning platform.

    -

    Recent years has seen tremendous growth in the use of social media platforms by the people to voice their opinion about a variety of topics, which may lead to comments being ambiguous i.e. sometimes some comments might be misunderstood and may lead to conflict between the readers and the author. A similar kind of pattern has been seen when it comes to Newspaper headlines. We come across a lot of news headlines throughout the day and while some of them are very straightforward, some can be…

    Recent years has seen tremendous growth in the use of social media platforms by the people to voice their opinion about a variety of topics, which may lead to comments being ambiguous i.e. sometimes some comments might be misunderstood and may lead to conflict between the readers and the author. A similar kind of pattern has been seen when it comes to Newspaper headlines. We come across a lot of news headlines throughout the day and while some of them are very straightforward, some can be sarcastic in nature. This may seem offensive to certain groups of readers and result in misunderstanding the subtle nature of humor added to the headlines. Therefore there is a need to separate the headlines or comments based on their nature – sarcastic or non-sarcastic. The “Sarcasm Detector” system developed, helps to address this problem by providing an AutoML UI to help any user irrespective of the field(technical or non-technical), bring in their comments or headlines and predict if its sarcastic or not. The system also helps a user to explore the Visual insights and have a hands-on experience with different types of classification models in Machine Learning.

    See project
  • Cipher Cracking using Distributed Computing

    -

    The Project develops a Cipher cracking system by taking a Hash value(Cipher text ) from User and searches for the password until it finds the actual password.
    Languages used: JAVA.
    Operating System: Windows Xp/ higher, Linux

    Other creators
  • Online Book shopping system

    -

    It provides the facility of purchasing various categories of books online.

    Other creators
    • Madhusmita Kalita
  • Website for Computer Science students of Jorhat Eng. College

    -

    The website jeccse.in is a support site designed for the Computer Science and Engineering students of Jorhat Engineering College.

    The purpose of this website is to help the C.S. students with all the required study materials for a particular semester. The course materials are presented in the form of e-books and video lectures by NPTEL.

    See project

Honors & Awards

  • Pat on the Back

    CGI

    Received the Pat on the Back award for significant performance in a quarter.

  • General Merit Scholarship in Class X, XII

    GOVT OF ASSAM

    Received General Merit Scholarship from Government of Assam for 80% marks in Xth & XIIth.

Test Scores

  • AMCAT Employability Assesment Test

    Score: AMCAT Percentile

    Verbal : 91.9%
    Aptitude: 88%
    Reasoning: 78%
    Computer Programming: 100%

Languages

  • English

    Full professional proficiency

  • Hindi

    Full professional proficiency

  • Assamese

    Full professional proficiency

Recommendations received

More activity by Dipankar

View Dipankar’s full profile

  • See who you know in common
  • Get introduced
  • Contact Dipankar directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Add new skills with these courses