About
Articles by Dipankar
-
Visualizing Multivariate dataset using Radviz.
Visualizing Multivariate dataset using Radviz.
By Dipankar Mazumdar, M.Sc 🥑
-
Interactive Data Visualization using D3.js
Interactive Data Visualization using D3.js
By Dipankar Mazumdar, M.Sc 🥑
Activity
-
We are looking for AI engineers to work on advanced capabilities across every part of the LLM application stack! 🦙 1. Using LLMs for better…
We are looking for AI engineers to work on advanced capabilities across every part of the LLM application stack! 🦙 1. Using LLMs for better…
Liked by Dipankar Mazumdar, M.Sc 🥑
-
If you want to grow quickly as a data engineer or data analyst, then you need to bring up fewer problems and provide more solutions. I've heard this…
If you want to grow quickly as a data engineer or data analyst, then you need to bring up fewer problems and provide more solutions. I've heard this…
Liked by Dipankar Mazumdar, M.Sc 🥑
-
Out of everything I've written about data, this is by far the article I'm most proud of, and I think can help the most…
Out of everything I've written about data, this is by far the article I'm most proud of, and I think can help the most…
Liked by Dipankar Mazumdar, M.Sc 🥑
Experience & Education
Licenses & Certifications
Volunteer Experience
-
General Secretary
Jorhat Engineering College
- 1 year 2 months
Poverty Alleviation
1. Objectives:
Donation of Clothes to needy:
The types of Cloth to be donated would be-
i) Used Clothes: Any types of used cloths (M/F)
ii) School Material: Shirts & Pants that can be used as School Dress(White & Blue or Black), Salwars or Skirts.
iii) Blankets, Shawls
Distribution of Packet lunch boxes: -
Student Mentor
Dalhousie University
- 1 month
Education
Volunteered as a mentor for new graduate students at the Faculty of Computer Science, Dalhousie University.
Publications
-
The Data Lakehouse: Data Warehousing and More
Arxiv
Relational Database Management Systems designed for Online Analytical Processing (RDBMS-OLAP) have been foundational to democratizing data and enabling analytical use cases such as business intelligence and reporting for many years. However, RDBMS-OLAP systems present some well-known challenges. They are primarily optimized only for relational workloads, lead to proliferation of data copies which can become unmanageable, and since the data is stored in proprietary formats, it can lead to vendor…
Relational Database Management Systems designed for Online Analytical Processing (RDBMS-OLAP) have been foundational to democratizing data and enabling analytical use cases such as business intelligence and reporting for many years. However, RDBMS-OLAP systems present some well-known challenges. They are primarily optimized only for relational workloads, lead to proliferation of data copies which can become unmanageable, and since the data is stored in proprietary formats, it can lead to vendor lock-in, restricting access to engines, tools, and capabilities beyond what the vendor offers.
This paper discusses how a data lakehouse, a new architectural approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake combined, while also providing additional advan- tages. We take today’s data warehousing and break it down into implementation-independent components, capabilities, and practices. We then take these aspects and show how a lakehouse architecture satisfies them. Then, we go a step further and discuss what additional capabilities and benefits a lakehouse architecture provides over an RDBMS-OLAP. -
Random Forest Similarity Maps: A Scalable Visual Representation for Global and Local Interpretation
MDPI Electronics
Machine Learning prediction algorithms have made significant contributions in today’s world, leading to increased usage in various domains. However, as ML algorithms surge, the need for transparent and interpretable models becomes essential. Visual representations have shown to be instrumental in addressing such an issue, allowing users to grasp models’ inner workings. Despite their popularity, visualization techniques still present visual scalability limitations, mainly when applied to analyze…
Machine Learning prediction algorithms have made significant contributions in today’s world, leading to increased usage in various domains. However, as ML algorithms surge, the need for transparent and interpretable models becomes essential. Visual representations have shown to be instrumental in addressing such an issue, allowing users to grasp models’ inner workings. Despite their popularity, visualization techniques still present visual scalability limitations, mainly when applied to analyze popular and complex models, such as Random Forests (RF). In this work, we propose Random Forest Similarity Map (RFMap), a scalable interactive visual analytics tool designed to analyze RF ensemble models. RFMap focuses on explaining the inner working mechanism of models through different views describing individual data instance predictions, providing an overview of the entire forest of trees, and highlighting instance input feature values. The interactive nature of RFMap allows users to visually interpret model errors and decisions, establishing the necessary confidence and user trust in RF models and improving performance.
Courses
-
Deep Learning
CSCI6516
-
Human Computer Interaction
CSCI6610
-
Machine learning for Big data
CSCI6515
-
Visual Analytics
CSCI6612
-
Visualization
CSCI6406
Projects
-
Qlik Sense 'Scatter-Pie Chart' using Extension API(SaaS)
- Present
This project focuses on building an out-of-the-box Visualization chart 'Scatter-Pie plot' using Qlik Sense Extension API.
Technology stack: D3.js, Qlik Sense API's -
Full Stack Machine Learning using Plotly's Dash
- Present
The Project is an experimental work for developing a Full-stack Machine Learning solution using Plotly's Dash based on Python Flask. The goal was to use Python as the main programming language instead of Javascript to design a Dashboard for Cannabis dataset. This Visual Analytics solution leverages Machine Learning Algorithm(Random Forest) in the back-end to make predictions for the 3 type of Cannabis strains. A word embedding is also generated using the Gensim package. The tool also allows to…
The Project is an experimental work for developing a Full-stack Machine Learning solution using Plotly's Dash based on Python Flask. The goal was to use Python as the main programming language instead of Javascript to design a Dashboard for Cannabis dataset. This Visual Analytics solution leverages Machine Learning Algorithm(Random Forest) in the back-end to make predictions for the 3 type of Cannabis strains. A word embedding is also generated using the Gensim package. The tool also allows to input any keyword related to the dataset and see the Top 5 similar words.
PS: The dashboard is a bit slow when it loads up initially since my Heroku dynamos are limited. So, kindly give it some time.
Visualization: Plotly
Machine Learning Algorithm: Random Forest
Framework: Flask
Programming Language: Python -
Multivariate data representation using RadViz
- Present
With the ever-increasing dimensionality of complex datasets, the ability to represent high dimensional data has always been a concern. Although there have been quite some advancements in the field of Visualization for the effective representation of a multidimensional dataset, problems like user interpretation and clutter-free representation contribute negatively to future usability. This project aims at addressing the aforementioned problems by proposing a radial visualization called RadViz…
With the ever-increasing dimensionality of complex datasets, the ability to represent high dimensional data has always been a concern. Although there have been quite some advancements in the field of Visualization for the effective representation of a multidimensional dataset, problems like user interpretation and clutter-free representation contribute negatively to future usability. This project aims at addressing the aforementioned problems by proposing a radial visualization called RadViz. Radviz allows dimensions to be placed as anchors on the perimeter of a circle, thus allowing multidimensional datasets to be projected to low dimensional space. To allow for better user interpretability of a high -dimensional dataset, this project will use Machine Learning techniques in the backend and principles by two researchers Alexander et al. and Hyunwoo Han et al.
Technologies:
Python Flask framework
Backend: Machine Learning clustering algorithm
Visual Representations: D3.js -
Building an advanced Visualization extension using Qlik's Nebula.js.
-
The focus of this project is to develop an out-of-the-box Visualization object, Parallel Coordinate Plot(PCP) using Qlik's Open source solution, Nebula.js.
Technology stack: Nebula.js, D3.js, Qlik Sense SaaS. -
Multi-class text classification and Feature engineering using Deep Learning techniques
-
The objective of the project is to classify multi-class Newspaper topics for the Reuter corpus dataset. In order to achieve so, the project has been primarily divided into 2 major components:
1. Feature engineering using Deep learning techniques
2. Building up a multi-layer Neural network to perform classification (LSTM).
-
Sarcasm Detection with an Automated Machine Learning platform.
-
Recent years has seen tremendous growth in the use of social media platforms by the people to voice their opinion about a variety of topics, which may lead to comments being ambiguous i.e. sometimes some comments might be misunderstood and may lead to conflict between the readers and the author. A similar kind of pattern has been seen when it comes to Newspaper headlines. We come across a lot of news headlines throughout the day and while some of them are very straightforward, some can be…
Recent years has seen tremendous growth in the use of social media platforms by the people to voice their opinion about a variety of topics, which may lead to comments being ambiguous i.e. sometimes some comments might be misunderstood and may lead to conflict between the readers and the author. A similar kind of pattern has been seen when it comes to Newspaper headlines. We come across a lot of news headlines throughout the day and while some of them are very straightforward, some can be sarcastic in nature. This may seem offensive to certain groups of readers and result in misunderstanding the subtle nature of humor added to the headlines. Therefore there is a need to separate the headlines or comments based on their nature – sarcastic or non-sarcastic. The “Sarcasm Detector” system developed, helps to address this problem by providing an AutoML UI to help any user irrespective of the field(technical or non-technical), bring in their comments or headlines and predict if its sarcastic or not. The system also helps a user to explore the Visual insights and have a hands-on experience with different types of classification models in Machine Learning.
-
Online Book shopping system
-
It provides the facility of purchasing various categories of books online.
Other creators -
-
Website for Computer Science students of Jorhat Eng. College
-
The website jeccse.in is a support site designed for the Computer Science and Engineering students of Jorhat Engineering College.
The purpose of this website is to help the C.S. students with all the required study materials for a particular semester. The course materials are presented in the form of e-books and video lectures by NPTEL.
Honors & Awards
-
Pat on the Back
CGI
Received the Pat on the Back award for significant performance in a quarter.
-
General Merit Scholarship in Class X, XII
GOVT OF ASSAM
Received General Merit Scholarship from Government of Assam for 80% marks in Xth & XIIth.
Test Scores
-
AMCAT Employability Assesment Test
Score: AMCAT Percentile
Verbal : 91.9%
Aptitude: 88%
Reasoning: 78%
Computer Programming: 100%
Languages
-
English
Full professional proficiency
-
Hindi
Full professional proficiency
-
Assamese
Full professional proficiency
Recommendations received
3 people have recommended Dipankar
Join now to viewMore activity by Dipankar
-
The lakehouse architecture combines the best of data lakes and data warehouses on one platform to power all your analytics — from AI to BI.
The lakehouse architecture combines the best of data lakes and data warehouses on one platform to power all your analytics — from AI to BI.
Liked by Dipankar Mazumdar, M.Sc 🥑
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More