Rahil Kwatra’s Post

Head of Product Management | Analytics | Operations | Supply Chain

1mo

This week, I was discussing Bayes’ theorem with one of the scientists on my team. It is one of my favorite statistical theory, and is often counterintuitive. Here is an example: Imagine a rare disease X, affecting 1 in 1000 people. Assume, there's a diagnostic test with 99% accuracy in detecting the disease when present, with a 2% false positive rate. Let’s say someone (Person A) takes the test, and it comes back positive. What's the likelihood that they actually have the disease? At first glance, one might think that the probability of having the disease, given a positive test result, would be high – especially since the test is 99% accurate with only a 2% false positive rate. But is that the case? Bayes’ theorem can give the exact answer, here is the number crunching: Prob (Having the disease X) = P(A) = 0.1% Prob (Not having the disease X) = P(A’) = 1- P(A) = 99.9% Prob (Getting a positive test result, when the disease is present) = P(B∣A) = 99% Prob (Getting a positive test result, when the disease is not present) = P(B∣A’) =2% From law of total probability, Prob (Getting a positive test result) = P(B)=P(B∣A) ×P(A)+P(B∣A′) ×P(A′) = 2.1% From Bayes’ theorem, Prob (Having the disease, when the test is positive) = P(A∣B) = P(B∣A) ×P(A) / P(B) = 0.04726 = 4.71% So, despite the positive test result, the likelihood that person A actually has the disease X is only 4.7% (in other words, the likelihood that person A does not have the disease X even with a positive test result is quite high, at 95.3%). This is what makes Bayes' Theorem so fascinating! #DataScience #Statistics #Bayes’Theorem

1 Comment

To view or add a comment, sign in

More Relevant Posts

Paulo Rogerio Lima

Leading in Corporate Health with specialization in Occupational Health and Wellbeing | Health data analytics specialist
9mo Edited
Report this post
"The authors believe that were able to achieve a good accuracy for each of the four diseases. furthermore, in future we can add more disease and combine multiple method along with questionnaire to make this process more robust and stronger". From my point of view I think this article was written following an interesting methodology at least in the use of the many algorhitms and choice of stpes to build a predicitve machine learning model.

1 Comment
Like Comment
To view or add a comment, sign in
Institute for Quantitative Social Science (IQSS)

1,047 followers
8mo
Report this post
Starting at 12 noon: Melody Huang presents "Towards Credible Causal Inference under Real-World Complications: Sensitivity Analysis for Generalizability" at this week's Workshop in Applied Statistics.

Melody Huang presents "Towards Credible Causal Inference under Real-World Complications: Sensitivity Analysis for Generalizability"

projects.iq.harvard.edu
Like Comment
To view or add a comment, sign in
Anjana Sunil

Data Science Intern at Luminar Technolab
6mo
Report this post
Hai Connections... #ML_Project1: Heart disease prediction using Supervised Machine Learning Algorithm. Successfully completed my first Machine Learning project using K-Nearest Neighbors(KNN) Algorithm. Steps implemented: 1️⃣ Importing Dataset and necessary libraries 2️⃣ Data preparation 3️⃣ Separating x as input label and y as output label 4️⃣ Converting data into Training and Testing data 5️⃣ Normalization using Standard Scaler 6️⃣ Model creation using KNN algorithm 7️⃣ Performance evaluation using classification report Algorithm: #knn IDE: #googlecolab Guide: #Sabir K #Machinelearning #Datascience #KNN #HeartDiseasePrediction
Like Comment
To view or add a comment, sign in
Anu Jose

Intern at Luminar Technolab
5mo
Report this post
Hey Connections, As part of my first macine learning project , I implemented the K-Nearest Neighbors classification algorithm for the prediction of heart diseases under the guidance of Sabir K. Steps implemented:- 1. Dataset and library importation. 2. Data preparation. 3. Separating input and ouput labels. 4. Splitting the data into Training and Testing data. 5. Normalization. 6. Model creation using KNN algorithm. 7. Performance evaluation.

1 Comment
Like Comment
To view or add a comment, sign in
Croct

5,320 followers
7mo
Report this post
The core idea of Bayesian statistics is to update one’s beliefs after being exposed to new evidence, treating everything as a random variable. Learn the differences between Bayesian and Frequentist analysis and how to choose the best option for your next test in our blog post. https://lnkd.in/dgCFuAht
Like Comment
To view or add a comment, sign in
Ishan Goel

Associate Director of Data Science at VWO | Author of VWO Stats Blog
4mo Edited
Report this post
The Bayesian priors and the Frequentist null hypothesis have an interesting connection depicted in the picture below. On the right side of the figure is the observed data. On the left side of the figure is a list of possible latent causes. The null hypothesis says that A = B. But there are a range of other hypotheses such as A > 2B, A > 3B, A > 4B, and so on. All of these are the set of alternate hypotheses. While the Frequentist only calculates the chance of observing the observed data by assuming the null hypothesis, the Bayesian calculates the chance of observing the observed data from the range of all possible underlying causes. The prior is the probability distribution over all these possible causes. #statistics #statisticalsignificance #bayesian
37 Comments
Like Comment
To view or add a comment, sign in
Sharli S

Full Stack Developer | AI-ML Developer
4mo
Report this post
Hey Viewers! 👋 Thrilled to share a snapshot of my recent mini project in the world of machine learning. 🌐📊 I took a deep dive into healthcare analytics, focusing on predicting heart disease using a Naive Bayes classifier. 🩺✨ 🔍 Project Highlights: ✔️ Implemented a Naive Bayes classifier for heart disease prediction. ✔️ Explored data pre-processing techniques for optimal model performance. ✔️ Delved into feature selection and fine-tuning for accuracy. #MachineLearning #MLMiniProject #LearningJourney

2 Comments
Like Comment
To view or add a comment, sign in
SURYAKANT SAHU

Aspiring Data Analyst proficient in python, Statistics, Machine Learning, SQL, Tableau, Power Bi , Numpy , Pandas, Matplotlib
6mo
Report this post
I'm very excited to share My First Machine Learning Project : Kidney Diseases Classification Dataset. I have performed the following step on the dataset: 1. Data Preprocessing 2. Handling Outliers 3. Encoding 4. Feature Scaling 5. Train and Test Split 6. Multiple Classification Method ➡ (Logistic Regression, Decision Tree, Random Forest, Bagging Classifier, AddaBoost Classifier, Gradient Boosting Classifier, XGB Classifier, Support Vector Classifier, KNN, Gausian NB) ➡ (Voting Classifier)
Like Comment
To view or add a comment, sign in
Raed Habib

Data Analyst @ siParadigm Diagnostics Informatics | Data Scientist | AI & Machine Learning Engineer
1y Edited
Report this post
I'm absolutely delighted to share my latest Data Science project, in which my team and I addressed a critical problem; that is, "Imbalanced Datasets". We used the "Personal Key Indicators of Heart Disease" dataset (obtained from Kaggle platform) as our case study, to which we introduced and implemented various techniques to handle its imbalanced data problem, such as: - Over-Sampling & Under-Sampling (Randomly) - Synthetic Minority Over-Sampling Technique (SMOTE) - Tomek Links Under-Sampling Technique ... and more. We tested those techniques in action using multiple machine-learning algorithms, such as K-Nearest Neighbors, Random Forrest, XGBoost, and more. Then, we compared our results for each technique when it comes to prediction using various evaluation metrics, such as Recall, Precision, and F1 score; in order to get the most accurate and realistic results possible, so that, eventually we can manage to determine the best technique for our case study. This amazing project wouldn't see the light without the effort of my great team: Zeyad Usf, Ziad Elsayed, Ziad Muhammad, and of course, our mentor Doaa Mahmoud Abdel_aty, to whom, we would like to express our deepest gratitude for giving us the opportunity to undertake the project, and for her immense guidance, support, and timely engagement. Project's Notebook on Kaggle: https://lnkd.in/d32CJuqq Project's Link on GitHub: https://lnkd.in/dRcXBWmp Feel free to give us your feedback. #machinelearning #datascience

Techniques for Handling Imbalanced Data + Modeling

kaggle.com

9 Comments
Like Comment
To view or add a comment, sign in
Phastar

30,456 followers
7mo
Report this post
Following on from our recent ‘Prior Elicitation for Clinical Research’ webinar we’re pleased to share the recording and slide deck which can be downloaded here: https://lnkd.in/gh79Back. You will also be given access to additional resources that provide valuable insights from our expert Statisticians on the applications of Bayesian Statistics in clinical trials. #Bayesian #Phastar #Statistics

Presentations on demand | Phastar Knowledge Centre

info.phastar.com
Like Comment
To view or add a comment, sign in

989 followers

6 Posts

View Profile Follow

Rahil Kwatra’s Post

More Relevant Posts

Explore topics