This week, I was discussing Bayes’ theorem with one of the scientists on my team. It is one of my favorite statistical theory, and is often counterintuitive. Here is an example: Imagine a rare disease X, affecting 1 in 1000 people. Assume, there's a diagnostic test with 99% accuracy in detecting the disease when present, with a 2% false positive rate. Let’s say someone (Person A) takes the test, and it comes back positive. What's the likelihood that they actually have the disease? At first glance, one might think that the probability of having the disease, given a positive test result, would be high – especially since the test is 99% accurate with only a 2% false positive rate. But is that the case? Bayes’ theorem can give the exact answer, here is the number crunching: Prob (Having the disease X) = P(A) = 0.1% Prob (Not having the disease X) = P(A’) = 1- P(A) = 99.9% Prob (Getting a positive test result, when the disease is present) = P(B∣A) = 99% Prob (Getting a positive test result, when the disease is not present) = P(B∣A’) =2% From law of total probability, Prob (Getting a positive test result) = P(B)=P(B∣A) ×P(A)+P(B∣A′) ×P(A′) = 2.1% From Bayes’ theorem, Prob (Having the disease, when the test is positive) = P(A∣B) = P(B∣A) ×P(A) / P(B) = 0.04726 = 4.71% So, despite the positive test result, the likelihood that person A actually has the disease X is only 4.7% (in other words, the likelihood that person A does not have the disease X even with a positive test result is quite high, at 95.3%). This is what makes Bayes' Theorem so fascinating! #DataScience #Statistics #Bayes’Theorem
Rahil Kwatra’s Post
More Relevant Posts
-
Leading in Corporate Health with specialization in Occupational Health and Wellbeing | Health data analytics specialist
"The authors believe that were able to achieve a good accuracy for each of the four diseases. furthermore, in future we can add more disease and combine multiple method along with questionnaire to make this process more robust and stronger". From my point of view I think this article was written following an interesting methodology at least in the use of the many algorhitms and choice of stpes to build a predicitve machine learning model.
To view or add a comment, sign in
-
Starting at 12 noon: Melody Huang presents "Towards Credible Causal Inference under Real-World Complications: Sensitivity Analysis for Generalizability" at this week's Workshop in Applied Statistics.
To view or add a comment, sign in
-
Hai Connections... #ML_Project1: Heart disease prediction using Supervised Machine Learning Algorithm. Successfully completed my first Machine Learning project using K-Nearest Neighbors(KNN) Algorithm. Steps implemented: 1️⃣ Importing Dataset and necessary libraries 2️⃣ Data preparation 3️⃣ Separating x as input label and y as output label 4️⃣ Converting data into Training and Testing data 5️⃣ Normalization using Standard Scaler 6️⃣ Model creation using KNN algorithm 7️⃣ Performance evaluation using classification report Algorithm: #knn IDE: #googlecolab Guide: #Sabir K #Machinelearning #Datascience #KNN #HeartDiseasePrediction
To view or add a comment, sign in
-
Hey Connections, As part of my first macine learning project , I implemented the K-Nearest Neighbors classification algorithm for the prediction of heart diseases under the guidance of Sabir K. Steps implemented:- 1. Dataset and library importation. 2. Data preparation. 3. Separating input and ouput labels. 4. Splitting the data into Training and Testing data. 5. Normalization. 6. Model creation using KNN algorithm. 7. Performance evaluation.
To view or add a comment, sign in
-
The core idea of Bayesian statistics is to update one’s beliefs after being exposed to new evidence, treating everything as a random variable. Learn the differences between Bayesian and Frequentist analysis and how to choose the best option for your next test in our blog post. https://lnkd.in/dgCFuAht
To view or add a comment, sign in
-
-
The Bayesian priors and the Frequentist null hypothesis have an interesting connection depicted in the picture below. On the right side of the figure is the observed data. On the left side of the figure is a list of possible latent causes. The null hypothesis says that A = B. But there are a range of other hypotheses such as A > 2B, A > 3B, A > 4B, and so on. All of these are the set of alternate hypotheses. While the Frequentist only calculates the chance of observing the observed data by assuming the null hypothesis, the Bayesian calculates the chance of observing the observed data from the range of all possible underlying causes. The prior is the probability distribution over all these possible causes. #statistics #statisticalsignificance #bayesian
To view or add a comment, sign in
-
-
Hey Viewers! 👋 Thrilled to share a snapshot of my recent mini project in the world of machine learning. 🌐📊 I took a deep dive into healthcare analytics, focusing on predicting heart disease using a Naive Bayes classifier. 🩺✨ 🔍 Project Highlights: ✔️ Implemented a Naive Bayes classifier for heart disease prediction. ✔️ Explored data pre-processing techniques for optimal model performance. ✔️ Delved into feature selection and fine-tuning for accuracy. #MachineLearning #MLMiniProject #LearningJourney
To view or add a comment, sign in
-
Aspiring Data Analyst proficient in python, Statistics, Machine Learning, SQL, Tableau, Power Bi , Numpy , Pandas, Matplotlib
I'm very excited to share My First Machine Learning Project : Kidney Diseases Classification Dataset. I have performed the following step on the dataset: 1. Data Preprocessing 2. Handling Outliers 3. Encoding 4. Feature Scaling 5. Train and Test Split 6. Multiple Classification Method ➡ (Logistic Regression, Decision Tree, Random Forest, Bagging Classifier, AddaBoost Classifier, Gradient Boosting Classifier, XGB Classifier, Support Vector Classifier, KNN, Gausian NB) ➡ (Voting Classifier)
To view or add a comment, sign in
-
I'm absolutely delighted to share my latest Data Science project, in which my team and I addressed a critical problem; that is, "Imbalanced Datasets". We used the "Personal Key Indicators of Heart Disease" dataset (obtained from Kaggle platform) as our case study, to which we introduced and implemented various techniques to handle its imbalanced data problem, such as: - Over-Sampling & Under-Sampling (Randomly) - Synthetic Minority Over-Sampling Technique (SMOTE) - Tomek Links Under-Sampling Technique ... and more. We tested those techniques in action using multiple machine-learning algorithms, such as K-Nearest Neighbors, Random Forrest, XGBoost, and more. Then, we compared our results for each technique when it comes to prediction using various evaluation metrics, such as Recall, Precision, and F1 score; in order to get the most accurate and realistic results possible, so that, eventually we can manage to determine the best technique for our case study. This amazing project wouldn't see the light without the effort of my great team: Zeyad Usf, Ziad Elsayed, Ziad Muhammad, and of course, our mentor Doaa Mahmoud Abdel_aty, to whom, we would like to express our deepest gratitude for giving us the opportunity to undertake the project, and for her immense guidance, support, and timely engagement. Project's Notebook on Kaggle: https://lnkd.in/d32CJuqq Project's Link on GitHub: https://lnkd.in/dRcXBWmp Feel free to give us your feedback. #machinelearning #datascience
Techniques for Handling Imbalanced Data + Modeling
kaggle.com
To view or add a comment, sign in
-
Following on from our recent ‘Prior Elicitation for Clinical Research’ webinar we’re pleased to share the recording and slide deck which can be downloaded here: https://lnkd.in/gh79Back. You will also be given access to additional resources that provide valuable insights from our expert Statisticians on the applications of Bayesian Statistics in clinical trials. #Bayesian #Phastar #Statistics
Presentations on demand | Phastar Knowledge Centre
info.phastar.com
To view or add a comment, sign in