SlideShare a Scribd company logo
An Application of
Ordinary Least Square
Regression & Stage-2
Regression to Remove
Endogeneity Issues in
Casual Inference
Author: Anthony Mok
Date: 18 Nov 2023
Email: xxiaohao@yahoo.com
FACTORS INFLUENCING
MEDICAL EXPENSES
• Endogeneity Issues in Casual
Inference
• Ordinary Least Square (OLS)
Regression & Stage-2
Regression
• Relationship Between OLS
Regression/Stage-2
Regression and Difference in
Difference and Interaction
Term
• Project’s Primary goals
• Context
• Dataset & Modelling
Strategies
• Findings & Conclusions
PRESENTATION TITLE 2
AGENDA
ENDOGENEITY ISSUES IN CASUAL INFERENCE
3
In casual inference, endogeneity issues arise when the variable that is causing an effect (independent variable) is
itself influenced by the outcome variable (dependent variable) or other unobserved factors
Reverse Causality
A situation where the independent
variable is influenced by the
dependent variable, making it
impossible to tell which one truly
causes the other without further
analysis
Unobserved Factors
These are like hidden players in
the causal game. They influence
both the independent and
dependent variables, but you
can't directly measure them.
These create a tangled web of relationships that makes it difficult to isolate the true causal effect of the
independent variable on the dependent variable
An Application of Ordinary Least Square Regression & Stage-2 Regression
OLS REGRESSION & STAGE-2 REGRESSION
4
In casual inference, endogeneity issues arise when the variable that is causing an effect (independent variable) is
itself influenced by the outcome variable (dependent variable) or other unobserved factors; the independent
variable in a regression is correlated with the error term
Ordinary Least Squares
(OLS) Regression
A general purpose statistical method
used to estimate the linear relationship
between a dependent variable and one
or more independent variables: fits a
straight line to the data points to
minimise the sum of the squared residuals
(the vertical distances between the data
points and the regression line)
Stage-2 Regression
A specific statistical technique, often used in
instrumental variable (IV) regression, deployed to
address endogeneity issues
An Application of Ordinary Least Square Regression & Stage-2 Regression
First stage
An instrument variable
(correlated with the endogenous
independent variable but not with
the error term) is used to predict
the endogenous variable
Second stage
The predicted values from the
first stage are used as an
independent variable in a
regression with the dependent
variable
STAGE-2 REGRESSION & DID – THE CONNECTIONS
5
Difference In Differences & Stage-2 Regression
are separate techniques used in causal inference
Difference In Differences (DID)
A research design & estimation technique
used to isolate the causal effect of a
treatment/policy intervention by
comparing changes over time between a
Treatment Group & a Control Group
Stage-2 Regression
2-stage regression is a statistical
technique used to address endogeneity
issues in regression models
An Application of Ordinary Least Square Regression & Stage-2 Regression
For example, apply DID to compare
the change in test scores for
programme participants before and
after the programme relative to the
change in test scores for non-
participants over the same period
Within the DID framework, use 2-stage
regression with an instrument variable
(e.g., distance to the programme) to
address this endogeneity and obtain
more reliable estimates of the
programme's true effect
Although distinct, DID and
2-stage regression can be
used sequentially in certain
situations
Recognise that self-selection
might still create
endogeneity issues
PROJECT’S PRIMARY GOALS
To i d e n t i f y f a c t o r s i n f l u e n c i n g m e d i c a l ex p e n s e s g i ve n
t h e va r i a b l e s w h i l e r e m o v i n g e n d o ge n e i t y i s s u e
An Application of Ordinary Least Square Regression & Stage-2 Regression
CONTEXT
Good health insurance is one that can cover a
maximum amount of medical expenses so that people
don't have to worry about paying medical bills
As a health insurance company, the company saw its
sales fall significantly over time, something that is
causing concerns
It is the firm's intention to analyse factors that
determine medical expenses in order to improve their
sales in the coming fiscal year
By conducting the study, they will have a better
understanding of their customers' needs and be able
to develop their marketing strategies accordingly
4/21/2024 PRESENTATION TITLE 7
An Application of Ordinary Least Square Regression & Stage-2 Regression
Dataset
OLS REGRESSION
Observed outcomes
from OLS regression
using independent
variables
1
STAGE-2 REGRESSION
• Observe outcomes from
Stage - 1 Regression with
the endogenous variable as
the target variable
• Observe Stage - 2
Regression using predicted
endogenous variable
2
INSIGHTS
Form insights from
results extracted out
of OLS regression
and Stage - 2
Regression
3
8
DATASET & MODELLING STRATEGIES
An Application of Ordinary Least Square Regression & Stage-2 Regression
9
SOCIAL SECURITY INCOME (SSI) RATIO
An Application of Ordinary Least Square Regression & Stage-2 Regression
Social Security Income is provided to Senior Citizens
SSI Ratio calculation is done by considering multiple parameters like Years of earning, AIME [Average indexed monthly earnings],
individual assets, and a number of dependencies
Considering the above parameters, the governing body will decide the ratio of SSI to be provided to the individuals
This final value has been provided in the dataset as ssiratio which can be used for further analysis directly)
10
OLS REGRESSION WITH INDEPENDENT VARIABLES
An Application of Ordinary Least Square Regression & Stage-2 Regression
Of the four control (also known as independent) variables, only 'illnesses' and 'healthinsu' have p-values below 0.05. These
are significant as their statistics suggest that their relationships with 'logmedexpense' are not an occurrence of chance nor a
random occurrence. So, an additional illness will raise medical expenses by 0.44 units while those with health insurance would
see their medical expenses increased by 0.07 units. Since these are independent variables, we assume that there is no
multicollinearity between these two variables. This would mean that as a patient has an additional of illness and at the same
time has a valid medical insurance, he/she would experience a total of 0.5156-unit increase to his/her medical expenses
11
STAGE - 1 REGRESSION
An Application of Ordinary Least Square Regression & Stage-2 Regression
• In a Linear Regression Analysis, the residual is the
difference between the observed value and the
predicted value of the dependent variable
• For this regression, the residual value of 0.4544
means that the predicted value of this
observation is 0.4544 units less than the
observed value.
• In other words, the model under-predicted the
value of the dependent variable for this
observation by this amount
Stage - 1 Regression with the endogenous variable as target variable
12
STAGE - 1 REGRESSION
An Application of Ordinary Least Square Regression & Stage-2 Regression
• Residuals are used to access how well the model fits the data. When the
residuals are randomly distributed around zero, it suggests that the
model is a good fit for the data
• However, the Histogram (referring left) for the residuals does not show
that the values are distributed around zero. In fact, the model mostly
over-predicted and under-predicted the value of the dependent
variable for the 10,089 observations; there are patterns in the residuals
which may suggest that the model is not a good fit for the data
• Conversely, the average predicted values for all 10,089 observations is
0.38, which is not closed to the observed values of the dependent
variable. This again suggests that the model is not a good fit for the
data
13
STAGE - 1 REGRESSION
An Application of Ordinary Least Square Regression & Stage-2 Regression
• When the Stage – 1 Regression model is not a good fit for the data, it
means that the model is not accurately capturing the relationship
between the independent and dependent variables
• There are several possible reasons causing this, like omitted variables,
incorrect functional form, or invalid instrument. In such cases, the
estimates produced by the model may be not accurately reflect the true
relationship between the variables
• To improve the fit of the model, additional relevant variables should be
included , changing the functional form of the model, or using a
different instrument so that the first stage satisfy the condition of
relevance and exogeneity
• However, since there isn’t additional information provided in the project,
making improvement to the model is infeasible
14
STAGE - 2 REGRESSION
An Application of Ordinary Least Square Regression & Stage-2 Regression
The statistics suggests that an additional unit of illness and an additional unit of income would, respectively, increase
medical expenses by 0.449 unit and 0.098 unit. Conversely, an additional unit of age and people with health
insurance would, respectively, lower medical expenses by 0.012 unit and 0.852 unit. All these four independent
variables have P-values lesser than 0.05, which suggests that these are significant, and not occurrences of chance nor a
random occurrence
Stage - 2 Regression using predicted endogenous variable
15
INSIGHTS FROM OLS & STAGE - 2 REGRESSION
An Application of Ordinary Least Square Regression & Stage-2 Regression
• In the Stage – 1 analysis, the endogenous
variable is regressed to the Instrumental
Variable. At this stage, since the P-value for the
Instrumental Variable is less than 0.05, it
indicates that the Instrumental Variable is
significantly related to the endogenous variable
• This is known as the relevance condition for an
instrumental variable, which means that the
instrument is correlated with the endogenous
variable and can be used to predict it
• If the value of the F-Stat could be calculated,
using tools like R or Python, the strength and
weakness of the instrument could be further
determined
The Linear Regression results suggest that people with health insurance would
experience a 0.075-unit increase in medical expenses. While the 2-Stage results
suggest that people with health insurance would experience a 0.852-unit decrease
in medical expenses. SSI Ratio is associated with -0.1998 units of health insurance.
These two estimates seem to be heading in opposite directions, and endogeneity
problems is suspected
An Application of
Ordinary Least Square
Regression & Stage-2
Regression to Remove
Endogeneity Issues in
Casual Inference
Author: Anthony Mok
Date: 18 Nov 2023
Email: xxiaohao@yahoo.com
FACTORS INFLUENCING
MEDICAL EXPENSES

More Related Content

Similar to Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing Medical Expenses

IRJET- Overview of Forecasting Techniques
IRJET- Overview of Forecasting TechniquesIRJET- Overview of Forecasting Techniques
IRJET- Overview of Forecasting Techniques
IRJET Journal
 
200994363
200994363200994363
200994363
Jett Hudson
 
Em score-medical-decision-making
Em score-medical-decision-makingEm score-medical-decision-making
Em score-medical-decision-making
SuperCoder LLC
 
Fuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
Fuzzy Regression Model for Knee Osteoarthritis Disease DiagnosisFuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
Fuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
IRJET Journal
 
OCI sensitvity to change
OCI sensitvity to changeOCI sensitvity to change
OCI sensitvity to change
Lucinda Gledhill
 
Market Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.pptMarket Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.ppt
Edu4Sure
 
Add slides
Add slidesAdd slides
Add slides
Rupa D
 
Cost Prediction of Health Insurance
Cost Prediction of Health InsuranceCost Prediction of Health Insurance
Cost Prediction of Health Insurance
IRJET Journal
 
USMLE CK SCORE.PDF
USMLE CK SCORE.PDFUSMLE CK SCORE.PDF
USMLE CK SCORE.PDF
Said Sarhan
 
Article
ArticleArticle
Article
FFSafety
 
ANALYSIS OF PATIENT WAITING TIME FOR HOSPITAL ADMISSION AND DISCHARGE PROCESS
ANALYSIS OF PATIENT WAITING TIME FOR HOSPITAL ADMISSION AND DISCHARGE PROCESSANALYSIS OF PATIENT WAITING TIME FOR HOSPITAL ADMISSION AND DISCHARGE PROCESS
ANALYSIS OF PATIENT WAITING TIME FOR HOSPITAL ADMISSION AND DISCHARGE PROCESS
Jessica Henderson
 
Satisfaction and loyalty
Satisfaction and loyaltySatisfaction and loyalty
Satisfaction and loyalty
TheDataNation
 
Intensive Care Unit Scoring Systems
Intensive Care Unit Scoring SystemsIntensive Care Unit Scoring Systems
Intensive Care Unit Scoring Systems
Apollo Hospitals
 
Methodologies for impact assessment of post harvest technologies
Methodologies for impact assessment of post harvest technologiesMethodologies for impact assessment of post harvest technologies
Methodologies for impact assessment of post harvest technologies
Ashish Murai
 
Poster for Sleep Final AARC
Poster for Sleep Final AARCPoster for Sleep Final AARC
Christie tiegland state_veterans_homes_not_your_average_nursing_home
Christie tiegland state_veterans_homes_not_your_average_nursing_homeChristie tiegland state_veterans_homes_not_your_average_nursing_home
Christie tiegland state_veterans_homes_not_your_average_nursing_home
Shane Newman
 
Correlation & Regression.pptx
Correlation & Regression.pptxCorrelation & Regression.pptx
Correlation & Regression.pptx
MuhammadUsman653449
 
Determining Condition Monitoring
Determining Condition MonitoringDetermining Condition Monitoring
Determining Condition Monitoring
Kerry Williams
 
Assessing the costs and effects of anti-retroviral therapy task shifting from...
Assessing the costs and effects of anti-retroviral therapy task shifting from...Assessing the costs and effects of anti-retroviral therapy task shifting from...
Assessing the costs and effects of anti-retroviral therapy task shifting from...
University of KwaZulu-Natal (UKZN) and Ethiopian Public Health Institute (EPHI)
 
Module 08 Assignment – Nursing InterventionsPurpose of the Assig
Module 08 Assignment – Nursing InterventionsPurpose of the AssigModule 08 Assignment – Nursing InterventionsPurpose of the Assig
Module 08 Assignment – Nursing InterventionsPurpose of the Assig
IlonaThornburg83
 

Similar to Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing Medical Expenses (20)

IRJET- Overview of Forecasting Techniques
IRJET- Overview of Forecasting TechniquesIRJET- Overview of Forecasting Techniques
IRJET- Overview of Forecasting Techniques
 
200994363
200994363200994363
200994363
 
Em score-medical-decision-making
Em score-medical-decision-makingEm score-medical-decision-making
Em score-medical-decision-making
 
Fuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
Fuzzy Regression Model for Knee Osteoarthritis Disease DiagnosisFuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
Fuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
 
OCI sensitvity to change
OCI sensitvity to changeOCI sensitvity to change
OCI sensitvity to change
 
Market Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.pptMarket Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.ppt
 
Add slides
Add slidesAdd slides
Add slides
 
Cost Prediction of Health Insurance
Cost Prediction of Health InsuranceCost Prediction of Health Insurance
Cost Prediction of Health Insurance
 
USMLE CK SCORE.PDF
USMLE CK SCORE.PDFUSMLE CK SCORE.PDF
USMLE CK SCORE.PDF
 
Article
ArticleArticle
Article
 
ANALYSIS OF PATIENT WAITING TIME FOR HOSPITAL ADMISSION AND DISCHARGE PROCESS
ANALYSIS OF PATIENT WAITING TIME FOR HOSPITAL ADMISSION AND DISCHARGE PROCESSANALYSIS OF PATIENT WAITING TIME FOR HOSPITAL ADMISSION AND DISCHARGE PROCESS
ANALYSIS OF PATIENT WAITING TIME FOR HOSPITAL ADMISSION AND DISCHARGE PROCESS
 
Satisfaction and loyalty
Satisfaction and loyaltySatisfaction and loyalty
Satisfaction and loyalty
 
Intensive Care Unit Scoring Systems
Intensive Care Unit Scoring SystemsIntensive Care Unit Scoring Systems
Intensive Care Unit Scoring Systems
 
Methodologies for impact assessment of post harvest technologies
Methodologies for impact assessment of post harvest technologiesMethodologies for impact assessment of post harvest technologies
Methodologies for impact assessment of post harvest technologies
 
Poster for Sleep Final AARC
Poster for Sleep Final AARCPoster for Sleep Final AARC
Poster for Sleep Final AARC
 
Christie tiegland state_veterans_homes_not_your_average_nursing_home
Christie tiegland state_veterans_homes_not_your_average_nursing_homeChristie tiegland state_veterans_homes_not_your_average_nursing_home
Christie tiegland state_veterans_homes_not_your_average_nursing_home
 
Correlation & Regression.pptx
Correlation & Regression.pptxCorrelation & Regression.pptx
Correlation & Regression.pptx
 
Determining Condition Monitoring
Determining Condition MonitoringDetermining Condition Monitoring
Determining Condition Monitoring
 
Assessing the costs and effects of anti-retroviral therapy task shifting from...
Assessing the costs and effects of anti-retroviral therapy task shifting from...Assessing the costs and effects of anti-retroviral therapy task shifting from...
Assessing the costs and effects of anti-retroviral therapy task shifting from...
 
Module 08 Assignment – Nursing InterventionsPurpose of the Assig
Module 08 Assignment – Nursing InterventionsPurpose of the AssigModule 08 Assignment – Nursing InterventionsPurpose of the Assig
Module 08 Assignment – Nursing InterventionsPurpose of the Assig
 

More from ThinkInnovation

Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
ThinkInnovation
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
ThinkInnovation
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
ThinkInnovation
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
ThinkInnovation
 
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
ThinkInnovation
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
ThinkInnovation
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
ThinkInnovation
 
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take PrecautionsDecision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
ThinkInnovation
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
ThinkInnovation
 
Create Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopCreate Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI Desktop
ThinkInnovation
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data Warehouse
ThinkInnovation
 
Creating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power PivotCreating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power Pivot
ThinkInnovation
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...
ThinkInnovation
 
Breakfast Talk - Manage Projects
Breakfast Talk - Manage ProjectsBreakfast Talk - Manage Projects
Breakfast Talk - Manage Projects
ThinkInnovation
 
Think innovation issue 4 share - scamper
Think innovation issue 4   share - scamperThink innovation issue 4   share - scamper
Think innovation issue 4 share - scamper
ThinkInnovation
 
SCAMPER
SCAMPERSCAMPER
Reverse Assumption Method
Reverse Assumption MethodReverse Assumption Method
Reverse Assumption Method
ThinkInnovation
 
Psyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating ConversationsPsyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating Conversations
ThinkInnovation
 
Visual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word AssociationVisual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word Association
ThinkInnovation
 

More from ThinkInnovation (19)

Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take PrecautionsDecision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
Create Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopCreate Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI Desktop
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data Warehouse
 
Creating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power PivotCreating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power Pivot
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...
 
Breakfast Talk - Manage Projects
Breakfast Talk - Manage ProjectsBreakfast Talk - Manage Projects
Breakfast Talk - Manage Projects
 
Think innovation issue 4 share - scamper
Think innovation issue 4   share - scamperThink innovation issue 4   share - scamper
Think innovation issue 4 share - scamper
 
SCAMPER
SCAMPERSCAMPER
SCAMPER
 
Reverse Assumption Method
Reverse Assumption MethodReverse Assumption Method
Reverse Assumption Method
 
Psyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating ConversationsPsyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating Conversations
 
Visual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word AssociationVisual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word Association
 

Recently uploaded

Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
femim26318
 
Annex K RBF's The World Game pdf document
Annex K RBF's The World Game pdf documentAnnex K RBF's The World Game pdf document
Annex K RBF's The World Game pdf document
Steven McGee
 
Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
SantuJana12
 
Field Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdfField Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdf
hritikbui
 
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
rightmanforbloodline
 
Big Data and Analytics Shaping the future of Payments
Big Data and Analytics Shaping the future of PaymentsBig Data and Analytics Shaping the future of Payments
Big Data and Analytics Shaping the future of Payments
RuchiRathor2
 
Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?
SomalyEng
 
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptxPRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
amazenolmedojeruel
 
From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...
From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...
From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...
Milind Agarwal
 
Training on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptxTraining on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptx
lenjisoHussein
 
Accounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-RegulationsAccounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-Regulations
DALubis
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
Becky Burwell
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
talha2khan2k
 
Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635
HeidiLivengood
 
CT AnGIOGRAPHY of pulmonary embolism.pptx
CT AnGIOGRAPHY of pulmonary embolism.pptxCT AnGIOGRAPHY of pulmonary embolism.pptx
CT AnGIOGRAPHY of pulmonary embolism.pptx
RejoJohn2
 
Data Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 SolutionsData Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 Solutions
District 11 Solutions
 
Selcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdfSelcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdf
SelcukTOPAL2
 
How AI is Revolutionizing Data Collection.pdf
How AI is Revolutionizing Data Collection.pdfHow AI is Revolutionizing Data Collection.pdf
How AI is Revolutionizing Data Collection.pdf
PromptCloud
 
Getting Started with Interactive Brokers API and Python.pdf
Getting Started with Interactive Brokers API and Python.pdfGetting Started with Interactive Brokers API and Python.pdf
Getting Started with Interactive Brokers API and Python.pdf
Riya Sen
 
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataTowards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Samuel Jackson
 

Recently uploaded (20)

Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
 
Annex K RBF's The World Game pdf document
Annex K RBF's The World Game pdf documentAnnex K RBF's The World Game pdf document
Annex K RBF's The World Game pdf document
 
Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
 
Field Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdfField Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdf
 
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
 
Big Data and Analytics Shaping the future of Payments
Big Data and Analytics Shaping the future of PaymentsBig Data and Analytics Shaping the future of Payments
Big Data and Analytics Shaping the future of Payments
 
Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?
 
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptxPRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
 
From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...
From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...
From Signals to Solutions: Effective Strategies for CDR Analysis in Fraud Det...
 
Training on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptxTraining on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptx
 
Accounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-RegulationsAccounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-Regulations
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
 
Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635
 
CT AnGIOGRAPHY of pulmonary embolism.pptx
CT AnGIOGRAPHY of pulmonary embolism.pptxCT AnGIOGRAPHY of pulmonary embolism.pptx
CT AnGIOGRAPHY of pulmonary embolism.pptx
 
Data Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 SolutionsData Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 Solutions
 
Selcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdfSelcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdf
 
How AI is Revolutionizing Data Collection.pdf
How AI is Revolutionizing Data Collection.pdfHow AI is Revolutionizing Data Collection.pdf
How AI is Revolutionizing Data Collection.pdf
 
Getting Started with Interactive Brokers API and Python.pdf
Getting Started with Interactive Brokers API and Python.pdfGetting Started with Interactive Brokers API and Python.pdf
Getting Started with Interactive Brokers API and Python.pdf
 
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataTowards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
 

Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing Medical Expenses

  • 1. An Application of Ordinary Least Square Regression & Stage-2 Regression to Remove Endogeneity Issues in Casual Inference Author: Anthony Mok Date: 18 Nov 2023 Email: xxiaohao@yahoo.com FACTORS INFLUENCING MEDICAL EXPENSES
  • 2. • Endogeneity Issues in Casual Inference • Ordinary Least Square (OLS) Regression & Stage-2 Regression • Relationship Between OLS Regression/Stage-2 Regression and Difference in Difference and Interaction Term • Project’s Primary goals • Context • Dataset & Modelling Strategies • Findings & Conclusions PRESENTATION TITLE 2 AGENDA
  • 3. ENDOGENEITY ISSUES IN CASUAL INFERENCE 3 In casual inference, endogeneity issues arise when the variable that is causing an effect (independent variable) is itself influenced by the outcome variable (dependent variable) or other unobserved factors Reverse Causality A situation where the independent variable is influenced by the dependent variable, making it impossible to tell which one truly causes the other without further analysis Unobserved Factors These are like hidden players in the causal game. They influence both the independent and dependent variables, but you can't directly measure them. These create a tangled web of relationships that makes it difficult to isolate the true causal effect of the independent variable on the dependent variable An Application of Ordinary Least Square Regression & Stage-2 Regression
  • 4. OLS REGRESSION & STAGE-2 REGRESSION 4 In casual inference, endogeneity issues arise when the variable that is causing an effect (independent variable) is itself influenced by the outcome variable (dependent variable) or other unobserved factors; the independent variable in a regression is correlated with the error term Ordinary Least Squares (OLS) Regression A general purpose statistical method used to estimate the linear relationship between a dependent variable and one or more independent variables: fits a straight line to the data points to minimise the sum of the squared residuals (the vertical distances between the data points and the regression line) Stage-2 Regression A specific statistical technique, often used in instrumental variable (IV) regression, deployed to address endogeneity issues An Application of Ordinary Least Square Regression & Stage-2 Regression First stage An instrument variable (correlated with the endogenous independent variable but not with the error term) is used to predict the endogenous variable Second stage The predicted values from the first stage are used as an independent variable in a regression with the dependent variable
  • 5. STAGE-2 REGRESSION & DID – THE CONNECTIONS 5 Difference In Differences & Stage-2 Regression are separate techniques used in causal inference Difference In Differences (DID) A research design & estimation technique used to isolate the causal effect of a treatment/policy intervention by comparing changes over time between a Treatment Group & a Control Group Stage-2 Regression 2-stage regression is a statistical technique used to address endogeneity issues in regression models An Application of Ordinary Least Square Regression & Stage-2 Regression For example, apply DID to compare the change in test scores for programme participants before and after the programme relative to the change in test scores for non- participants over the same period Within the DID framework, use 2-stage regression with an instrument variable (e.g., distance to the programme) to address this endogeneity and obtain more reliable estimates of the programme's true effect Although distinct, DID and 2-stage regression can be used sequentially in certain situations Recognise that self-selection might still create endogeneity issues
  • 6. PROJECT’S PRIMARY GOALS To i d e n t i f y f a c t o r s i n f l u e n c i n g m e d i c a l ex p e n s e s g i ve n t h e va r i a b l e s w h i l e r e m o v i n g e n d o ge n e i t y i s s u e An Application of Ordinary Least Square Regression & Stage-2 Regression
  • 7. CONTEXT Good health insurance is one that can cover a maximum amount of medical expenses so that people don't have to worry about paying medical bills As a health insurance company, the company saw its sales fall significantly over time, something that is causing concerns It is the firm's intention to analyse factors that determine medical expenses in order to improve their sales in the coming fiscal year By conducting the study, they will have a better understanding of their customers' needs and be able to develop their marketing strategies accordingly 4/21/2024 PRESENTATION TITLE 7 An Application of Ordinary Least Square Regression & Stage-2 Regression
  • 8. Dataset OLS REGRESSION Observed outcomes from OLS regression using independent variables 1 STAGE-2 REGRESSION • Observe outcomes from Stage - 1 Regression with the endogenous variable as the target variable • Observe Stage - 2 Regression using predicted endogenous variable 2 INSIGHTS Form insights from results extracted out of OLS regression and Stage - 2 Regression 3 8 DATASET & MODELLING STRATEGIES An Application of Ordinary Least Square Regression & Stage-2 Regression
  • 9. 9 SOCIAL SECURITY INCOME (SSI) RATIO An Application of Ordinary Least Square Regression & Stage-2 Regression Social Security Income is provided to Senior Citizens SSI Ratio calculation is done by considering multiple parameters like Years of earning, AIME [Average indexed monthly earnings], individual assets, and a number of dependencies Considering the above parameters, the governing body will decide the ratio of SSI to be provided to the individuals This final value has been provided in the dataset as ssiratio which can be used for further analysis directly)
  • 10. 10 OLS REGRESSION WITH INDEPENDENT VARIABLES An Application of Ordinary Least Square Regression & Stage-2 Regression Of the four control (also known as independent) variables, only 'illnesses' and 'healthinsu' have p-values below 0.05. These are significant as their statistics suggest that their relationships with 'logmedexpense' are not an occurrence of chance nor a random occurrence. So, an additional illness will raise medical expenses by 0.44 units while those with health insurance would see their medical expenses increased by 0.07 units. Since these are independent variables, we assume that there is no multicollinearity between these two variables. This would mean that as a patient has an additional of illness and at the same time has a valid medical insurance, he/she would experience a total of 0.5156-unit increase to his/her medical expenses
  • 11. 11 STAGE - 1 REGRESSION An Application of Ordinary Least Square Regression & Stage-2 Regression • In a Linear Regression Analysis, the residual is the difference between the observed value and the predicted value of the dependent variable • For this regression, the residual value of 0.4544 means that the predicted value of this observation is 0.4544 units less than the observed value. • In other words, the model under-predicted the value of the dependent variable for this observation by this amount Stage - 1 Regression with the endogenous variable as target variable
  • 12. 12 STAGE - 1 REGRESSION An Application of Ordinary Least Square Regression & Stage-2 Regression • Residuals are used to access how well the model fits the data. When the residuals are randomly distributed around zero, it suggests that the model is a good fit for the data • However, the Histogram (referring left) for the residuals does not show that the values are distributed around zero. In fact, the model mostly over-predicted and under-predicted the value of the dependent variable for the 10,089 observations; there are patterns in the residuals which may suggest that the model is not a good fit for the data • Conversely, the average predicted values for all 10,089 observations is 0.38, which is not closed to the observed values of the dependent variable. This again suggests that the model is not a good fit for the data
  • 13. 13 STAGE - 1 REGRESSION An Application of Ordinary Least Square Regression & Stage-2 Regression • When the Stage – 1 Regression model is not a good fit for the data, it means that the model is not accurately capturing the relationship between the independent and dependent variables • There are several possible reasons causing this, like omitted variables, incorrect functional form, or invalid instrument. In such cases, the estimates produced by the model may be not accurately reflect the true relationship between the variables • To improve the fit of the model, additional relevant variables should be included , changing the functional form of the model, or using a different instrument so that the first stage satisfy the condition of relevance and exogeneity • However, since there isn’t additional information provided in the project, making improvement to the model is infeasible
  • 14. 14 STAGE - 2 REGRESSION An Application of Ordinary Least Square Regression & Stage-2 Regression The statistics suggests that an additional unit of illness and an additional unit of income would, respectively, increase medical expenses by 0.449 unit and 0.098 unit. Conversely, an additional unit of age and people with health insurance would, respectively, lower medical expenses by 0.012 unit and 0.852 unit. All these four independent variables have P-values lesser than 0.05, which suggests that these are significant, and not occurrences of chance nor a random occurrence Stage - 2 Regression using predicted endogenous variable
  • 15. 15 INSIGHTS FROM OLS & STAGE - 2 REGRESSION An Application of Ordinary Least Square Regression & Stage-2 Regression • In the Stage – 1 analysis, the endogenous variable is regressed to the Instrumental Variable. At this stage, since the P-value for the Instrumental Variable is less than 0.05, it indicates that the Instrumental Variable is significantly related to the endogenous variable • This is known as the relevance condition for an instrumental variable, which means that the instrument is correlated with the endogenous variable and can be used to predict it • If the value of the F-Stat could be calculated, using tools like R or Python, the strength and weakness of the instrument could be further determined The Linear Regression results suggest that people with health insurance would experience a 0.075-unit increase in medical expenses. While the 2-Stage results suggest that people with health insurance would experience a 0.852-unit decrease in medical expenses. SSI Ratio is associated with -0.1998 units of health insurance. These two estimates seem to be heading in opposite directions, and endogeneity problems is suspected
  • 16. An Application of Ordinary Least Square Regression & Stage-2 Regression to Remove Endogeneity Issues in Casual Inference Author: Anthony Mok Date: 18 Nov 2023 Email: xxiaohao@yahoo.com FACTORS INFLUENCING MEDICAL EXPENSES