Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb:142:264-267.
doi: 10.1016/j.jclinepi.2021.08.001. Epub 2021 Aug 8.

Tutorial on directed acyclic graphs

Affiliations

Tutorial on directed acyclic graphs

Jean C Digitale et al. J Clin Epidemiol. 2022 Feb.

Abstract

Directed acyclic graphs (DAGs) are an intuitive yet rigorous tool to communicate about causal questions in clinical and epidemiologic research and inform study design and statistical analysis. DAGs are constructed to depict prior knowledge about biological and behavioral systems related to specific causal research questions. DAG components portray who receives treatment or experiences exposures; mechanisms by which treatments and exposures operate; and other factors that influence the outcome of interest or which persons are included in an analysis. Once assembled, DAGs - via a few simple rules - guide the researcher in identifying whether the causal effect of interest can be identified without bias and, if so, what must be done either in study design or data analysis to achieve this. Specifically, DAGs can identify variables that, if controlled for in the design or analysis phase, are sufficient to eliminate confounding and some forms of selection bias. DAGs also help recognize variables that, if controlled for, bias the analysis (e.g., mediators or factors influenced by both exposure and outcome). Finally, DAGs help researchers recognize insidious sources of bias introduced by selection of individuals into studies or failure to completely observe all individuals until study outcomes are reached. DAGs, however, are not infallible, largely owing to limitations in prior knowledge about the system in question. In such instances, several alternative DAGs are plausible, and researchers should assess whether results differ meaningfully across analyses guided by different DAGs and be forthright about uncertainty. DAGs are powerful tools to guide the conduct of clinical research.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Directed acyclic graph illustrating key concepts and terms.
  1. Paths are sequences of arrows, of any direction, connecting two variables and may be causal or non-causal.

  2. Paths are causal if each variable causes the subsequent variable (all the arrows point in the same direction).

  3. Paths are non-causal if the arrows do not all point in the same direction. They contain confounders and/or colliders.

  4. Confounding occurs because of common (shared) causes (e.g., C) of E and D. To estimate the effect of E on D, it is necessary to control for such common causes or other variables along the non-causal path. For example, control for either C or G would be adequate to eliminate the confounding due to C. G may be preferable, for example if it is easier to obtain a high-quality measurement of G.

  5. Mediators (e.g., M) are caused by E and, in turn, cause D. They should not be controlled for to estimate the total effect of E on D.

  6. Colliders (e.g., S) are so named because they have two arrows pointing into them. Colliders on a path block that path unless they are conditioned on (e.g., by controlling for them) or a consequence of the collider is conditioned on.

  7. Analyses should not adjust for, stratify on, or in any way condition on descendants of D (e.g., Z).

  8. Instrumental variables (e.g., I) are variables related to the exposure of interest that have no association with the outcome except through the exposure. Instrumental variables analysis (a technique common in the economics literature) can be used to derive effect estimates when there is intractable confounding of E and D.

  9. Effect modifiers (e.g., J) are variables that cause D and modify the effect of other causes of D, such as E. If E and J both cause D, then J modifies the effect of E on D on at least one effect-measure scale (additive or multiplicative).

Figure 2:
Figure 2:
Examples of settings in which controlling for or restricting on a variable can introduce bias. A box around a variable denotes conditioning on that variable. Panel a) Depiction of controlling for a mediating variable. Stroke severity is a consequence of stroke and adjusting for it blocks one pathway through which stroke causes functional decline, attenuating the estimated effect size. Panel b) Depiction of selection bias in a study estimating the effect of the number of sexual partners on cervical cancer. Here, to be included in the study, participants had to have sought care at an STI clinic. Because seeking care at an STI clinic is influenced by both the exposure and the outcome (i.e., it is a collider), the estimate of the causal effect of interest will be biased.

Comment in

Dataset use reported in

Similar articles

Cited by

References

    1. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiol Camb Mass. 1999. Jan;10(1):37–48. - PubMed
    2. [Seminal introduction of DAGs in epidemiologic research.]

    1. Glymour MM. Using causal diagrams to understand common problems in social epidemiology. In: Oakes JM, Kaufman JS, editors. Methods in Social Epidemiology. 2nd ed. San Francisco: Jossey-Bass; 2017. p. 458–92.
    2. [Comprehensive and approachable explanation of the practical use of DAGs.]

    1. Mansournia MA, Higgins JPT, Sterne JAC, Hernán MA. Biases in randomized trials: A conversation between trialists and epidemiologists. Epidemiol Camb Mass. 2017. Jan;28(1):54–9. - PMC - PubMed
    1. Hernán MA, Hernández-Díaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: An application to birth defects epidemiology. Am J Epidemiol. 2002. Jan 15;155(2):176–84. - PubMed
    2. [Landmark convincing argument as to why prior causal knowledge about the ambient biologic or behavioral system, encoded in DAGs, should be used to decide whether a variable is contributing to confounding and therefore must be managed in order to avoid bias. Overturns decades of varying and ill-defined approaches to the evaluation of confounding.]

    1. Didelez V, Sheehan N. Mendelian randomization as an instrumental variable approach to causal inference. Stat Methods Med Res. 2007. Aug 1;16(4):309–30. - PubMed

Publication types

LinkOut - more resources