Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 16;6(3):ooad067.
doi: 10.1093/jamiaopen/ooad067. eCollection 2023 Oct.

Who is pregnant? Defining real-world data-based pregnancy episodes in the National COVID Cohort Collaborative (N3C)

Collaborators, Affiliations

Who is pregnant? Defining real-world data-based pregnancy episodes in the National COVID Cohort Collaborative (N3C)

Sara E Jones et al. JAMIA Open. .

Abstract

Objectives: To define pregnancy episodes and estimate gestational age within electronic health record (EHR) data from the National COVID Cohort Collaborative (N3C).

Materials and methods: We developed a comprehensive approach, named Hierarchy and rule-based pregnancy episode Inference integrated with Pregnancy Progression Signatures (HIPPS), and applied it to EHR data in the N3C (January 1, 2018-April 7, 2022). HIPPS combines: (1) an extension of a previously published pregnancy episode algorithm, (2) a novel algorithm to detect gestational age-specific signatures of a progressing pregnancy for further episode support, and (3) pregnancy start date inference. Clinicians performed validation of HIPPS on a subset of episodes. We then generated pregnancy cohorts based on gestational age precision and pregnancy outcomes for assessment of accuracy and comparison of COVID-19 and other characteristics.

Results: We identified 628 165 pregnant persons with 816 471 pregnancy episodes, of which 52.3% were live births, 24.4% were other outcomes (stillbirth, ectopic pregnancy, abortions), and 23.3% had unknown outcomes. Clinician validation agreed 98.8% with HIPPS-identified episodes. We were able to estimate start dates within 1 week of precision for 475 433 (58.2%) episodes. 62 540 (7.7%) episodes had incident COVID-19 during pregnancy.

Discussion: HIPPS provides measures of support for pregnancy-related variables such as gestational age and pregnancy outcomes based on N3C data. Gestational age precision allows researchers to find time to events with reasonable confidence.

Conclusion: We have developed a novel and robust approach for inferring pregnancy episodes and gestational age that addresses data inconsistency and missingness in EHR data.

Keywords: COVID-19; algorithms; electronic health records; gestational age; pregnancy.

PubMed Disclaimer

Conflict of interest statement

K.R.B. and S.L. are employees of Palantir Technologies. Y.K. and L.L. are employees of Sema4. M.N.L. is Managing Director of IPQ Analytics, LLC.

Figures

Figure 1.
Figure 1.
Common scenarios of pregnancy episodes in N3C illustrate how the EHR provides an incomplete picture of care. Some visits were not recorded (occurred in another healthcare system), and yet others were likely recorded inconsistently or inaccurately. Some routine visits may have occurred in another healthcare system or may not have occurred at all, potentially due to healthcare disruption caused by the pandemic.
Figure 2.
Figure 2.
Overview of HIPPS. The inputs (A) of our composite algorithm are all individuals and the full set of OMOP concepts in N3C. From these, we identify pregnant persons (B) and identify pregnancy-specific concepts (C). Our HIPPS algorithm (D) is comprised of the Hierarchy-based Inference of Pregnancy (HIP) algorithm (E), Get gestational timing concepts (F), Pregnancy Progression Signature (PPS) Algorithm (G), Merge episodes (H), and Estimated Start Date (ESD) Algorithm (I). The output is a dataset with pregnancy-related data and enriched with COVID covariates. The following are the individual steps within each panel. (1) From the 12 million (M) patients in N3C, we identified 4M that were both female and of reproductive age (15–55 years). (2) Of these, we identified 633K possibly pregnant persons who matched at least one concept in an initial set of ultrasound and pregnancy outcome concepts from Matcho et al. (3) To develop an enriched set of concepts specific for pregnancy, we then assessed concept frequency among the initial cohort of possibly pregnant persons and chose 1417 concepts that were present in at least 1000 individuals and were 10X (determined empirically via distribution analysis) more frequent among possibly pregnant persons relative to controls (all other patients in N3C). (4) We utilized Athena and Atlas to expand and curate the concepts from Matcho et al related to pregnancy outcomes and gestational age. (5–7) For each of the 628K pregnant persons selected for the HIPPS approach based on the presence of at least one of the 930 concepts in their records, we inferred pregnancy episodes with the Hierarchy-based Inference of Pregnancy (HIP) Algorithm. Gestation-based and outcome-based episodes were first defined before combining them for each pregnant person. Any overlapping episodes were merged as a single episode and any outcomes were reclassified if the gestational age info did not align with the outcome. (8) We then used the start dates of the HIP episodes to discover concepts that occur during a specific time during pregnancy from the 1417 pregnancy-specific concepts derived from Step 3. Concepts with a standard deviation of <1.5 months were kept for clinicians to vet and assign minimum and maximum months of when these concepts most likely occur during pregnancy. (9) The Progressing Pregnancy Signature (PPS) Algorithm was applied to both (A) validate HIP algorithm predictions and (B) provide further evidence of pregnancy in the data using a separate set of concepts and logic: iterations of comparisons of each concept with other gestational timing concepts for each person were performed, followed by checks of whether the actual difference in dates was within the minimum and maximum plausible expected months for the concepts. If at least one comparison between concepts evaluated to “TRUE”, the algorithm continued building the pregnancy episode. If all evaluated to “FALSE”, then a new pregnancy episode was begun if a minimum permissible retry period of 60 days was also met. (10) For each PPS episode, outcomes were added if they occurred within 10 months or until the next episode, whichever was earlier. (11) We then combined HIP and PPS episodes and checked which HIP episodes and outcomes were supported by PPS. (12) Lastly, pregnancy start dates were calculated using the Estimated Start Date (ESD) Algorithm using the gestational timing concepts from Step 8. The result of the HIPPS approach is a dataset that can be enriched to include other variables of interest related to pregnancy and in our case, COVID-19.
Figure 3.
Figure 3.
Inference of pregnancy episodes. (A) Definition of Hierarchy-based Inference of Pregnancy (HIP) algorithm: episodes were inferred by both an initial set of gestational timing markers (gestation-based episodes) and outcomes (outcome-based episodes), shown in steps 1–6. Both types of episodes were then merged and quality checked (steps 7–9). (B) Definition of Pregnancy Progression Signature (PPS) algorithm: we leveraged further, empirically derived, gestational timing markers based on low standard deviation distance from HIP algorithm start dates across any pregnancy outcome category, followed by clinician curation of expected gestational timing ranges (N = 74 concepts). The patient records were first scanned for this new set of gestational timing concepts (step 1), followed by detailed iterations across the patient data making comparisons between pairs of concepts to determine whether to extend or start a new episode (steps 2–3). To provide further clarity, equations representing these iterations and detailed application to an example patient can be found in Figure S1. Where present, outcomes were appended to the end of the progression signatures to derive the full recorded episode (step 4). Finally, we first check for any overlap of HIP and PPS episodes. Then we have additional steps to remove episodes that overlap with more than 2 episodes—prioritizing the episodes with the closest end dates. The resulting pairs of overlapping episodes are merged using a union of the 2 source episodes (eg, taking the earliest date from both, and the latest date from both) (step 5). Note that for both algorithms these panels provide the full algorithm definition, examples are used purely to provide clarity.
Figure 4.
Figure 4.
Inferring pregnancy start dates using the Estimated Start Date (ESD) algorithm. The 74 concepts determined to be gestational timing specific were split into 2 types: Gestational Range 3 months (GR3m) and Gestational Week (GW). GR3m indicated concepts that have a possible gestational timing span of >1 week but <3 months at the point they occur during pregnancy (eg, estriol can be tested between months 3.75–5.5). GW indicated concepts of type “Gestation period, X weeks,” denoted as point estimates since week-level was the smallest unit of precision for gestational timing in this study. Start dates or plausible start date ranges from these concepts were determined by extrapolating backwards using gestational timing information, and outliers were removed using the 1.5*IQR on each concept type and the overlap of intersecting start date ranges and point estimates (steps 1–5b). The most logical start dates were then assigned based on concept types present, and levels of precision of start date estimate were added (step 6).
Figure 5.
Figure 5.
HIPPS results. (A) Histogram of the number of outcome concepts per episode by outcome category. (B) Outcome concordance scores by outcome category. An outcome concordance score of 2 has an outcome within the expected term duration and is supported by both HIP and PPS. An outcome concordance score of 1 has an outcome within the expected term duration. An outcome concordance score of 0 does not have an outcome within the expected term duration. (C) Histogram of episodes with week-level resolution only (N = 563 471) by outcome category of recorded pregnancy lengths (start and end dates of records for pregnancy that occur within the EHR data) in weeks and (D) inferred pregnancy lengths (pregnancy start and end estimated using HIPPS) in weeks. (E) Histogram of episodes with week-level resolution only (N = 563 471) by outcome category. Number of outcome concepts were determined from the outcome date to 28 days after. (F) Proportion of episodes by outcome category and by start date precision level for baseline and Estimated Start Date Algorithm. The baseline method obtained the start dates using only the week-level or GW concepts within an episode without any removal of outliers and assigned precision based on the maximum start date difference between GW concepts.
Figure 6.
Figure 6.
Demographics and outcomes of pregnant persons before and during COVID-19 pandemic, stratified by week-level resolution (A), month-level resolution (B), and all patients (C). See Table S7 for source data. Note that COVID negative (COVID-) includes pregnant persons without any results in their records.

Update of

Similar articles

Cited by

References

    1. Kotlar B, Gerson E, Petrillo S, et al.The impact of the COVID-19 pandemic on maternal and perinatal health: a scoping review. Reprod Health. 2021;18(1):10. - PMC - PubMed
    1. Allotey J, Stallings E, Bonet M, et al.Clinical manifestations, risk factors, and maternal and perinatal outcomes of coronavirus disease 2019 in pregnancy: Living systematic review and meta-analysis. Obstetric Anesthesia Digest. 2021;41(2):81-82. 10.1097/01.aoa.0000744128.44930.48 - DOI - PMC - PubMed
    1. CDC. Pregnant and Recently Pregnant People. Centers for Disease Control and Prevention. 2022. Accessed June 21, 2022. https://www.cdc.gov/coronavirus/2019-ncov/need-extra-precautions/pregnan...
    1. Lv D, Peng J, Long R, et al.Exploring the immunopathogenesis of pregnancy with COVID-19 at the vaccination era. Front Immunol. 2021;12:683440. - PMC - PubMed
    1. Carrasco I, Muñoz-Chapuli M, Vigil-Vázquez S, et al.SARS-COV-2 infection in pregnant women and newborns in a spanish cohort (GESNEO-COVID) during the first wave. BMC Pregnancy Childbirth. 2021;21(1):326. - PMC - PubMed

Grants and funding

LinkOut - more resources