Abstract
Introduction
Stigma and bias related to race and other minoritized statuses may underlie disparities in pregnancy and birth outcomes. One emerging method to identify bias is the study of stigmatizing language in the electronic health record. The objective of our study was to develop automated natural language processing (NLP) methods to identify two types of stigmatizing language: marginalizing language and its complement, power/privilege language, accurately and automatically in labor and birth notes.
Methods
We analyzed notes for all birthing people > 20 weeks’ gestation admitted for labor and birth at two hospitals during 2017. We then employed text preprocessing techniques, specifically using TF-IDF values as inputs, and tested machine learning classification algorithms to identify stigmatizing and power/privilege language in clinical notes. The algorithms assessed included Decision Trees, Random Forest, and Support Vector Machines. Additionally, we applied a feature importance evaluation method (InfoGain) to discern words that are highly correlated with these language categories.
Results
For marginalizing language, Decision Trees yielded the best classification with an F-score of 0.73. For power/privilege language, Support Vector Machines performed optimally, achieving an F-score of 0.91. These results demonstrate the effectiveness of the selected machine learning methods in classifying language categories in clinical notes.
Conclusion
We identified well-performing machine learning methods to automatically detect stigmatizing language in clinical notes. To our knowledge, this is the first study to use NLP performance metrics to evaluate the performance of machine learning methods in discerning stigmatizing language. Future studies should delve deeper into refining and evaluating NLP methods, incorporating the latest algorithms rooted in deep learning.
Significance
Traditional informatics methods include natural language processing, and these methods have been increasingly applied to the study of public health problems using electronic health records.
AbstractSection What this Study Adds?We identified well-performing machine learning methods to automatically identify stigmatizing language in labor and birth clinical notes. These methods have not been applied to labor and birth clinical notes and have the potential to be a powerful tool in examining perinatal health inequities.
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10995-023-03857-4/MediaObjects/10995_2023_3857_Fig1_HTML.png)
Similar content being viewed by others
References
Alpaydin, E. (2020). Introduction to machine learning, fourth edition. MIT Press. https://books.google.com/books?id=tZnSDwAAQBAJ.
Alpert, J. M., Morris, B. B., Thomson, M. D., Matin, K., Geyer, C. E., & Brown, R. F. (2019). OpenNotes in oncology: Oncologists’ perceptions and a baseline of the content and style of their clinician notes. Transl Behav Med, 9(2), 347–356. https://doi.org/10.1093/tbm/iby029.
Barcelona, V., Horton, R. L., Rivlin, K., Harkins, S., Green, C., Robinson, K., & Topaz, M. (2023a). The Power of Language in Hospital Care for pregnant and Birthing people: A vision for change. Obstetrics & Gynecology. https://doi.org/10.1097/AOG.0000000000005333.
Barcelona, V., Scharp, D., Idnay, B. R., Moen, H., Goffman, D., Cato, K., & Topaz, M. (2023b). A qualitative analysis of stigmatizing language in birth admission clinical notes. Nursing Inquiry, e12557. https://doi.org/10.1111/nin.12557.
Beach, M. C., Saha, S., Park, J., Taylor, J., Drew, P., Plank, E., & Chee, B. (2021). Testimonial injustice: Linguistic Bias in the Medical Records of Black Patients and women. Journal of General Internal Medicine, 36(6), 1708–1714. https://doi.org/10.1007/s11606-021-06682-z[doi].
Berthold, M. R. C., Dill, N., Gabriel, F., Kotter, T. R., Meinl, T., Ohl, T., Thiel, P., & Wiswedel, K., B (2009). KNIME – the Konstanz Information Miner. AcM SIGKDD Explorations Newsletter, 11(1), 26–31.
Braveman, P., Dominguez, T. P., Burke, W., Dolan, S. M., Stevenson, D. K., Jackson, F. M., & Waddell, L. (2021). Explaining the black-white disparity in Preterm Birth: A Consensus Statement from a Multi-disciplinary Scientific Work Group convened by the March of dimes [Review]. 3. https://doi.org/10.3389/frph.2021.684207.
Bridle, J. S. (1990). Probabilistic interpretation of Feedforward Classification Network Outputs, with relationships to Statistical Pattern Recognition. In F. F. Soulié, & J. Hérault (Eds.), Neurocomputing (Vol. 68). Springer.
Coyne, I. T. (1997). Sampling in qualitative research. Purposeful and theoretical sampling; merging or clear boundaries? Journal of Advanced Nursing, 26(3), 623–630. https://doi.org/10.1046/j.1365-2648.1997.t01-25-00999.x.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
Drewniak, D., Krones, T., & Wild, V. (2017). Do attitudes and behavior of health care professionals exacerbate health care disparities among immigrant and ethnic minority groups? An integrative literature review. International Journal of Nursing Studies, 70, 89–98. https://doi.org/10.1016/j.ijnurstu.2017.02.015.
Everett, B. G., Limburg, A., McKetta, S., & Hatzenbuehler, M. L. (2022). State-Level regulations regarding the protection of sexual minorities and birth outcomes: Results from a Population-based Cohort Study. Psychosomatic Medicine, 84(6), 658–668. https://doi.org/10.1097/psy.0000000000001092.
Fernández, L., Fossa, A., Dong, Z., Delbanco, T., Elmore, J., Fitzgerald, P., & DesRoches, C. (2021). Words Matter: What do patients find judgmental or Offensive in Outpatient notes? Journal of General Internal Medicine, 36(9), 2571–2578. https://doi.org/10.1007/s11606-020-06432-7.
Goddu, A. P., O’Conor, K. J., Lanzkron, S., Saheed, M. O., Saha, S., Peek, M. E., & Beach, M. C. (2018). Do words Matter? Stigmatizing Language and the transmission of Bias in the medical record. Journal of General Internal Medicine, 33(5), 685–691. https://doi.org/10.1007/s11606-017-4289-2[doi].
Goh, Y. C., Cai, X. Q., Theseira, W., Ko, G., & Khor, K. A. (2020). Evaluating human versus machine learning performance in classifying research abstracts. Scientometrics, 125(2), 1197–1212. https://doi.org/10.1007/s11192-020-03614-2.
Hall, W. J., Chapman, M. V., Lee, K. M., Merino, Y. M., Thomas, T. W., Payne, B. K., & Coyne-Beasley, T. (2015). Implicit Racial/Ethnic Bias among Health Care Professionals and its influence on Health Care outcomes: A systematic review. American Journal of Public Health, 105(12), e60–76. https://doi.org/10.2105/AJPH.2015.302903[doi].
Himmelstein, G., Bates, D., & Zhou, L. (2022). Examination of stigmatizing Language in the Electronic Health Record. JAMA Netw Open, 5(1), e2144967. https://doi.org/10.1001/jamanetworkopen.2021.44967.
Ho, T. K. (1995). Random decision forests. The Institute of Electronical and Electronics Engineers (IEEE), In Proceedings of 3rd international conference on document analysis and recognition.
Hoover, K., Lockhart, S., Callister, C., Holtrop, J. S., & Calcaterra, S. L. (2022). Experiences of stigma in hospitals with addiction consultation services: A qualitative analysis of patients’ and hospital-based providers’ perspectives. Journal of Substance Abuse Treatment, 138, 108708. https://doi.org/10.1016/j.jsat.2021.108708.
Hsieh, H. F., & Shannon, S. E. (2005). Three approaches to qualitative content analysis. Qualitative Health Research, 15(9), 1277–1288. https://doi.org/10.1177/1049732305276687.
Jindal, M., Thornton, R. L. J., McRae, A., Unaka, N., Johnson, T. J., & Mistry, K. B. (2022). Effects of a curriculum addressing racism on Pediatric residents’ racial biases and Empathy. J Grad Med Educ, 14(4), 407–413. https://doi.org/10.4300/jgme-d-21-01048.1.
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. European conference on machine learning Berlin, Heidelberg.
Kim, H., Sefcik, J. S., & Bradway, C. (2017). Characteristics of qualitative descriptive studies: A systematic review. Research in Nursing & Health, 40(1), 23–42. https://doi.org/10.1002/nur.21768.
Kravitz, E., Suh, M., Russell, M., Ojeda, A., Levison, J., & McKinney, J. (2021). Screening for Substance Use disorders during pregnancy: A decision at the intersection of racial and Reproductive Justice. American Journal of Perinatology. https://doi.org/10.1055/s-0041-1739433.
Landau, A. Y., Blanchard, A., Cato, K., Atkins, N., Salazar, S., Patton, D. U., & Topaz, M. (2022). Considerations for development of Child Abuse and neglect phenotype with implications for reduction of racial bias: A qualitative study. Journal of the American Medical Informatics Association, 29(3), 512–519. https://doi.org/10.1093/jamia/ocab275.
Locke, S. B., Al-Adely, A., Moore, S., Wilson, J., & Kitchen, A., G.B (2021). Natural language processing in medicine: A review. Trends in Anaesthesia and Critical care, 38, 4–9. https://doi.org/10.1016/j.tacc.2021.02.007.
Malouf, R., Redshaw, M., Kurinczuk, J. J., & Gray, R. (2014). Systematic review of heath care interventions to improve outcomes for women with disability and their family during pregnancy, birth and postnatal period. Bmc Pregnancy and Childbirth, 14, 58. https://doi.org/10.1186/1471-2393-14-58.
Manning, C. D. R., & Schütze, P., H (2008). Introduction to information retrieval (Vol. 39). Cambridge University Press.
Martin, J. A., & Osterman, M. J. K. (2018). Describing the increase in Preterm Births in the United States, 2014–2016. NCHS data Brief, (312)(312), 1–8.
Martin, K., & Stanford, C. (2020). An analysis of documentation language and word choice among forensic mental health nurses. International Journal of Mental Health Nursing, 29(6), 1241–1252. https://doi.org/10.1111/inm.12763.
Minehart, R. D., Bryant, A. S., Jackson, J., & Daly, J. L. (2021). Racial/Ethnic inequities in pregnancy-related morbidity and mortality. Obstet Gynecol Clin North Am, 48(1), 31–51. https://doi.org/10.1016/j.ogc.2020.11.005.
Omenka, O. I., Watson, D. P., & Hendrie, H. C. (2020). Understanding the healthcare experiences and needs of African immigrants in the United States: A scoping review. BMC Public Health, 20(1), 27. https://doi.org/10.1186/s12889-019-8127-9.
Park, J., Saha, S., Chee, B., Taylor, J., & Beach, M. C. (2021). Physician use of stigmatizing Language in Patient Medical records. JAMA Network open, 4(7). https://doi.org/10.1001/jamanetworkopen.2021.17052.
Philipsborn, R. P., Sorscher, E. A., Sexson, W., & Evans, H. H. (2021). Born on U.S. Soil: Access to Healthcare for neonates of non-citizens. Maternal and Child Health Journal, 25(1), 9–14. https://doi.org/10.1007/s10995-020-03020-3.
Quinlan, J. R. (2014). C4. 5: Programs for Machine Learning. 58–60. https://books.google.com/books/about/C4_5.html?id=b3ujBQAAQBAJ.
Sandelowski, M. (2010). What’s in a name? Qualitative description revisited. Research in Nursing & Health, 33(1), 77–84. https://doi.org/10.1002/nur.20362.
Shattell, M. M. (2009). Stigmatizing language with unintended meanings: Persons with mental Illness or mentally ill persons? Issues in Mental Health Nursing, 30(3), 199. https://doi.org/10.1080/01612840802694668.
Sun, M., Oliwa, T., Peek, M. E., & Tung, E. L. (2022). Negative patient descriptors: Documenting racial Bias. Health Aff (Millwood), 41(2), 203–211. https://doi.org/10.1377/hlthaff.2021.01423. The Electronic Health Record.
Tiwary, U. S. S., T (2008). Natural Language Processing and Information Retrieval. Oxford University Press, Inc. https://dl.acm.org/doi/abs/10.5555/1481140.
Togioka, B. M., Seligman, K. M., & Delgado, C. M. (2022). Limited English proficiency in the labor and delivery unit. Current Opinion in Anaesthesiology, 35(3), 285–291. https://doi.org/10.1097/aco.0000000000001131.
United States Department of Health and Human Services (2020). 08/04/2020). 21st Century Cures Act: Interoperability, information blocking, and the ONC health IT certification program National Archives. Retrieved November 5 from https://www.federalregister.gov/documents/2020/05/01/2020-07419/21st-century-cures-act-interoperability-information-blocking-and-the-onc-health-it-certification.
Vaswani, A. S., Parmar, N., Uszkoreit, N., Jones, J., Gomez, L., Kaiser, A. N., & Polosukhin, L. (2017). I. Attention is all you need. Advances in neural information processing systems https://arxiv.org/abs/1706.03762.
Acknowledgements
This project was supported by funding from the Columbia University Data Science Institute Seeds Funds Program and a grant (GBMF9048) from the Gordon and Betty Moore Foundation.
Author information
Authors and Affiliations
Contributions
Author contributions are as follows: Conceptualization (VB, MT), Analysis (DS, AD, BRI, MT), Original draft (VB), Revised draft (VB, DS, HM, AD, BRI, KC, MT), Funding (VB, MT, KC).
Corresponding author
Ethics declarations
Conflict of Interest
The authors have no conflicts of interest to disclose.
Human Subjects
Human subjects approval for this study was received from the Institutional Review Board at Columbia Irving Medical Center, AAAT9870.
Data Sharing
No new data were generated for this analysis, therefore, there are no data to share.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Barcelona, V., Scharp, D., Moen, H. et al. Using Natural Language Processing to Identify Stigmatizing Language in Labor and Birth Clinical Notes. Matern Child Health J 28, 578–586 (2024). https://doi.org/10.1007/s10995-023-03857-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10995-023-03857-4