Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR Medical Informatics

Date Submitted: Jun 16, 2024
Open Peer Review Period: Jul 2, 2024 - Aug 27, 2024
(currently open for review and needs more reviewers - can you help?)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Research on entity recognition of Chinese medical case text for stroke disease by integrating data enhancement and loss weighting

  • Jiawei Zhou; 
  • Tong Su; 
  • Xiufeng Liu

ABSTRACT

Background:

In recent years, the rapid development of artificial intelligence technology, and the combination of Chinese medicine is becoming increasingly close, artificial intelligence's powerful data processing capabilities and pattern recognition technology, is widely used in the depth mining of Chinese medicine information.

Objective:

In order to deeply explore the theoretical knowledge of Chinese medicine contained in Chinese medical cases, this paper explores the named entity recognition technology under the corpus characteristics of Chinese medical cases, and solves the problems of model performance degradation and low classification accuracy caused by sample imbalance.

Methods:

Introducing data enhancement methods to increase the diversity of the original samples, introducing loss-weighting methods to reduce the weight of the majority class and increase the weight of the minority class; extracting the contextual semantic information of the words using the BERT two-layer bi-directional Transformer structure to feature represent the text, and then connecting the BiLSTM-WCRF model to realise the downstream task of named entity recognition.

Results:

The experiments show that the Macro-F1 value of the BERT-BiLSTM-CRF(EDA) model is 10.1% higher than that of the BiLSTM-CRF(EDA) model with the introduction of the data enhancement method; and with the introduction of the loss weighting method on top of EDA, the Macro-F1 value of the BERT-BiLSTM-WCRF(EDA) model is 6.8% higher than that of the BiLSTM- WCRF(EDA) model by 6.8%.

Conclusions:

The introduction of both data augmentation and loss weighting methods can mitigate overfitting while improving the model as a whole as well as entity recognition for individual labels. Clinical Trial: None


 Citation

Please cite as:

Zhou J, Su T, Liu X

Research on entity recognition of Chinese medical case text for stroke disease by integrating data enhancement and loss weighting

JMIR Preprints. 16/06/2024:63275

DOI: 10.2196/preprints.63275

URL: https://preprints.jmir.org/preprint/63275

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

Advertisement