JMIR Preprints #63275: Research on entity recognition of Chinese medical case text for stroke disease by integrating data enhancement and loss weighting

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)

Research on entity recognition of Chinese medical case text for stroke disease by integrating data enhancement and loss weighting

Jiawei Zhou;
Tong Su;
Xiufeng Liu

ABSTRACT

Background:

In recent years, the rapid development of artificial intelligence technology, and the combination of Chinese medicine is becoming increasingly close, artificial intelligence's powerful data processing capabilities and pattern recognition technology, is widely used in the depth mining of Chinese medicine information.

Objective:

In order to deeply explore the theoretical knowledge of Chinese medicine contained in Chinese medical cases, this paper explores the named entity recognition technology under the corpus characteristics of Chinese medical cases, and solves the problems of model performance degradation and low classification accuracy caused by sample imbalance.

Methods:

Introducing data enhancement methods to increase the diversity of the original samples, introducing loss-weighting methods to reduce the weight of the majority class and increase the weight of the minority class; extracting the contextual semantic information of the words using the BERT two-layer bi-directional Transformer structure to feature represent the text, and then connecting the BiLSTM-WCRF model to realise the downstream task of named entity recognition.

Results:

The experiments show that the Macro-F1 value of the BERT-BiLSTM-CRF(EDA) model is 10.1% higher than that of the BiLSTM-CRF(EDA) model with the introduction of the data enhancement method; and with the introduction of the loss weighting method on top of EDA, the Macro-F1 value of the BERT-BiLSTM-WCRF(EDA) model is 6.8% higher than that of the BiLSTM- WCRF(EDA) model by 6.8%.

Conclusions:

The introduction of both data augmentation and loss weighting methods can mitigate overfitting while improving the model as a whole as well as entity recognition for individual labels. Clinical Trial: None

Citation

Please cite as:

Zhou J, Su T, Liu X

Research on entity recognition of Chinese medical case text for stroke disease by integrating data enhancement and loss weighting

JMIR Preprints. 16/06/2024:63275

DOI: 10.2196/preprints.63275

URL: https://preprints.jmir.org/preprint/63275

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR Medical Informatics

Date Submitted: Jun 16, 2024

Open Peer Review Period: Jul 2, 2024 - Aug 27, 2024

(currently open for review and needs more reviewers - can you help?)

Research on entity recognition of Chinese medical case text for stroke disease by integrating data enhancement and loss weighting

ABSTRACT

Citation

Copyright