Skip to content
/ deepseg Public
forked from luozhouyang/deepseg

Chinese word segmentation in tensorflow

License

Notifications You must be signed in to change notification settings

hitdxh/deepseg

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deepseg

Chinese word segmentation in tensorflow.

Architecture

The architecture of this model is simple. There are three components of the model:

  • Embedding: words embedding layer
  • BiLSTM: a bidirectional LSTM layer
  • CRF: a conditional random field layer

Segmentation is some kind of tagging. We can tag each token of the input sequence with Only a few tags:

  • B: begin of a token
  • M: middle of a token
  • E: end of a token
  • S: single character as a token
  • O: Out of tags

We train the model to tag every input sequence, and then wo process the tagged result, so we get the final segmentation.

Training

Assuming that we have a hparams file in deepseg/example_params.json:

python -m deepseg.runner \
    --params_file=deepseg/example_params.json \
    --mode=train

Eval

python -m deepseg.runner \
    --params_file=deepseg/example_params.json \
    --mode=eval

Predict

python -m deepseg.runner \
    --params_file=deepseg/example_params.json \
    --mode=predict

Train and eval

python -m deepseg.runner \
    --params_file=deepseg/example_params.json \
    --mode=train_and_eval

Export

You may want to export the model to saved model format and serve it on tf serving, you can just run:

python -m deepseg.runner \
    --params_file=deepseg/example_params.json \
    --mode=export

About

Chinese word segmentation in tensorflow

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.6%
  • Dockerfile 0.4%