ParakeetRebeccaRosario/examples/text_frontend
TianYuan e8991c973c restructure frontend example 2021-08-16 08:31:37 +00:00
..
data restructure frontend example 2021-08-16 08:31:37 +00:00
README.md restructure frontend example 2021-08-16 08:31:37 +00:00
get_g2p_data.py restructure frontend example 2021-08-16 08:31:37 +00:00
get_textnorm_data.py restructure frontend example 2021-08-16 08:31:37 +00:00
run.sh restructure frontend example 2021-08-16 08:31:37 +00:00
test_g2p.py restructure frontend example 2021-08-16 08:31:37 +00:00
test_textnorm.py restructure frontend example 2021-08-16 08:31:37 +00:00

README.md

Chinese Text Frontend Example

Here's an example for Chinese text frontend, including g2p and text normalization.

G2P

For g2p, we use BZNSYP's phone label as the ground truth and we delete silence tokens in labels and predicted phones.

You should Download BZNSYP from it's Official Website and extract it. Assume the path to the dataset is ~/datasets/BZNSYP.

We use WER as evaluation criterion.

Text Normalization

For text normalization, the test data is data/textnorm_test_cases.txt, we use | as the separator of raw_data and normed_data.

We use CER as evaluation criterion.

Start

Run the command below to get the results of test.

./run.sh

The avg WER of g2p is: 0.027124048652822204

The avg CER of text normalization is: 0.0061629764893859846