| Corpus | sentencessents | zh-CN tok | en tok | sample | bilingual | monolingual |
|---|---|---|---|---|---|---|
| OpenSubtitles v2024 | 22,394,81222.4M | 142,825,593142.8M | 42,942,58442.9M | |||
| CCAligned v1 | 15,181,41715.2M | 155,906,399155.9M | 42,397,65842.4M | |||
| KDE4 v2 | 139,666139.7K | 656,843656.8K | 412,836412.8K | |||
| MDN_Web_Docs v2023-09-25 | 12,86412.9K | 37,56037.6K | 120,885120.9K | |||
| GNOME v1 | 7878 | 187187 | 8585 | |||
| Ubuntu v14.10 | 00 | 00 | 00 | |||
| Total | 37.7M | 299.4M | 85.9M |