| Corpus | sentencessents | am tok | tok | sample | bilingual | monolingual |
|---|---|---|---|---|---|---|
| NLLB v1 | 101,523101.5K | 1,473,8991.5M | 629,755629.8K | |||
| GNOME v1 | 51,76051.8K | 109,698109.7K | 97,47497.5K | |||
| XLEnt v1.2 | 36,65736.7K | 95,47595.5K | 96,90296.9K | |||
| bible-uedin v1 | 6,3976.4K | 156,033156K | 82,54782.5K | |||
| TED2020 v1 | 116116 | 1,4481.4K | 1,0951.1K | |||
| MultiCCAligned v1.1 | 2828 | 264264 | 199199 | |||
| wikimedia v20230407 | 55 | 201201 | 217217 | |||
| Tatoeba v2023-04-12 | 33 | 66 | 99 | |||
| Ubuntu v14.10 | 00 | 00 | 00 | |||
| translatewiki v2025-01-01 | 00 | 00 | 00 | |||
| Total | 196.5K | 1.8M | 908.2K |