Home / Query / WordAlign / Wiki | [ada83] [bible] [bianet] [books] [CCAligned] [CCMatrix] [CAPES] [DGT] [DOGC] [ECB] [EhuHac] [EiTB] [Elhuyar] [ELITR_ECA] [ELRC] [EMEA] [EUbooks] [EU] [Europarl] [EuroPat] [finlex] [fiskmö] [giga] [GNOME] [GlobalVoices] [hren] [infopankki] [JRC] [KDE4/doc] [liv4ever] [MBS] [memat] [MontenegrinSubs] [MultiUN] [MultiParaCrawl] [MultiCCAligned] [MT560] [NC] [Ofis] [OO/OO3] [subs/16/18] [Opus100] [ParaCrawl] [ParCor] [PHP] [QED] [sardware] [SciELO] [SETIMES] [SPC] [Tatoeba] [Tanzil] [TEP] [TED] [tico19] [Tilde] [Ubuntu] [UN] [UNPC] [WikiMatrix] [Wikimedia] [Wikipedia] [WikiSource] [WMT] [XhosaNavy] |
The following table lists alignments between subtitles in the same language. There are often various alternative subtitle files for each movie in the collection. Many of them are identical or near identical. We have processed them all and sorted the results in various ways. The resulting files are linked in the table for each language. Here is an explanation of the different columns:
Some alignment files exist as XCES only (standoff annotation of sentence alignment) and some of them are also available in TMX format (to make it easier to inspect the actual sentence pairs). If you use the XCES alignment files, then you will also need the corpus, which is linked in the first column.
Please cite the following article if you use any part of the corpus in your own work:
Jörg Tiedemann, 2016, Finding Alternative Translations in a Large Corpus of Movie Subtitles.
In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)