opus2multi
opus2multi [OPTIONS] xmldir pivot [lang-ids]*
Combine sentence alignments for several language pairs using a pivot language as intermediate language for all other languages.
OUTPUT: sentence alignment files for all languages together with the pivot language
<xmldir> should be the path to the XML directory that contains sentence alignment files for each individual language pair (e.g. xmldir/en-fr.xml.gz) <pivot> is the language ID of the pivot language (e.g. en) <lang-ids> are language IDs of the other language to be combined in the multilingual corpus
SYNOPSIS
# Combine all sentence alignments via Swedish # for German, English, Spanish and French. # The alignment units will cover the same English sentences. opus2multi /path/to/OPUS/corpus/RF/xml sv de en es fr # shortcut without full path to xml-dir # (requires OPUS in some standard directory) opus2multi RF sv de en es fr # use intralingual links (for pivot language) to extend the data set # (useful for OpenSubtitles-corpora) opus2multi -a OpenSubtitles2016 sv de en es fr
OPTIONS
-e ................. keep segments with empty links in any of the languages -i pivot-links ..... intralingual pivot link file -a ................. same as -i but read intralingual links from xmldir/../alt/ -s nr .............. max number of sentences in an alignment unit -h ................. this help
Last modified 3 years ago
Last modified on Nov 16, 2017, 8:25:22 PM