SYNOPSIS
# convert sentence aligned bitexts to factored moses input # (requires XML::Parser) opus2moses [OPTIONS] < sentence-align-file.xml
OPTIONS:
-s srcfactors ......... specify source language factors besides surface words
-t trgfactors ......... the same for the target language (separated by ':')
factors should be attributes of <w> tags!!
(except 'word' which is the word itself)
-d dir ................ home directory of the OPUS subcorpus
-n file-pattern ....... skip bitext files that match pattern (e.g. ep-00-1*)
-i .................... inverse selection (only files matching file pattern)
-e src-data-file ...... output file for source language data (default = src)
-f src-data-file ...... output file for target language data (default = trg)
-p sentence-pair-file . stores sentence ID pairs of the extracted pairs
-l .................... convert to lower case
-1 .................... 1:1 links only
-x max ................ max size of sentences (in nr of words)
-r .................... process untokenized (raw) XML (no length filtering)
-M .................... read all sentences into memory for each linked document
before extracting linked sentences (for non-monotonic links)
Last modified 3 years ago
Last modified on Nov 16, 2017, 8:20:52 PM
