wiki:Tools/Opus2Moses

SYNOPSIS

        # convert sentence aligned bitexts to factored moses input
        # (requires XML::Parser)

        opus2moses [OPTIONS] < sentence-align-file.xml

OPTIONS:

        -s srcfactors ......... specify source language factors besides surface words
        -t trgfactors ......... the same for the target language (separated by ':')
                                factors should be attributes of <w> tags!!
                                (except 'word' which is the word itself)
        -d dir ................ home directory of the OPUS subcorpus
        -n file-pattern ....... skip bitext files that match pattern (e.g. ep-00-1*)
        -i .................... inverse selection (only files matching file pattern)
        -e src-data-file ...... output file for source language data (default = src)
        -f src-data-file ...... output file for target language data (default = trg)

        -p sentence-pair-file . stores sentence ID pairs of the extracted pairs
        -l .................... convert to lower case
        -1 .................... 1:1 links only
        -x max ................ max size of sentences (in nr of words)

        -r .................... process untokenized (raw) XML (no length filtering)

        -M .................... read all sentences into memory for each linked document
                                before extracting linked sentences (for non-monotonic links)
Last modified 3 years ago Last modified on Nov 16, 2017, 8:20:52 PM