wiki:Tools/OpusIndex

opus-index

Script for converting and indexing parallel corpora from OPUS using CWB

OPTIONS

          -a lang.... list of aligned languages (optional, space separated)
          -o ........ overwrite existing data (deletes entire data directory!!)
          -y ........ assumes yes (doesn't prompt before deleting data dir!)
          -s ........ skip conversion via recode (used for OO)
          -m dir .... directory for temporary data (otherwise /tmp/BITEXTINDEXER...)
          -i depth .. min depth for finding alignment file (0 otherwise)
          -u pattern  allowed structural patterns
          -p pattern  allowed positional patterns
          -U pattern  disallowed structural patterns
          -P pattern  disallowed positional patterns
          -M ........ skip creating monolingual index files
          -A ........ skip creating alignment files
          -k ........ keep temp file for cwb encoding
          -e enc .... use character encoding enc
          -C ........ convert only (don't run indexing and registring)
Last modified 3 years ago Last modified on Nov 16, 2017, 8:30:59 PM