opus-index
Script for converting and indexing parallel corpora from OPUS using CWB
OPTIONS
-a lang.... list of aligned languages (optional, space separated)
-o ........ overwrite existing data (deletes entire data directory!!)
-y ........ assumes yes (doesn't prompt before deleting data dir!)
-s ........ skip conversion via recode (used for OO)
-m dir .... directory for temporary data (otherwise /tmp/BITEXTINDEXER...)
-i depth .. min depth for finding alignment file (0 otherwise)
-u pattern allowed structural patterns
-p pattern allowed positional patterns
-U pattern disallowed structural patterns
-P pattern disallowed positional patterns
-M ........ skip creating monolingual index files
-A ........ skip creating alignment files
-k ........ keep temp file for cwb encoding
-e enc .... use character encoding enc
-C ........ convert only (don't run indexing and registring)
Last modified 3 years ago
Last modified on Nov 16, 2017, 8:30:59 PM
