opus-index
Script for converting and indexing parallel corpora from OPUS using CWB
OPTIONS
-a lang.... list of aligned languages (optional, space separated) -o ........ overwrite existing data (deletes entire data directory!!) -y ........ assumes yes (doesn't prompt before deleting data dir!) -s ........ skip conversion via recode (used for OO) -m dir .... directory for temporary data (otherwise /tmp/BITEXTINDEXER...) -i depth .. min depth for finding alignment file (0 otherwise) -u pattern allowed structural patterns -p pattern allowed positional patterns -U pattern disallowed structural patterns -P pattern disallowed positional patterns -M ........ skip creating monolingual index files -A ........ skip creating alignment files -k ........ keep temp file for cwb encoding -e enc .... use character encoding enc -C ........ convert only (don't run indexing and registring)
Last modified 3 years ago
Last modified on Nov 16, 2017, 8:30:59 PM