DownloadTools – OPUS

Context Navigation

Pre-processing, Tagging and Parsing

In the OPUS corpus, tools and language-specific models for segmentation, tagging and parsing have been collected. The language-specific models are available for download in the table below, whereas the tools are available for download here:

Hunpos tagger: http://code.google.com/p/hunpos/downloads/list
MaltParser?: http://maltparser.org/download.html (use version 1.4.1 for which the models below are trained for!)
MElt tagger for French: https://gforge.inria.fr/frs/download.php/27240/
SVMTool tagger (with pretrained models for English, Spanish and Catalan): http://www.lsi.upc.edu/~nlp/SVMTool/#DOWNLOAD
Zpar statistical parser with language-specific features for Chinese and English: http://sourceforge.net/projects/zpar/

For a consistent tagging and parsing procedure, the same tagging and parsing tools have been used for most of the languages, i.e. the Hunpos tagger (Péter Halácsy, András Kornai, Csaba Oravecz, 2007, Hunpos - an open source trigram tagger) and the Maltparser (Joakim Nivre and Johan Hall, 2005, Maltparser: A language-independent system for data-driven dependency parsing). For some languages, alternative taggers and/or parsers are used.

Click on a language name for more information on the models available for this language.

Language	Tokenizer	Sentence splitter	Tagger(s)	Parser(s)
Catalan			SVMTool	malt-1.4.1
Czech			hunpos	malt-1.4.1
Chinese	zpar	zpar	zpar	zpar
Danish			hunpos	malt-1.4.1
Dutch				malt
English			hunpos	malt-1.4.1
French			MElt	malt-1.4.1
German			hunpos	malt-1.4.1
Hungarian			hunpos
Italian			TextPro	malt-1.4.1
Portuguese			hunpos	malt-1.4.1
Russian			hunpos	malt-1.4.1
Slovene			hunpos	malt-1.4.1
Spanish			SVMTool	malt-1.4.1
Swedish			hunpos	malt-1.4.1
Turkish				malt-1.4.1

Other tools

language guesser: textcat with pre-trained language models

Last modified 9 years ago Last modified on Feb 28, 2012, 11:35:46 AM

Download in other formats:

Plain Text