wiki:DownloadTools

Pre-processing, Tagging and Parsing

In the OPUS corpus, tools and language-specific models for segmentation, tagging and parsing have been collected. The language-specific models are available for download in the table below, whereas the tools are available for download here:

For a consistent tagging and parsing procedure, the same tagging and parsing tools have been used for most of the languages, i.e. the Hunpos tagger (Péter Halácsy, András Kornai, Csaba Oravecz, 2007, Hunpos - an open source trigram tagger) and the Maltparser (Joakim Nivre and Johan Hall, 2005, Maltparser: A language-independent system for data-driven dependency parsing). For some languages, alternative taggers and/or parsers are used.

Click on a language name for more information on the models available for this language.

Language Tokenizer Sentence splitter Tagger(s) Parser(s)
Catalan SVMTool malt-1.4.1
Czech hunpos malt-1.4.1
Chinese zpar zpar zpar zpar
Danish hunpos malt-1.4.1
Dutch malt
English hunpos malt-1.4.1
French MElt malt-1.4.1
German hunpos malt-1.4.1
Hungarian hunpos
Italian TextPro malt-1.4.1
Portuguese hunpos malt-1.4.1
Russian hunpos malt-1.4.1
Slovene hunpos malt-1.4.1
Spanish SVMTool malt-1.4.1
Swedish hunpos malt-1.4.1
Turkish malt-1.4.1

Other tools

Last modified 9 years ago Last modified on Feb 28, 2012, 11:35:46 AM