

OPUS is now part of the Nordic Language Processing Laboratory (NLPL), a collaboration project that integrates resources and infrastructures for research in computational linguistics and language technology in high-performance computing environments of the Nordic countries. The corpus is now hosted at CSC, the national scientific infrastructure provider of Finland, and the resources are directly available for users of their services. The OPUS server runs in that environment but the data sets and tools are also directly available from the taito shell. The core data is even available on the Norwegian cluster abel provided by sigma2.

If you have access to those systems then you will be able to access the data from the file system:

  • on taito: /projappl/nlpl/data/OPUS/
  • on abel: /projects/nlpl/data/OPUS/ (only raw XML data)

On both systems you can also use tools that are packaged for working with the data (and other NLPL related activities). More information is available on the NLPL wiki. The basic tools for working with OPUS data can be loaded with the module nlpl-opus:

module load nlpl-opus

With this you will have access to essential tools that make it easier to read and process the data sets.

Last modified 10 months ago Last modified on Feb 15, 2020, 8:23:57 PM