The OPUS ecosystem

Tools for finding and processing OPUS data sets:

Managing OPUS:

Machine translation with OPUS-MT:


Please, cite the following LREC 2012 paper when using OPUS and also acknowledge corpus-specific references as specified in the resource-specific information and documentation!

Links to other resources


OPUS and related resources and tools have been partially supported by various projects such as

  • LetsMT! - A Platform for Online Sharing of Training Data and Building User Tailored Machine Translation (EU ICT PSP)
  • MeMAD - Methods for Managing Audiovisual Data (EU Horizon 2020)
  • NLPL - the Nordic Language Processing Laboritory (neic)
  • EOSC-nordic - the European Open Science Cloud within the Nordic and Baltic countries (EU Horizon 2020)
  • ELG - the European Language Grid (EU Horizon 2020)
  • FoTran - Found in Translation (EU ERC)
  • HPLT - High-Performance Language Technologies (EU Horizon)

OPUS is hosted by CSC, the IT Center for Science in Finland, and heavily draws on the HPC resources provided by CSC. OPUS is also part of NLPL, the Nordic Language Processing Laboratory. Last but not least, OPUS would not be possible without the various contributions from the community including aligned data sets and tools to create and process parallel corpora.