Find the corpus you are looking for

Here you find the corpora listed by name. The ELRC and ELRA link will take you to their entire collections.

ada83Ada 83 manualsALT20k Myanmar-English parallel sentencesAnuvaad links for popular Indian languagesBianetTranslated Turkish articles (tr, ku, en)bible-uedinCollection of Bible translationsBooksA collection of translated literatureCAPESThesis and dissertation abstractsCCAlignedParallel documents from Common CrawlCCMatrixParallel sentences from Common CrawlChuBiCoresources for the Chuvash languageDGTA collection of EU TMs provided by the JRCDocHPLTDOGCDocuments from the Catalan GovermentECBEuropean Central Bank corpusECDCEuropean Centre for Disease Prevention corpusEhuHacHizkuntzen Arteko CorpusaEiTB-ParCCParallel Corpus of Comparable NewsElhuyarfoundation Elhuyar corpusELITR-ECAEuropean Court of Auditors documentsELRA CollectionELRC CollectionEMEAEuropean Medicines Agency documentsEOPCEUbookshopdocuments from the EU bookshopEUconstThe European constitutionEuroparlEuropean Parliament ProceedingsEuroPatParallel corpus of patentsFFRFon and French sentencesFinlexLegislative and other judicial information of FinlandfiskmoData from the fiskmö projectgiga-frenFrench-English Gigal-Word CorpusGlobalVoicesNews stories in various languagesGNOMEGNOME localization filesGoURMETParallel data from web crawlsHPLTHPLT web crawled parallel sentenceshrenWaCCroatian-English Parallel Web CorpusIITBIIT Bombay English-Hindi corpusinfopankkiinfopankki.fi via the Open Data APIInterdialectCorpusJESCJapanese-English Subtitle CorpusJoshua-IPCIndian-language from Wikipedia pages corpusJParaCrawlEnglish-Japanese parallel corpus JRC-Acquislegislative EU textsKDE4KDE4 localization files (v.2)KDEdocthe KDE manual corpusKFTTKyoto Free Translation Task corpusLinguaTools-WikiTitlesbilingual titles of Wikipedia articlesliv4everLivonian 4-lingual parallel corpusMaCoCuMBSBelgisch Staatsblad corpusMDN_Web_DocsMDN web docsmematXhosa/English parallel dataMIZANA large Persian-English corpusMontenegrinSubsMontenegrin movie subtitlesMozilla-I10nMultiCCAlignedPivot-based Bitexts from CCAlignedMultiHPLTHPLT web crawled parallel sentencesMultiMaCoCuMultiParaCrawlNon-English Bitexts from ParaCrawlMultiUNTranslated UN documentsNeuLab-TedTalksTED talk subtitlesNews-CommentaryNews CommentariesNLLBbased on Meta AI metadataNunavut_HansardOfisPublikBreton - French parallel textsOpenOfficethe OpenOffice.org corpus OpenSubtitlestranslated subtitlesParaCrawlParallel corpora from Web CrawlsParaCrawl-BonusParIceEnglish-Icelandic parallel corpusPHPthe PHP manual corpuspmindiaparallel corpus containing 13 Indian languagesQEDsubtitles for educational videos and lecturesRFDeclarations of Government Policy by the Swedish GovernmentSalometranslations of Oscar Wilde’s SaloméSamanantarLargest Indic corpora collectionsardwarethe sardware corpusSCB_MT_EN_THSciELOArtciles from SciELOSETIMESA parallel corpus of the Balkan languagesSPCStockholm Parallel CorporaStanfordNLP-NMTStanfordNLP-NMTSUMMAcorpus from SUMMA projectTanzilA collection of Quran translationsTatoebaA DB of translated sentencesTED2013TED talk subtitlesTED2020a crawl of nearly 4000 TED/TEDX transcripts TedTalksCroatian-English parallel corpusTEPThe Tehran English-Persian subtitle corpustico-19Translation Initiative for COVID-19TildeMODEL Multilingual Open Data for European Languagestldr-pagestranslatewikiUbuntuUbuntu localization filesUNPCThe United Nations Parallel CorpusWikiMatrixParallel sentences extracted from Wikipediawikimediawikimedia article translation systemWikipediatranslated sentences from WikipediaWikiSourcesmall en-sv sample onlyWikiTitlesparallel wikipedia titlesWMT-NewsA parallel corpus of News Test SetsXhosaNavySouth African Navy parallel corpusXLEntCCAligned, CCMatrix, and WikiMatrix parallel sentences