Find the corpus you are looking for

Here you find the corpora listed by name. The ELRC and ELRA link will take you to their entire collections.

ada83Ada 83 manuals ALT20k Myanmar-English parallel sentences Anuvaad links for popular Indian languages BianetTranslated Turkish articles (tr, ku, en)bible-uedinCollection of Bible translations BooksA collection of translated literature CAPESThesis and dissertation abstracts CCAlignedParallel documents from Common Crawl CCMatrixParallel sentences from Common Crawl ChuBiCoresources for the Chuvash language DGTA collection of EU TMs provided by the JRC DocHPLT DOGCDocuments from the Catalan Goverment ECBEuropean Central Bank corpus ECDCEuropean Centre for Disease Prevention corpus EhuHacHizkuntzen Arteko Corpusa EiTB-ParCCParallel Corpus of Comparable News Elhuyarfoundation Elhuyar corpus ELITR-ECAEuropean Court of Auditors documents ELRA Collection ELRC Collection EMEAEuropean Medicines Agency documents EOPC EUbookshopdocuments from the EU bookshop EUconstThe European constitution EuroparlEuropean Parliament Proceedings EuroPatParallel corpus of patents FFRFon and French sentences FinlexLegislative and other judicial information of Finland fiskmoData from the fiskmö project giga-frenFrench-English Gigal-Word Corpus GlobalVoicesNews stories in various languages GNOMEGNOME localization files GoURMETParallel data from web crawls HPLTHPLT web crawled parallel sentences hrenWaCCroatian-English Parallel Web Corpus IITBIIT Bombay English-Hindi corpus infopankkiinfopankki.fi via the Open Data API InterdialectCorpus JESCJapanese-English Subtitle Corpus Joshua-IPCIndian-language from Wikipedia pages corpus JParaCrawlEnglish-Japanese parallel corpus JRC-Acquislegislative EU texts KDE4KDE4 localization files (v.2)KDEdocthe KDE manual corpus KFTTKyoto Free Translation Task corpus LinguaTools-WikiTitlesbilingual titles of Wikipedia articles liv4everLivonian 4-lingual parallel corpus MaCoCu MBSBelgisch Staatsblad corpus MDN_Web_DocsMDN web docs mematXhosa/English parallel data MIZANA large Persian-English corpus MontenegrinSubsMontenegrin movie subtitles Mozilla-I10n MultiCCAlignedPivot-based Bitexts from CCAligned MultiHPLTHPLT web crawled parallel sentences MultiMaCoCu MultiParaCrawlNon-English Bitexts from ParaCrawl MultiUNTranslated UN documents NeuLab-TedTalksTED talk subtitles News-CommentaryNews Commentaries NLLBbased on Meta AI metadata Nunavut_Hansard OfisPublikBreton - French parallel texts OpenOfficethe OpenOffice.org corpus OpenSubtitlestranslated subtitles ParaCrawlParallel corpora from Web Crawls ParaCrawl-Bonus ParIceEnglish-Icelandic parallel corpus PHPthe PHP manual corpus pmindiaparallel corpus containing 13 Indian languages QEDsubtitles for educational videos and lectures RFDeclarations of Government Policy by the Swedish Government Salometranslations of Oscar Wilde’s Salomé SamanantarLargest Indic corpora collection sardwarethe sardware corpus SCB_MT_EN_TH SciELOArtciles from SciELO SETIMESA parallel corpus of the Balkan languages SPCStockholm Parallel Corpora StanfordNLP-NMTStanfordNLP-NMT SUMMAcorpus from SUMMA project TanzilA collection of Quran translations TatoebaA DB of translated sentences TED2013TED talk subtitles TED2020a crawl of nearly 4000 TED/TEDX transcripts TedTalksCroatian-English parallel corpus TEPThe Tehran English-Persian subtitle corpus tico-19Translation Initiative for COVID-19 TildeMODEL Multilingual Open Data for European Languages tldr-pages translatewiki UbuntuUbuntu localization files UNPCThe United Nations Parallel Corpus WikiMatrixParallel sentences extracted from Wikipedia wikimediawikimedia article translation system Wikipediatranslated sentences from Wikipedia WikiSourcesmall en-sv sample only WikiTitlesparallel wikipedia titles WMT-NewsA parallel corpus of News Test Sets XhosaNavySouth African Navy parallel corpus XLEntCCAligned, CCMatrix, and WikiMatrix parallel sentences