Copyright:
@InProceedings{roussis-EtAl:2022:LREC2, author = {Roussis, Dimitrios and Papavassiliou, Vassilis and Prokopidis, Prokopis and Piperidis, Stelios and Katsouros, Vassilis}, title = {SciPar: A Collection of Parallel Corpora from Scientific Abstracts}, booktitle = {Proceedings of the Thirteenth Language Resources and Evaluation Conference}, month = {June}, year = {2022}, address = {Marseille, France}, publisher = {European Language Resources Association}, pages = {2652--2657}, url = {https://aclanthology.org/2022.lrec-1.284} }Please acknowledge the original sources and providers of the data and also cite the following article if you use any part of the corpus in your own work: J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012
Languages | Bitexts | Number of files | Number of tokens | Sentence fragments |
---|---|---|---|---|
3 | 3 | 6 | 24.41M | 1.14M |
Please, select a language pair.
Please select a language pair. If you wish to download Opus resources, visit the website on desktop.
A note on formats: TMX files contain only unique translation units. Moses downloads include all non-empty alignment units including duplicates. Token counts for each language also include duplicate sentences and documents.