The MERLIN corpus is a written learner corpus for Czech, German, and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR) with authentic learner data. The corpus contains learner texts produced in standardized language certifications covering CEFR levels A1-C1. The MERLIN annotation scheme includes a wide range of language characteristics that provide researchers with concrete examples of learner performance and progress across multiple proficiency levels.
The MERLIN corpus is available from
The following file bundles are available:
merlin-docs-v1.0.zip
contains documentation about the MERLIN transcription, rating, and annotation processes.
[ERCC download] [GitLab download] [Source code repository]
merlin-text-v1.0.zip
contains human-readable plain text versions of the learner texts, metadata, and target hypotheses. No further manual or automatic annotation is included.
[ERCC download] [GitLab download] [Source code repository]
merlin-metadata-v1.0.zip
contains the metadata, CEFR ratings, indicators, and complexity measures.
[ERCC download] [GitLab download] [Source code repository]
merlin-tasks-v1.0.zip
contains information about the tasks included in the MERLIN corpus.
[ERCC download] [GitLab download] [Source code repository]
merlin-paula-v1.0.zip
contains the complete corpus in PAULA format with all metadata and annotation.
[ERCC download] [GitLab download] [Source code repository]
merlin-annis-v1.0.zip
contains the complete corpus in ANNIS format with all metadata and annotation.
[ERCC download] [GitLab download] [Source code repository]
merlin-exmaralda-v1.0.zip
contains the MERLIN corpus in Exmaralda format with all annotation that can be displayed in a grid/table format. The metadata and dependency annotation are not included.
[ERCC download] [GitLab download] [Source code repository]
merlin-solr-v1.0.zip
contains the solr <add><doc/></add>
XML files for adding documents to the solr index as used in the simple search interface in the MERLIN platform.
[ERCC download] [GitLab download] [Source code repository]
merlin-relannis-v1.0.zip
contains the complete corpus in relANNIS format with all metadata and annotation.
[ERCC download] [GitLab download] [Source code repository]