The MERLIN corpus is a written learner corpus for Czech, German, and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR) with authentic learner data. The corpus contains learner texts produced in standardized language certifications covering CEFR levels A1-C1. The MERLIN annotation scheme includes a wide range of language characteristics that provide researchers with concrete examples of learner performance and progress across multiple proficiency levels.
The MERLIN corpus is available from
The following file bundles are available:
merlin-docs-v1.1.zip
contains documentation about
the MERLIN transcription, rating, and annotation processes.
[ERCC
download] [GitLab
download] [Source
code repository]
merlin-text-v1.1.zip
contains human-readable plain
text versions of the learner texts, metadata, and target hypotheses. No
further manual or automatic annotation is included.
[ERCC
download] [GitLab
download] [Source
code repository]
merlin-metadata-v1.1.zip
contains the metadata, CEFR
ratings, indicators, and complexity measures.
[ERCC
download] [GitLab
download] [Source
code repository]
merlin-tasks-v1.1.zip
contains information about the
tasks included in the MERLIN corpus.
[ERCC
download] [GitLab
download] [Source
code repository]
merlin-paula-v1.1.zip
contains the complete corpus
in PAULA format with all metadata and annotation.
[ERCC
download] [GitLab
download] [Source
code repository]
merlin-relannis-v1.1.1.zip
contains the complete
corpus in relANNIS format with all metadata and annotation.
[ERCC
download] [GitLab
download] [Source
code repository]
merlin-annis-v1.1.zip
(discontinued).
[Source
code repository]
merlin-exmaralda-v1.1.zip
contains the MERLIN corpus
in Exmaralda format with all annotation that can be displayed in a
grid/table format. The metadata and dependency annotation are not
included.
[ERCC
download] [GitLab
download] [Source
code repository]
merlin-solr-v1.1.zip
contains the solr
<add><doc/></add>
XML files for adding
documents to the solr index as used in the simple search interface in
the MERLIN platform.
[ERCC
download] [GitLab
download] [Source
code repository]