KoKo German L1 Learner Corpus v2

Name: KoKo German L1 Learner Corpus v2
License: https://gitlab.inf.unibz.it/commul/var/eurac-licenses/-/raw/v1.0/EULA-CLARIN-ACA-BY-NC-NORED.md

Abel, Andrea; Glaznieks, Aivars; Culy, Chris

KoKo German L1 Learner Corpus v2

Learner Language

Authors: Abel, Andrea ; Glaznieks, Aivars ; Culy, Chris

Item identifier: http://hdl.handle.net/20.500.12124/11

Project URL: http://www.korpus-suedtirol.it/KoKo.html

Is Based On: https://gitlab.inf.unibz.it/commul/koko/data/bundle/-/tags/v2

Referenced by: http://apples.jyu.fi/article/abstract/305
http://www.lrec-conf.org/proceedings/lrec2014/pdf/934_Paper.pdf

Date issued: 2012-12

Type: corpus

Size: 1503 texts, 950,000 tokens

Language(s): German

Description: The KoKo Corpus is an error-annotated learner corpus of L1 German speakers. It has been created with the aim to investigate and describe the writing skills of German-speaking secondary-school pupils at the end of their school career by analysing authentic texts produced in classrooms. The corpus consists of 1503 argumentative essays which contain manually performed transcription annotations and linguistic error annotations. Error annotation relates to the orthographic level only. Transcription annotations reflect surface features of the text, such as the graphical arrangement and self-corrections. The corpus building process was guided by two goals: 1. describe writing skills at the transition from secondary school to university, 2. determine external factors that may influence the distribution of writing skills, such as the region, sociolinguistic (gender, age), socio-economic, and language-related biographical factors (L1, preferred variety of German, reading and writing habits, etc.). The pupils were selected from three different German-speaking areas: - North Tyrol (Austria), South Tyrol (Italy), and Thuringia (Germany). Classes were sampled randomly, using the size of the cities in which the schools were located (small vs. medium vs. big) and the type of school (providing general education vs. education specific to a particular profession) as strata for the sampling. Since data were collected during regular courses, the typical formation of secondary-school classes in the three regions is represented in the whole corpus. Most of the participants are German native speakers (n=1319, 82.7%). Person-related metadata provides information about: - writer's L1 - writer's gender - type of school the essay comes from - location of the school the essay comes from - grade attended at data collection In addition, the corpus is automatically annotated, including tokenisation, sentence splitting, POS-tagging and lemmatization.

Publisher: Institute for Applied Linguistics, Eurac Research

Subject(s): learner corpus German varieties students in secondary school argumentative essays

Collection(s): Eurac Research: Learner Language

This item is replaced by a newer submission:

http://hdl.handle.net/20.500.12124/12

Show full item record

Files in this item

Download all files in item (8.1 MB)

This item is

Academic Use

and licensed under:
CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)

Name: README.html
Size: 6.4 KB
Format: HTML
Description: File bundle descriptions.
MD5: 9c00f173f336be41603c94bacf69692b

Download file

Name: CHANGELOG.html
Size: 1.29 KB
Format: HTML
Description: CHANGELOG
MD5: a778d9ac379c3a9e9cff649ac1b7b54b

Download file

Name: docs-v2.zip
Size: 164.9 KB
Format: application/zip
Description: documentation.
MD5: 694e1dd4965014b8a236d89e0db8d490

Download file

Name: xmlmind-v2.zip
Size: 3.65 MB
Format: application/zip
Description: transcribed corpus in the KoKo XML format, done with XMLmind.
MD5: 27f2b334d016b6367c1f2e04b537055d

Download file

Name: vrt-v2.zip
Size: 4.28 MB
Format: application/zip
Description: transcribed corpus in a text file with structural annotations and metadata.
MD5: 96a874cef4c9d8221004ef94f0d43a60

Download file

KoKo German L1 Learner Corpus v2

Files in this item

Contact

Repository

More