KoKo German L1 Learner Corpus v3

Name: KoKo German L1 Learner Corpus v3
License: https://gitlab.inf.unibz.it/commul/var/eurac-licenses/-/raw/v1.0/EULA-CLARIN-ACA-BY-NC-NORED.md

Abel, Andrea; Glaznieks, Aivars; Culy, Chris; Nicolas, Lionel; Stemle, Egon W.

Show simple item record

dc.contributor.author	Abel, Andrea
dc.contributor.author	Glaznieks, Aivars
dc.contributor.author	Culy, Chris
dc.contributor.author	Nicolas, Lionel
dc.contributor.author	Stemle, Egon W.
dc.date.accessioned	2019-09-19T14:27:45Z
dc.date.available	2019-09-19T14:27:45Z
dc.date.issued	2014-12
dc.identifier.uri	http://hdl.handle.net/20.500.12124/12
dc.description	The KoKo Corpus is an error-annotated learner corpus of L1 German speakers. It has been created with the aim to investigate and describe the writing skills of German-speaking secondary-school pupils at the end of their school career by analysing authentic texts produced in classrooms. The corpus consists of 1503 argumentative essays which contain manually performed transcription annotations and linguistic error annotations. Transcription annotations reflect surface features of the text, such as the graphical arrangement and self-corrections. Error annotations relate to the orthographic level (including punctuation errors), and a selection of the texts (n=597) also contain error annotations on the grammatical level. The corpus building process was guided by two goals: 1. describe writing skills at the transition from secondary school to university, 2. determine external factors that may influence the distribution of writing skills, such as the region, sociolinguistic (gender, age), socio-economic, and language-related biographical factors (L1, preferred variety of German, reading and writing habits, etc.). The pupils were selected from three different German-speaking areas: - North Tyrol (Austria), South Tyrol (Italy), and Thuringia (Germany). Classes were sampled randomly, using the size of the cities in which the schools were located (small vs. medium vs. big) and the type of school (providing general education vs. education specific to a particular profession) as strata for the sampling. Since data were collected during regular courses, the typical formation of secondary-school classes in the three regions is represented in the whole corpus. Most of the participants are German native speakers (n=1319, 82.7%). Person-related metadata provides information about: - writer's L1 - writer's gender - type of school the essay comes from - location of the school the essay comes from - grade attended at data collection In addition, the corpus is automatically annotated, including tokenisation, sentence splitting, POS-tagging and lemmatization.
dc.language.iso	deu
dc.publisher	Institute for Applied Linguistics, Eurac Research
dc.relation.isbasedon	https://gitlab.inf.unibz.it/commul/koko/data/bundle/-/tags/v3
dc.relation.isreferencedby	http://www.lrec-conf.org/proceedings/lrec2014/pdf/934_Paper.pdf
dc.relation.replaces	http://hdl.handle.net/20.500.12124/11
dc.relation.isreplacedby	http://hdl.handle.net/20.500.12124/77
dc.rights	CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)
dc.rights.uri	https://gitlab.inf.unibz.it/commul/var/eurac-licenses/-/raw/v1.0/EULA-CLARIN-ACA-BY-NC-NORED.md
dc.rights.label	ACA
dc.source.uri	http://www.korpus-suedtirol.it/KoKo.html
dc.subject	learner corpus
dc.subject	German varieties
dc.subject	students in secondary school
dc.subject	argumentative essays
dc.title	KoKo German L1 Learner Corpus v3
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
hidden	false
hasMetadata	false
has.files	yes
branding	Learner Language
demo.uri	https://commul.eurac.edu/annis/koko
contact.person	Corpus Manager clarin@eurac.edu Eurac Research CLARIN Centre (ERCC)
size.info	1503 texts
size.info	950,000 tokens
files.size	148995332
files.count	7

Files in this item

Download all files in item (142.09 MB)

This item is

Academic Use

and licensed under:
CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)

Name: README.html
Size: 6.86 KB
Format: HTML
Description: File bundle descriptions.
MD5: 7276506887ce8fda05bcb39ccba78238

Download file

Name: CHANGELOG.html
Size: 1.58 KB
Format: HTML
Description: CHANGELOG
MD5: 7df479e4300575e81829741ac887af8b

Download file

Name: docs-v3.zip
Size: 322.87 KB
Format: application/zip
Description: documentation.
MD5: 4e1c56cd4c53fe71586c649e9d114fe7

Download file

Name: xmlmind-v3.zip
Size: 3.65 MB
Format: application/zip
Description: transcribed corpus in the KoKo XML format, done with XMLmind.
MD5: 71ef290d7ad6521341fdfd23de99349a

Download file

Name: vrt-v3.zip
Size: 4.28 MB
Format: application/zip
Description: transcribed corpus in a text file with structural annotations and metadata.
MD5: febed956eea474c42c7b7c734b03dd8c

Download file

Name: mmax-v3.zip
Size: 98.45 MB
Format: application/zip
Description: corpus in MMax2 format with all annotation.
MD5: d5d441df7550f216f460397a18ad0276

Download file

Name: annis-v3.zip
Size: 35.39 MB
Format: application/zip
Description: complete corpus in ANNIS format with all metadata and annotation.
MD5: b96f6c1a13ce04716d682122edcbba8a

Download file

Show simple item record

Files in this item

Contact

Repository

More