Show simple item record

 
dc.contributor.author Abel, Andrea
dc.contributor.author Glaznieks, Aivars
dc.contributor.author Culy, Chris
dc.contributor.author Nicolas, Lionel
dc.contributor.author Stemle, Egon W.
dc.date.accessioned 2024-06-14T20:11:29Z
dc.date.available 2024-06-14T20:11:29Z
dc.date.issued 2024-06
dc.identifier.uri http://hdl.handle.net/20.500.12124/77
dc.description The KoKo Corpus is an error-annotated learner corpus of L1 German speakers. It has been created with the aim to investigate and describe the writing skills of German-speaking secondary-school pupils at the end of their school career by analysing authentic texts produced in classrooms. The corpus consists of 1503 argumentative essays which contain manually performed transcription annotations and linguistic error annotations. Transcription annotations reflect surface features of the text, such as the graphical arrangement and self-corrections. All texts are error annotated on the orthographic level (including punctuation errors) and a selection contains error annotations on the grammatical level (i.e. ANNIS sub-corpus KoKo_4_gram, n=597) and on the lexical level (i.e. ANNIS sub-corpus KoKo_4_lex, n=980). The corpus building process was guided by two goals: 1. describe writing skills at the transition from secondary school to university, 2. determine external factors that may influence the distribution of writing skills, such as the region, sociolinguistic (gender, age), socio-economic, and language-related biographical factors (L1, preferred variety of German, reading and writing habits, etc.). The pupils were selected from three different German-speaking areas: - North Tyrol (Austria), South Tyrol (Italy), and Thuringia (Germany). Classes were sampled randomly, using the size of the cities in which the schools were located (small vs. medium vs. big) and the type of school (providing general education vs. education specific to a particular profession) as strata for the sampling. Since data were collected during regular courses, the typical formation of secondary-school classes in the three regions is represented in the whole corpus. Most of the participants are German native speakers (n=1319, 82.7%). Person-related metadata provides information about: - writer’s L1 - writer’s gender - type of school the essay comes from - location of the school the essay comes from - grade attended at data collection In addition, the corpus is automatically annotated, including tokenisation, sentence splitting, POS-tagging and lemmatization.
dc.language.iso deu
dc.publisher Institute for Applied Linguistics, Eurac Research
dc.relation.isbasedon https://gitlab.inf.unibz.it/commul/koko/data/bundle/-/tags/v4
dc.relation.isreferencedby http://www.lrec-conf.org/proceedings/lrec2014/pdf/934_Paper.pdf
dc.relation.replaces http://hdl.handle.net/20.500.12124/12
dc.rights CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)
dc.rights.uri https://gitlab.inf.unibz.it/commul/var/eurac-licenses/-/raw/v1.0/EULA-CLARIN-ACA-BY-NC-NORED.md
dc.rights.label ACA
dc.source.uri http://www.korpus-suedtirol.it/KoKo.html
dc.subject learner corpus
dc.subject German varieties
dc.subject students in secondary school
dc.subject argumentative essays
dc.title KoKo German L1 Learner Corpus 4
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden false
hasMetadata false
has.files yes
branding Learner Language
demo.uri https://commul.eurac.edu/annis/koko
contact.person Corpus Manager clarin@eurac.edu Eurac Research CLARIN Centre (ERCC)
size.info 1503 texts
size.info 950,000 tokens
files.size 389420767
files.count 6


 Files in this item

 Download all files in item (371.38 MB)
This item is
Academic Use
and licensed under:
CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)
Redistribution Not Permitted Attribution Required Noncommercial
Icon
Name
README.html
Size
433.41 KB
Format
HTML
Description
README
MD5
0e3cd6a7e944b38990a14fc6b5870a31
 Download file
Icon
Name
CHANGELOG.html
Size
429.01 KB
Format
HTML
Description
CHANGELOG
MD5
829cb8f1ded7a09727d37a51ffd027c1
 Download file
Icon
Name
docs-v4.zip
Size
531.2 KB
Format
application/zip
Description
documentation.
MD5
802a2c732063223bb9c85ce0a083265a
 Download file
Icon
Name
xmlmind-v4.zip
Size
3.7 MB
Format
application/zip
Description
transcribed corpus in the KoKo XML.
MD5
85633d58d1c781a2d48739cb4945ced0
 Download file
Icon
Name
mmax-v4.zip
Size
107.34 MB
Format
application/zip
Description
corpus in MMax2 format with all annotations.
MD5
601e262d7c9404c16188ed075893cdf0
 Download file
Icon
Name
annis-v4.zip
Size
258.97 MB
Format
application/zip
Description
the individual corpora in ANNIS format.
MD5
32369d6fb68d4b470ae4908f7901e65b
 Download file

Show simple item record