Show simple item record

 
dc.contributor.author Glaznieks, Aivars
dc.contributor.author Frey, Jennifer-Carmen
dc.contributor.author Abel, Andrea
dc.contributor.author Vettori, Chiara
dc.contributor.author Nicolas, Lionel
dc.date.accessioned 2021-05-05T15:52:10Z
dc.date.available 2021-05-05T15:52:10Z
dc.date.issued 2021-05-05
dc.identifier.uri http://hdl.handle.net/20.500.12124/26
dc.description The Kolipsi-1 L2 is a written learner corpus of German and Italian L2 speakers originating from South Tyrol (Italy). It has been developed as a by-product of the KOLIPSI project “South-Tyrolean pupils and the second language: a linguistic and socio-psychological investigation”. In addition, data from L1 pupils were collected exclusively for the creation of a native speaker reference corpus. The data collection took place in autumn 2007 and is based on two standardized tests for written productions. The two tasks consisted of (1) writing an e-mail to a friend retelling a given event at the supermarket based on a picture story (narrative text genre) and (2) in writing a letter to a friend discussing holiday plans (argumentative text genre). For both tasks a time limit of 30 minutes was fixed and no additional reference material was allowed. CEFR levesl have been assigned to all L2 learner texts, providing a holistic score as well as evaluations of coherence, lexis, grammar and sociolinguistic appropriateness. Person-related metadata provides information about: - the writer's language background, including L1(s), the L1(s) of mother and father, and a self-declared language group affiliation - the writer's age, gender and socio-economic status - the writer's district of residence and whether he lives in an urban or rural environment - the language, location and type of school the writer attended - whether the writer passed the local bilinguality exam or not - an anonymous identifier for the writer's school class and L2 teacher to account for class effects All texts have been transcribed manually adding transcription annotations that reflect surface features of the text, such as the graphical arrangement, and include error annotation on the orthographic level. In addition to that, all texts were automatically annotated, adding tokenisation, sentence splitting, POS-tagging and lemmatization using an orthographically corrected target version of the corpus. Kolipsi-1 L2 belongs to the Kolipsi Corpus Family, a series of related learner corpora collected in South Tyrolean upper secondary schools. The corpora of the Kolipsi Corpus Family contain Italian and German learner texts that were collected in the course of the KOLIPSI project in 2007/2008 (Kolipsi-1) and a follow-up study in 2014/2015 (Kolipsi-2). The aim of both corpus studies was to analyse the second language competences of South-Tyrolean pupils from upper secondary schools (between 16-18 years old), and to contextualize the results of such investigation by commenting on crucial sociolinguistic and psychosocial aspects that influence it. The results of the follow-up study should be compared to the results of the original KOLIPSI project.
dc.language.iso deu
dc.language.iso ita
dc.publisher Institute for Applied Linguistics, Eurac Research
dc.rights CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)
dc.rights.uri https://gitlab.inf.unibz.it/commul/var/eurac-licenses/-/raw/v1.0/EULA-CLARIN-ACA-BY-NC-NORED.md
dc.rights.label ACA
dc.source.uri https://www.porta.eurac.edu/lci/kolipsi-family/
dc.subject L2
dc.subject Learner corpora
dc.subject South Tyrol
dc.subject argumentative essay
dc.subject students
dc.subject high school
dc.subject upper secondary school
dc.subject picture story
dc.subject opinion text
dc.title Kolipsi-1 Corpus v1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden false
hasMetadata false
has.files yes
branding Learner Language
demo.uri https://commul.eurac.edu/annis/kolipsi
contact.person Aivars Glaznieks porta@eurac.edu Institute for Applied Linguistics, Eurac Research
contact.person Jennifer-Carmen Frey porta@eurac.edu Institute for Applied Linguistics, Eurac Research
size.info 2426 texts
size.info 500 000 tokens
files.size 102328150
files.count 8


 Files in this item

 Download all files in item (97.59 MB)
This item is
Academic Use
and licensed under:
CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)
Redistribution Not Permitted Attribution Required Noncommercial
Icon
Name
README.html
Size
8.73 KB
Format
HTML
MD5
b490276ad1fe7f8de8d9cae1bd31cb13
 Download file
Icon
Name
CHANGELOG.html
Size
1.8 KB
Format
HTML
MD5
94bf32ec6c1bc7197a94d1a481414fa8
 Download file
Icon
Name
docs-v1.0.zip
Size
785.7 KB
Format
application/zip
Description
documentation on the corpus such as transcription guidelines, annotation guidelines and task instructions or proficiency level descriptors
MD5
164d2f6086598a08a7582fbc4d24a22d
 Download file
Icon
Name
metadata-v1.0.zip
Size
449.99 KB
Format
application/zip
Description
metadata on the corpus, the texts, tasks and authors in tab-separated format.
MD5
9c8e4de9e0e16e6d0da940447c7910f1
 Download file
Icon
Name
txt-v1.0.zip
Size
3.85 MB
Format
application/zip
Description
plain text versions of the corpus in original and corrected target form
MD5
620d904f3c2f1df8f4f05f327b875152
 Download file
Icon
Name
xmlmind-v1.0.zip
Size
6.13 MB
Format
application/zip
Description
transcribed corpus in an custom XML format with inline annotations as described in docs
MD5
408bf984afc882bef30c9b0476eeeb07
 Download file
Icon
Name
mmax2-v1.0.zip
Size
27 MB
Format
application/zip
Description
corpus version with stand-off annotations produced using the annotation tool MMAX2
MD5
18a24fc8c620175527826807a356bc80
 Download file
Icon
Name
annis-v1.0.zip
Size
59.39 MB
Format
application/zip
Description
complete corpus in ANNIS format with all metadata and annotation
MD5
9b66a67ae7d190febdcce967d8723f24
 Download file

Show simple item record