Show simple item record

 
dc.contributor.author Glaznieks, Aivars
dc.contributor.author Frey, Jennifer-Carmen
dc.contributor.author Nicolas, Lionel
dc.contributor.author Abel, Andrea
dc.contributor.author Vettori, Chiara
dc.date.accessioned 2021-05-05T15:33:00Z
dc.date.available 2021-05-05T15:33:00Z
dc.date.issued 2021-05-05
dc.identifier.uri http://hdl.handle.net/20.500.12124/30
dc.description The Kolipsi-2 Corpus is a written learner corpus of German and Italian L2 speakers originating from South Tyrol (Italy). It has been developed as a by-product of the KOLIPSI II project, a replication study of the KOLIPSI project on “South-Tyrolean pupils and the second language: a linguistic and socio-psychological investigation” that was conducted 7 years after the original study. The data collection for this second edition took place in spring 2014 and is based on two standardized tests for written productions, that were aligned with the original tasks for the KOLIPSI study. However, while the first task remained the same for both editions, the second task was slightly adapted. The two tasks consisted of (1) writing an e-mail to a friend retelling a given event at the supermarket based on a picture story (narrative text genre) and (2) writing an e-mail about negative aspects of social-media chats prompted by a letter to the editor in a youth magazine (argumentative text genre). For both tasks a time limit of 25 minutes was fixed and no additional reference material was allowed. CEFR levels have been assigned to all L2 learner texts, providing a holistic score as well as evaluations of coherence, sociolinguistic appropriateness, lexical accuracy, lexical diversity, grammar and orthography. Person-related metadata provides information about: - the writer's language background, including L1(s), the L1(s) of mother and father, and a self-declared language group affiliation as well as the pre-dominant language spoken in the area the writer is residing in - the writer's results from an additional language test in the L2 (dialang test) - the writer's competence in the local German dialect (for students with L1 Italian only) - the writer's age, gender and socio-economic status - whether the writer lives in an urban or rural environment - the language, location and type of school the writer attended - an anonymous identifier for the writer's school class to account for class effects All texts have been transcribed manually adding transcription annotations that reflect surface features of the text, such as the graphical arrangement, and include error annotation on the orthographic level. In addition to that, all texts were automatically annotated, adding tokenisation, sentence splitting, POS-tagging and lemmatization using an orthographically corrected target version of the corpus. Kolipsi-1 L2 belongs to the Kolipsi Corpus Family, a series of related learner corpora collected in South Tyrolean upper secondary schools. The corpora of the Kolipsi Corpus Family contain Italian and German learner texts that were collected in the course of the KOLIPSI project in 2007/2008 (Kolipsi-1) and a follow-up study in 2014/2015 (Kolipsi-2). The aim of both corpus studies was to analyse the second language competences of South-Tyrolean pupils from upper secondary schools (between 16-18 years old), and to contextualize the results of such investigation by commenting on crucial sociolinguistic and psychosocial aspects that influence it. The results of the follow-up study should be compared to the results of the original KOLIPSI project.
dc.language.iso ita
dc.language.iso deu
dc.publisher Institute for Applied Linguistics, Eurac Research
dc.relation.isreplacedby http://hdl.handle.net/20.500.12124/66
dc.rights CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)
dc.rights.uri https://gitlab.inf.unibz.it/commul/var/eurac-licenses/-/raw/v1.0/EULA-CLARIN-ACA-BY-NC-NORED.md
dc.rights.label ACA
dc.source.uri https://www.porta.eurac.edu/lci/kolipsi-family/
dc.subject L2 corpora
dc.subject learner corpus
dc.subject student essay
dc.subject argumentative essay
dc.subject picture story
dc.subject South Tyrol
dc.title Kolipsi-2 Corpus v1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Learner Language
demo.uri https://commul.eurac.edu/annis/kolipsi
contact.person Aivars Glaznieks porta@eurac.edu Institute for Applied Linguistics, Eurac Research
contact.person Jennifer-Carmen Frey porta@eurac.edu Institute for Applied Linguistics, Eurac Research
size.info 2763 texts
size.info 500 000 tokens
files.size 92121492
files.count 8


 Files in this item

 Download all files in item (87.85 MB)
This item is
Academic Use
and licensed under:
CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)
Redistribution Not Permitted Attribution Required Noncommercial
Icon
Name
README.html
Size
8.77 KB
Format
HTML
MD5
00a5a29016d6b978c5de09890eadd2a0
 Download file
Icon
Name
CHANGELOG.html
Size
1.8 KB
Format
HTML
MD5
78719d1737eeabecb44b6a42e3f96ad7
 Download file
Icon
Name
docs-v1.0.zip
Size
641.61 KB
Format
application/zip
Description
documentation on the corpus such as transcription guidelines, annotation guidelines and task instructions or proficiency level descriptors.
MD5
9e47c3d828e545db6724a29f25d7f542
 Download file
Icon
Name
metadata-v1.0.zip
Size
342.1 KB
Format
application/zip
Description
metadata on the corpus, the texts, tasks and authors in tab-separated format.
MD5
4e2dccc2b49e183576078a5d77188429
 Download file
Icon
Name
txt-v1.0.zip
Size
4.16 MB
Format
application/zip
Description
original and corrected plain text versions of the corpus
MD5
3150528525304390148b013a7473c116
 Download file
Icon
Name
xmlmind-v1.0.zip
Size
5.92 MB
Format
application/zip
MD5
9991eed302ac8d08102ab3577b4c8487
 Download file
Icon
Name
mmax2-v1.0.zip
Size
22.29 MB
Format
application/zip
Description
corpus version with stand-off annotations produced using the annotation tool MMAX2
MD5
4c68fd8b0285b62dec39432be5439df0
 Download file
Icon
Name
annis-v1.0.zip
Size
54.51 MB
Format
application/zip
Description
complete corpus in ANNIS format with all metadata and annotation
MD5
166516f5f2e34c846222d7d370e230b0
 Download file

Show simple item record