Kolipsi-1 Corpus v1.1

Name: Kolipsi-1 Corpus v1.1
License: https://gitlab.inf.unibz.it/commul/var/eurac-licenses/-/raw/v1.0/EULA-CLARIN-ACA-BY-NC-NORED.md

Glaznieks, Aivars; Frey, Jennifer-Carmen; Abel, Andrea; Vettori, Chiara; Nicolas, Lionel

Show simple item record

dc.contributor.author	Glaznieks, Aivars
dc.contributor.author	Frey, Jennifer-Carmen
dc.contributor.author	Abel, Andrea
dc.contributor.author	Vettori, Chiara
dc.contributor.author	Nicolas, Lionel
dc.date.accessioned	2023-02-15T09:07:59Z
dc.date.available	2023-02-15T09:07:59Z
dc.date.issued	2023-02-15
dc.identifier.uri	http://hdl.handle.net/20.500.12124/64
dc.description	The Kolipsi-1 L2 is a written learner corpus of German and Italian L2 speakers originating from South Tyrol (Italy). It has been developed as a by-product of the KOLIPSI project “South-Tyrolean pupils and the second language: a linguistic and socio-psychological investigation”. In addition, data from L1 pupils were collected exclusively for the creation of a native speaker reference corpus. The data collection took place in autumn 2007 and is based on two standardized tests for written productions. The two tasks consisted of (1) writing an e-mail to a friend retelling a given event at the supermarket based on a picture story (narrative text genre) and (2) in writing a letter to a friend discussing holiday plans (argumentative text genre). For both tasks a time limit of 30 minutes was fixed and no additional reference material was allowed. CEFR levesl have been assigned to all L2 learner texts, providing a holistic score as well as evaluations of coherence, lexis, grammar and sociolinguistic appropriateness. Person-related metadata provides information about: - the writer's language background, including L1(s), the L1(s) of mother and father, and a self-declared language group affiliation - the writer's age, gender and socio-economic status - the writer's district of residence and whether he lives in an urban or rural environment - the language, location and type of school the writer attended - whether the writer passed the local bilinguality exam or not - an anonymous identifier for the writer's school class and L2 teacher to account for class effects All texts have been transcribed manually adding transcription annotations that reflect surface features of the text, such as the graphical arrangement, and include error annotation on the orthographic level. In addition to that, all texts were automatically annotated, adding tokenisation, sentence splitting, POS-tagging and lemmatization using an orthographically corrected target version of the corpus. Kolipsi-1 L2 belongs to the Kolipsi Corpus Family, a series of related learner corpora collected in South Tyrolean upper secondary schools. The corpora of the Kolipsi Corpus Family contain Italian and German learner texts that were collected in the course of the KOLIPSI project in 2007/2008 (Kolipsi-1) and a follow-up study in 2014/2015 (Kolipsi-2). The aim of both corpus studies was to analyse the second language competences of South-Tyrolean pupils from upper secondary schools (between 16-18 years old), and to contextualize the results of such investigation by commenting on crucial sociolinguistic and psychosocial aspects that influence it. The results of the follow-up study should be compared to the results of the original KOLIPSI project.
dc.language.iso	deu
dc.language.iso	ita
dc.publisher	Institute for Applied Linguistics, Eurac Research
dc.relation.replaces	http://hdl.handle.net/20.500.12124/26
dc.rights	CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)
dc.rights.uri	https://gitlab.inf.unibz.it/commul/var/eurac-licenses/-/raw/v1.0/EULA-CLARIN-ACA-BY-NC-NORED.md
dc.rights.label	ACA
dc.source.uri	https://www.porta.eurac.edu/lci/kolipsi-family/
dc.subject	L2
dc.subject	Learner corpora
dc.subject	South Tyrol
dc.subject	argumentative essay
dc.subject	students
dc.subject	high school
dc.subject	upper secondary school
dc.subject	picture story
dc.subject	opinion text
dc.title	Kolipsi-1 Corpus v1.1
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
hidden	false
hasMetadata	false
has.files	yes
branding	Learner Language
demo.uri	https://commul.eurac.edu/annis/kolipsi
contact.person	Aivars Glaznieks porta@eurac.edu Institute for Applied Linguistics, Eurac Research
contact.person	Jennifer-Carmen Frey porta@eurac.edu Institute for Applied Linguistics, Eurac Research
size.info	2426 texts
size.info	500 000 tokens
files.size	104671336
files.count	8

Files in this item

Download all files in item (99.82 MB)

This item is

Academic Use

and licensed under:
CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)

Name: README.html
Size: 8.81 KB
Format: HTML
MD5: bd18d0ec6e59ec6b30b707ff68cff550

Download file

Name: CHANGELOG.html
Size: 2.51 KB
Format: HTML
Description: Changelog for corpus versions
MD5: 8cbae60227a971753893deca00fb8e6e

Download file

Name: docs-v1.1.zip
Size: 785.87 KB
Format: application/zip
Description: documentation such as transcription and annotation guidelines and task prompts
MD5: 514bfb41e15ee919b6b8243ea1a1ac7f

Download file

Name: txt-v1.1.zip
Size: 3.84 MB
Format: application/zip
Description: plain text versions of the corpus in original and corrected target form
MD5: a7e369d34dab70300b8643c1583e1bd1

Download file

Name: xmlmind-v1.1.zip
Size: 6.38 MB
Format: application/zip
Description: xmlfiles with transcribed student’s texts and manual annotations, using the stylesheets present in the folder and the xml editor “xmlmind”
MD5: c3ef4a8de950d43a8beef253cf856f88

Download file

Name: mmax2-v1.1.zip
Size: 27.68 MB
Format: application/zip
Description: corpus version with stand-off annotations produced using the annotation tool MMAX2
MD5: e9aeacdd0c78f4eba25d365ddc5d2dbd

Download file

Name: annis-v1.1.zip
Size: 60.99 MB
Format: application/zip
Description: complete corpus in ANNIS format with all metadata and annotation
MD5: 510701397fba03646405cff827eb5a62

Download file

Name: metadata-v1.1.zip
Size: 156.36 KB
Format: application/zip
Description: metadata on the corpus, the texts, tasks and authors in tab-separated format
MD5: df3c7c0f5fd9cf99ba4a51432f89fd7c

Download file

Show simple item record

Files in this item

Contact

Repository

More