Show simple item record

 
dc.contributor.author Glaznieks, Aivars
dc.contributor.author Frey, Jennifer-Carmen
dc.contributor.author Stopfner, Maria
dc.contributor.author Zanasi, Lorenzo
dc.contributor.author Nicolas, Lionel
dc.date.accessioned 2020-07-06T10:24:27Z
dc.date.available 2020-07-06T10:24:27Z
dc.date.issued 2020-12-18
dc.identifier.uri http://hdl.handle.net/20.500.12124/25
dc.description LEONIDE is a longitudinal corpus of student essays documenting the language competences and writing development of lower secondary school students in three different languages. The corpus contains 2.512 texts from 163 pupils, who participated in the project “One school, many languages” conducted in eight schools in the officially multilingual Italian province of South Tyrol / Alto Adige (Zanasi & Stopfner, 2018). The aim of the project was to document the development of the pupils' plurilingual linguistic and communicative skills by collecting oral and written language samples in Italian, German and English, in order to obtain a global view of their individual linguistic repertoire. LEONIDE contains all the texts written by the participating students during the course of the project, the overall size of the corpus amounts to ca. 240.000 tokens. The texts were collected over the span of 3 consecutive years (2015-2018) in public middle schools (i.e. lower secondary school, grade 6 to grade 8). The pupils were 11 years old at the beginning of the data collection and 13 years old at the end. In each grade, two written texts were collected that differ with respect to genre: the first text was elicited using a picture story re-telling task; the second text is an opinion text on different aspects related to the pupils’ life and public discourse. For each genre and each grade, the corpus provides texts in the three languages German, Italian and English. In order to reflect the school system of the Province of South Tyrol / Alto Adige, about half of the texts was collected in four schools in which German is the main language of teaching and Italian is taught as L2. The other half of the texts was collected in four schools in which Italian is the main language of teaching and German is taught as L2. In all schools, English is taught as L3 (i.e. as a foreign language at school). Subdivided by language, the corpus contains 844 Italian, 833 German and 835 English texts. Manual annotation: The corpus is fully anonymised and annotated with target hypotheses correcting orthography errors in the text as well as annotations on structural elements (paragraphs, line breaks, bullet points, symbols or emoticons etc.), foreign word insertions and transcript surface features (e.g. deletions, corrections or insertions of the student, unreadable or ambiguous items). Automatic annotation: Automatic linguistic annotation included sentence splitting, tokenisation, lemmatisation and part-of-speech-tagging. Text metadata: The corpus provides a series of relevant person-related metadata (e.g. age, gender, first language(s), school and possible special needs of the students) as well as task-related metadata (e.g. task year, text genre, etc.) Usage: As the corpus documents the development of plurilingual competences of individual learners over a period of three years, it will allow both quantitative research on the characteristics of young learners’ language over a relatively long period, as well as investigations of the development of individuals taking into account a wide range of person related metadata. In addition, it allows contrastive analyses of the young learners’ progress in their L1, L2 and L3. Availability: The corpus will be available for corpus queries via an ANNIS search interface and as download for academic purposes (ACA-BY-NC-NORED 1.0) on the Eurac Research Clarin Centre by the end of 2020.  References: Zanasi, L. & Stopfner, M. (2018). Rilevare, osservare, consultare. Metodi e strumenti per l’analisi del plurilinguismo nella scuola secondaria di primo grado. In C. M. Coonan, A. Bier Ada & E. Ballarin (Ed.), La didattica delle lingue nel nuovo millennio. Le sfide dell’internazionalizzazione (pp. 135-148). Edizioni Ca’Foscari. http://doi.org/10.30687/978-88-6969-227-7/009 Glaznieks, A., Frey, J.-C., Stopfner, M., Zanasi, L. & Nicolas, L. (accepted): LEONIDE: A longitudinal trilingual corpus of young learners of Italian, German and English. In: International Journal of Learner Corpus Linguistics.
dc.language.iso deu
dc.language.iso ita
dc.language.iso eng
dc.publisher Institute for Applied Linguistics, Eurac Research
dc.relation.isbasedon https://gitlab.inf.unibz.it/commul/leonide/data/bundle/-/tags/v1.1
dc.relation.isreferencedby https://doi.org/10.1075/ijlcr.21004.gla
dc.rights CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)
dc.rights.uri https://gitlab.inf.unibz.it/commul/var/eurac-licenses/-/raw/v1.0/EULA-CLARIN-ACA-BY-NC-NORED.md
dc.rights.label ACA
dc.source.uri http://sms-project.eurac.edu/
dc.subject multilingualism
dc.subject evaluation
dc.subject language competences
dc.subject learner corpus
dc.subject L1
dc.subject L2
dc.subject student essays
dc.subject picture story
dc.subject opinion texts
dc.subject argumentative essay
dc.title LEONIDE - Longitudinal Learner Corpus in Italiano, Deutsch and English 1.1
dc.type corpus
dc.description.version 1.1
metashare.ResourceInfo#ContentInfo.mediaType text
hidden false
hasMetadata false
has.files yes
branding Learner Language
demo.uri https://commul.eurac.edu/annis/leonide
contact.person Corpus Manager clarin@eurac.edu Eurac Research
contact.person Aivars Glaznieks porta@eurac.edu Institute for Applied Linguistics, Eurac Research
sponsor Internal, Autonomous province of South Tyrol/Alto Adige - One school many languages Other
size.info 2510 texts
size.info 240 000 tokens
files.size 46893327
files.count 9


 Files in this item

 Download all files in item (44.72 MB)
This item is
Academic Use
and licensed under:
CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)
Redistribution Not Permitted Attribution Required Noncommercial
Icon
Name
README.html
Size
8.44 KB
Format
HTML
MD5
929b961cadf4e4609e913331537748a0
 Download file
Icon
Name
CHANGELOG.html
Size
2.65 KB
Format
HTML
Description
version changes
MD5
defd0493ac00b313c048624f8b507b7e
 Download file
Icon
Name
docs-v1.1.zip
Size
301.9 KB
Format
application/zip
Description
documentation (transcription guidelines, annotation guidelines, file format)
MD5
bbeb7302f85ba8f601af0555309cbfa8
 Download file
Icon
Name
metadata-v1.1.zip
Size
29.25 KB
Format
application/zip
Description
metadata for copurs, texts and authors in tab-separated format
MD5
acbf8691622aa3755c8e1c31ee103135
 Download file
Icon
Name
txt-v1.1.zip
Size
3.34 MB
Format
application/zip
Description
plain text versions of corpus
MD5
d4cd2a2aad407a253ba2b49621de6e58
 Download file
Icon
Name
pepper-xml-v1.1.zip
Size
6.52 MB
Format
application/zip
Description
transcribed and cleaned corpus in the Pepper XML format
MD5
1845f7e8bb799b61b69194bfd6c4ffe9
 Download file
Icon
Name
pepper-xml-lines-v1.1.zip
Size
6.65 MB
Format
application/zip
Description
transcribed and cleaned corpus in the Pepper XML format including original line endings
MD5
d0130e847465da1bd46bea104cd80faa
 Download file
Icon
Name
transcanno-tei-v1.1.zip
Size
1.88 MB
Format
application/zip
Description
the transcribed corpus in an TEI XML format exported from the transcription software Transc&Anno
MD5
68d299b780283eeed7151100a443661b
 Download file
Icon
Name
annis-v1.1.zip
Size
25.99 MB
Format
application/zip
Description
complete corpus in ANNIS format with metadata and annotation
MD5
03395c264b6e4d06da1f26d864b6e827
 Download file

Show simple item record