LEONIDE - Learner Corpus in Italiano, Deutsch & English v1.0 Description LEONIDE is a collection of longitudinal learner data produced in the three languages Italian, German and English (Longitudinal lEarner cOrpus iN Italiano, Deutsch, English). The data was collected in the project “One school, many languages”, conducted in eight schools in the officially multilingual Italian province of South Tyrol – Alto Adige, with the aim to document the development of plurilingual competences of lower secondary school pupils obtaining a global view of their individual linguistic repertoire. LEONIDE itself contains around 2.500 texts from 163 pupils, who participated in the project. While the overall size of the corpus amounts to around 240.000 tokens, the corpus represents written productions from different years of the pupils lower secondary school career as well as texts from different genres (opinion texts and picture story retelling) and languages (German, Italian and English). Subdivided by language, the corpus contains 850 Italian, 849 German and 844 English texts. All texts contain manually performed transcription annotations and linguistic error annotations. Transcription annotations reflect surface features of the text, such as the graphical arrangement and self-corrections. Error annotations relate to the orthographic level only. Person-related metadata provides information about: writer’s L1(s) writer’s gender writer’s age writer’s language assessment scores for each of the three languages writer’s school id (for class effects) In addition, the corpus is automatically annotated, including tokenisation, sentence splitting, POS-tagging and lemmatization. References: Zanasi, L. & Stopfner, M. (2018). Rilevare, osservare, consultare. Metodi e strumenti per l’analisi del plurilinguismo nella scuola secondaria di primo grado. In C. M. Coonan, A. Bier Ada & E. Ballarin (Eds.), La didattica delle lingue nel nuovo millennio. Le sfide dell’internazionalizzazione. Edizioni Ca’Foscari, pp. 135-148. Files LEONIDE is available from Eurac Research Clarin Centre (ERCC) On-premise GitLab installation and also ready-to-search in ANNIS from Eurac Research ANNIS installation. For further information visit https://www.porta.eurac.edu/?page_id=9 or write to porta@eurac.edu. The following file bundles are available: docs-v1.0.zip contains documentation. [ERCC download] [GitLab download] [Source code repository] transcanno-tei-v1.0.zip contains the transcribed corpus in an TEI XML format exported from the transcription software Transc&Anno. [ERCC download] [GitLab download] [Source code repository] pepper-xml-v1.0.zip contains the transcribed corpus in the Pepper XML format, used for conversion to other file formats like ANNIS. [ERCC download] [GitLab download] [Source code repository] annis-v1.0.zip contains the complete corpus in ANNIS format with all metadata and annotation. [ERCC download] [GitLab download] [Source code repository] LICENSE LEONIDE is available under CLARIN ACADEMIC END-USER LICENCE ACA-BY-NC-NORED Text file Pdf file Any code or scripts in the repositories are licensed under their respective LICENSE files.