ERCC Open: Various

ERCC Open: Various http://hdl.handle.net/20.500.12124/37 Submissions dealing with various themes not otherwise covered (from other institutions). Thu, 21 May 2026 19:33:31 GMT 2026-05-21T19:33:31Z LegISTyr test set http://hdl.handle.net/20.500.12124/104 LegISTyr test set Alber, Marlies; Chiocchetti, Elena; Ralli, Natascia; Stanizzi, Isabella LegISTyr is a machine translation test set for evaluating the quality of legal terminology translation from Italian to South Tyrolean German, a minor standard variety of German. It covers specific legal subdomains or legal translation issues: 1) standardised terminology, 2) occupational health and safety, 3) subsidised housing, 4) family law, 5) criminal and criminal procedure law, 6) homonyms, 7) abbreviated forms, 8) gender-inclusive writing strategies. Each subset contains at least 250 examples, i.e. five examples for each term or twenty examples for each inclusive writing strategy. The total number of examples is 2067. The example sentences in the test set showcase single-word and multi-word terms from the Italian legal system, together with their correct, standardised or non-standardised South Tyrolean German target hypothesis. It also lists other (less) acceptable variants used in South Tyrol and, where available, equivalent terms from other German-speaking legal systems (mainly Austria, Germany, Switzerland). The legal subdomain is specified for each example in every subset, except for the last subset on gender-inclusive writing. This subset contains examples for different strategies used in Italian but no target hypotheses, as there may be several acceptable ones. LegISTyr can be used, for example, to assess the success of terminology enforcement strategies when machine translating legal and administrative texts from Italian into German as well as the influence of major varieties of legal German on translations into a minor standard variety. Mon, 07 Jul 2025 00:00:00 GMT http://hdl.handle.net/20.500.12124/104 2025-07-07T00:00:00Z „One school, many languages“: A Teacher Questionnaire for Research on Plurilingual Education http://hdl.handle.net/20.500.12124/103 „One school, many languages“: A Teacher Questionnaire for Research on Plurilingual Education Guarda, Marta; Colombo, Sabrina; Stawinoga, Agnieszka Elzbieta; Flarer, Heidi This questionnaire was used in the spring of 2021 by the research team of the project „One school, many languages / A lezione con più lingue / Sprachenvielfalt macht Schule” (SMS 2.0) of Eurac Research in the context of a cross-sectional, explorative study on plurilingual education. Plurilingual education is understood here as any approach in which two or more languages are strategically used for teaching and learning with the aim of encouraging students to gain increased awareness of and appreciation for linguistic diversity, and to leverage the resources of their repertoires to enhance their overall learning (Guarda 2023). The study in which this questionnaire was developed involved a selected sample of teachers of every subjects and working at all levels of education – from primary to upper secondary school – in the Italian province of South Tyrol, a historically multilingual area where three official languages (German, Italian and Ladin) now coexist with the new forms of multilingualism brought by recent migration flows. The aim of the study was to explore whether and how plurilingual education was implemented by the questionnaire respondents, as well as to identify their formative needs with regard to its implementation. Based on this, the main research questions that informed the study are as follows: • RQ1. With which frequency, if any, did the questionnaire respondents implement plurilingual didactic activities (PDAs) before the outbreak of the Covid-19 pandemic? • RQ2. In case the respondents did implement PDAs, what kind of activities did they conduct in their classes? • RQ3. In case the respondents did implement PDAs, which languages and/or varieties did they involve? • RQ4. In case the respondents did not implement any PDAs in their classes, what reasons did they provide? • RQ5. What were the respondents’ formative needs with regard to plurilingual education and its implementation? Since the study took place in a time when schools in Italy were still dealing with the Covid-19 pandemic, it was hypothesised that the frequency or conditions of PDA implementation had been affected by the emergency situation. This aspect was taken into account while designing the questionnaire, and this is also why the research questions reported above make reference to experiences in times before the outbreak of the pandemic. The questionnaire includes 55 items distributed across four sections: these aimed at collecting information about the school the respondents were working in, their experience – if any - with plurilingual education, their formative needs and their biodata. The questionnaire can be adapted to inform the design and administration of future questionnaires aimed at a deeper understanding of plurilingual education through the experiences and perspectives of schoolteachers, both in South Tyrol and in other increasingly multilingual contexts. Users acknowledge and agree that the survey is provided “as is,” without warranty of any kind, and that users assume all risks and liabilities arising from or relating to its and recipient subsidiaries’ use of and reliance upon the survey. Eurac Research makes no representations or warranties of any kind whatsoever, express or implied, at law or in equity, in connection with or with respect to the survey, including any representations or warranties in regard to quality, performance, or noninfringement. If interested, researchers can read more about the questionnaire, as well as about the findings of the study in which the questionnaire was administered, in the following publications: Guarda, M., Colombo, S. & Flarer, H. (2022). Plurilinguismo: uno studio esplorativo sulla didattica plurilingue. Bolzano: Eurac Research. https://sms-project.eurac.edu/report-didattica-plurilingue/?lang=it ISBN: 978-88-98857-77-7 Guarda, M., Colombo, S. & Flarer, H. (2022). Mehrsprachigkeit: Eine explorative Studie zur Mehrsprachigkeitsdidaktik. https://sms-project.eurac.edu/bericht-mehrsprachigkeitsdidaktik/?lang=de ISBN: 978-88-98857-76-0. Guarda, M. (2023). Plurilingual education through the teachers‘ eyes: insights from South Tyrol. In: Fusco, F., Marcato, C. & Oniga, R. (eds.) Proceedings of the Third International Colloquium on Plurilingualism, 252-269. Udine: FORUM. Mon, 16 Jun 2025 00:00:00 GMT http://hdl.handle.net/20.500.12124/103 2025-06-16T00:00:00Z HELLO CAMPANIA! Ghana Collection http://hdl.handle.net/20.500.12124/90 HELLO CAMPANIA! Ghana Collection Di Salvo, Margherita; Cataldo, Violetta; Marta, Maffia; Asienda, Hannaora Marlene The HELLO CAMPANIA! Ghana collection contains 12 sociolinguistic interviews collected with 4 first generation migrants and 8 second generation migrants living in Naples. It also contains 9 language portraits. Tue, 03 Dec 2024 00:00:00 GMT http://hdl.handle.net/20.500.12124/90 2024-12-03T00:00:00Z HELLO CAMPANIA! Sri Lanka Collection http://hdl.handle.net/20.500.12124/87 HELLO CAMPANIA! Sri Lanka Collection Di Salvo, Margherita; Cataldo, Violetta; Maffia, Marta; Noschese, Maria Paola The corpus consists of 48 audio files for a total of 20:38 of recordings (public) and their relative transcriptions in ELAN (upon request). This collection includes 15 language portraits. The collection is organized in four bundles: - 1G_audio: contains all the audio files collected with 1st generation migrants (30 files) - 1G_portrait: contains the language portraits collected 1st generation migrants (13 files) - 2G_audio: contains all the audio files collected with 2nd generation migrants (18 files) - 2G_portrait: contains the language portraits collected 2nd generation migrants (2 files) Wed, 27 Nov 2024 00:00:00 GMT http://hdl.handle.net/20.500.12124/87 2024-11-27T00:00:00Z HELLO CAMPANIA! Ukraina Collection http://hdl.handle.net/20.500.12124/97 HELLO CAMPANIA! Ukraina Collection Di Salvo, Margherita; Noschese, Maria Paola; Cataldo, Violetta; Maffia, Marta The Ukrainian collection contains data for 26 speakers of first generation (G1), 19 females and 6 males. The collection contains three folders for each group: the sociolinguistic interview and a language portrait. Fri, 10 Jan 2025 00:00:00 GMT http://hdl.handle.net/20.500.12124/97 2025-01-10T00:00:00Z HELLO CAMPANIA! Bangladesh Collection http://hdl.handle.net/20.500.12124/88 HELLO CAMPANIA! Bangladesh Collection Di Salvo, Margherita; Cataldo, Violetta; Noschese, Maria Paola The collection contains 11 interviews with 1st Bangladeshi generation migrants in Naples. It also contains langauge portraits of the migrants. Thu, 28 Nov 2024 00:00:00 GMT http://hdl.handle.net/20.500.12124/88 2024-11-28T00:00:00Z German Summary Corpus (GerSumCo) v1.0.0 http://hdl.handle.net/20.500.12124/81 German Summary Corpus (GerSumCo) v1.0.0 Wedig, Helena; Strobl, Carola The GerSumCo (German Summary Corpus) is a learner corpus comprising syntheses written by L2 German writers (CEFR B2/C1) and writers of L1 German. The corpus has been created with the objective of conducting a comparative analysis of the academic writing of L1 German and L2 German students. The two subcorpora (L1 and L2) contain a total of 286 texts (178 L1 and 108 L2), written by 286 students at 14 universities and language schools in Germany (Bamberg, Bochum, Dresden, Hamburg, Hildesheim, Kiel, Leipzig, Magdeburg, Osnabrück, Potsdam, Trier, Wuppertal), Poland (Gdansk) and China (Hangzhou). The texts were collected between 2022 and 2024 as part of a PhD research project about a contrastive interlanguage analysis using GerSumCo and Beldeko to identify L1-dependent features in cohesion in L2/L1 German. The metadata files (Meta_GerSumCo_L1 & Meta_GerSumCo_L2) contain the following information: - Up to three L1s of the writers - Up to three L2s of the writers - Collection date - Topic - Whether the text was written as homework or in class - Group of students the texts belonged to The file names contain the following information: - Whether the text is part of the L1 or L2 subcorpus - Topic The summaries, on average, consist of 230 words. The texts were either produced in class on computers or as homework, within a 60-minute time frame. Students were permitted to use online dictionaries, but no AI-based auxiliary means. They were required to summarise two texts on one of four topics related to language variation in German: Kiezdeutsch, Mundartdebatte in der Schweiz, Viadrinisch and Varianten-Wörterbuch des Deutschen. This version contains the TXT files of the texts and the CSV files containing the manual annotations of the texts with token ID, sentence ID, source text form, target form, automatic annotated lemma, POS (STTS) and simple UPOS part-of-speech tag. Mon, 01 Jan 2024 00:00:00 GMT http://hdl.handle.net/20.500.12124/81 2024-01-01T00:00:00Z KONTATTO v1.0 http://hdl.handle.net/20.500.12124/78 KONTATTO v1.0 Dal Negro, Silvia; Ciccolone, Simone Luca, Ducceschi; Franzini, Greta Kontatto is a corpus of transcribed and annotated spoken data collected by Silvia Dal Negro at the Free University of Bozen/Bolzano. It consists of almost 150,000 orthographic words divided into 55 recordings involving 97 different speakers for a total of 18 hours of speech. The corpus is multilingual and contains a variety of spontaneously occurring code-mixing patterns. However, language distribution is not even: 80.4% of the corpus is made of Tyrolean words, 11.5% of Italian, 2.6% of the words were classified as Trentino, another 0.8% involved other languages (e.g. Ladin, English, etc.) and, finally, 4.7% of the words are not confidently attributable to any language in particular (e.g. proper names, widespread loanwords, some interjections, etc.). This repository contains the Kontatto-MT corpus subset. The data was collected using a collaborative Map Task, during which two speakers and an interviewer interacted to navigate a physical map in order to reach a given destination. This subcorpus documents a variety of languages and dialects in the dolomite region, including (some) Tyrolean and Trentino dialects, Italian, Cimbrian, Ladin, usually combined in the same dialogue. At present it consists of 35,453 tokens, 73% classified as local German dialect. Kontatto was created within the scope of two projects financed by the Autonomous Province of Bozen-Bolzano between 2011-2014, “Italiano-tedesco: aree storiche di contatto in Sudtirolo e Trentino”, and 2016-2019, “Germanico-Romanzo: discorsi e strutture in contatto nell’area dolomitica”. Over the years, many research assistants and students have contributed to the annotation of the data: Katrin Tartarotti, Mara Leonardi, Marta Ghilardi, Nicole Giaier, Adriana Rasa, Lucia Rossaro, Luigi Parisi and Jay Hevelone. The CLARIN deposit was prepared by Greta Franzini and Luca Ducceschi of Eurac Research. Mon, 10 Jun 2024 00:00:00 GMT http://hdl.handle.net/20.500.12124/78 2024-06-10T00:00:00Z e-LIS: Electronic Bilingual Dictionary Italian Sign Language (LIS) – Italian v1.0 http://hdl.handle.net/20.500.12124/75 e-LIS: Electronic Bilingual Dictionary Italian Sign Language (LIS) – Italian v1.0 Vettori, Chiara; Zanoni, Claudio; Felice, Mauro; Stanizzi, Isabella; Baj, Claudio; Battagin, Alessandra; Consolati, Marco; Valente, Maddalena Franzini, Greta; Stemle, Egon W. Legacy files of the former Electronic Bilingual Dictionary Italian Sign Language (LIS) - Italian, the first prototype of an online Italian Sign Language reference dictionary (2004-2008). Data includes 2677 videos with definitions and examples for 294 Italian lemmas. Sun, 01 Jan 2006 00:00:00 GMT http://hdl.handle.net/20.500.12124/75 2006-01-01T00:00:00Z VinKo (Varieties in Contact) Corpus v1.2 http://hdl.handle.net/20.500.12124/74 VinKo (Varieties in Contact) Corpus v1.2 Rabanus, Stefan; Kruijt, Anne; Tagliani, Marta; Tomaselli, Alessandra; Padovan, Andrea; Alber, Birgit; Cordin, Patrizia; Zamparelli, Roberto; Vogt, Barbara Maria VINKO is a spoken corpus based on crowd-sourced audio recordings that has been designed to provide relevant linguistic information about the minority languages and dialects spoken in the area between Innsbruck and the Po Valley. The corpus contains audio recordings from local languages and varieties spoken in the regions Trentino-Alto Adige/Südtirol, Veneto, and Friuli-Venezia Giulia, with particular focus on the so-called 'language contact' between Germanic (Cimbrian, Mòcheno, Tyrolean, Saurano, and Sappadino) and Romance (Ladin, Trentino and Veneto dialects). The data collection took place from June 2017 to May 2023. Sun, 01 Jan 2023 00:00:00 GMT http://hdl.handle.net/20.500.12124/74 2023-01-01T00:00:00Z