Show simple item record

 
dc.contributor.author Frey, Jennifer-Carmen
dc.contributor.author Glaznieks, Aivars
dc.contributor.author Stemle, Egon W.
dc.date.accessioned 2019-03-07T17:58:26Z
dc.date.available 2019-03-07T17:58:26Z
dc.date.issued 2019-03-07
dc.identifier.uri http://hdl.handle.net/20.500.12124/7
dc.description The DiDi corpus has an overall size of around 600.000 Tokens gathered from 136 South Tyrolean Facebook users who participated in the DiDi project. It consists of 11.102 Facebook wall posts, 6.507 wall comments and 22.218 private messages. All messages were written by the participants throughout the year 2013. Please read the fulldescription of the corpus for further details. Please consider also the description of the method of data collection and the full description of the DiDi project and its research questions. As every participant could offer either his/her private messages, his/her texts on the wall or both, the corpus comprises wall posts and wall comments from 130 profiles and private messages of 56 profiles; 50 participants granted access to both types of data. Free access to the corpus is given to the wall posts and comments. Due to privacy issues the access to the private messages is restricted. Access to the private messages can be given for scientific research only, after signing a non-disclosure agreement. In case you are interested in the data for scientific reasons, please contact the research team. All texts were anonymised in order to guarantee that the participants' identity cannnot be infered from the texts. The anonymisation included person names, group names, geographical names and adjectival references, institution names, hyperlinks, mail addresses, phone numbers, numbers of bank accounts, servers, postal codes and other private information. Please, read the anonymisation document for the anonymisation keys. The corpus offers a vast range of research opportunities for linguists that are interested in CMC in general, and more specific in multilingual language use, the use of regional varieties, code switching, code shifting and code mixing phenomena, etc. Access to the DiDi corpus: https://commul.eurac.edu/annis/didi
dc.language.iso deu
dc.language.iso ita
dc.language.iso eng
dc.language.iso lad
dc.publisher Institute for Applied Linguistics, Eurac Research
dc.relation.isbasedon https://gitlab.inf.unibz.it/commul/didi/data-bundle/-/tags/v1.0.0
dc.relation.isreferencedby http://www.eurac.edu/en/research/autonomies/commul/Documents/DiDi/NLP4CMC-2015_DiDi_paper.pdf
dc.relation.isreferencedby http://www.eurac.edu/en/research/autonomies/commul/Documents/DiDi/didi_clic-it2016_FINAL.pdf
dc.rights CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)
dc.rights.uri https://gitlab.inf.unibz.it/commul/var/eurac-licenses/-/raw/v1.0/EULA-CLARIN-ACA-BY-NC-NORED.md
dc.rights.label ACA
dc.source.uri http://www.eurac.edu/didi
dc.subject Facebook
dc.subject Social Media
dc.subject Computer-mediated Communication
dc.subject Chat
dc.subject Status Updates
dc.subject Comment
dc.subject Social Networking Sites
dc.subject Multilingualism
dc.subject Dialect
dc.subject South Tyrol
dc.subject Instant Messaging
dc.subject CMC
dc.title DIDI - The DiDi Corpus of South Tyrolean CMC 1.0.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden false
hasMetadata false
has.files yes
branding CMC & WaC
demo.uri https://commul.eurac.edu/annis/didi
contact.person Corpus Manager clarin@eurac.edu Eurac Research CLARIN Centre (ERCC)
sponsor Autonome Provinz Bozen - Südtirol, Abteilung Bildungsförderung, Universität und Forschung, Landesgesetz vom 13. Dezember 2006, Nr. 14 ''Forschung und Innovation" / Provincia autonoma di Bolzano - Alto Adige, Ripartizione Diritto allo studio, università e ricerca scientifica, Legge provinciale 13 dicembre 2006, n. 14 ''Ricerca e innovazione'' x Digital Natives - Digital Immigrants. Schreiben auf Social Network Sites: Eine korpusunterstützte Sprachbeobachtung des aktuellen Sprachgebrauchs in Südtirol unter besonderer Berücksichtigung des Alters nationalFunds
size.info 23000 texts
size.info 370000 tokens
files.size 70312349
files.count 6


 Files in this item

 Download all files in item (67.06 MB)
This item is
Academic Use
and licensed under:
CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)
Redistribution Not Permitted Attribution Required Noncommercial
Icon
Name
README.html
Size
4.61 KB
Format
HTML
Description
File bundle descriptions.
MD5
8497ad24f6b18a57530f1bedee23889c
 Download file
Icon
Name
CHANGELOG.html
Size
1.69 KB
Format
HTML
Description
CHANGELOG
MD5
6db487a62477b9bcf98430be7cfb4a65
 Download file
Icon
Name
data-docs-v1.0.0.zip
Size
765.28 KB
Format
application/zip
Description
documentation (German:DE and English:EN) about annotation layers, anonymization, and metadata.
MD5
959501efa48434b8172bf9fb98a941f2
 Download file
Icon
Name
data-annis-v1.0.0.zip
Size
34.35 MB
Format
application/zip
Description
complete corpus in ANNIS format with all metadata and annotation.
MD5
a00c2e4c91ac0e47bf0fb2e39bedcbe2
 Download file
Icon
Name
data-didijson-v1.0.0.zip
Size
5.31 MB
Format
application/zip
Description
compete corpus in didijson dumps with all metadata and annotation.
MD5
ac0793514eb9b7455e66c00f50de8e5d
 Download file
Icon
Name
data-didixml-v1.0.0.zip
Size
26.64 MB
Format
application/zip
Description
complete corpus in didixml format with all metadata and annotation.
MD5
18116af35ccb9042b6724070bc4bd1a0
 Download file

Show simple item record