DiDi Corpus v1.0.0 Description The DiDi Corpus is a corpus of South Tyrolean Data of Computer-mediated Communication (CMC). The corpus comprises around 370,000 tokens from Facebook wall posts and comments on wall posts, as well as socio-demographic data of participants. All data was automatically annotated with language information (DE, IT, EN and others), and manually normalised and anonymised. Furthermore, semi-automatic token level annotations include part-of-speech and CMC phenomena ( e.g. emoticons, emojis, and iteration of graphemes and punctuation). The anonymised corpus without the private messages is freely available for researchers. Files The DiDi Corpus is available from Eurac Research Clarin Centre (ERCC) On-premise GitLab installation and also ready-to-search in ANNIS from Eurac Research ANNIS installation. For further information visit http://www.eurac.edu/didi or write to linguistics@eurac.edu. The following file bundles are available: data-annis-v1.0.0.zip contains the complete corpus in ANNIS format with all metadata and annotation. [ERCC download] [GitLab download] [Source code repository] data-didijson-v1.0.0.zip contains the compete corpus in didijson dumps with all metadata and annotation. [ERCC download] [GitLab download] [Source code repository] data-didixml-v1.0.0.zip contains the complete corpus in didixml format with all metadata and annotation. [ERCC download] [GitLab download] [Source code repository] data-docs-v1.0.0.zip contains documentation (German:DE and English:EN) about annotation layers, anonymization, and metadata. [ERCC download] [GitLab download] [Source code repository] LICENSE The DiDi Corpus is available under CLARIN ACADEMIC END-USER LICENCE ACA-BY-NC-NORED Text file Pdf file Any code or scripts in the repositories are licensed under their respective LICENSE files.