What's New

 corpus 
corpus
Description:
The CANOLA Corpus is a visually annotated English web corpus for training classification engines to remove boiler plate on unseen Web pages. It was harvested, annotated and evaluated by the tools and infrastructure of the ...
 This item contains 2 files (10.83 MB).
 
Publicly Available
 corpus 
corpus
Description:
The CANOLA Corpus is a visually annotated English web corpus for training classification engines to remove boiler plate on unseen Web pages. It was harvested, annotated and evaluated by the tools and infrastructure of the ...
 This item contains 1 file (9.24 MB).
 
Publicly Available
 corpus 
corpus
Description:
The DiDi corpus has an overall size of around 600.000 Tokens gathered from 136 South Tyrolean Facebook users who participated in the DiDi project. It consists of 11.102 Facebook wall posts, 6.507 wall comments and 22.218 ...
 This item contains 6 files (67.06 MB).
 
Academic Use Attribution Required Noncommercial

Most Viewed Items

Top Last Week
 corpus 
corpus
Description:
The DiDi corpus has an overall size of around 600.000 Tokens gathered from 136 South Tyrolean Facebook users who participated in the DiDi project. It consists of 11.102 Facebook wall posts, 6.507 wall comments and 22.218 ...
 This item contains 6 files (67.06 MB).
 
Academic Use Attribution Required Noncommercial
 corpus 
corpus
Description:
The Paisà corpus is a large collection of Italian web texts, licensed under Creative Commons (Attribution-ShareAlike and Attribution-Noncommercial-ShareAlike). It has been created in the context of the project PAISÀ. All ...
 This item contains 4 files (2.36 GB).
 
Publicly Available
 corpus 
corpus
Description:
The MERLIN corpus is a written learner corpus for Czech, German, and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR) with authentic learner data. The corpus ...
 This item contains 10 files (174.81 MB).
 
Publicly Available