Files in this item
This item is
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
- Name
- paisa.raw.utf8.gz
- Size
- 521.58 MB
- Format
- application/gzip
- Description
- raw cleaned web texts
- MD5
- d7804d4d9af31ddaec5bfa7409926f2e
- Name
- paisa.annotated.CoNLL.utf8.gz
- Size
- 1.84 GB
- Format
- application/gzip
- Description
- cleaned and linguistically annotated web texts in CoNLL format
- MD5
- 9d49fd1e86c9e6de3a6cb67a6c10a2f2
- Name
- lemma-WITHOUTnumberssymbols-frequencies-paisa.txt.gz
- Size
- 6.94 MB
- Format
- application/gzip
- Description
- lemma frequencies (only composed of letters and the following three symbols: . - ' )
- MD5
- 6d3959478ad4c5fecfe9c9cc305c68af
- Name
- lemma-frequencies-paisa.txt.gz
- Size
- 9.53 MB
- Format
- application/gzip
- Description
- lemma frequencies
- MD5
- ea27fe186efc59410d5ea39c4130315b