The corpus of the 1990s - total 995,800 tokens - consists of the following types of text:
Type | Beginning of file name | Number of tokens | Per cent of corpus |
---|---|---|---|
Newspapers | (several, see below) | 384,800 | 39 % |
Fiction | ilu | 611,000 | 61 % |
Newspaper title | Beginning of file name | Number of tokens | Per cent of newspapers |
---|---|---|---|
Edasi, Postimees | ed, pm | 37,700 | 9.8 % |
Eesti Ekspress | ee | 40,000 | 10.4 % |
Kultuurileht, Reede, Sirp | kl, re, si | 70,100 | 18.2 % |
Maaleht | ml | 67,300 | 17.5 % |
Pärnu Postimees | pp | 24,700 | 6.4 % |
Rahva Hääl | rh | 85,000 | 22.1 % |
Õhtuleht | ol | 36,200 | 9.4 % |
Äripäev | ap | 23,500 | 6.1 % |