The corpus of the 1890s - total of 348,000 tokens - consists of following text classes:
Text class | File name beginning | Number of tokens | Per Cent of corpus |
---|---|---|---|
Newspapers | aja | 193,000 | 55 % |
Fiction | ilu | 155,000 | 45 % |
Newspaper texts come from the following titles:
Newspaper | File name beginning | Number of tokens | Per Cent from newspapers | Per Cent from corpus |
---|---|---|---|---|
Eesti Postimees | epo | 36,600 | 19 % | 11 % |
Olewik | ole | 33,400 | 17 % | 10 % |
Postimees | pos | 48,000 | 25 % | 14 % |
Ristirahwa pühapäewa leht | rip | 2,100 | 1 % | 1 % |
Sakala | sak | 5,300 | 3 % | 2 % |
Walgus | val | 60,500 | 31 % | 17 % |
Wirmaline | vir | 7,100 | 4 % | 2 % |