The corpus of the 1910s - total of 401 695 tokens - consists of following text classes:
Text class | File name beginning | Number of tokens | Per cent of corpus |
---|---|---|---|
Newspapers | aja | 214 131 | 53 % |
Fiction | ilu | 187 564 | 47 % |
Newspaper texts come from the following titles:
Newspaper | File name beginning |
---|---|
Olewik | ow |
Päewaleht | pl |
Postimees | pm |
Tallinna Teataja | tt |
Wirulane | wi |