The corpus of the 1910s - total of 401 695 tokens - consists of following text classes:
| Text class | File name beginning | Number of tokens | Per cent of corpus |
|---|---|---|---|
| Newspapers | aja | 214 131 | 53 % |
| Fiction | ilu | 187 564 | 47 % |
Newspaper texts come from the following titles:
| Newspaper | File name beginning |
|---|---|
| Olewik | ow |
| Päewaleht | pl |
| Postimees | pm |
| Tallinna Teataja | tt |
| Wirulane | wi |