The corpus of the 1960s - 333,000 tokens total - consists of the following text classes:
Text Class | File name beginning | Number of tokens | Per Cent of corpus |
---|---|---|---|
Newspapers | aja | 201,000 | 60 % |
Fiction | ilu | 132,000 | 30 % |
Newspaper texts come from the following titles:
Newspaper | File name beginning | Number of tokens | Per Cent newspaper texts | Per Cent of corpus |
---|---|---|---|---|
Edasi | ed | 30,000 | 15 % | 9 % |
Kodumaa | km | 8,100 | 4 % | 2 % |
Noorte Hääl | nh | 24,400 | 12 % | 7 % |
Punane Täht | pt | 4,200 | 2 % | 1 % |
Rahva Hääl | rh | 102,600 | 51 % | 31 % |
Sirp ja Vasar | sv | 14,600 | 7 % | 4 % |
Õhtuleht | ol | 17,100 | 9 % | 5 % |