The corpus of the 1950s - total of 308,000 tokens consists of the following text classes:
Text Class | File name beginning | Number of tokens | Per Cent of corpus |
---|---|---|---|
Newspapers | aja | 242,400 | 79 % |
Fiction | ilu | 66,000 | 21 % |
The newspaper texts come from the following titles:
Newspapers | File name beginning | Number of tokens | Per Cent newspapers | Per Cent of corpus |
---|---|---|---|---|
Edasi | ed | 31,900 | 13 % | 10 % |
Noorte Hääl | nh | 32,800 | 14 % | 11 % |
Rahva Hääl | rh | 109,200 | 45 % | 35 % |
Sirp ja Vasar | sv | 16,400 | 7 % | 5 % |
Talurahvaleht | tl | 11,400 | 5 % | 4 % |
Õhtuleht | ol | 39,500 | 16 % | 13 % |