The corpus of the 1970s total of 425,600 tokens consists of the following text classes:
Text class | Filename beginning | Number of tokens | Per cent of corpus |
---|---|---|---|
Newspapers | (several, see below) | 168,500 | 40 % |
Fiction | ilu | 257,100 | 60 % |
Newspaper texts are from the following titles:
Newspaper | File name beginning | Number of tokens | Per cent of newspapers | Per cent of corpus |
---|---|---|---|---|
Edasi | ed | 27,000 | 16 % | 6 % |
Kodumaa | km | 10,600 | 6 % | 2.5 % |
Noorte Hääl | nh | 37,000 | 22 % | 9 % |
Punane Täht | pt | 3,800 | 2 % | 1 % |
Rahva Hääl | rh | 60,500 | 36 % | 14 % |
Sirp ja Vasar | sv | 21,500 | 13 % | 5 % |
Õhtuleht | ol | 8,000 | 5 % | 2 % |