Eesti keeles

Corpus of Written Estonian: the 1970s

Corpus statistics: The 1970s

The corpus of the 1970s total of 425,600 tokens consists of the following text classes:

Text class Filename beginning Number of tokens Per cent of corpus
Newspapers (several, see below) 168,500 40 %
Fiction ilu 257,100 60 %

Newspaper texts are from the following titles:

Newspaper File name beginning Number of tokens Per cent of newspapers Per cent of corpus
Edasi ed 27,000 16 % 6 %
Kodumaa km 10,600 6 % 2.5 %
Noorte Hääl nh 37,000 22 % 9 %
Punane Täht pt 3,800 2 % 1 %
Rahva Hääl rh 60,500 36 % 14 %
Sirp ja Vasar sv 21,500 13 % 5 %
Õhtuleht ol 8,000 5 % 2 %

Valid XHTML 1.0! Valid CSS! Webmaster    Last modified: December 14 2018 17:11:58.