Eesti keeles

Corpus statistics: the 1980s

The corpus of the 1980s - total of 1 million tokens - consists of the following text classes:

Text class Beginning of file name Number of tokens Per cent of corpus
Newspapers tat 175,000 17.5 %
Documents tdt 12,000 1.2 %
Encyclopædias tnt 20,000 2.0 %
Essays and biographies tet 90,000 9.0 %
Hobby texts tht 75,000 7.5 %
Fiction tkt 250,000 25.0 %
Popular science tpt 150,000 15.0 %
Propaganda tot 60,000 6.0 %
Religion trt 8,000 0.8 %
Science ttt 160,000 16.0 %

Valid XHTML 1.0! Valid CSS! Webmaster    Last modified: December 19 2018 18:11:07.