Corpus of Written Estonian: the 1970s

Corpus statistics: The 1970s

The corpus of the 1970s total of 425,600 tokens consists of the following text classes:

Text class	Filename beginning	Number of tokens	Per cent of corpus
Newspapers	(several, see below)	168,500	40 %
Fiction	ilu	257,100	60 %

Newspaper texts are from the following titles:

Newspaper	File name beginning	Number of tokens	Per cent of newspapers	Per cent of corpus
Edasi	ed	27,000	16 %	6 %
Kodumaa	km	10,600	6 %	2.5 %
Noorte Hääl	nh	37,000	22 %	9 %
Punane Täht	pt	3,800	2 %	1 %
Rahva Hääl	rh	60,500	36 %	14 %
Sirp ja Vasar	sv	21,500	13 %	5 %
Õhtuleht	ol	8,000	5 %	2 %

Webmaster Last modified: December 14 2018 17:11:58.