Corpus statistics: The 1890s

The corpus of the 1890s - total of 348,000 tokens - consists of following text classes:

Text class	File name beginning	Number of tokens	Per Cent of corpus
Newspapers	aja	193,000	55 %
Fiction	ilu	155,000	45 %

Newspaper texts come from the following titles:

Newspaper	File name beginning	Number of tokens	Per Cent from newspapers	Per Cent from corpus
Eesti Postimees	epo	36,600	19 %	11 %
Olewik	ole	33,400	17 %	10 %
Postimees	pos	48,000	25 %	14 %
Ristirahwa pühapäewa leht	rip	2,100	1 %	1 %
Sakala	sak	5,300	3 %	2 %
Walgus	val	60,500	31 %	17 %
Wirmaline	vir	7,100	4 %	2 %

Webmaster Last modified: December 14 2018 22:43:43.