Corpus statistics: the 1980s

The corpus of the 1980s - total of 1 million tokens - consists of the following text classes:

Text class	Beginning of file name	Number of tokens	Per cent of corpus
Newspapers	tat	175,000	17.5 %
Documents	tdt	12,000	1.2 %
Encyclopædias	tnt	20,000	2.0 %
Essays and biographies	tet	90,000	9.0 %
Hobby texts	tht	75,000	7.5 %
Fiction	tkt	250,000	25.0 %
Popular science	tpt	150,000	15.0 %
Propaganda	tot	60,000	6.0 %
Religion	trt	8,000	0.8 %
Science	ttt	160,000	16.0 %

Webmaster Last modified: December 19 2018 18:11:07.