The corpus of the 1980s - total of 1 million tokens - consists of the following text classes:
| Text class | Beginning of file name | Number of tokens | Per cent of corpus |
|---|---|---|---|
| Newspapers | tat | 175,000 | 17.5 % |
| Documents | tdt | 12,000 | 1.2 % |
| Encyclopædias | tnt | 20,000 | 2.0 % |
| Essays and biographies | tet | 90,000 | 9.0 % |
| Hobby texts | tht | 75,000 | 7.5 % |
| Fiction | tkt | 250,000 | 25.0 % |
| Popular science | tpt | 150,000 | 15.0 % |
| Propaganda | tot | 60,000 | 6.0 % |
| Religion | trt | 8,000 | 0.8 % |
| Science | ttt | 160,000 | 16.0 % |