The Balanced Corpus of Estonian | Grammatical number | Journalism | Jourbalism % | Fiction | Fiction % | Science | Science % | 4313246 | sg | 1 446 576 | 78,5% | 1 228 388 | 81% | 1 638 282 | 73,5% | 1281723 | pl | 396 106 | 21,5% | 291 318 | 19% | 594 299 | 26,5% |
When interpreting these numbers, it should be noted that even though all the sub-corpora are of the same size (5 million words), they do not include an even number of nominals (see Table 2). Scientific texts include the largest amount of nominals, journalistic texts, on the other hand, include the largest amount of verbs. Therefore, the absolute frequencies of the singular and plural forms in each sub-corpus might not be as informative as are the proportions of the singular and plural forms in the total number of nominals in each sub-corpus.