The Balanced Corpus of Estonian
Why do we need this corpus
The Balanced Corpus of Estonian is compiled in order to enable the comparison of the three main text classes of the written language: fiction, journalistic and scientific writing.
What does this corpus consist of?
The corpus consists of:
- 5 million words of journalistic texts, that are described in more detail on the page „Newspapers in the Balanced Corpus“ and that are under the label „Ajalehed“ in the corpus query.
- 5 million words of fiction, that is described in more detail on the page „Fiction in the Balanced Corpus“ and that is under the label „Ilukirjandus“ in the corpus query.
- 5 million words of science texts, that are described in more detail on the page „Scientific texts in the Balanced Corpus“ and that are under the label „Teadus“ in the corpus query.
The Balanced Corpus of Estonian is a subpart or a subcorpus of the big Mixed Corpus of Estonian.
Webmaster
Last modified: January 21 2019 19:06:56.