These texts form a part of the reference corpus of Estonian. Their collecting and processing has been financed by a national program ~Estonian language and national culture~.
The texts have been semi-automatically downloaded from the internet and converted from PDF to SGML (TEI) format. The conversion programs were written and conversions made by Kaarel Veskis and Heiki-Jaan Kaalep.
One file contains the issues of the journal published during one year. The non-textual material - tables, graphs, pictures - has been omitted, as well as the English summaries and lists of references.
The text has been divided into paragraphs as in the original pdf-file. The sentences have been annotated automatically. Every file begins with a teiHeader that contains description about the file, tags used etc.
<div0> stands for a whole year of the issues, <div1> stands for one issue and <div2> stands for an article or other text in an issue.
The opening quotation mark is the entity “. The closing quotation mark is the entity ”, single quote is '. The information about rendition (bold, italic, etc) has been stated only if it applies to the whole paragraph. The possible rendition alternatives for a paragraph are the following:
SGML-files contain entities listed in this table