This subcorpus contains texts from the internet archive of the journal Agraarteadus www.eau.ee/~aps/index.pp?AGRAARTEADUS (ca 298 000 words altogether). The corpus contains the issues form the years 2001 – 2006, except for some articles from the years 2002-2003.
The corpus is free for use for non-commercial purposes only.
Mark-up and annotation conform to the TEI-guidelines. One file contains one year’s issues of the journal.
Every file begins with a header <teiheader>
that contains information about the file size, used tags etc.
The rest of the file is structured as follows:
<div0>
contais one year’s issues of journal, e.g. <div0 type='aasta'><head>
Agraarteadus 2001 </head>
<div1>
is one issue of the journal e.g. <div1 type='number'><head>
2001 Nr 2 </head>
<div2>
is an article, e.g. <div2 type='artikkel'><head>
EESTI HOLSTEINI GENEETILISE SELEKTSIOONIEDU MAJANDUSLIK VÄÄRTUS </head>
<p>
, sentences <s>
, headlines <head>
and authors <bibl><author>
. <gap desc=’description_of_the_omitted_material’>
. By non-textual material we mean pictures (photos, drawings, diagrams etc), tables, lists of references etc. Longer non-Estonian passages, usually the English summaries of the articles have also been omitted
In the corpus version one can access via our corpus query, all mark-up except the tags <gap>
used for the omitted material have been deleted.