This subcorpus contains texts from the popular science magazine Horisont, 260 000 words altogether. The corpus contains issues from the years 1996-2003, 230 articles in 7 files. The texts originate from the webpage http://www.horisont.ee (09.10 2003)
The corpus is free for use for non-commercial purposes only.
Mark-up and annotation conform to the TEI-guidelines.
Every file begins with a header <teiheader>
that contains information about the file size, used tags etc.
The rest of the file is structured as follows:
<div0>
, e.g. <div0 type='ajakiri'>
<head>
Horisont , 1996 </head>
one article in the journal with <div1>
, e.g. <div1 type='artikkel'>
<head>
OLEN NOOR VÕI OLEN VANA ? </head>
<p>
and sentences <s>
, headlines <head>
and authors <bibl>
<author>
. <gap desc=’description_of_the_omitted_material’>
. By non-textual material we mean pictures (photos, drawings, diagrams etc), tables etc.SGML-files contain entities listed in this table