Eesti keeles

Reference corpus: (SL) Õhtuleht

Content

This corpus contains the issues of the daily newspaper Õhtuleht / SL Õhtuleht from 06. 03. 1997 until 31. 12. 2007, altogether 3344 issues; 45,572,699 tokens.

The corpus is free for use for non-commercial purposes only.

These texts form a part of the planned reference corpus of Estonian. Their collecting and processing has been financed by a national program Estonian language and national culture.

Sources

The texts originate from http://www.ohtuleht.ee

The texts have been semi-automatically downloaded from the internet and converted from HTML to SGML (TEI). The conversion programs were written by Krista Liin.

One file contains one issue. Non-textual parts (e.g. pictures, comic strips) have been omitted. TV programmes, all advertisments, hyperlinks, tables (e.g. sport results, currency exchange rates) etc have also been omitted. Omitted material (Excl. pictures) is substituted by a tag <gap desc=’description_of_omission’>.

Markup

(See the Estonian page)

SGML-entities

SGML-files contain entities listed in this table


Valid XHTML 1.0! Valid CSS! Webmaster    Last modified: December 21 2018 18:16:25.