The corpus has been created at the University of Tokyo under the guidance of Prof. Kazuto Matsumura.
It consists of transcripts of the Asutaw Kogu (Constituent Assembly) from 1919-1920; approximately 2 million words.
The corpus can be used freely only for non-commercial purposes.
utf-8, xml
Text is divided into paragraphs <p>
and sentences <s>
. The sentences are numbered.