Nijmegen Corpus of Casual Czech

The Nijmegen Corpus of Casual Czech contains 30 hours of high-quality recordings featuring 60 Czech speakers conversing among friends. The speech has been orthographically transcribed and these orthographic transcriptions are stored in Transcriber xml files.

The corpus is available to researchers in academics. If you would like to obtain a copy of the corpus, please contact Rian Zondervan by e-mail (

A detailed description of the corpus is provided in:

  • L. Kočková-Amortová, P. Pollák, J.Rajnoha, & M. Ernestus (2014). The Nijmegen corpus of casual Czech. In Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation, pages 365-370 [link]

This project is funded by a European Young Investigator Award to Mirjam Ernestus and by two Czech grants (from GACR 102/08/0707 and CTU SGS 14/191/OHK3/3T/13). The corpus was recorded at the Phonetic Institute of Charles University in Prague in Autumn 2008. The orthographic transcription was carried out at the Faculty of Electrical Engineering of the Czech Technical University in Prague under the direction of Petr Pollák and Josef Rajnoha.