Nijmegen Corpus of Casual Czech

The Nijmegen Corpus of Casual Czech contains 30 hours of high-quality recordings featuring 60 Czech speakers conversing among friends. The speech has been orthographically transcribed and these orthographic transcriptions are stored in Transcriber xml files.

The corpus is available to researchers in academics. If you would like to obtain access to the corpus, you can send an access request to the Radboud University Faculty of Arts data officer by e-mail dataofficer@let.ru.nl. The data officer will send you a data use agreement. After the data use agreement has been signed, you will be granted access to the corpus.

A detailed description of the corpus is provided in:

L. Kočková-Amortová, P. Pollák, J.Rajnoha, & M. Ernestus (2014). The Nijmegen corpus of casual Czech. In Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation, pages 365-370 [link]

This project was funded by a European Young Investigator Award to Mirjam Ernestus and by two Czech grants (from GACR 102/08/0707 and CTU SGS 14/191/OHK3/3T/13). The corpus was recorded at the Phonetic Institute of Charles University in Prague in Autumn 2008. The orthographic transcription was carried out at the Faculty of Electrical Engineering of the Czech Technical University in Prague under the direction of Petr Pollák and Josef Rajnoha.