Nijmegen Corpus of Casual French

The Nijmegen Corpus of Casual French contains 35 hours of high-quality recordings featuring 46 French speakers conversing among friends. The speech has been orthographically annotated by professional transcribers. The transcriptions are stored in Transcriber xml and Praat TextGrid files, as well as ELAN .eaf format.

The corpus is available to researchers in academics. If you would like to obtain access to the corpus, you can send an access request to the Radboud University Faculty of Arts data officer by e-mail dataofficer@let.ru.nl. The data officer will send you a data use agreement. After the data use agreement has been signed, you will be granted access to the corpus.

A detailed description of the corpus is provided in:

Torreira, F., Adda-Decker, M., and Ernestus, M. (2010). The Nijmegen Corpus of Casual French. Speech Communication, 52:201-221.[pdf]

This project was funded by a European Young Investigator Award to Mirjam Ernestus. The corpus was recorded by Francisco Torreira at the Laboratoire de Phonétique et Phonologie (UMR7018) in Paris during November 2008 as part of his dissertation work at the Radboud University Nijmegen. The orthographic transcription was carried out in collaboration with Martine Adda-Decker (CNRS-LIMSI, France).