Nijmegen Corpus of Casual Spanish

Corpus collection

The creation of the corpus was initiated in March 2008. A group of university students were hired at the Universidad Politécnica de Madrid as confederates. Each confederate brought two friends to the recording session. Both the confederate and the two other participants complied with the following requirements:

  • They knew the two other participants in the recording well.
  • They were of the same sex as the two other participants in the recording.
  • They were university students in Madrid.
  • They had been raised in the Madrid region.
  • They reported not suffering from any pathology related to speech or hearing.

The recordings took place in a sound-attenuated booth at the Universidad Politécnica de Madrid in sessions of around 90 minutes for each group of participants. Each of the two naïve speakers participating in a conversation was recorded in a separate audio channel of a stereo signal, while the confederate was recorded separately in a mono audio stream. The participants were placed in such a way that only the two naïve speakers were filmed at all moments. This is illustrated in the following image:

example

Casual speech was elicited during three different parts. In Part 1, we pretended that the confederate's microphone did not work properly and asked her to leave the room. This resulted in an unexpected situation in which the naïve speakers did not know with certainty whether the recording had begun. The conversation then held by the two naïve speakers was recorded for 20 minutes. Part 2, which lasted around 35 minutes on average, consisted of a free conversation between the confederates and their friends. Part 3 required participants to choose three questions from a list of general interest questions, and to negotiate a common position for their group. Part 3 had an average duration of 35 minutes.