Nijmegen Corpus of Spanish English

Corpus contents

The Nijmegen Corpus of Spanish English was created in order to provide high-quality sound recordings of speech produced by Spanish speakers of English in a communicative setting with Dutch speakers of English in both a formal and an informal speech situation. Please find the core characteristics below:

  • The NCSE contains about 38.5 hours of speech.
    • The total duration of the informal speech recordings is about 25 hours; 15 hours of which have been produced by the Spanish speakers.
    • The total duration of the formal speech recordings is about 13 hours; 9.5 hours of which have been produced by the Spanish speakers.
  • The Spanish speakers in the NCSE produced 229,415 word tokens and 6,411 word types.
  • The NCSE contains high-quality recordings captured with head-mounted microphones in a sound-attenuated room.
  • The NCSE contains speech from 34 speakers (17 male and 17 female). All speakers were recorded in both an informal, peer-to-peer conversation and in a formal interview.
  • The NCSE contains large amounts of speech data for every speaker. The average total duration of the recordings of both speech situations (the formal and informal parts together) is just under 70 minutes per speaker.
  • The NCSE also contains video data, capturing a frontal view of the Spanish speakers and a side view of the Dutch speakers.

The following screenshot illustrates the informal situation (click on image for a short stretch of audio):

The following screenshot illustrates the formal situation (click on image for a short stretch of audio):