Nijmegen Corpus of Spanish English

Corpus transcription

The corpus was orthographically transcribed in Praat. The Praat TextGrids have separate tiers for both speakers and one tier for remarks, for example about background noise. The speech was manually segmented into chunks of a few seconds defined by natural pauses in the speech signal. The transcriptions were made in standard American English spelling. Contractions, such as don't, were written in full (do not). Some particular speech tokens, for instance Spanish or Dutch words or truncated words, were marked by additional symbols (for example '*' for Spanish words, and '\-' for truncated words). Frequently recurring non-speech sounds, such as taking breaths and laughter, were transcribed between square brackets, for example [breath] and [laughter].

The following screenshot illustrates the transcription of a short stretch of speech uttered in the informal situation (click on image for audio):

The following screenshot illustrates the transcription of a short stretch of speech uttered in the formal situation (click on image for audio):