Onsets and Vowel Identity

 

The importance of onsets in auditory grouping is illustrated here with the stimuli originally introduced by Darwin and Sutherland (1984) and later studied by Roberts and Holmes (2006, 2007). The sound examples used here are courtesy of Brian Roberts.

The following figure illustrates the basic setup. It shows the spectra of 3 sounds, each with the same pitch, but with different relative levels of the partials. As a  result, the peak location of the spectral envelope varies, as indicated to the right of the figure. The peak location is at the first formant frequency (see Chapter 4 of the book). The sounds have spectral content at higher frequencies as well, but it is irrelevant for the rest of the illustration.

Click on the radio button marked 'original' and then on 'start' in order to hear these sounds. By moving the diagonal slider, you can change the location of F1. The perceptual quality of these sounds is that of a (rather artificial) vowel: when the first formant is low, the vowel is judged to be /I/-like, while when the first formant is high, it is more /e/-like. In the middle, the vowel is ambiguous. In order to experience this demonstration best, position the slider so that the vowel identity is ambiguous.

 

Next, try the modified sound examples. The first modification is to increase the level of the harmonic at 500 Hz. The result is an upward shift of the formant frequency, and hence a more /e/-like quality of sounds that previously were judged to be /I/. To hear these stimuli, press the '500 Hz louder' radio button. Try to switch back and forth between the original and these modified sounds.

The second modification is to start the 500 Hz harmonic before the rest of the sound. Because its onset is earlier, the 500 Hz harmonic is less fused with the rest of the sound, and its effect on the sound is reduced, moving the perceived quality of the sounds back towards /I/. To hear these stimuli, press the '500 Hz louder, early onset' radio button.

Finally, a 1000 Hz tone is added simultaneously with the 500 Hz tone and is stopped at the onset of the rest of the vowel. This 'captor tone' is expected to fuse with the 500 Hz tone. Since it stops at vowel onset, it is expected to reverse the effect of the early onset of the 500 Hz tone, and to move the perceived quality of the sounds towards /e/ again. To hear these stimuli, press the '500 Hz, early onset, captured' radio button.

The remarkable aspect of this experiment is that the vowel appearing in the three modified stimuli is the same, but its perceptual quality is modified by the context in which it appears. As discussed in Chapter 6 of the book, the reason for the context effect is unclear - it may be due to the properties of neurons in early stages of the auditory system, or alternatively it may represent a high-level interpretation of the sounds by a 'scene analyzer'.