Pitch is that quality of sound which allows us to play musical melodies. Naive introductory neuroscience texts tend to equate pitch with frequency. However, most real world sounds contain many frequency components, but have either only one clear pitch, or no pitch at all. Chapter 3 of "Auditory Neuroscience" describes the characteristics of sound that determine pitch, outlines how musical pitch is treated in western classical music, and describes how the brain is thought to extract physical cues to the periodicity of a sound to create the subjective percept of pitch. The following collection of web pages provide supplementary materials to accompany that chapter.
Many different sounds have the same pitch - you can play the same melody with a flute or with a clarinet or with a horn. Here, you can hear the same melody played on three computer-generated instruments.
The melodies in these examples share the same sequence of pitches, and have about the same sound level. The property of sounds that is different between them is called 'timbre' - the timbre of the flute is different from that of the clarinet and from that of the horn.
Here are the spectra of the three versions:
Note the large differences between the spectra, as well as the similarities. The sequence of pitches is to a large extent apparent in the line of the fundamental (the lowest red band). However, the number and relative strengths of the higher harmonics vary substantially between the three instruments, resulting in their unique timbre.
The main determinant of pitch is sound periodicity. A sound is periodic when it is composed of consecutive repetitions of a single short segment (the 'period'). The following figure (fig. 3.1 "Auditory Neuroscience") shows examples of periodic sounds:
Only a small number of repetitions of a period are required to generate the perception of pitch. The following sounds are all composed of the same period, which repeats itself a different number of times in each example. Each such sound is repeated a few times.
A single repeat results in no pitch sensation at all - there is nothing periodic:
With eight repeats, a clear pitch is heard:
Find out yourself what is the minimal number of repeats that is necessary in order to hear pitch!
Pitch is defined by its perceptual qualities, and therefore has to be determined by the judgment of human listeners. By convention, we use the pitch evoked by pure tones as a yardstick with respect to which we judge the pitch evoked by other sounds. In practical terms, this is performed by matching experiments: A periodic sound whose pitch we want to measure is presented lternately with a pure tone. Listeners are asked to change the frequency of the pure tone until it evokes the same pitch as the periodic sound. The frequency of he matching pure tone then serves as a quantitative measure of the pitch of the tested periodic sound. In such experiments, subjects most often set the pure tone so that its period is equal to the period of the test sound. In the demonstrations below, the sound to be tested is played four times alternately with a pure tone. The test sound is the same at all repetitions, but the period of the pure tone changes from repetition to repetition: it is the same as the period ot the test sound in the first repetition, shorter(higher pitch) in the second, longer (lower pitch) in the third, and is again the same as that of the test sound in the last repetition.
Here is the demonstration, with the test sound being a cosine-phase harmonic complex at 400 Hz:
Here is the demonstration, with the test sound being an Iterated Repeated Noise (IRN) with 4 iterations, again at 400 Hz:
Now you are ready to try pitch matching by yourself. In the following gadget, you can select the type of pitch-evoking sound and try to match it by moving the slider:
Regular click trains at a rate of less than about 40 Hz sound like individual regular events, perhaps a bit like machine-gun fire. Click trains with rates faster than about 40 Hz merge into a continuous "buzz", where the pitch of the buzz depends on the click rate: the faster the rate, the higher the pitch.
The two sound examples below illustrates this. The first example consists of a click train, where the rate of the clicks doubles every 3 seconds. During the first 3 seconds, the rate of the clicks is about 10.7 Hz, then it increases to 21.4 Hz, 43 Hz, 86 Hz, 172 Hz, 344 Hz, 689 Hz, 1378 Hz, 2756 Hz, and 5512 Hz. Clear pitch emerges between 43 Hz and 86 Hz, although at 86 Hz there is still 'flutter', and full smoothness of the resulting percept occurs only at rates of a few hundreds Hz.
The second example consists of clicks which are presented initially at a rate of about 10 Hz. The rate then continuously speeds up, to a rate of 10,000 Hz, and then slows down again. At the beginning and the end you will hear a rapid train of individual clicks, but in the middle you hear a buzz of rapidly rising, then falling, pitch. The sequence is illustrated in the figure below.
One of the most important classes of sounds that have pitch in the natural environment are voiced speech sounds. However, like many other naturally-produced sounds, these sounds are not strictly periodic. In spite of this, they produce a strong sense of pitch. Sounds that are not strictly periodic but that do evoke pitch are in fact the rule, rather than the exception.
Here is a naturally-produced human vowel:
Elliot and Theunissen addressed this question by calculating the "modulation spectra" of speech as shown here:
The spectrogram of the vowel is presented on the left below, while on the right three consecutive periods are shown superimposed. Each of the periods has a length of about 6.75 ms - the pitch here is about 150 Hz. The periods have been extracted from around time 50 ms in the spectrogram. Note the small, but continuously present, changes in the sound, which can be observed both by following it period by period (right) and at the longer time scale, of 100s of ms, which can be followed in the spectrogram.
These micromodulations are crucial for the natural quality of the sound. When the micromodulations are removed, making the sound strictly periodic, it becomes buzzy and artificial, losing much of its speech quality:
Additional information about modulations in speech can be found in 'speech as a "modulated signal"'.
As explained in the section on modes of vibration. most natural sound sources will not emit pure tones, but sounds composed of many, often harmonically related frequencies. Now, some texts on hearing will tell you that the pitch of a sound is "related to the sound's frequency", but if a sound contains many (possibly harmonically related) frequencies then it may not be at all obvious which of the sound's frequencies determines the pitch.
Take the following example. Here we have a simple melody played in pure tones.
and here we have the same melody played using "complex" tones containing sine waves with the same fundamental frequency as well as 9 additional "higher harmonics" (multiples of the fundamental).
Spectrograms of these sounds are shown in the picture below.
Now it should be obvious that the melodies (and hence the pitches) are the same, even if the overall frequency content, and therefore the "timbre", of these sounds is different. These two sounds do however share the same fundamental frequency: the second souns is literally the sum of the first one plus the contributions of the higher harmonics. So it may be tempting to think that it is the presence of the fundamental frequency that determines the pitch.
The curious thing is that you can take this fundamental frequency out, leaving only the higher harmonics, and the pitch still remains the same. This is what we have done in the following sound (in fact, for good measure, we took out not just the fundamental but the next two harmonics too, so that only harmonics 4 to 10 are left; when listening, turn down the volume to reduce the harmonic distortions produced by the speakers, which would reintroduce the lower harmonics!).
The first melody and the third melody have no frequency components in common. You can verify that on the spectograms shown here below to the left. Nevertheless they are 'the same' in that they have the same sequence of pitches.
"Missing fundamental" stimuli, like the third melody, argue strongly against a simple frequency place code in the cochlea as the main cue for perceived pitch. A more reliable cue is the sound's periodicity. The panels on the right in the figure above zoom in on a 10 ms long sound snippet during the first note for each of the melodies here. This note has a fundamental of 300 Hz, so there are three whole cycles in 10 ms. This 300 Hz periodicity (identical wave form patterns repeating themselves at a rate of 300 such patterns per second) is something that the first note in all three of the examples here clearly have in common, even though the corresponding sound in the third melody does not contain a 300 Hz (Fourier) frequency component. It has the right periodicity because all its frequency components have a 300 Hz periodicity, but they do not share any longer period.
The fact that tone complexes with missing fundamentals can be perceived to have a pitch that is below their lowest frequency component can have counterintuitive consequences.
Consider the tone sequence shown in the spectrogram here:
A pure tone 700 Hz pure tone (labelled "A") alternates with a tone complex containing frequencies 800, 1200, 1600 and 2000 Hz (labelled "B"). All frequencies in B are above the frequencies in A, but nevertheless when you listen to these stimuli you will find that B sounds lower because the pitch of B is perceived to be that of a missing 400 Hz fundamental.
Or here is another example:
The sounds in this spectrogram are harmonic complexes with a fundamental frequency that rises from 110 to 220 Hz in semitone steps. So the pitch should be rising from the musical notes A2 to A3. But the harmonic complexes have been bandpass filtered to be 3.5 octaves wide with the lower edge of their passband falling from 880 Hz to 440 Hz. So the frequency components become increasingly lower, but the pitch should get higher, at least in as far as harmonic structure is the dominant pitch cue. Do the pitches sound falling or rising to you?
Here are examples of three sounds that evoke pitch without being strictly periodic. A detailed discussion of these sounds can be found in the pitch chapter of the book.
Each of these sounds is approximately periodic, and their spectra have an approximately periodic structure, reminiscent of the strictly harmonic structure of periodic sounds. In each case, there is a clear period. If the sound is shifted by that period, it best resembles its unshifted version.
Harmonic complex with noise
Iterated repeated noise (IRN)
Periodic sounds (sounds with waveforms that have a repeated "motiv", as in the blue trace shown above) will have Fourier spectra which always must consist solely of "harmonics" of the sound. Harmonics are sine waves with periods that are integer multiples of some fundamental period. The red lines above are "cosine phase" harmonics of the blue line. When thinking about Fourier spectra, we want to imagine the blue line being made up of a sum of lots of sine waves like the red and green lines, where we might adjust the phase and amplitude of the sine waves as required. The important thing to note here is that, no matter how we would adjust the phase and amplitude of the green line, it could never be part of the mixture needed to make up the blue line. The reason is this: compare the values that the waveforms have at identical points in the period, for example by comparing the points marked by the stippled gray lines. The red lines will always contribute the same values at each cycle (here for example they are always maximal at the periods marked out by the gray lines). In contrast, the green line does not "fit" an integer number of cycles into the fundamental period, and the contribution it would make to each cycle of the wave would therefore be different, which would destroy the periodicity of the wave. The green line can therefore not be a Fourier component of the periodic blue sound wave. Nor can any other sine wave that has a period which is not a harmonic of the fundamental period of the sound.
Harmonic complexes composed of 3 consecutive harmonics are among the simplest periodic sounds. Their periodicity is determined by the spacing between the harmonics. Here is such a complex, composed of harmonics 1 (the fundamental), 2 and 3 of 100 Hz. The top panel shows the spectrum of this sound, and the bottom panel shows a 30 ms long segment of the waveform, consisting of three periods (100 Hz corresponds to a period of 10 ms). The pitch of this sound is very obvious:
These complexes, when built of harmonics of very high harmonic number, still have the same periodicity. However, their pitch is not at their periodicity anymore. Here is a complex composed of harmonics 21, 22 and 23 of 100 Hz (note: you should lower the volume of the computer loudspeakers to avoid generating harmonic distortions that would regenerate the fundamental!):
How high can the harmonic numbers of the components be for a periodicity pitch to appear? Note that this demonstration stretches the capabilities of poor-quality computer speakers. These have hard time reproducing sounds with frequencies below a few hundreds Hz, but generate serious harmonic distortions at frequencies of a few thousands Hz. Thus, to hear the examples with low harmonic numbers you will need to use a high sound volume, while to avoid regenerating the fundamental by harmonic distortions in the examples with high harmonic numbers you will need to use low sound volume.
Harmonics 1 to 3:
Harmonics 2 to 4:
Harmonics 3 to 5:
Harmonics 4 to 6:
Harmonics 5 to 7:
Harmonics 7 to 9:
Harmonics 9 to 11:
Harmonics 11 to 13:
Harmonics 13 to 15:
Harmonics 17 to 19:
Harmonics 21 to 23:
Pitch is determined in most cases by the periodicity of the sound waveform. However, some sounds have other, more subtle periodicities. In some cases, these periodicities may determine the pitch, but in other cases they don't. Here such subtle periodicity is illustrated - the periodicity of the envelope.
Look at the figure below. This sound (blue) was generated as a sum of many harmonics in 'cosine-phase' - this is a complicated way to say that all harmonics peak together at the beginning of the pitch period. The resulting waveform is very peaky, but has fast fluctuations which are smallest at the midpoint between two peaks and which increase in size around the peaks. One could imaging an 'envelope' - a positive waveform that would measure the overall energy of the sound waveform at each moment in time. The envelope (here computed using the 'Hilbert transform' - this is a side issue here) is plotted in gray.
Now, observe the figure below. This sound was generated with the same harmonics, except that only every other harmonic is in cosine-phase. The other half are in 'sine-phase' - this means that instead of peaking at the beginning of the pitch period, they have an upward zero crossing there (remember, harmonics are sine waves!). In contrast with the previous sound, this one has two 'events', with similar shape but opposite polarity, during the pitch period. Its periodicity is nevertheless exactly that of the previous sound, and the evoked pitch is the same. However, the envelope, which measures the overall energy at each moment in time, has now two peaks within each pitch period, and in fact, when using the same method for computing it, has half the period of the waveform. Thus, it has a pitch which is twice as high (200 Hz instead of 100 Hz in this case).
This is important when we study the responses of neurons to periodic sounds. Neurons may well respond to the envelope, rather than to the sound itself. Such neurons cannot encode pitch, even when they are sensitive to periodicity, because they do not give the right answer for alternate-phase harmonic complexes.
This sound illustrates one of the sounds used in the study of Cariani and Delgutte (1996) on the coding of pitch in auditory nerve fibers. It is a so-called single-formant vowel, since its spectral envelope has a single peak in frequency (vowels have multiple such 'formants' - see Chapter 4). See Fig. 3-9 in the book.
Here are two consecutive periods, one pair taken from the beginning (green) and one pair from the middle (orange) of the sound:
The blue arrows span one period of the sound (you can see how similar are the two consecutive periods). In the middle of the sound, the periods are about half as long as in the beginning of the sound.
The gray bars indicate one cycle of the 'fine structure' of each period, which is determined by the formant frequency. In contrast with the pitch periods, they are essentially equal to each other.
Thus, the sound has a pitch that changes over about one octave (it is twice as high in the middle as in the beginning and end of the sound), but its formant frequency remains fixed. In consequence, its timbre remains the same throughout its duration.
Chapter 3 of Auditory Neuroscience discusses the pitch intervals used western music in great detail. For convenience, a table of fundamental frequencies for equal-tempered scale is copied below from http://www.phy.mtu.edu/~suits/notefreqs.html
By convention A4 = 440 Hz
Notes are separated by "semitone" intervals. There are 12 seimtones in each octave, and fundamental frequencies are logarithmically spaced, so the each note fundamental frequency is 2(1/12) = 1.0595 times the previous frequency.
The wavelength values assume a speed of sound = 345 m/s
("Middle C" is C4 )
|Note||Frequency (Hz)||Wavelength (cm)|
This video clip shows a presentation on the Cortical Representation of Complex Sounds given by Jan Schnupp at a symposium of the British Neuroscience Association meeting in Harrogate on April 18th 2011.[swf file="http://howyourbrainworks.net/jan/JanBNApitchTalk.flv"]