This figure (fig 1.16 of "Auditory Neuroscience") shows spectrograms of the words "hot", "hat", "hit" & "head" spoken once with a high-pitched voice (top), and then again with a lower pitched voice (bottom). You can see that the vowels of the speech sounds are made up of regularly spaced harmonics (the red stripes) which originate from the glottal pulse train and determine the pitch of the spoken word. You can also see that the harmonics are not all of equal intensity. For example, the vowel /a/ has more energy at ca 1.8 kHz than either the /o/ or the /i/. Regions of frequency space where speech sounds carry a lot of energy are known as "formants", and these formants arise from resonances in the vocal tract. Speakers change the resonance frequencies by moving their "articulators" (lips, jaws, tongue, soft palate), and thereby changing the dimensions of the resonance cavities in the vocal tract.
High Pitched Voice
Low Pitched Voice
Source: full color version of Figure 1-12 of "Auditory Neuroscience"