"The perceptual organization of speech: Contributions of general and speech-specific factors"
A summary of key results, papers, and posters for EPSRC Research Grant EP/F016484/1 (Brian Roberts & Peter J. Bailey) is presented here:
Purpose of the project
Spoken communication is a fundamental human activity.
However, it is uncommon in everyday life for us to hear the speech of a single
talker in the absence of other background sounds, and so our auditory system is
faced with the challenge of grouping together those sound elements that come
from one source and segregating them from those arising from other sources.
Without a solution to this “auditory scene analysis” (ASA) problem, our
perceptions of speech and other sounds would not correspond to the events producing
them. The fact that we can focus our attention on one person speaking in the
presence of other talkers indicates that our auditory system is generally
successful at grouping together the sound elements from a source in a complex
auditory scene, and segregating them from other sounds, but our understanding
of how this is achieved remains limited. Most ASA research has focused on
relatively simple sounds and has identified a number of general principles for
the grouping of sound elements. However, these principles often seem inadequate
to explain the perceptual grouping of speech, because speech has acoustic
properties that are diverse and rapidly changing. Also, speech is a highly
familiar stimulus, and so our auditory system has had the opportunity to learn
about speech-specific properties that may assist in the successful perceptual
grouping of speech. This project’s aim was to explore how much of our ability to
segregate a talker’s speech from a sound mixture depends on general-purpose
grouping principles, applicable to all sounds, and how much depends on speech-specific
principles.
Published papers (pre-prints)
Included here are downloadable pre-prints of published papers arising from this project:
(1) Roberts, B., Summers, R.J., and Bailey, P.J. (2010). “The perceptual
organization of sine-wave speech under competitive conditions,”
Journal of
the Acoustical Society of America, 128, 804-817.
Pre-print
(2) Summers, R.J., Bailey, P.J., and Roberts, B. (2010). “Effects of differences in
fundamental frequency on across-formant grouping in speech perception,”
Journal of the Acoustical Society of America, 128, 3667-3677. Pre-print
(3) Roberts, B., Summers, R.J., and Bailey, P.J. (2011). “The intelligibility of noise-vocoded speech: Spectral information available
from across-channel comparison of amplitude envelopes,” Proceedings of the
Royal Society of London Series B: Biological Sciences, 278,
1595-1600. Pre-print
Papers under submission and in preparation
Included here are pre-prints of submitted papers and titles of papers in preparation. In addition, it is anticipated that at least two papers will arise from Marcin Stachurski's research towards the PhD; see also Poster presentations (verbal transformation effect).
(4) Summers, R.J., Bailey, P.J., and Roberts B. (submitted). "Effects of the rate of formant-frequency variation on the grouping of formants in speech perception," under submission to the Journal of the Association for Research in Otolaryngology (JARO). Pre-print
(5) Roberts, B., Summers, R.J., and Bailey, P.J. (in preparation). "The role of formant-frequency contours in the
perceptual grouping of speech formants: Evidence against speech-specific constraints." Abstract
Poster presentations (formant-competitor paradigm)
Included here are posters containing material from the project which has not yet appeared in published journal articles. See also "papers under submission and in preparation."
(1) Poster by Roberts, Summers, and Bailey (presented in September 2010). The perceptual organization of noise-vocoded speech under competitive conditions
(2) Poster by Roberts, Summers, and Bailey (presented in May 2011).
The role of formant-frequency contours in the perceptual grouping of speech formants
Poster presentations (verbal transformation effect)
Included here are posters containing material from the part of the
project relating directly to Marcin Stachurski's research towards the PhD. None of this material has yet appeared in the form of published journal
articles. Marcin is currently writing up his thesis.
(1) Poster by Stachurski, Summers, and Roberts (presented in September 2009). Grouping and the Verbal Transformation Effect - The influence of fundamental frequency, ear of presentation, and interaural time-difference cues
(2) Poster by Stachurski, Summers, and Roberts (presented in May 2011).
Grouping and the Verbal Transformation Effect - The influence of formant transitions
Project outcomes summary
Our approach was to generate artificial speech-like stimuli with
precisely controlled properties, particularly the spectral prominences called
formants.
These are important because they arise as a result of resonances in the
air-filled cavities of the talker’s vocal tract. Variation in the frequency and
amplitude of a formant is an inevitable consequence of change in the size of
its associated cavity as the tongue, lips, and jaw move when the talker produces
speech. Hence, knowledge of formant frequencies and their change over time is
of great benefit to listeners trying to understand a spoken message, and so
choosing the right set of formants from a mixture is critical for
intelligibility. Simplified versions of target sentences were synthesised and
then mixed with carefully designed “competitors” offering alternative
grouping possibilities for the formants in the target sentence. The impact of
these competitors on listeners’ recognition of the target sentence in the
mixture was measured as the properties of the competitors were manipulated.
The key findings of the project are: (a) Modulation of the formant-frequency
contour, but not the amplitude contour, is critical for across-formant grouping;
(b) The ability of listeners to reject a competitor formant declines as either
the rate or depth of modulation of its frequency contour increases, relative to
that of the target sentence; (c) The impact of a competitor does not depend on
whether its pattern of variation in formant frequency is plausibly speech-like; (d) The ability of listeners to reject a competitor
increases as the pitch difference between target and competitor formants increases;
(e) Formant-frequency variation conveys information important for speech
intelligibility even in contexts often regarded as conveying information about
speech-sound identity mainly through other cues. In summary, the results of
this project have shown that our ability to segregate a talker’s speech from a
sound mixture depends heavily on general-purpose grouping principles and rather
less on speech-specific principles than has been suggested by some researchers.
The results also suggest approaches by which engineers and computer scientists
might improve the performance of devices such as hearing aids and automatic
speech recognizers when they are operating in noisy environments.
Last updated 23 August 2011