Unit 9: Prosody
The contribution of the voice to perceived speaker meaning has long been recognized as very important. In the nineteenth and early twentieth centuries many books were written about how to speak well (‘elocution’), and these included not only instruction in ‘correct’ pronunciation, but also guidance on how to speak effectively. Speaking was seen as an art form, and all those who spoke publicly, including politicians, ministers of religion and actors, were considered to need training in how to do it well. Such skills were, of course, the main focus of the much earlier study of rhetoric, which was concerned both with what was said and how it was said. Elocution handbooks on the other hand were concerned solely with the voice, and much of what they contain in relation to speaking effectively involves what we now refer to as prosody. This, as you have read in Unit A9, has a number of component elements – tempo, voice quality, loudness and pitch, and the most important of these was considered to be pitch or intonation, referred to then as ‘modulation’. An early handbook of the ‘elocutionary art’ (Brewer 1912) claimed that it was only through modulation that it was possible to establish ‘a sympathy between the speaker and his audience’ (1912: 83). The so-called ‘attitudinal function’ of intonation has continued to be seen as primary, but exactly how we convey an attitude in our voices has so far been difficult to determine.
In Unit B9.4 you can read an extract from the work of Carlos Gussenhoven (2004), who has suggested that some effects originate in animal behaviour. His theory of ‘Biological Codes’ builds on earlier work by John Ohala (e.g. REF), who coined the notion of the ‘Frequency Code’ relating specifically to Fundamental Frequency (pitch).1
Recent work at the interface between prosody and pragmatics (e.g. Barth-Weingarten et al. 2009) suggests that meaning can be conveyed both by paralinguistic features of the voice, for example, changes in pitch range over longer stretches of speech, and by linguistic choices, for example, choice of a rising or falling nucleus, which can trigger prosodic implicatures if there is a mismatch between expected and unexpected usage.
In this section we suggest two different ways to follow up what you have read about prosody. The first deals with the very elusive attitudinal intonation, so often linked with emotion, which assumes that these meanings are paralinguistic effects. The second is an examination of a recent innovation in English intonation – ‘Uptalk’ – which seems to reverse the expected phonological choice of nuclear tone associated with statements.
Some sound files are included which illustrate prosodic patterns mentioned in the book.
- 9.1. Emotion and attitude in the voice.
- 9.2. Speaker meaning and linguistic choices: the case of ‘Uptalk’.
- 9.3. Audio Excerpts from the book
Read more...
- Barth-Weingarten, D., N. Dehé and A. Wichmann (eds.) (2009) Where Prosody Meets Pragmatics. Bingley: Emerald
- Brewer, R.F. (1912) ‘Speech’ in R.D. Blackman (ed.) Voice Speech and Gesture: A practical handbook to the elocutionary art. Edinburgh: John Grant
- Gussenhoven, C. (2004) The phonology of tone and intonation. Cambridge: Cambridge University Press.
1 Pitch is the term for what we hear, while the fundamental frequency (F0) is the measurable acoustic correlate. It is measured in Hertz.
9.1. Emotion and attitude in the voice
Intonation is known to indicate the speaker’s attitudes and emotions, and many studies, both descriptive and experimental, have been carried out to find out exactly how this is signalled in the voice. It has, however, proved difficult to identify reliable characteristics, and listening tests show that some emotions are easily confused with others unless there is additional contextual information. One of the greatest problems has been the failure to distinguish adequately between emotion on the one hand – how the speaker feels – and attitude on the other hand – the speaker’s stance or behaviour towards an interlocutor. The simplest way to illustrate this is by comparing what is meant by you sound sad with he sounded so patronizing: a person can feel sad on their own, whether they speak or not, while it is not possible to be patronizing on your own. The first is an emotion and the second is an attitude.
The effects of emotion on the voice have been of great interest to those working in speech technology, especially automatic speech recognition. But to get a computer to recognize emotions in a voice it is first necessary for humans to analyse how they do it themselves. The use of emotional labels has been unreliable: for example, what one listener hears as anger another listener may hear as fear. Referred to in Unit B9.1, Cowie et al. (2000) have developed a way of tracking perceived emotion in speech that avoids the use of labels and requires listeners to place what they hear in an emotional ‘space’ that has two intersecting dimensions – the active–passive dimension and the positive–negative dimension. According to this, for example, anger would be in the active–negative quarter, while sadness would be in the negative–passive quarter.
Such studies, however, assume that the cues to emotion or attitude are in the utterance itself, in other words, that there is something about how a word, phrase or utterance is said that contains the information that hearers use. However, other research suggests that this is not necessarily the case. It seems that sequential relationships are also important in conveying speaker meaning. You can read about this in relation to conversation analysis in Unit A6.
In one interesting study, Cauldwell (2000) edited some recordings of conversation that he had collected. He took two short utterances and played them to listeners, first in isolation and then in context. The impressions reported by the listeners were very different in each case: in isolation most of them thought the utterances sounded angry, but in context only 10 percent of the listeners heard any anger. This suggests that emotions and attitudes that we ‘hear’ in people’s voices are not necessarily in the utterance itself but in the conjunction between the sound of the voice and the sequential context, or even in the words themselves.
TASK
You can carry out your own study if you have access to a simple sound editor.
- Record someone reading the same sentence in a happy, angry, sad, enthusiastic way (choose your own adjectives), and then play them to listeners.
-
You can
- ask listeners to match a given list of attitudes/emotions with the recordings (perhaps with a few extra ones as distractors), or
- ask listeners for their own descriptions: e.g. How does this person sound? Or
- ask listeners to place what they hear in a circle like that used by Cowie et al. (described above).
Remember to give some thought to your methodology here.
- How easy would it be to record people who were really angry, or sad, or happy?
- Might there be ethical problems involved?
- Are you sure that the attitude or emotion is not obvious from the words themselves? I am absolutely furious is unlikely to sound happy; We had a wonderful time is unlikely to sound angry. On the other hand, How did it begin? could be said in a number of different contexts with different underlying emotions, and might be a good sentence to try.
When you have designed and carried out your study, either with sentences devised and read for the purpose or using naturally occurring utterances, note any difficulties listeners may have in identifying the intended, or inferred attitude. Why might this be?
Read more...
- Cauldwell, R.T. (2000) ‘Where did the anger go? The role of context in interpreting emotion in speech’ in Proceedings of the ISCA workshop on Speech and Emotion, Belfast, pp127–31
- Cowie, R., E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey and M. Schröder. (2000) ‘“Feeltrace”: an instrument for recording perceived emotion in real time’, in R. Cowie, E. Douglas-Cowie and M. Schröder (eds) Speech and emotion: Proceedings of the ISCA workshop, Belfast, NI: Textflow, pp19–24
- Wichmann, A. (2000) Intonation in Text and Discourse. London: Longman (especially Chapter 6)
- Wichmann, A. and Cauldwell, R.T. (2003) ‘Wh-Questions and attitude: The effect of context’ in A. Wilson, P. Rayson and T. McEnery (eds.) Corpus Linguistics by the Lune. Frankfurt: Peter Lang, pp291–305