Harvey, Nigel. 1985. "Vocal Control in Singing," In Musical Structure and Cognition, editors Peter Howell, Ian Cross, and R. West, 287-332. London: Academic Press.
Outlines singing as a motor skill involving the coordination of three subsystems: exhalation, phonation, and articulation. Description is driven by a computational model, but incorporates special-purpose "peripherals" (e.g., "stretch sensitive mechanoreceptors may be expected to provide information about the various other factors that could be included in the exhalation schema as additional initial condition variables" (298)). Analyzes the three subsystems each in terms of three component schemas: a "parameter specification schema" that relates muscle contractions to movement outcome; an "outcome specification schema" that relates the movement to the desired auditory experience of the listener, and a "sensation specification schema" that relates the auditory experience of the listener with the auditory and proprioceptive experience of the singer. QUESTION: ARE THERE NOT THREE ANALOGOUS SCHEMA FOR INSTRUMENT PLAYING? (there certainly are, at least to the extent that the effects of room acoustics are the same for both, see pp. 301-302). The output of one schema serves as input to the next in cumulative fashion (see diagrams on pp. 303, 312, 319). Through rehearsal of particular songs and in particular acoustic environments, the singer precomputes the necessary initial conditions for each vocal phrase based on its requirements of pitch, loudness, length, and so on. Argues for the existence of schema based on the observation that different motor programs are required to produce the same result according to changes in initial conditions, e.g. combinations of inspiratory and expiratory muscle activity based on the amount of breath taken and the subglottal pressure required (295-296). Includes a good bit of detail and relevant citations on the physics and physiology of singing, e.g. reasons why vowels sound different to the singer (bone conduction, the effect of the "singer's yawn" on the middle ear (319)) than they do to the listener. Believes strongly in Schmidt's "variability of practice" hypothesis, that the singer needs to exercise a wide range of initial conditions (both internal and environmental) and outcomes in order to build reliable schemas, making the analogy of linear regression with widely scattered as opposed to tightly clustered data points (290-291). Points out that the larynx alone has many degrees of freedom in movement, and that acquiring a sophisticated skill requires initial control then gradual unfreezing of those degrees until all of the mechanisms can be used in a flexible yet controlled manner (323). Questions whether the desired outcomes are stored as relative or absolute values (324). Proposes a final diagram (326) of "information flow in the singer:" listener's desired auditory perception->auditory input required by performer->proprioceptive input required by performer->movement parameters to be inserted into motor program->song (which then flows back to listener). Discusses convincingly why all of these steps are necessary.
Commentary: The comparison to a straw-man behaviorist explanation of motor skills in singing is a throwaway (291); it neither adds to the argument nor answers the challenge of ecological models. Author occasionally uses ecological arguments himself to explain certain details, such as the Lombard effect in which the speaker increases output intensity to maintain constant loudness when background noise is introduced (300). However, he seems predisposed to favor adaptational theories that lead to a general computational ability rather than a specialized one (see discussion of virtual pitch, p. 311). Although he makes occasional reference to other kinds of vocal production (Tibetan chant (Large & Murray 1981), barbershop singing (Hagerman & Sundberg 1980)) and the need for separate schemas for different styles of singing (309), his central point of departure is Western classical vocal technique (e.g. the avoidance of registral breaks (307)). The way that his model precomputes all initial conditions through practice (302) does not seem to leave enough flexibility for the singer to deal with improvisational contexts. The sensation specification schema assumes that there is a monotonically increasing relationship between the singer's perception of loudness and the listener's (300-301), but does not take into account the disproportionate effect that the singer's formant has on loudness perception in the listener (although this formant is mentioned as a by-product of positioning of the larynx (308), a perception not shared by the singer due to the filtering qualities of the bone conduction of sound that favors the fundamental over the higher frequencies (317).
Mark DeWitt
Harvey argues that a singer's ability to control processes of exhalation, phonation and articulation, as well as the ability to coordinate all these processes, stems from the acquisition of abstract internal models of behavior. These are termed schemata and are akin to a mental regression equation that relates some internal state (initial conditions) to a desired output. For each of the aforementioned processes, Harvey suggests that three types of schemata govern vocal control. The parameter specification schema allows basic control over the process at hand (e.g., in exhalation, this schema allows the singer to maintain control over subglottal pressure). The outcome specification schema relates this controlled internal state to the kind of vocal output that the singer will perceive. Thirdly, the sensation specification schema relates the output that the singer perceives to the kind of output that the listener wants to hear. This third schema type relies entirely on the singer's ability to incorporate verbal reports from an "expert listener" into active singing. Similarly, none of the schemata are meant to describe the active process of singing that entails real-time adaptation. In this way, Harvey's approach adopts the perspective of a generalized motor program, where the parameters of movement are specified prior to that movement. I see two main flaws in this approach. First, Harvey opposes this approach to singing to an "associationist" approach,arguing that the latter woule require that a singer store each occasion at which vocal control is exercised, and that the singer would be unable to generalize between these instances. However, it seems that the core of Harvey's schema approach IS associational - as he argues that the process of schema acquisition depends on the acquisition of "data points" that are generated from practice (although not during practice, presumably). Additionally, the fact that this approach only accounts for modifications that are made following a performance seems a strong shortcoming, especially given the complexity of the approach as it is laid out in the article. The ecologically-based approach adovated by Whiting, Vogt & Verijken (1992) handles this second problem. Given evidence such as the Lombard effect (which Harvey refers to many times), it is odd to think that a successful model of sining could not account for real-time adaptation.
Peter Q. Pfordresher.
cognition/ethnomusicology/physiology.