Scarlet & Grey
Ohio State University
School of Music


Sixty Methodological Potholes

The following table identifies a number of fallacies, problems, biases, and effects that scholars have, over the centuries, recognized as confounding the conduct of good research. Note that some of these "methodological potholes" remain contentious among some scholars.

ad hominem argument

Criticizing the person rather than criticizing the argument.

Advice: Focus on the quality of the argument.

discovery fallacy

Criticizing an idea because of its origin (for example, an idea given in a religious text).

Advice: Criticize the justifications offered in support of an idea rather than how the idea originated.

In 1858, Kekulé, was having difficulty deciphering the chemical structure of benzene. One night he went to bed and dreamed of a snake biting its tail. The next morning, he awoke with the insight that perhaps benzene is chemically ring-shaped. Following up on the idea, his subsequent work did indeed establish that the structure of benezene could be explained only by joining the ends of the carbon chain to form a closed ring.

Some people might be uncomfortable with the idea that Kekulé's dream proved seminal in understanding benzene. For most people, dreams are not the preeminent technique for doing research.

However, scholars make a distinction between the context of discovery and the context of legitimation. Kekulé's dream did not establish the structure of benzene. It was Kekulé's subsequent experimental work that was consistent with a ring-structure.

Chemists would be wrong to criticize the idea because it originated in a dream. It simply doesn't matter what the source of an idea is. It was Kekulé's subsequent arguments (context of legitimation) -- the evidence he assembled in favor of a ring-structure that convinced other chemists of the idea.

ipse dixit

Appealing to an authority figure in support of an argument.

Advice: Cite published research rather than identifying authority figures. Provide references so others can judge the quality of the supporting research for themselves.

ad baculum argument

An appeal to physical or psychological threat. A surprisingly common form of argument, even in modern times. In his debate with the Scholastics concerning the sun-centered theory of the solar system, Galileo "was shown the instruments of torture" as a way of bringing the debate to a conclusion. The threat is often made as an observation that someone else will cause harm, as in "Your boss will fire you if your views become known." Common in many religious arguments, as when "one should not do (or think) X because you will be punished by God." May also appear as a threat directed at others, as when "The government will deport your friends if you don't stop publishing criticisms." Also pervasive in business, as echoed in the phrase "He twisted my arm." The ad baculum argument has force only insofar as we fear pain or empathize with others -- more than we are motivated by a just argument.

Advice: Do not threaten. Draw attention when others use threats in debates.

egocentric bias

The tendency to assume that other people experience things the same way we do.

Advice: Don't rely exclusively on introspection. Listen carefully to what others report. Carry out a survey or run an experiment in order to observe the behaviors of others. Be wary when generalizing from your own experiences.

cultural bias

The inappropriate application of a concept to people from another culture.

Advice:Talk with culturally knowledgeable people. Carry out cross-cultural experiments. Listen carefully in post-experiment debriefings.

cultural ignorance

The failure to make a distinction that people in another culture readily make.

Advice: Talk with culturally knowledgeable people. Listen carefully in post-experiment debriefings.

over-generalization

The tendency to assume that an experimental result generalizes to a wide variety of real-world situations.

Advice: Be careful. Look for converging evidence. Analyze additional works. Run further experiments.

inertia fallacy

The idea that research consistent with a particular conclusion will "grow" in the future. A subtle fallacy that is evident in such statements as "Research is increasingly showing that ...". Future research is just as likely to over-turn a current theory as to confirm it. As in the stock market, past "trends" are not necessarily indicative of future results. Research results do not have inertia.

Advice:Talk about research results in the past tense ("Research has shown ..." rather than "Research is showing ..."). Avoid "growth" or "band-wagon" metaphors when describing the evidence pertaining to some theory.

relativist fallacy

The belief that no idea, hypothesis, theory or belief is better than another.

Advice: Avoid "absolute" relativism; the world appears to be "relatively relative." Don't mistake relativism for pluralism.

universalist phobia

A prejudice against the possibility of cross-cultural universals.

Advice: Familiarize yourself with music from a variety of cultures. Investigate notions of similarity and difference. Use cross-cultural surveys or experiments where appropriate.

problem of induction

The problem (identified by Hume) that no number of particular observations can establish the truth of some general conclusion.

Advice: Avoid claiming you know the truth. Present your research results as "consistent" or "inconsistent" with a particular theory, hypothesis or interpretation.

How do we learn from observation? The classic response is that we learn through a process dubbed induction. Induction entails making a set of specific observations, and then forming a general principal from these observations. For example, having stubbed my toe on many occasions over the course of my life, I have formed a general conviction that rapid acceleration of my toe into massive objects is likely to evoke pain. We might say that I have learned from experience (although my continued toe-stubbings make me question how well I've learned this lesson).

The 18th-century Scottish philosopher, David Hume, recognized that there are serious difficulties with the concept of induction. Hume noted that no amount of observation could ever resolve the truth of some general statement. For example, no matter how many white swans one observes, an observer would never be justified in concluding that all swans are white. In postmodernist language, one would say that we cannot legitimately raise local observations to the status of global truths.

Several serious attempts have been made by philosophers to resolve the problem of induction. Three of these attempts have been influential in scientific circles: falsificationism, conventionalism and instrumentalism. However these attempts suffer from serious problems of their own. In all three philosophies, the validity of empirical knowledge is preserved by forfeiting any strong claim to absolute truth.

Observation can never be used to "prove" anything. But that doesn't mean that observation is useless; we still manage to learn from observation, even if the process seems mysterious.

In observation-based research, we never claim to prove something. Instead, we can say that the observations are consistent with a particular theory, hypothesis or interpretation.

positivist fallacy

The problem arising when a phenomenon is deemed not to exist because no evidence is available: "Absence of evidence is interpreted as evidence of absence."

Advice: Recognize that not all phenomena leave obvious evidence of their existence.

Some areas of research have little or no available data or evidence. Data-poor fields raise some special methodological concerns, one of which is the positivist fallacy. If a phenomenon leaves no trail of evidence, then there is nothing to study. We may even be tempted to conclude that nothing has happened. In other words, the positivist fallacy is the misconception that absence of evidence may be interpreted as evidence of absence.

Positivism had a marked impact on mid-twentieth century psychology. In particular, the influence of logical positivism was notable in the behaviorists such as B.F. Skinner. The classic example of the positivist fallacy was the penchant of behaviorists to dismiss unobservable mental states as non-existent. For example, because "consciousness" could not be directly observed, for the positivist it must be regarded as an occult or fictional quality with no truth status (Ayer, 1936).

Psychology escaped the excesses of behaviorism with the advent of the cognitive revolution which re-introduced mental states as legitimate topics of investigation. It is perhaps ironic that computer technology played an important role in facilitating the acceptance of mental states. Although not easily observable, computer memories could clearly exist in a number of different states, and these states could significantly affect the ensuing computational behavior.

If it is true that the positivist fallacy tends to arise from data-poor conditions, then it should be possible to observe this same misconception in humanities scholarship -- whenever data is limited. Consider, by way of example, the following argument from the distinguished historical musicologist, Albert Seay. At the beginning of his otherwise fine book on medieval music, Seay provides the following rationale for focusing predominantly on sacred music in preference to secular music:

"Although much music did exist for secular purposes and many musicians satisfied the needs of secular audiences, the Church and its musical opportunities remained the central preoccupation. No better evidence of this emphasis on the religious can be seen than in the relative scarcity of both information and primary source materials for secular music as compared to those for the sacred." (Seay, 1975, p.2)

In other words, Seay is arguing that, with regard to secular medieval music-making, absence of evidence is evidence of absence. Since secular activities generated little documentation, we have almost no idea of the extent and day-to-day pertinence of medieval secular music-making. For illiterate peasants, "do-it-yourself" folk music may have shaped daily musical experience far more than has been supposed. Of course Seay may be entirely right about the relative unimportance of secular music-making, but in basing his argument on the absence of data, he is in the company of the most rabid logical positivist. The positivist fallacy is commonly regarded as a symptom of scientific excess. However, it knows no disciplinary boundaries; it tends to appear whenever pertinent data is scarce.

confirmation bias

The tendency to see events as conforming to a hypothesis while viewing falsifying events as "exceptions".

Advice: Be systematic in your observations.

hindsight bias

The ease with which people confidently interpret or explain any set of existing data.

Advice: Whenever possible, attempt to predict data in advance. Aim to test ideas rather than to look for confirmation.

unfalsifiable hypothesis

The formulation of a theory or hypothesis which cannot be, in principle, falsified.

Advice: Whenever possible, formulate theories, hypotheses or interpretations so they are, in principle, falsifiable. Identify the sorts of observations that would be inconsistent with your views.

The most well-known attempt to resolve the problem of induction was formulated by Karl Popper in 1934. Popper accepted the view that no amount of observation could ever verify that a particular proposition is true. That is, an observer cannot prove that all swans are white. However, Popper argued that one could be certain of falsity. For example, observing a single black swan would allow one to conclude that the claim -- all swans are white -- is false.

Accordingly, Popper endeavored to explain the growth of knowledge as arising by trimming the tree of possible hypotheses using the pruning shears of falsification. Truth is what remains after the falsehoods have been trimmed away.

For Popper, what makes a theory a "scientific theory" is not that the theory is true (since we cannot know this). Rather, what makes a theory "scientific" is that the theory is, in principal, falsifiable. The mark of a good theory, for Popper, is that the theory is stated in a way that admits the possibility of being disproved or falsified.

Accordingly, a theory that claims "all blungs are blue" is a bad theory, not simply because we don't know what a "blung" is, but because as stated, the claim would be impossible to falsify.

A number of music scholars, such as Eugene Narmour and William Poland, have argued that music scholars need, whenever possible, to state their theories in a way that they can, in principle, be falsified.

post-hoc hypothesis

Following data collection, the formulation and testing of additional hypotheses not envisaged before the data was collected.

Advice: Limit. Beware of hindsight bias and multiple tests. Collect new data; analyze new works.

smorgasbord thinking

Sometimes we don't realize that we unconsciously hold a collection of hypotheses for all occasions. Suppose two very different people marry: we explain it by saying "Opposites attract." But if two very similar people marry, we explain it by saying "Birds of a feather flock together."

If a group of people dealing with a problem work inefficiently, we explain it by saying "Too many cooks spoil the broth." But if a group of people excell at a task, we explain it by saying "Two heads are better than one."

If a person rushes into a poor decision, we conclude that one should "look before you leap." But if a fast decision produces a good result, we conclude that "He who hesitates is lost." Similarly, we say "Time waits for no man," but we also say "Haste makes waste."

We say "Cross that bridge when you come to it," but we also say "Don't put off 'til tomorrow what you can do today."

Most of us move easily from one contradictory hypothesis to another with little awareness of what we are doing. If explanations are so easy to come by -- no matter what trend exists in the data -- then no explanation can be trusted.

Consider, for example, two common ideas related to human behavior. Many people believe that there is something to the ancient Greek idea of "catharsis." That is, by watching (for example) a drama portraying a murder, viewers can, in some sense "purge" any murderous instinct they may have. But many of the same people also believe "Monkey see, monkey do." So what do you make of television violence? Or pornography? If someone thinks pornography is okay, they will tend to argue that it is cathartic. But if someone thinks pornography is bad, they will argue "monkey see, monkey do."

Advice: Don't deceive yourself that you have only one prediction. Write your prediction down before you analyse any data. Ask yourself whether you have a "spare" explanation should the data show a reverse trend; if so, ask yourself what hypothesis you should be testing.

ad-hoc hypothesis

The proposing of a supplementary hypothesis that is intended to explain why a favorite theory failed an experimental test.

Advice: Open to grave abuse. Try to avoid. Test the ad hoc hypothesis in another experiment.

sensitivity syndrome

The tendency to try to interpret every perturbation in a data set; a failure to recognize that data always contains some "noise".

Advice: Use test-retest and other techniques to estimate the margin of error for any collected data. Report chance levels, p values, effect sizes. Beware of hindsight bias.

positive results bias

A bias commonly shown by scholarly journals to publish only studies that demonstrate positive results (i.e., where data and theory agree).

Advice: Seek replications for suspect phenomena. Be aware of possible "bottom-drawer effect".

bottom-drawer effect

Unawareness of unpublished negative results of earlier experiments.

Advice: Maintain contact and communicate within a scholarly community. Ask other scholars whether they have carried out a given analysis, survey or experiment. Widely report negative results through informal channels.

head-in-the-sand syndrome

The failure to test important theories, assumptions, or hypotheses that are readily testable.

Advice: Collect pertinent data. Carry out analyses. Do a survey. Run an experiment.

data neglect

The tendency to ignore readily available data when assessing theories, assumptions or hypotheses.

Advice: Don't ignore existing resources. Test your hypotheses using other available data sets.

research hoarding

The failure to make the fruits of your scholarship available for the benefit of others.

Advice: Publish often. Prefer to write short research articles rather than books. Make your data available to others.

I once worked with a colleague who spent 30 years studying the life of Felix Mendelssohn. Mendelssohn was a very prolific letter writer, and my colleague had produced English translations of several thousand letters. The letters include details of trips, concerts, rehearsals, meetings, etc. from which my colleague had produced a detailed day-by-day chronicle of Mendelssohn's life. On such-and-such a day Mendelssohn conducting a concert and made revisions to a particular work. The next day he read a newspaper review and received a letter from his publisher. Etc., etc. The assembled chronicle runs to several hundred pages. None of the translations have been published, and the day-by-day chronicle remains a manuscript.

My colleague likes to tell the story of an experience that took place while he was on sabbatical about 15 years ago. He had travelled to Cambridge University where many of Mendelssohn's letters are archived. As one might expect, my colleague encountered another musicologist who was also studying Mendelssohn's letters. By chance, this other scholar happened to glimpse the computer printout of my colleague's chronicle of Mendelssohn's day-by-day activities.
"That's an abolute goldmine you've got there," she said.
"Yes, I know," responded my colleague proudly.

About an hour later, my colleague returned to his table and discovered that a section of the printout was missing.
"Have you seen my chronicle?" he asked of the other scholar.
"No," she replied with a guilty look on her face.

After several minutes of searching, my colleague noticed the missing printout sitting under a pile of books belonging to the other musicologist. Red-faced, she handed the chronicle back to my colleague.

My colleague likes to tell this story as an example of the low morals to which scholars might stoop. But, I think there is a more compelling lesson. To this day, after the passage of some 30 years, my colleague has not yet published his day-by-day chronicle of Mendelssohn's life. My colleague argues that the chronicle is not yet "perfect" and still requires some work reconciling some conflicting dates, etc. My colleague retired about 5 years ago, and still works occasionally on the project.

In the meantime, an entire generation of Mendelssohn scholars have had to work without benefit of his work. Yes, there are likely to be errors in the chronicle. But these errors will be identified more quickly by soliciting the feedback of other Mendelssohn scholars.

Research is a communal activity. All scholars benefit by being in constant dialogue with our peers. We might be tempted to hold back information from our colleagues in order to bask in the glory of some discovery. But we shouldn't wait long before allowing others to benefit from the fruits of our labors. There is a point where our own egos actually impede the development of an area of knowledge.

double-use data

The use of a single data set both to formulate a theory and to "independently" test the theory.

Advice: Avoid. Collect new data.

A pernicious problem plaguing much scholarship is the tendency to use a single data set both to generate the theory and to support the theory. Formally, if observation O is used to formulate theory T, then O cannot be construed as a predicted outcome of T. That is, observation O in no way supports T.

The origin of the Theory of Continental Drift arose from observing the suspicious visual fit between the east coasts of the American continents and the west coasts of Europe and Africa. The bulge of north-west Africa appears to fit like a piece of a jig-saw puzzle into the Caribbean gulf. This observation was ridiculed as childish nonsense by geologists in the first part of the twentieth century. Geologists were right to dismiss the similarity of the coast-lines as evidence in support of the theory of continental drift, since this similarity was the origin of the theory in the first place. Plate tectonics gained credence only when independent evidence was gathered consistent with the spreading of the Atlantic sea-bed.

skills neglect

The human disposition to resist learning new scholarly methods that may be pertinent to a research problem.

Advice: Resist scholarly laziness. Engage in continuing education. Learn things your peers and teachers don't know.

control failure

The failure to contrast an experimental group with a control group.

Advice: Add a control group.

third variable problem

The presumption that two correlated variables are causally linked; such links may arise through an unknown third variable.

Advice: Avoid interpreting correlation as causality. Carry out an experiment where manipulating variables can test notions of probable causality.

In correlational studies, the researcher can demonstrate that there is a relationship or association between two variables or events. But there is no way to determine whether A causes B or B causes A. Moreover, the researcher cannot dismiss the possibility that A and B are not causally connected. It may be the case that both A and B are caused by an independent third variable. By way of illustration we might note that there is a strong correlation between consumption of ice cream and death by drowning. Whenever ice cream consumption increases there is a concomitant increase in drowning deaths (and vice versa). Of course the likely reason for this correlation is that warm summer days lead people to go swimming and also leads to greater ice cream consumption. In historical disciplines, one can never know whether the association of two events is causal, accidental, or the effect of a third (unidentified) event or factor.

reification

Falsely concretizing an abstract concept (e.g. regarding spatial representations of pitch structure as mental representations).

Advice: Take care with terminology.

validity problem

When an operational definition of a variable fails to accurately reflect the true theoretical meaning of the variable (See Cozby, p.31).

Advice: Think carefully when forming operational definitions. Use more than one operational definition. Seek converging evidence.

anti-operationalizing problem

The tendency to raise perpetual objections to all operational definitions.

Advice: Propose better operational definitions. Seek converging evidence using several alternative operational definitions.

Most concepts are vaguely defined. For example, what is meant by a "melodic arch"? Ostensibly, a phrase may be considered "arch-shaped" if the pitches go up and then go back down. But surely its possible for some notes to deviate from this strict criterion, and yet the phrase may still be considered "arch-shaped". Suppose the pitches go: A-B-C-D-E-C-E-D-C-B-A. Does the "C" in the middle mean that there is no arch?

If we are unable to provide a precise definition of a melodic arch, then it would appear that we'd never be able to determine whether most phases are arch-shaped, or whether there is a tendency to create arch-shaped phrases.

If everyone is unable to agree of a definition of a melodic arch, then, by definition, it is impossible to study the purported phenomenon. One could legitimately claim there is no such thing as a melodic arch.

The best way to address this problem is by proposing several alternative operational definitions of a melodic arch. For each operational definition, we can examine a large number of phrases to determine which phrases conform to the given definition.

Suppose that we provide 5 contrasting definitions of a melodic arch. If we can show that most phrases are arch-shaped, no matter which definition we choose, then it becomes more difficult for someone to claim that arch-shaped phrases are mere figments of our imagination. See The Melodic Arch in Western Folksongs.

problem of ecological validity

The problem of generalizing results from controlled experiments to real-world contexts.

Advice: Seek convering evidence between controlled experiments and experiments in real-world settings.

naturalist fallacy

The belief that what IS is what OUGHT to be.

Advice: Imagine desirable alternatives.

presumptive representation

The practice of representing others to themselves. (Natoli, 1997; p.151).

Advice: Exercise care when portraying or summarizing the views of others -- especially when your portrayal causes a disadvantaged group to lose power.

exclusion problem

The tendency to prematurely exclude competing views. (Natoli, 1997; p.151).

Advice: Remember that "no theory is every truly dead." (Popper)

contradiction blindness

The failure to take contradictions seriously.

Advice: Attend to possible contradictions.

multiple tests

If a statistical test relies on a 0.05 confidence level, then, on average, a spuriously significant result will occur for each 20 tests performed.

Advice: Avoid excessive numbers of tests for a given data set. Use statistical techniques to compensate for multiple tests. Split large data sets into one or more "reserved sets." Prefer hypothesis testing over open-ended chasing after significance.

magnitude blindness

The tendency to become preoccupied with significant results that have a small magnitude of effect.

Advice: Aim to uncover the most important factors first.

regression artifacts

The tendency to interpret regression toward the mean as an experimental phenomenon.

Advice: Don't use extreme values as a sampling criterion. Use a control group (such as scrambling orders) to compare with the experimental group.

range restriction effect

Failure to vary an independent variable over a sufficient range of values -- with the consequence that the effect size looks small.

Advice: Decide what range of a variable or what effect size is of interest. Run a pilot study.

ceiling effect

When a task is so easy that the experimental manipulation shows little/no effect.

Advice: Make the task more difficult. Run a pilot study.

floor effect

When a task is so difficult that the experimental manipulation shows little/no effect.

Advice: Make the task easier. Run a pilot study.

sampling bias

Any confound that causes the sample to not be representative of the pertinent population.

Advice: Use random sampling. If there are identifiable sub-groups use a stratified random sample. Where possible, avoid "convenience" or haphazard sampling.

subsample ignorance

Failure to recognize that sub-groups within a sample respond differently. For example, where responses diverge between males and females, or between vocalists and instrumentalists.

Advice: Use descriptive methods and data exploration methods to examine the experimental results. Use cluster analysis methods where appropriate.

cohort bias or cohort effect

Differences between age groups in a cross-sectional study that are due to generational differences rather than due to the experimental manipulation.

Advice: Use a more narrow range of ages. Use a longitudinal design instead of a cross-sectional design.

Suppose we had a theory that listeners tend to become less tolerant of dissonance as they age. In a cross-sectional design, we might randomly select 30 people (say) for each of the ages of 15, 25, 35, 45, 55, 65 and 75. We might present each subject with chords, phrases, passages, or complete musical works, and ask them to rate how unpleasant or annoying they find them.

Further suppose that we found that older listeners rate the passages significantly more unpleasant/annoying than younger listeners. On this basis, we would not be justified in claiming that listeners become less tolerant of dissonance with increasing age. Why? It is possible that music has become more dissonant with successive generations. In other words, the effect may have nothing to do with age, and may be simply attributable to when a person was born.

A better approach to studying this question would employ a longitudinal design. Using this approach, the researcher would follow specific individuals as they age over several decades. One would need to test each subject several times over a long period of time to determine whether they rate the same sounds more unpleasant or annoying as they grow older.

expectancy effect

Any unconscious or conscious cues that convey to the subject how the experimenter wants them to respond. Expecting someone to behave in a particular way has been shown to promote the expected behavior.

Advice: Use standardized interactions with subjects. Use automated data-gathering methods. Use double-blind protocol.

One of the earliest demonstrations of the expectancy effect is found in the famous case of "Clever Hans" -- a horse that appeared to be an utter genius. Here is a description by Robert Rosenthal:

"Hans, it will be remembered, was the clever horse who could solve problems of mathematics and musical harmony with equal skill and grace, simply by tapping out the answers with his hoof. A committee of eminent experts testified that Hans, whose owner made no profit from his horse's talents, was receiving no cues from his questioners. Of course, Pfungst later showed that this was not so, that tiny head and eye movements were Hans' signals to begin and to end his tapping. When Hans was asked a question, the questioner looked at Hans' hoof, quite naturally so, for that was the way for him to determine whether Hans' answer was correct. Then, it was discovered that when Hans approached the correct number of taps, the questioner would inadvertently move his head or eyes upward -- just enough that Hans could discriminate the cue, but not enough that even trained animal observers or psychologists could see it."

When interacting with an experimental subject, there are innumerable subtle cues by which the experimenter may unwittingly communicate what the subject is expected to do. In such circumstances it is better that the experimenter not be present, or that the experimenter not know the purpose of the experiment (double-blind), or that the interaction with the subject is done using automated procedures.

placebo effect

The positive or negative response arising from the subject's belief about the efficacy of some manipulation.

Advice: Use a placebo control group.

Medical practitioners long ago discovered that a patient's belief in the effectiveness of a treatment can have a marked impact on their recovery, or on their experience of pain. Roughly one-third of patients report feeling better after taking a simple "sugar pill".

The placebo effect makes it more difficult to test the efficacy of new drugs in pharmaceutical research. The simple act of injecting someone, or giving them a pill, will cause people to report improvement. Consequently, drug trials must include a "placebo group" who are treated in an identical fashion to the experimental group -- with the exception that any pills or injections use inert substances. A drug is likely to be effective, only if the improvement for the experimental group exceeds any improvement among the placebo group.

The placebo effect is not limited to improvement. Simply suggesting that a pill contains a toxin, for example, is likely to make recipients feel sick.

In some cases, the placebo effect can be remarkably large. For example, Marlatt and Rohsenow (1980) carried out a study to compare the effect of alcohol compared with the psychological effect of believing one is drinking alcohol. The results showed that the belief that one has consumed alcohol has a greater effect on behavior than the alcohol itself.

The placebo effect is an example of the influence of demand characteristics.

demand characteristics

Any aspect of an experiment that might inform subjects of the purpose of the study.

Advice: Control demand characteristics by: (1) using deception (for example, by adding "filler" questions that make it more difficult for subjects to infer the experimental question), (2) debriefing subjects at the end of the experiment, (3) using field observation, (4) avoiding within-subjects designs where all subjects are aware of all the experimental conditions, (5) asking subjects not to discuss the experiment with future participants.

People who participate in studies, surveys or experiments are not "inert" or "neutral". People form their own intuitions about the purpose of a study, and will often respond in a way that reflects their opinion about the hypothesis, rather than their unreflective way of behaving. For example, a person might receive in the mail a survey distributed by a large chemical corporation. Browsing through the survey, the respondant might form the view that the industry is carrying out the survey in the hopes of showing that people are less concerned about pollution than is widely believed. The respondant might be offended by this possibility, and consequently respond to questions in a way that actually exaggerates their views.

Conversely, many participants in experiments are often eager to please the researcher. This is especially true in cross-cultural studies. Faced with a European anthropologist, a native Papuan might well respond to questions based on what they think the anthropologist wants to hear.

Some years ago, I carried out an experiment to try to determine whether people are prejudiced against woman composers. In the experiment, we provided listeners with copies of concert programs that identified various pieces of contemporary art music and included brief biographical descriptions of the composers -- some of whom were male and some female.

We played brief excerpts from each piece on the program and asked listeners to rate how well they liked them.

Two groups of listeners were used. For one group, we made slight changes to the program so that male and female composers were switched. So some listeners heard an excerpt thinking the composer was a woman, whereas other listeners heard the same excerpt thinking the composer was a man.

We found no difference whatsoever in the ratings according to sex. Some excerpts were rated more highly than others, but it didn't matter whether the composer was a man or a woman.

After the experiment, we carried out brief interviews with our participants. We discovered that the vast majority of our listeners accurately suspected that the purpose of the experiment was to determine the effect of sex on musical ratings. If we had not carried out post-experiment debriefings, we might have wrongly concluded that listeners are not prejudiced by whether a composer is a man or a woman.

A better experiment might have found a much more subtle way (deception) to imply that a given excerpt is written by a man or a woman.

Demand characteristics are also evident in non-experimental situations. For example, after teaching a particular analysis method, music students will naturally tend to assume that an assigned musical work will readily be explicated using the analytic method.

reactivity problem

When the act of measuring something changes the measurement itself. (See Cozby, p.33)

Advice: Use clandestine measurement methods.

history effect

Any change between a pretest measure and posttest measure that is not attributable to the experimental manipulation.

Advice: Isolate subjects from external information. Use post-experiment debriefing to identify possible confounds.

maturation confounds

Any changes in responses due to changes in the subject not related to the experimental manipulation. Examples of maturation changes include increasing boredom, becoming hungry, and (for longer experiments) reduced reaction times, fading beauty, becoming wiser, etc. (See Cozby, p.68)

Advice: Prefer short experiments. Provide breaks. Run a pilot study.

testing effect

In a pretest-posttest design, where a pre-test causes subjects to behave differently. (See Cozby, p.69).

Advice: Use clandestine measurement methods. Use a control group with no manipulation between pre- and post-test.

carry-over effect

When the effects of one treatment are still present when the next treatment is given. (See Cozby, p.281)

Advice: Leave lots of time between treatments. Use between-subjects design.

order effect

In a repeated measures design, the effect that the order of introducing treatment has on the dependent variable.

Advice: Randomize or counter-balance treatment order. Use between-subjects design.

Suppose you play a series of musical excerpts and ask listeners to rate how well they like each excerpt. It turns out that the first musical excerpt will always tend to be more highly rated than the subsequent excerpts. This effect is common: people enjoy the first bite from a chocolate bar more than subsequent bites.

In addition, if a "nice" musical excerpt is preceded by an "ugly" musical excerpt, then the "nice" excerpt will typically be rated as more pleasant than if it had been preceded by a less "ugly" excerpt.

These order effects must be overcome whenever one asks people to rate several musical excerpts or stimuli.

One way to avoid order effects is to provide each listener with a unique random ordering of the excerpts. This means that all excerpts have an equal probability of occuring first, or occuring after an especially "ugly" excerpt. In short, one should avoid playing the excerpts in the same order to each of the listeners.

Alternatively, the experimenter can use all possible orderings. This can be done only if the number of items is small.

mortality problem

In a longitudinal study, the bias introduced by some subjects disappearing from the sample.

Advice: Convince subjects to continue; investigate possible differences between continuing and non-continuing subjects.

premature reduction

The tendency to rush into an experiment without first familiarizing yourself with a complex phenomena.

Advice: Use descriptive and qualitative methods to explore a complex phenomenon. Use explorative information to help form testable hypotheses and to identify plausible confounds that need to be controlled.

spelunking

Exploring a phenomenon without ever testing a proper hypothesis.

Advice: Formulate and test hypotheses.

shifting population problem

The tendency to reconceive a sample as representing a different population than originally conceived.

Advice: Write-down in advance what you think is the population.

instrument decay

Changes of measurement over time due to fatigue, increased observational skill, or changes of observational standards.

Advice: Use a pilot study to establish observational standards and develop skill.

reliability problem

When various measures or judgments are inconsistent.

Advice: Solutions: (1) careful training of experimenter, (2) careful attention to instrumentation, (3) measure reliability, and avoid interpreting affects smaller than the error bars.

hypocrisy

Holding others to a higher methodological standard than oneself.

Advice: Employ higher standards than others.

At first, this advice will seem unfair. Surely, there is no imperative for one researcher (me) to be more rigorous than another researcher. There are two rejoinders to this complaint. First, legitimate criticisms can come from otherwise incompetent or ill-informed people. We may rightly feel angry when another scholar criticizes our work, when their own work is less rigorous. But this doesn't not mean that the criticism is less valid. Secondly, by following the advice to employ higher standards than others a "virtuous circle" is established through which the methodological rigor of a discipline is advanced.

It is for these reasons that hypocrisy is here defined as "holding others to a higher methodological standard than oneself" rather than simply "holding others to a higher methodological standard". As defined here, hypocrisy is an error that only I can make. If another person holds me to a higher methodological standard, he/she is not a hypocrite. But if I hold someone else to a higher methodological standard, then I am a fully qualified hypocrite.



This document is available at http://dactyl.som.ohio-state.edu/Music829C/methodological.potholes.html