This page is about the
analysis of speech categorization data, meaning 2-alternative forced-choice
responses to single stimuli drawn from a continuum.
The figures shown are generally taken from the ASA volume of reprinted
articles on Speech Perception. Some reference
is also made to papers about speech perception in dyslexia, as that was the
immediate purpose of the material when first presented in the UCLA Phonetics
Lab.

Look at the example file ID_data.xls: Sheet1 contains a simple made-up
dataset, 9 subjects' responses to 7 stimuli, % response “A” for 10 repetitions
of each stimulus, with the average responses graphed along the continuum.

How to analyze such data?

1. You might think: a simple 1-way
ANOVA (factor = stimulus, with 7 levels). This can be done, but generally
isn't.

It must be a Repeated Measures analysis
because each subject gets all levels of stimuli; and in our lab you almost
certainly must use SPSS (rather than StatView) because correlations among
levels are almost certain. By itself the 1-way ANOVA is not of great
interest as it is almost certain to show a significant effect of stimulus
on response (else subjects are responding equally to all stimuli); the interest
then would be in the post-hoc tests/planned comparisons, and these have to
be done separately from the RM analysis. (E.g. paired t-tests, corrected
alpha for 6 comparisons is .0083.) Such tests will tell you which
stimuli are perceived differently from which others, but the research question
is rarely framed that way.

For practice, do the SPSS RM ANOVA (using
the file ID_data.xls, opening Sheet 1 in SPSS):

- Tests
of within-subject effects on this data file: p < .0001 under all
corrections (even Lower Bound, where df = 1 not
6).
- Note the
tests of contrasts, which look for trends across the levels (our data
are fit by odd orders, but not even)

Note that any ANOVA (RM
or factorial) is treating the stimuli as unordered - not as a continuum but
just as a set of 7 stimuli. That’s probably one reason you never see
this kind of analysis.

2. Often there
are 2 or more continua tested, and the interest is in comparing responses
across these, more than within each continuum.

In the example file ID_data.xls, Sheet2 is as above, plus same
subjects' responses to a second continuum of 7 stimuli, with responses
to both plotted together. The second set has been constructed to show
a shallower crossover than the first.

Do the ANOVA (sheet "RM"), now with 2 within-subject
factors:

- Effect
of Stimulus is still significant, Continuum is not
- Stimulus
x Continuum interaction is significant

Suppose the data are the
same, but they come from 2 groups of subjects (either on the same continuum,
such as speakers of 2 languages, or assigned to one of 2 different continua). Now there's a between-subjects (Grouping) factor while
the stimulus factor remains within subjects (RM). (Data sheet "mixed":
Now significance of interaction is weaker.)

(A dyslexia paper that does such an analysis
(Between-subjects factor Group with 2 levels, Within-subjects factor Stimulus):
Gerrits (2003) ICPhS paper.)

REVIEW: a new SPSS file, pretend_spy_sky.sav: 4 (small) groups of subjects, 2 continua, 8
stimuli each – do the analysis, you’ll see no Group effect.

3. But instead of analyzing responses
to each stimulus, researchers generally derive one or more OVERALL MEASURES of a subject's responses.

In general, there are 2 advantages to finding
an appropriate overall measure:

- the stimuli
can be treated as a continuum, in an ordered relation
- there's
no more "Stimulus" variable in the analysis, making the analysis that
much simpler; especially, if each continuum is given to a different
group of subjects, the analysis is entirely factorial

So what are appropriate measures?

Most simply, Repp et al. (1978), *J. Exp. Psych.: Human Perc. and Perf*.
4(4), 621-37 (and Sussman 1993 on lengthened transitions) totaled the overall
responses to all stimuli on a continuum, lumped together. Such a measure
will characterize a difference between two functions shifted along the continuum
(see figure just below). But in our example, and many cases of interest,
this is not the difference between the two datasets, rather the steepness
vs. gradience of the function is, and a total responses measure won’t show
that.

The usual two measures are the (category)
**boundary **(the 50% crossover point),
and the crossover **slope**.

- Early,
graphical, method of estimating the boundary: graph the responses, estimate
where the function crosses y=50%. Here’s an example from Best
et al. (1981),
*Perc. & Psychophys.*29, p. 197, who show this line:

- You could
even measure the
**boundary shift**directly from such a graph. Others graphed the 2 complementary responses, and measured where they crossed. - Later
method: (in effect) smooth the raw data by fitting some kind of curve,
and base measurements on the curve rather than the raw data.
Kuhl and Miller (1978),
*JASA 63*: 905-917, did this fitting by hand, on "probability" paper. Their picture of sample fitted curves (p. 910):

- Kuhl &
Miller also calculated a measure of slope from these fitted curves:
the "boundary width", or the range of stimuli spanned
by the 25% to 75% responses. Note that their data, like my made-up
data, differ in slope more than in boundary location (p. 910):

- Still later method: Fit a function, rather than an arbitrary curve, e.g. by some kind of regression. (You could imagine that researchers would then show the fitted functions, and/or report their Goodness of Fit, but they almost never do.) The advantages of fitting a function are that it is reproducible (unlike fitting by eye), and that its values and properties can be calculated rather than estimated.

Note that in
all such cases, it is common to exclude the responses
of a subject when they do not have the overall expected shape, for example
if they are random, since in such cases there is no “boundary” or “slope”
to speak of. One criterion for inclusion is that
the responses fall below 25% (or some similar number) at one end of the
continuum, and above 75% at the other end.

4.
So, what kind of function to fit?

Categorization data are generally
not quite linear, nor exponential; they are generally more S-shaped, though
very shallow curves can be almost linear. Cubic spline and logistic
functions have this shape, also generally ogive (cumulative frequency).
However, a cubic function has this shape only over part of its range, and
may not be a good overall fit. Here we will use a logistic; because
it can be constrained to fall between 0 and 1, it is the best theoretical
match to the expected shape of the data, which can never go outside that
range..

Stefan Frisch has discussed the use of logistics
more generally (e.g. Frisch, Pierrehumbert, and Broe (2004) *NLLT
22*: 179-228). Here is their example of logistic
functions that differ in "degree of gradience" (p. 183):

COMPARING DIFFERENT FITTED CURVES IN SPSS:
The set of datapoints to be fitted with a function must be arranged as a
single variable (i.e. in a column); choose
Regression--Curve Estimation. Use the SPSS file pretend_regression which has each set of
ID responses as a separate column. Use the first or last columns.
(Logistic, and several of the other curves, requires no "0" values in the
data, and so values in the first column have been
modified accordingly; logistic needs an upper bound set just above the maximum
value in the data; set it at .01 above the max in the data for this exercise,
but see also below.)

Compare the various offered curves for their
fit to these data. Top candidates are Linear, Cubic and Logistic.
The Cubic is actually the best fit (in terms of r2), but it uses more degrees
of freedom to get this better fit, and goes outside the data range of 0 to
1. Notice that for the first column of data, although the 3 best curves
look different overall, they are almost identical at the y=50% point (and
all are different from the raw data at that point).

A vexed issue in curve fitting: fit the
entire set of responses, or only the sloping part? That is, eliminate
the trailing 0s and 1s at the two ends? Then there are very few points
to fit (in the perfect case, only 2), and there are often jiggles in the
data that keep some 0s and 1s in. However, because the logistic can
fit the endpoints so well, it does often look like underestimating slope differences
between different data.

5. How to get
the boundary and slope values from the SPSS fitted logistic curve.

The slope per se is not returned by the
function, but the term “b1” is related to the slope, with higher values reflecting
shallower curves. We have not tried to recover
the true slope from b1, instead use b1 directly as our measure of slope.

The boundary must be further computed from
the “b0” term. To be able to do this in the simplest
way, it is crucial that the upper bound parameter for the curve fitting
be set to 1. That in turn requires that the
highest data value be less than 1. Thus you
should change your “0” datapoints to .001 (or similar), and correspondingly,
your “1” datapoints to .999.

As long as you have set
the upper bound to 1, then you can use the following formula, and the b0
and b1 values given for the logistic curve for your data, to solve for x when
y=.5: -ln(b0)/ln(b1). (That’s a minus sign.) Let's see this in Excel (file pretend_regression.xls,
which contains 3 results from "pretend regression.sav").
We can see that every response affects the fitted curve and therefore
the boundary. (This file shows only the original
data; to see the fitted curves from which the
boundaries are obtained, use SPSS.)

Another approach seen
in the literature, more in accord with general statistical practice: If data cannot be fit by a line, try transforming
the data to make them look linear, then fit a line.
Eimas, Cooper & Corbit (1973), *Perc. & Psychophys. 13*, 247-252, and Miller & Liberman (1979), *Perc. &
Psychophys. 26*(6), 457-465, transformed their data to z-scores to fit
a line by linear regression; then the boundary is where z=0.

Finally, a very popular
approach is probit transformation (seen in lots
of papers, especially from Haskins), but our stats consultants say that
this is essentially like the logistic, formerly more popular, now less so;
and only the logistic fit is easy to do in SPSS. (One
possible difference is that the probit curve can have a steeper slope than
a logistic curve, and that could be a reason to prefer it, for fitting very
categorical responses.)

6.
A final topic on analysis of identification data: Using the binomial
distribution to ask, "Is this rate of identification above chance?" For example, given 10 repetitions of a stimulus, and
2 alternative responses "spy" and "sky", clearly 5 "spy" responses is at
chance. But is 6 "spy"
responses above chance? How about 7? And if there are 20 repetitions, is 14 "spy" responses
above chance?

SPSS provides a binomial
test for this. See file 14_of_20.sav which gives 14 "1" responses vs. 6 "0" responses, and do Analyze - Nonparametric
Tests - Binomial. The default level of chance
is .5, and the test shows that these responses are not reliably different
from chance. But 15 "1" responses is above chance. Try different sets of responses for different numbers
of trials.

Note that this SPSS test
is stricter than looking up the same cases in the binomial table in the back of Hays (Table II, pp. 1008-1012). This is because
the SPSS test is testing the probability of 14 OR MORE out of 20, while Hays
is testing only the probability of 14 out of 20. If you sum Hays’s probabilities for 14 and above out of 20,
you get the SPSS result. (Thanks to

For chance=.5, Hays's
table (last column) gives the following numbers as above chance with alpha
.05:

(nothing for fewer than 5 trials)

0 or 5 out of 5

0 or 6 out of 6

0 or 7 out of 7

0 - 1 or 7 - 8 out of 8

0 - 1 or 8 - 9 out of 9

0 - 2 or 8 - 10 out of 10

0 - 2 or 9 - 11 out of 11

0 - 2 or 10 - 12 out of 12

0 - 3 or 10 - 13 out of 13

0 - 3 or 11 - 14 out of 14

0 - 4 or 11 - 15 out of 15

0 - 4 or 12 - 16 out of 16

0 - 5 or 12 - 17 out of 17

0 - 5 or 13 - 18 out of 18

0 - 5 or 14 - 19 out of 19

0 - 6 or 14 - 20 out of 20

*Prepared Spring 2004 by Pat Keating*

Return to UCLA Phonetics Lab Statistics

Return to UCLA Phonetics
Lab