Analysis of identification data


This page is about the analysis of speech categorization data, meaning 2-alternative forced-choice responses to single stimuli drawn from a continuum.  The figures shown are generally taken from the ASA volume of reprinted articles on Speech Perception.  Some reference is also made to papers about speech perception in dyslexia, as that was the immediate purpose of the material when first presented in the UCLA Phonetics Lab.

Look at the example file ID_data.xls: Sheet1 contains a simple made-up dataset, 9 subjects' responses to 7 stimuli, % response “A” for 10 repetitions of each stimulus, with the average responses graphed along the continuum.

How to analyze such data?

1.  You might think: a simple 1-way ANOVA (factor = stimulus, with 7 levels). This can be done, but generally isn't.

It must be a Repeated Measures analysis because each subject gets all levels of stimuli; and in our lab you almost certainly must use SPSS (rather than StatView) because correlations among levels are almost certain.  By itself the 1-way ANOVA is not of great interest as it is almost certain to show a significant effect of stimulus on response (else subjects are responding equally to all stimuli); the interest then would be in the post-hoc tests/planned comparisons, and these have to be done separately from the RM analysis.  (E.g. paired t-tests, corrected alpha for 6 comparisons is .0083.)  Such tests will tell you which stimuli are perceived differently from which others, but the research question is rarely framed that way.

For practice, do the SPSS RM ANOVA (using the file ID_data.xls, opening Sheet 1 in SPSS):

Note that any ANOVA (RM or factorial) is treating the stimuli as unordered - not as a continuum but just as a set of 7 stimuli. That’s probably one reason you never see this kind of analysis.


2.  Often there are 2 or more continua tested, and the interest is in comparing responses across these, more than within each continuum.

In the example file ID_data.xls, Sheet2 is as above, plus same subjects' responses to a second continuum of 7 stimuli, with responses to both plotted together.  The second set has been constructed to show a shallower crossover than the first.

Do the ANOVA (sheet "RM"), now with 2 within-subject factors:

Suppose the data are the same, but they come from 2 groups of subjects (either on the same continuum, such as speakers of 2 languages, or assigned to one of 2 different continua).  Now there's a between-subjects (Grouping) factor while the stimulus factor remains within subjects (RM).  (Data sheet "mixed": Now significance of interaction is weaker.)

(A dyslexia paper that does such an analysis (Between-subjects factor Group with 2 levels, Within-subjects factor Stimulus): Gerrits (2003) ICPhS paper.)

REVIEW: a new SPSS file, pretend_spy_sky.sav: 4 (small) groups of subjects, 2 continua, 8 stimuli each – do the analysis, you’ll see no Group effect.


3.  But instead of analyzing responses to each stimulus, researchers generally derive one or more OVERALL MEASURES of a subject's responses.

In general, there are 2 advantages to finding an appropriate overall measure:

So what are appropriate measures? 

Most simply, Repp et al. (1978), J. Exp. Psych.: Human Perc. and Perf. 4(4), 621-37 (and Sussman 1993 on lengthened transitions) totaled the overall responses to all stimuli on a continuum, lumped together.  Such a measure will characterize a difference between two functions shifted along the continuum (see figure just below).  But in our example, and many cases of interest, this is not the difference between the two datasets, rather the steepness vs. gradience of the function is, and a total responses measure won’t show that.

The usual two measures are the (category) boundary (the 50% crossover point), and the crossover slope.

Note that in all such cases, it is common to exclude the responses of a subject when they do not have the overall expected shape, for example if they are random, since in such cases there is no “boundary” or “slope” to speak of.  One criterion for inclusion is that the responses fall below 25% (or some similar number) at one end of the continuum, and above 75% at the other end.


4.  So, what kind of function to fit?

Categorization data are generally not quite linear, nor exponential; they are generally more S-shaped, though very shallow curves can be almost linear.  Cubic spline and logistic functions have this shape, also generally ogive (cumulative frequency).  However, a cubic function has this shape only over part of its range, and may not be a good overall fit.  Here we will use a logistic; because it can be constrained to fall between 0 and 1, it is the best theoretical match to the expected shape of the data, which can never go outside that range..

Stefan Frisch has discussed the use of logistics more generally (e.g. Frisch, Pierrehumbert, and Broe (2004) NLLT 22: 179-228).  Here is their example of logistic functions that differ in "degree of gradience" (p. 183):

COMPARING DIFFERENT FITTED CURVES IN SPSS: The set of datapoints to be fitted with a function must be arranged as a single variable (i.e. in a column); choose Regression--Curve Estimation.  Use the SPSS file pretend_regression which has each set of ID responses as a separate column.  Use the first or last columns.  (Logistic, and several of the other curves, requires no "0" values in the data, and so values in the first column have been modified accordingly; logistic needs an upper bound set just above the maximum value in the data; set it at .01 above the max in the data for this exercise, but see also below.)

Compare the various offered curves for their fit to these data.  Top candidates are Linear, Cubic and Logistic.  The Cubic is actually the best fit (in terms of r2), but it uses more degrees of freedom to get this better fit, and goes outside the data range of 0 to 1.  Notice that for the first column of data, although the 3 best curves look different overall, they are almost identical at the y=50% point (and all are different from the raw data at that point).

A vexed issue in curve fitting: fit the entire set of responses, or only the sloping part?  That is, eliminate the trailing 0s and 1s at the two ends?  Then there are very few points to fit (in the perfect case, only 2), and there are often jiggles in the data that keep some 0s and 1s in.  However, because the logistic can fit the endpoints so well, it does often look like underestimating slope differences between different data.


5.  How to get the boundary and slope values from the SPSS fitted logistic curve.

The slope per se is not returned by the function, but the term “b1” is related to the slope, with higher values reflecting shallower curves.  We have not tried to recover the true slope from b1, instead use b1 directly as our measure of slope.

The boundary must be further computed from the “b0” term.  To be able to do this in the simplest way, it is crucial that the upper bound parameter for the curve fitting be set to 1.  That in turn requires that the highest data value be less than 1.  Thus you should change your “0” datapoints to .001 (or similar), and correspondingly, your “1” datapoints to .999.

As long as you have set the upper bound to 1, then you can use the following formula, and the b0 and b1 values given for the logistic curve for your data, to solve for x when y=.5: -ln(b0)/ln(b1).  (That’s a minus sign.)  Let's see this in Excel (file pretend_regression.xls, which contains 3 results from "pretend regression.sav").  We can see that every response affects the fitted curve and therefore the boundary.  (This file shows only the original data; to see the fitted curves from which the boundaries are obtained, use SPSS.)


Another approach seen in the literature, more in accord with general statistical practice:  If data cannot be fit by a line, try transforming the data to make them look linear, then fit a line.  Eimas, Cooper & Corbit (1973), Perc. & Psychophys. 13, 247-252, and Miller & Liberman (1979), Perc. & Psychophys. 26(6), 457-465, transformed their data to z-scores to fit a line by linear regression; then the boundary is where z=0.


Finally, a very popular approach is probit transformation (seen in lots of papers, especially from Haskins), but our stats consultants say that this is essentially like the logistic, formerly more popular, now less so; and only the logistic fit is easy to do in SPSS.  (One possible difference is that the probit curve can have a steeper slope than a logistic curve, and that could be a reason to prefer it, for fitting very categorical responses.)  



6.  A final topic on analysis of identification data: Using the binomial distribution to ask, "Is this rate of identification above chance?"  For example, given 10 repetitions of a stimulus, and 2 alternative responses "spy" and "sky", clearly 5 "spy" responses is at chance.  But is 6 "spy" responses above chance?  How about 7?  And if there are 20 repetitions, is 14 "spy" responses above chance?


SPSS provides a binomial test for this.  See file 14_of_20.sav which gives 14 "1" responses vs. 6 "0" responses, and do Analyze - Nonparametric Tests - Binomial.  The default level of chance is .5, and the test shows that these responses are not reliably different from chance.  But 15 "1" responses is above chance.  Try different sets of responses for different numbers of trials.  


Note that this SPSS test is stricter than looking up the same cases in the binomial table in the back of Hays (Table II, pp. 1008-1012).  This is because the SPSS test is testing the probability of 14 OR MORE out of 20, while Hays is testing only the probability of 14 out of 20.  If you sum Hays’s probabilities for 14 and above out of 20, you get the SPSS result.  (Thanks to Colin Wilson for figuring this out!) 


For chance=.5, Hays's table (last column) gives the following numbers as above chance with alpha .05:

(nothing for fewer than 5 trials)

0 or 5 out of 5

0 or 6 out of 6

0 or 7 out of 7

0 - 1 or 7 - 8 out of 8

0 - 1 or 8 - 9 out of 9

0 - 2 or 8 - 10 out of 10

0 - 2 or 9 - 11 out of 11

0 - 2 or 10 - 12 out of 12

0 - 3 or 10 - 13 out of 13

0 - 3 or 11 - 14 out of 14

0 - 4 or 11 - 15 out of 15

0 - 4 or 12 - 16 out of 16

0 - 5 or 12 - 17 out of 17

0 - 5 or 13 - 18 out of 18

0 - 5 or 14 - 19 out of 19

0 - 6 or 14 - 20 out of 20


Prepared Spring 2004 by Pat Keating

Return to UCLA Phonetics Lab Statistics

Return to UCLA Phonetics Lab