Analysis
of variance:
This
is how we generally test for the presence of one or more effects.
In our experiments we manipulate one or more independent variables, we
control for other independent variables, and we measure one or more dependent
variables. Each independent
variable (or factor) has two or more levels.
Each datum comes from some condition, or combination of the levels of the
factors. For example, suppose
our dependent variable is VOT and our independent variables are consonant
(let’s say 3 levels), following vowel (let’s say 5 levels), and stress
(let’s say 2 levels) – plus the subjects who produce the speech data; and
we control for certain things, such as position in word by having all consonants
word-initial, and voicing by having them all voiceless.
Each single VOT measurement comes from some particular condition such as
“p before i in a stressed syllable” (and from a particular subject).
There is a set of all measurements (from all the subjects) for each condition,
e.g. “p before i in a stressed syllable”; and these can be combined with
other conditions to produce larger sets for all the different levels of
the different factors, e.g. all the “p” data, or all the “stressed” data,
or all the “stressed p” data – any subset defined by your variables.
Analysis
of an experiment with one factor is called “1-way”, of an experiment with
two factors “2-way”, etc.
Since you test for an effect of a factor by forming an F-ratio, there is
a separate F-ratio for each factor in your experiment; these test
for main effects. This doesn’t
depend on how many levels each factor has.
So, for example:
1-way
ANOVA: F-ratio for 1 factor (regardless of its number of levels)
2-way
ANOVA: F-ratios for 2 factors (regardless of their numbers of levels)
(etc.)
In
our example above, there would be F-ratios for consonant, vowel, and stress
factors.
F-ratios
can also be formed for the various subsets of the data; these are called
interactions. The consonant
x stress interaction looks at the effect of the stress factor on each of
the different consonants, and the effect of the consonant factor on each
of the different stresses.
DOING
A BASIC ANOVA
Before
we get to Repeated Measures designs, we’ll review doing a factorial ANOVA.
1.
Datafile organization
Look at data in a standard arrangement, where each row is a datapoint and each column is an independent variable, in my sample Excel file factorial.xls. This gives 24 datapoints for 2 factors each with 2 levels. (This organization assumes that there are 24 different subjects, each providing one datapoint. If there is just one subject providing all the datapoints, then the analysis is exploratory only.)
2.Running the analysis
Open the Excel file in
SPSS. Under Analyze, select GLM, then Univariate. Drag your Dependent
Variable (here, Rating) into the top box, then the 2 Independent
Variables into the Fixed Factors box. (By treating Item as a fixed
factor, we're assuming that these are particular words chosen to test
for particular effects, not randomly selected words.) Then click OK.
In the Output window, the first box, Between Subjects Factors,
summarizes your data as SPSS saw them, and you should always
double-check this report. The second box, Tests of Between Subjects
Effects, gives the ANOVA results.
The
rows show the two factors (WdStatus, Item), their interaction (WdStatus*Item), and the error term. For each
of these, the columns show the sums of squares
SS, the degrees of freedom df, the mean square MS (= SS/df), the F-ratio (= MS for that row / MS Error),
and Significance (p, the chance that an effect is due to chance).
Since the 2 factors each had only 2 levels, their df is 1, and since MS
= SS/df, here the value of MS = the value of SS.
For all three F-ratios, the denominator is the MS Error.
Its df is related to the fact that there are 24 datapoints in the entire
experiment, minus the other df.
Recall
that to report your result you say, e.g., “F(1,20) = 211.892, p < .001”.
Do
the analysis again, selecting only one independent variable, and compare
the results.
3.A
twist: “compact” data files
There is another kind of data file organization, in which each row is a subject and each condition (combination of levels of the independent variables) is a column. Look at my sample file compact.xls to see this compact organization of the same data we saw already, but now assuming that there were 6 subjects each of whom provided 4 datapoints. The 4 columns are the 4 conditions (2 levels x 2 levels). This kind of data file cannot be used for a factorial ANOVA in SPSS or R.
to
review:
structure of these kinds of files
|
regular
|
compact
|
row
|
observation
|
subject
|
column
|
variable
|
condition
|
4. Question: Can you do an ANOVA with only one or two subjects?
No,
you need at least a few, probably several, possibly many, subjects.
(See section on Power for info on how many you need.)
The
problem is that the observations from a single subject are not independent
enough to be analyzed by ANOVA. You need enough subjects for subjects to
be the experimental unit (the basis of the comparisons in the test).
When an ANOVA is done with only one subject, as indeed is often seen in
the literature (especially in speech production experiments), then individual trials (tokens, repetitions) are used as the
experimental unit, and as these are not independent, the degrees of freedom
used will be too high, which will overestimate the significance of any
differences (Alpha, or Type I, error).