Analysis
of variance (ANOVA)
is a collection of statistical models used in order to analyze the differences
between group means and their associated procedures (such as
"variation" among and between groups), developed by R. A. Fisher. In
the ANOVA setting, the observed variance in a particular variable is
partitioned into components attributable to different sources of variation. In
its simplest form, ANOVA provides a statistical test of whether or not the means
of several groups are equal, and therefore generalizes the t-test to
more than two groups. As doing multiple two-sample t-tests would result in an
increased chance of committing a statistical type I error, ANOVAs are useful in
comparing (testing) three or more means (groups or variables) for statistical
significance.
LOGIC
OF ANOVA
The calculations of
ANOVA can be characterized as computing a number of means and variances,
dividing two variances and comparing the ratio to a handbook value to determine
statistical significance. Calculating a treatment effect is then trivial,
"the effect of any treatment is estimated by taking the difference between
the mean of the observations which receive the treatment and the general mean.”
PARTITIONING
OF THE SUM OF SQUARES
ANOVA uses traditional
standardized terminology. The definitional equation of sample variance is
, where the divisor
is called the degrees of freedom (DF), the summation is called the sum of
squares (SS), the result is called the mean square (MS) and the squared terms
are deviations from the sample mean. ANOVA estimates 3 sample variances: a
total variance based on all the observation deviations from the grand mean, an
error variance based on all the observation deviations from their appropriate
treatment means and a treatment variance. The treatment variance is based on
the deviations of treatment means from the grand mean, the result being
multiplied by the number of observations in each treatment to account for the
difference between the variance of observations and the variance of means.
, where the divisor
is called the degrees of freedom (DF), the summation is called the sum of
squares (SS), the result is called the mean square (MS) and the squared terms
are deviations from the sample mean. ANOVA estimates 3 sample variances: a
total variance based on all the observation deviations from the grand mean, an
error variance based on all the observation deviations from their appropriate
treatment means and a treatment variance. The treatment variance is based on
the deviations of treatment means from the grand mean, the result being
multiplied by the number of observations in each treatment to account for the
difference between the variance of observations and the variance of means.
The fundamental
technique is a partitioning of the total sum of squares SS into
components related to the effects used in the model. For example, the model for
a simplified ANOVA with one type of treatment at different levels.
The number of degrees
of freedom DF can be partitioned in a similar way: one of these
components (that for error) specifies a chi-squared distribution which
describes the associated sum of squares, while the same is true for
"treatments" if there is no treatment effect.
The
F-test
The F-test is used
for comparing the factors of the total deviation. For example, in one-way, or
single-factor ANOVA, statistical significance is tested for by comparing the F
test statistic
where MS is
mean square, I = number of
treatments and nT = total number of
cases to the F-distribution
with I
– 1, nT - I
degrees of freedom. Using the F-distribution is a natural candidate because the
test statistic is the ratio of two scaled sums of squares each of which follows
a scaled chi-squared distribution.
The expected value of
F is
(where n is the treatment sample size) which
is 1 for no treatment effect. As values of F increase above 1, the evidence is
increasingly inconsistent with the null hypothesis. Two apparent experimental
methods of increasing F are increasing the sample size and reducing the error
variance by tight experimental controls.
(where n is the treatment sample size) which
is 1 for no treatment effect. As values of F increase above 1, the evidence is
increasingly inconsistent with the null hypothesis. Two apparent experimental
methods of increasing F are increasing the sample size and reducing the error
variance by tight experimental controls.
There are two methods
of concluding the ANOVA hypothesis test, both of which produce the same result:
- The textbook method is to compare the observed value of F with the critical value of F determined from tables. The critical value of F is a function of the degrees of freedom of the numerator and the denominator and the significance level (α). If F ≥ FCritical, the null hypothesis is rejected.
- The computer method calculates the probability (p-value) of a value of F greater than or equal to the observed value. The null hypothesis is rejected if this probability is less than or equal to the significance level (α).
The ANOVA F-test is
known to be nearly optimal in the sense of minimizing false negative errors for
a fixed rate of false positive errors (i.e. maximizing power for a fixed
significance level). For example, to test the hypothesis that various medical
treatments have exactly the same effect, the F-test’s p-values closely
approximates the permutation test's p-values: The approximation is particularly
close when the design is balanced. Such permutation tests characterize tests
with maximum power against all alternative hypotheses, as observed by
Rosenbaum. The ANOVA F–test (of the null-hypothesis that all treatments have
exactly the same effect) is recommended as a practical test, because of its
robustness against many alternative distributions.
EXTENDED
LOGIC
ANOVA consists of
separable parts; partitioning sources of variance and hypothesis testing can be
used individually. ANOVA is used to support other statistical tools. Regression
is first used to fit more complex models to data, then ANOVA is used to compare
models with the objective of selecting simple(r) models that adequately
describe the data. "Such models could be fit without any reference to
ANOVA, but ANOVA tools could then be used to make some sense of the fitted
models, and to test hypotheses about batches of coefficients." "We
think of the analysis of variance as a way of understanding and structuring
multilevel models—not as an alternative to regression but as a tool for
summarizing complex high-dimensional inferences ..."

