5.3.2 Randomized Block Design - comparison of samples taken in blocks

This statlet compares samples taken from several different populations, where observations are grouped according to a blocking factor. It calculates summary statistics, performs an analysis of variance, and displays the data in various ways. The tabs are:

Input

Scatterplot

Stats

Example

The example data consists of the measured breaking strength of 32 widgets. 8 widgets were made from each of 4 materials, and the resulting strengths are shown below:

In this example, we will consider each row to represent a separate block (perhaps defining the day on which the widgets were made).

The Oneway ANOVA with Blocking statlet performs a similar analysis to this statlet, with all measurements placed into a single column and additional columns indicating the type of material and block number.

Input

Enter the names of the columns containing each of the data samples:

Each row is assumed to constitute a separate block.

Scatterplot

This tab shows a plot of the data values by column:

Notice that the strengths of the widgets made from materials A and B appear to be greater than that of widgets made from materials C and D.

Options button

Specify the desired orientation for the plot:

Indicate the axis along which the data values will be displayed.

Stats

This tab displays a table of summary statistics for each column:

Initially, a set of commonly computed statistics is displayed. You can request other statistics by pressing the Options button.

For a discussion of other summary statistics, refer to the One Variable Analysis statlet.

Options button

The Options button permits you to select any of 18 different statistics to compute:

For definitions of each of these statistics, refer to the Glossary.

Boxplot

This tab displays box-and-whisker plots for the different samples:

Box-and-whisker plots are useful graphs for displaying many aspects of samples of numeric data. Their construction is discussed in the One Variable Analysis statlet.

Options button

The Options button permits you to control various aspects of the plot:

These include:

Median notches - if checked, notches are added to the boxes indicating the location of uncertainty intervals for the medians. These intervals are scaled in such a way that if two intervals do not overlap, there is a statistically significant difference between the two population medians at the indicated confidence level.

Mean symbols - if checked, the location of the sample means are shown as plus signs.

Flagged outside points - if checked, the whiskers extend outward to the most extreme points which are not outside points. All outside points are marked with special symbols. If not checked, the whiskers extend outward to the minimum and maximum values in the samples.

Confidence level - specifies the confidence level for the median notches.

Direction - specifies the orientation of the plot.

ANOVA

This tab displays an analysis of variance table for the data:

The table takes the original variability among the 32 measured breaking strengths and divides it into three components:

A "Between rows" component, which measures contribution to overall variability due to differences between the levels of the blocking factor.
A "Between columns" component, which measures contribution to overall variability due to differences between the levels of the experimental factor (material).
A "Residual" component, which measures the variability due to all other factors. This third component is sometimes referred to as the experimental error.

The P-values in the rightmost column of the table are used to determine whether significant differences exist between rows and between columns. P-values below 0.05 indicate statistically significant differences at the 5% significance level. In the above example, there are significant differences between both rows and columns.

Means table

This tab displays the mean of each group together with uncertainty intervals:

The uncertainty intervals bound the estimation error in the means using one of several methods, selected by pressing the Options button. The choice of intervals is described in detail in the Completely Randomized Design statlet.

Options button

Select the desired type of uncertainty intervals:

Means plot

This tab plots the group means with uncertainty intervals:

See the discussion under Means table for an explanation of the intervals.

Options button

Select the desired type of uncertainty intervals:

Range tests

This tab indicates which means are significantly different from which others:

The output of this tab is described in the Completely Randomized Design statlet.

Options button

Select the desired procedure and the level of confidence:

Resids vs level

This tab plots the residuals by group:

In a oneway analysis of variance, the residuals are equal to the data values minus the mean of the group from which they come. If the standard deviations within each group are the same, we should see approximately the same scatter amongst the residuals for each material. In this case, there are small differences between the spreads of the residuals in the four groups. We don’t usually start to worry, however, until the standard deviations differ by more than a factor of 3 to 1 between the largest and the smallest. The analysis of variance is known to be reasonably robust at differences of less than this magnitude, which means that the confidence levels stated are approximately correct. The differences here are much less than 3 to 1.

Resids vs predict

This tab plots the residuals versus predicted values of strength:

It is useful for detecting possible violations of the assumption of constant within group variability. Frequently, the variability of measurements increases with their mean. If so, the above plot would show points falling into a funnel-shaped pattern, increasing in spread from left to right. Such observed heteroscedasticity may often be eliminated by analyzing the logarithms of the data rather than the original data values.

H₀: all group medians are equal

H_A: all group medians are not equal

A P-value below 0.05, as in the above table, indicates a statistically significant difference between the medians at the 5% significance level.