7.2 Capability Analysis - comparison of data to process specifications

This statlet performs a process capability analysis. The goal of such an analysis is to compare a sample of items obtained from a production facility against established specifications for a product. It will estimate the percent of the product that will meet the specifications and compute several capability indices. In addition, statistical tolerance intervals can be computed to determine new specifications if current limits are not being met.

The statlet is designed to handle specifications of various forms:

Two-sided symmetric specifications such as 250 ± 100 psi.
One-sided specifications such as 16 ounces or greater.
Asymmetric specifications such as 50 to 350, with a target of 150.

The tabs are:

Example

The example data consists of the measured breaking strengths of 100 glass bottles sampled randomly from a day's production:

Specifications for the process require that breaking strength be within the interval 250 +/- 100 psi.

Input

The input panel requires you to enter the name of a column containing numeric data and the specifications for the process:

You must enter at least one of the two specification limits. The nominal or target value is optional.

Note: the spinner permits you to select a transformation for the data. If you do, the specification limits will be automatically transformed in a similar manner.

Summary

The analysis summary table shows a detailed comparison of the data to the specification limits:

The output displays several important pieces of information:

Distribution - the assumed distribution, sample size, and sample statistics. Unless you specify otherwise by pressing the Fit tab, the data is assumed to come from a normal distribution.

Observed Beyond Limits - shows the number of data values found outside the specification limits. In this case, all bottles were within the specification.

Predicted Beyond Limits - the area of the fitted distribution below the lower spec limit and above the upper spec limit. This provides an estimate of the proportion of the population which will be out of spec. In this case, the program estimates that about 0.38% of all bottles will be beyond the specification limits, most of which will be on the high side.

Histogram

This tab shows a frequency histogram together with the fitted normal distribution:

The short vertical lines are at located at the mean ± 3 standard deviations. Recall that these 3-sigma limits contain 99.73% of the normal distribution. If the data is consistent with the specifications, the 3-sigma limits should be inside the specification limits, which are shown as taller lines. In this case, the lower 3-sigma limit is within the specification, but the upper 3-sigma limit is not.

Also displayed are the values of several common capability indices. You may select the indices to be displayed using the Options button on the Indices tab.

Options button

The Options button allows you to change the number of classes in the histogram:

Indices

This tab displays a number of commonly calculated capability indices:

Capability indices are used to summarize how well a set of data conforms to a specification.

The capability indices are defined as follows:

Cp = (USL - LSL) / 6s

Cpk = min[(xbar - LSL) / 3s , (USL - xbar) / 3s ]

Cr = 1 / Cp

Cpm = (USL - LSL) / 6s'

where s' = a modified sample standard deviation calculated by replacing the sample average with the nominal value

K = (xbar - nominal) / [(USL - LSL) / 2]

The index Cp compares the distance between the specification limits to the 6-sigma range and is only defined for two-sided specifications. Cpk compares the distance between the mean and the nearer spec limit to 3-sigma. In either case, the index must be greater than 1.0 for the process to be capable of keeping virtually all of the product within specification, although most companies prefer values of Cp and Cpk to be greater than 1.33.

Cr, the capability ratio, is the reciprocal of Cp. Cpm is similar to Cp, except it uses a modified definition of the standard deviation which measures the variability of the data around the nominal value rather than the sample mean. This modification takes into account whether or not the distribution is properly centered. If it is not, Cpm will be considerably less than Cp.

K measures how far the mean of the distribution is away from the nominal value and should be close to 0.

For the bottle data, Cp is just slightly greater than 1.0, indicating that the 6-sigma spread is slightly tighter than the distance between the specification limits. However, Cpk is only 0.89, indicating that the 3-sigma limit on the high side falls outside the upper spec limit. The value of K=.14 indicates that the mean of the data is about 14% of the way between the nominal value and the upper specification limit.

Also shown are 95% confidence intervals for some of the capability indices. These intervals quantify the uncertainty surrounding the calculated indices with respect to the population from which the data came.

Options button

The Options button allows you to select which capability indices will be computed:

This also affects which indices are displayed on the frequency histogram. The confidence level affects the confidence intervals.

Fit

A number of the results displayed above are highly dependent on the assumption that the data come from a normal distribution:

The estimated percentage out of spec is computed from the area of the tails of the normal distribution beyond the specification limits.

Interpretation of the capability indices depends on the fact that the mean ± 3 sigma contains almost all of the area under a normal curve. This is not necessarily true for other distributions.

Before relying on the above statistics, it is important to assess whether the assumption of a normal distribution is reasonable. This tab runs two formal statistical tests to determine whether the selected distribution adequately fits the data:

The two tests performed are the:

Chi-squared test - compares the number of observations in each bar of the histogram to the expected number given the fitted distribution.

Kolmogorov-Smirnov test - compares the cumulative distribution of the data to that of the fitted distribution.

In each case, the important result is the P-value for the test. A P-value below 0.05 indicates that the assumed distribution does not adequately fit the data at the 5% significance level.

The bottle data passes both of the tests, although barely. Consequently, there is no reason to reject the use of the normal distribution. If the P-value for either test fell below 0.05, then you might consider transforming the data using a logarithm or square root or selecting a different distribution by pressing the Options button.

Options button

Several distributions other than the normal can be selected to model the data:

If you select a different distribution, all of the statistics computed above will be adjusted for that distribution.

Prob. plot

This plot is used to determine whether or not a sample of data are approximately distributed according to a normal distribution:

Any major departures from the straight line, which corresponds to the best-fitting normal distribution, would indicate that the data are not normally distributed. The above plot shows only minor departures from normality.

Options button

The Options button allows you to change the orientation of the plot and remove the fitted line is desired:

Tolerances

In situations where the specifications for the product are not set in concrete, it may be useful to ask what reasonable specifications for the process would be based on the observed data. This tab computes normal tolerance limits from the data:

Normal tolerance limits make a statement about a specified proportion of the population at a selected level of confidence. For example, the output above states that:

"We can be 95% confident that 99% of the bottles will fall between 170.1 and 358.0 psi."

The interval is formed by taking the sample mean and adding ± 2.934 times the sample standard deviation. This takes into account both the spread of the normal distribution and also possible sampling errors in estimating the mean and standard deviation.

Provided the data comes from a normal distribution, the normal tolerance limits may be used to establish specifications for the process.

Options button

The Options button allows you to specify both the level of confidence and the proportion of the population to be bound:

You might consider setting the population percentage to a number such as 99.99%, which would guarantee that all but 1 in 10,000 items would fall within the resulting limits.