STATISTICAL COMPUTING ACTIVITY:
PRINCIPAL COMPONENT ANALYSIS FROM DATA
The data
set that we will be using is the 1988 Olympic Decathlon, given in Table 2.2.
Note that
Events: (1)
100m, (2) long jump, (3) shot putt, (4) high jump, (5) 400m, (6) 110m hurdles,
(7) discus, (8) pole vault, (9) javelin, (10) 1500m.
Step 1. Download olympic.xls from the web
site.
Step 2. Click on the file to open Excel
(We need to
modify the data first. That is the reason why we are running Excel.)
Step 3. Highlight the last row and delete it
by going Edit/Clear/All.
(Since this
athlete is a suspected outlier.)
Step 4. Transform the data by taking
negative values for the four running events, i.e., event1, event5, event6,
event10.
(So that
all events are scored in the same direction; small scores reflect poor
performance, large scores reflect the good performance.)
Step 5. Save the file as modoly.xls and
modoly.txt
Or
Step 1. Download modifiedolympic.txt from
the web site.
Step 2. Run R
Step 3. Go to File menu and ³Change
directory² to the location that you have saved the file
>modoly=read.table(³modifiedolympic.txt², header=T)
>attach(modoly)
To see the
variable names and data type
> modoly
Let us do
PCA by using correlation matrix
>prcomp(modoly, scale=T)
This will
give you Table 3.3. Note that R gives you the standard deviations, Table 3.3
gives you the variances,
To save
results, type
>modoly.pc=prcomp(modoly, scale=T)
Now try
>summary(modoly.pc)
To get a
scree plot
>plot(modoly.pc)
To produce
a biplot
>biplot(modoly.pc)
>princomp(modoly, cor=TRUE)
>modoly.pc=princomp(modoly, cor=T)
>summary(modoly.pc, loadings=TRUE)
To get the
scree plot
>screeplot(modoly.pc)
To produce
a biplot
>biplot(modoly.pc)
>plot(modoly.pc$scores[,1],modoly.pc$[,2],xlab=²PC1²,ylab=²PC2²)
This is
Figure 3.3. Note that the signs are arbitrary in PCA.
STATISTICAL COMPUTING ACTIVITY:
PRINCIPAL COMPONENT ANALYSIS FROM A GIVEN COVARIANCE OR CORRELATION MATRIX
First we
need to learn how to enter a matrix
>example1=matrix(c(1,2,3,4,5,6), nrow=3, byrow=T)
To see the
matrix
>example1
Now try
>example2=matrix(c(1,2,3,4,5,6), nrow=2, byrow=T)
>example2
Now enter
the correlation matrix for the weekly rates of return for Allied Chemical,
duPont, Union Carbide, Exxon, and Texaco.
>return=matrix(c(1, .577, .509, Š , .523, 1), nrow=5,
byrow=T)
>eigen(return)
STATISTICAL COMPUTING ACTIVITY:
PRINCIPAL COMPONENT ANALYSIS FROM DATA
Step 1. Download olympic.xls from the web
site.
Step 2. Run SYSTAT
File
Open
Data
Change Files of Type to All Files (*.*)
Locate the folder that you have saved the file
Select the file
Click Open
Click on line 34
Edit
Cut
To change
the signs of event1, event5, event6, event10
Data
=Let
Type negevent1 in the Variable box and -event1 in the Expression
box
Repeat this for event5, event6, and event10
Analysis
Multivariate
Analysis
Factor
Analysis
From Available variable(s) window select negevent1, event2,
event3, event4, negevent5, negevent6, event7, event8, event9, negevent10 by
double clicking on them
Make sure that Principal components (PCA) is selected as
Method and Correlation is selected as Matrix for extraction
Get rid off 1 and type 0 in the Minimum eigenvalue window.
Check extended results
Click on Save tab
Select Factor scores
Check Save data with scores
Click on Š at the end of the Filename and select where you
want to save the file
Type a file name, say olympicpca
Click OK
STATISTICAL COMPUTING ACTIVITY:
PRINCIPAL COMPONENT ANALYSIS FROM A GIVEN COVARIANCE OR CORRELATION MATRIX
Over a
period of five years, yearly samples of fishermen on 28 lakes in Minnesota were
asked to report the time they spent fishing and how many of each type of gane
fish they caught. Their responses were then converted to a catch rate per hour
for
x1=Bluegill,
x2=Black crappie, x3=Smallmouth bass, x4=Largemouth
bass, x5=Walleye, x6=Northern pike.
Fish caught
by the same fisherman live alongside of each other, so the data should provide
some evidence on how the fish group. The first four fish belong to the
centrarchids, the most plentiful family. The walleye is the most popular fish
to eat.
Now let us
carry out principal component analysis on this data
First we
need to enter correlation matrix to SYSTAT
Utilities
Matrix
Read
Give a name to your matrix, say fish
Since
the matrix is symmetric it is enough to
enter diagonal and lower diagonal elements of the matrix remaining
values will be entered as ³.²:
On the keyboard data window you enter the matrix like
this
firstrow; secondrow; thirdrow Š (That is rows elements
are separated with commas, rows separated with semi colons. Here it is:
1, ., ., ., ., .; .49919, 1, ., ., ., .; .2635, .3127, 1,
., ., .; .4653, .3506, .4108, 1, ., .; ŠŠ..
Check save matrix, by clicking on Š on
the next window select the folder that you would like to save the file and type
a name, say fishcor. Also select Correlation from the last window.
Click OK
Now we
will load the file
Edit
Open
Data
Locate the fishcor and double click.
Analysis
Multivariate
Analysis
Perform a PCA using only x1 through x4.
Perform a PCA using all six variables.
Interpret your results.