STATISTICAL COMPUTING ACTIVITY:
HIERARCHICAL CLUSTER ANALYSIS
R carries
out hierarchical cluster analysis on a set of dissimilarities. Therefore you
need to get the dissimilarities by using dist( ).
The general
comments are
>hclust(d,
method=²complete²), (methods that
are available are: ward, single, complete, average, mcquitty, median,
centroid.)
>plot(clust(d,
method=²complete²), cex=0.6) produces the tree, cex controls for the font size.
Step 1. Download uscrime.txt from the web
site.
Step 2. Run R
Step 3. Go to File menu and ³Change
directory² to the location that you have saved the file
>uscrime=read.table(³uscrime.txt², header=T)
>uscrime
>hclust(dist(uscrime),method=²single²)
>plot(hclust(dist(uscrime),method=²single²),cex=0.8)
Also try
>plot(hclust(dist(uscrime),method=²single²),cex=0.8,
hang=-1)
Discuss the differences between these two plots.
Now change the method to complete
>plot(hclust(dist(uscrime),method=²complete²),cex=0.8)
and then to average
>plot(hclust(dist(uscrime),method=²average²),cex=0.8)
Discuss the
differences and interpret the results
Now we will
trim(cut) the tree. We do that by selecting a cut value from the cluster
dendogram (h is the cut value that you select)
>
plot(hclust(dist(uscrime),method=²complete²),cex=0.8)
Let us cut the tree at height 700
>
cutree(hclust(dist(uscrime),method=²complete²),h=700)
Interpret the clusters.
Now try
> cutree(hclust(dist(uscrime),method=²complete²),h=800)
Suppose you would like to have k=5 groups
> cutree(hclust(dist(uscrime),method=²complete²),k=5)
Note that number of groups (k) overrides height cut-off (h).
Now type
>
cutree(hclust(dist(uscrime),method=²complete²),k=c(1,2,3,4,5,6,7,8))
Now interpret the output
STATISTICAL COMPUTING ACTIVITY:
HIERARCHICAL CLUSTER ANALYSIS
Step 1. Download motivation.xls from the
course site
Step 2. Run SYSTAT
File
Open
Data
Change Files of Type to All Files (*.*)
Locate the folder that you have saved the file
Select the file
Click Open
First plot
the data
How many
clusters do you see?
Now let us
carry out agglomerative hierarchical clustering
Analysis
Multivariate
Analysis
Hierarchical
Select X and Y by double clicking on them.
Make sure Linkage shows Single
Check Save cluster identifiers
Click on after the text box and locate where you want to
save the file and give a name, say motivationcluster
Click OK
Open motivationcluster
Copy cluster column
Open motivation.xls
Paste the cluster column and name it cluster
Graph
Plot
Scatterplot
Select X and Y by double clicking on them.
Click on Symbol and Label tab
Check Display case labels
Select cluster
Click OK
What do you think about the clusters created?
Now, repeat the above steps, but this time use Complete for Linkage. Also try Average. What is your opinion on these results?