STATISTICAL COMPUTING ACTIVITY: HIERARCHICAL CLUSTER ANALYSIS

 

 

R carries out hierarchical cluster analysis on a set of dissimilarities. Therefore you need to get the dissimilarities by using dist( ).

The general comments are

>hclust(d, method=²complete²),  (methods that are available are: ward, single, complete, average, mcquitty, median, centroid.)

>plot(clust(d, method=²complete²), cex=0.6) produces the tree, cex controls for the font size.

 

Step 1. Download uscrime.txt from the web site.

Step 2. Run R

Step 3. Go to File menu and ³Change directory² to the location that you have saved the file

 

>uscrime=read.table(³uscrime.txt², header=T)

>uscrime

 

>hclust(dist(uscrime),method=²single²)

>plot(hclust(dist(uscrime),method=²single²),cex=0.8)

 

Also try

>plot(hclust(dist(uscrime),method=²single²),cex=0.8, hang=-1)

Discuss the differences between these two plots.

 

Now change the method to complete

>plot(hclust(dist(uscrime),method=²complete²),cex=0.8)

and then to average

>plot(hclust(dist(uscrime),method=²average²),cex=0.8)

 

Discuss the differences and interpret the results

 

Now we will trim(cut) the tree. We do that by selecting a cut value from the cluster dendogram (h is the cut value that you select)

> plot(hclust(dist(uscrime),method=²complete²),cex=0.8)

Let us cut the tree at height 700

> cutree(hclust(dist(uscrime),method=²complete²),h=700)

Interpret the clusters.

Now try

> cutree(hclust(dist(uscrime),method=²complete²),h=800)

 

Suppose you would like to have k=5 groups

> cutree(hclust(dist(uscrime),method=²complete²),k=5)

Note that number of groups (k) overrides height cut-off (h).

Now type

> cutree(hclust(dist(uscrime),method=²complete²),k=c(1,2,3,4,5,6,7,8))

Now interpret the output

 

 

STATISTICAL COMPUTING ACTIVITY: HIERARCHICAL CLUSTER ANALYSIS

 

Step 1. Download motivation.xls from the course site

Step 2. Run SYSTAT

File

            Open

                        Data

 

Change Files of Type to All Files (*.*)

Locate the folder that you have saved the file

Select the file

Click Open

 

First plot the data

Graph

Plot

Scatterplot

 

How many clusters do you see?

Now let us carry out agglomerative hierarchical clustering

Analysis

            Multivariate Analysis

                        Cluster Analysis

                                    Hierarchical

Select X and Y by double clicking on them.

Make sure Linkage shows Single

Check Save cluster identifiers

Click on Š after the text box and locate where you want to save the file and give a name, say motivationcluster

Click OK

 

Open motivationcluster

Copy cluster column

Open motivation.xls

Paste the cluster column and name it cluster

 

Graph

            Plot

                        Scatterplot

Select X and Y by double clicking on them.

Click on Symbol and Label tab

Check Display case labels

Select cluster

Click OK

What do you think about the clusters created?

Now, repeat the above steps, but this time use Complete for Linkage. Also try Average. What is your opinion on these results?