STATISTICAL COMPUTING ACTIVITY: MDS
FROM DISSIMILARITY MATRIX
You can use
R only for the symmetric dissimilarity (distance) matrices, i.e., higher values
implies higher distances.
Consider
three cities A, B, and C, and another group of cities D, E, F with the
following distance matrices:
|
A |
B |
C |
A |
0 |
|
|
B |
50 |
0 |
|
C |
50 |
50 |
0 |
|
D |
E |
F |
D |
0 |
|
|
E |
50 |
0 |
|
F |
100 |
50 |
0 |
Let us use
classical MDS on these to distance matrices by using R.
First enter
the matrices:
>abc=matrix(c(0,50,50,50,0,50,50,50,0),nrow=3,
dimnames=list(c(³A², ³B², ³C²,), c(³A², ³B², ³C²)))
>def=matrix(c(0,50,100,50,0,50,100,50,0),nrow=3,
dimnames=list(c(³D², ³E², ³F²,), c(³D², ³E², ³F²)))
Note that
we have named the rows and columns for plotting. First, guess how these cities
will be plotted.
Now we
are ready to use classical MDS:
>locabc=cmdscale(abc)
>locabc
>xabc=locabc[,1]
>yabc=locabc[,2]
>plot(xabc,yabc,type=²n²,xlab=²²,ylab=²²,main=³cmdscale(abc)²)
>text(xabc,yabc,rownames(abc),cex=0.8)
Is the
plot meaningful?
>locdef=cmdscale(def)
>locdef
>xdef=locdef[,1]
>ydef=locdef[,2]
>plot(xdef,ydef,type=²n²,xlab=²²,ylab=²²,main=³cmdscale(def)²)
>text(xdef,ydef,rownames(def),cex=0.8)
Is the
plot meaningful?
Now let
us get the one-dimensional solution for both cases
>plot(xabc, type=²n²,xlab=²²,main=³cmdscale(abc)²)
>text(xabc,rownames(abc),cex=0.8)
>plot(xdef,type=²n²,xlab=²²,main=³cmdscale(def)²)
>text(xdef,rownames(def),cex=0.8)
Discuss
how meaningful these plots are.
Our goal is to reproduce the observed distance matrix by using fewer dimensions to reduce the observed complexity of the nature. This example shows fewer factors may produce a worse represntation of a distance matrix than would more factors. For the cities A, B, and C there is no way to arrange the three cities on one line so that the distances can be reproduced. On the other hand cities D, E, F can be arranged in one dimension nicely as follows:
D ------50 miles------ E ------50 miles------ F
PART 2:
COLA DATA FOR SUBJECT 1
Step 1. Download cola.txt from the web
site.
Step 2. Run R
Step 3. Go to File menu and ³Change
directory² to the location that you have saved the file
>cola=read.table(³cola.txt², header=T)
>loccola=cmdscale(cola)
>loccola
>xcola=loccola[,1]
>ycola=loccola[,2]
>plot(xcola,ycola,type=²n²,xlab=²²,ylab=²²,main=³cmdscale(cola)²)
>text(xcola,ycola,names(cola),cex=0.8)
Interpret
the results
STATISTICAL COMPUTING ACTIVITY:
MDS
Step 1. Download morse.xls from the course
site
Step 2. Run SYSTAT
File
Open
Data
Change Files of Type to All Files (*.*)
Locate the folder that you have saved the file
Select the file
Click Open
Now we have
to let SYSTAT know that this is a similarity matrix
Click on Options and select Similarity, then OK
Select
the folder that you want to save the file and name it, say morsesim.
Save
Open the file that you have saved
Analysis
Scale
Multidimensional
Scaling
From Available variable(s) window select number1, Š,
number10 by double clicking on them
Make sure that Square (similarities model) is selected
Click OK
Suppose that you have thought that this is a dissimilarity matrix. Let us see what will happen:
Step 1. Download morse.xls from the course
site
Step 2. Run SYSTAT
File
Open
Data
Change Files of Type to All Files (*.*)
Locate the folder that you have saved the file
Select the file
Click Open
Now we have
to let SYSTAT know that this is a similarity matrix
Click on Options and select Dissimilarity, then OK
Select
the folder that you want to save the file and name it, say morsedissim.
Save
Open the file that you have saved
Analysis
Multidimensional
Scaling
From Available variable(s) window select number1, Š,
number10 by double clicking on them
Make sure that Square (similarities model) is selected (IF YOU START WITH THE DATA
MATRIX SELECT RECTANGULAR)
Click OK
Discuss the differences in your interpretation for the similarity and dissimilarity matrices.
Carry out the MDS analysis for the data given in Table 5.3 which is located at country.xls. Note how data has been entered.