STATISTICAL COMPUTING ACTIVITY: CHAPTER 1 & 2

 

 

 

Step 1. Download husb.text from the web site.

Step 2. Run R

Step 3. Go to File menu and ÒChange directoryÓ to the location that you have saved the file

 

>husb=read.table(Òhusb.txtÓ, header=T)

>attach(husb)

 

To see the variable names and data type

 

>husb

 

Let us get some descriptive statistics

 

>mean(husb)

>var(husb)

>sd(husb)

>median(husbage)

(Note that you need variable name not the data file name)

>fivenum(wifeage)

(Note that you need variable name not the data file name)

>summary(husb)

>cov(husb)

>cor(husb)

 

This will produce Figure 2.1 (a) which is a simple scatterplot of wifeÕs age versus husbandÕs age.

 

>plot(husbage,wifeage)

 

Note how the arguments are entered. The form is plot(x,y), first the explanatory variable (x), then the response variable (y).

 

Now, let us add an y=x line, i.e. any point on this line will imply husbandÕs age=wifeÕs age. This is the Figure 2.1 (b)

 

>abline(0,1)

 

Here the form is abline(intercept,slope).

 

If you would like to add a least squares regression line you need to use

 

>abline(lm(wifeage~husbage))

 

The symbol Ò~Ó means relationship, Òlm" means linear model.

 

Next let us add noise to husbage and wifeage to prevent overlapping. This is called jittering.

 

> plot(jitter(husbage),jitter(wifeage))

 

This is the Figure 2.1 (c).

 

Let us show the marginal distribution of the variables

 

>rug(jitter(husbage), side=1)

>rug(jitter(wifeage), side=2)

 

In this case side 1 is the horizontal axis, side 2 is the vertical.

 

Now, we have the Figure 2.1 (d).

 

To analyze the age difference between wife and husband, let us find the age difference

 

>agediff=husbage-wifeage

 

Let plot husbandÕs age at the first marriage versus the age difference

 

>plot(agediff, husagefi)

 

Let us add a line on which there is no age difference, i.e., agediff=0

 

> abline(v=0)

 

Here ÒvÓ is for vertical ÒhÓ for horizontal.

Now, we have the Figure 2.2.

 

To produced Figure 2.3 (a) and (b):

 

>plot(husbhei, wifehei) 

>abline(0,1)

 

For Figure 2.4

 

>heidiff=husbhei-wifehei

>plot(agediff,heidiff)

>abline(v=0)

>abline(h=0)

 

Now, tell me what is the main thing that you did not like in this example. (My brain was fighting with my hands when I was preparing this handout.)

 

Labeling the Scatterplot with Row Names

 

Enter the following data to Excel

 

species

bodywt

brainwt

PotarMonkey

10.0

115

Gorilla

207.0

406

Human

62.0

1320

RhesusMonkey

6.8

179

Chip

52.2

440

 

Save as primates.txt

 

To put labels to the points on the scatterplot

Step 1. Run R

Step 2. Go to File menu and ÒChange directoryÓ to the location that you have saved the file

 

>primates=read.table(Òprimates.txtÓ, header=T)

>attach(primates)

>row.names(primates)=species

>plot(bodywt, brainwt)

>text(x=bodywt, y=brainwt, labels=row.names(primates))

 

 

 

From Bivariate to Multivariate

 

To produce a scatterplot matrix you can either use

 

>pairs(husb)

 

Or

 

>plot(husb)

 

To smooth the data

 

> plot(husb, lower.panel=panel.smooth)

 

 

Coplots

 

To produce a conditional scatterplot of wifeage versus husbage given the husbhei use

 

>coplot(wifeage~husbage|husbhei)

 

To smooth the coplot by using lowess

 

> coplot(wifeage~husbage|husbhei, panel=panel.smooth)

 

Quantile-Quantile Plots (Normal Plots)

 

To produce a normal plot use

 

>qqnorm(husbage)

 

To add a line

 

> qqline(husbage)