STATISTICAL COMPUTING ACTIVITY: (CHAPTER 2 CONTD.)

 

 

BUBBLE PLOTS

 

The scatterplots can only display the relationship between two variables. A third variable can be displayed by adding circles with radii proportional to its values. Now, let us produce a bubbleplot in R.

 

Step 1. Download husb.text from the web site.

Step 2. Run R

Step 3. Go to File menu and ÒChange directoryÓ to the location that you have saved the file

 

>husb=read.table(Òhusb.txtÓ, header=T)

>attach(husb)

>husb

         

>plot(wifeage, husbage)

(Now, let us add circles to the scatterplot)

>symbols(wifeage, husbage, circles=husagefi, inches=0.2, add=TRUE)

 

Interpret the graph. Can you think of any other way of displaying a third variable in a scatterplot?

 

For the chiplots bivariate boxplots, and bivariate density estimate plots we need to load functions to R. To load them please do the following:

 

Step 1. On the course website click on the LEARNING NOTES.

Step 2. Click on the R Functions for chiplot, bivariate boxplot, and bivariate density plot.

Step 3. Highlight every thing and copy.

Step 4. Go back to R and paste.

 

CHIPLOTS

 

The chiplots are designed to help researchers to judge whether or nor the variables are independent by augmenting the scatterplot with an auxiliary display. 

 

>chiplot(husbage, wifeage, vlabs=c(ÒHusband AgeÓ, ÒWife AgeÓ))

 

INTERPRETATION: In the case of independence, the points will be concentrated in the central region, in the horizontal band indicated on the plot.

 

Suppose that the variables are negatively related. Where do you think that the points will be concentrate?

 

Can you say any thing on how strong the relationship is by looking at the chiplot? How?

 

Now produce a chiplot for the husbhei and wifehei, and interpret.

 

BIVARIATE BOXPLOTS

 

Bivariate boxplots is a two-dimesional analogue of the box-and-whisker plots. Just like univariate boxplots they use the robust measures of location, scale, and correlation. They are used to understand distributional properties of the data and to identify possible outliers. In these plots there are two concentric ellipses, one of them includes 50% of the data (ÒhingeÓ), the other one delineates potential outliers (ÒfenceÓ). The graph also provides resistant regression lines of both y on x and x on y. Small acute angle between regression lines indicates large absolute value of correlations.

 

>bvbox(cbind(husbage,wifeage), xlab=ÓHusband AgeÓ, ylab=ÓWife AgeÓ)

 

If you would like to use nonrobust estimators (means, variances, and correlation coefficient), use the following:

 

>bvbox(cbind(husbage,wifeage), xlab=ÓHusband AgeÓ, ylab=ÓWife AgeÓ, method=ÓOÓ)

 

Now produce a bivariate boxplot for the husbhei and wifehei, and interpret.

 

BIVARIATE DENSITY ESTIMATES

 

From scatterplots one can see the ÒclustersÓ, regions where there are low and high density of observation, and spot ouliers. Bivariate density estimates help researchers in these two interpretations. Plot of bivariate density estimates can be seen as the smoothed two-dimensional histograms.

 

To get the bivariate density estimates using a normal kernel

 

>den1=bivden(husbage, wifeage)

 

To construct a perspective plot of the density values

 

>persp(den1$seqx, den1$seqy, den1$den, xlab=ÓHusband AgeÓ, ylab=ÓWife AgeÓ, zlab=ÓDensityÓ)

 

To change the viewing direction you can define two angles; theate (azimuthal direction) and phi (colatitude).

 

>persp(den1$seqx, den1$seqy, den1$den, xlab=ÓHusband AgeÓ, ylab=ÓWife AgeÓ, zlab=ÓDensityÓ, theta=30, phi=30)

 

Now, try different viewing directions.

 

To add contour  plot of density values to a scatterplot, we need to produce the scatterplot first

 

>plot(husbage, wifeage)

 

Then add the contour plot

 

>contour(den1$seqx, den1$seqy, den1$den, nlevels=20, add=T)

 

Now change the number of cuts, nlevels, to 10.

 

Interpret the graph.

 

Produce density estimate plot for the husbhei and wifehei, and interpret.