Homework 8 ======================================================== ```{r , echo=FALSE} #This code turns off annoying warnings and messages # from appearing in your output file opts_chunk$set(warning = FALSE, message = FALSE) ``` ```{r , echo=FALSE} # this block loads R packages that may be needed for the analysis. library(knitr) library(MASS) library(nlme) library(car) library(alr3) ``` Your Name ------------------------------ ### Problem 7.5 The Union Bank of Switzerland publishes a report entitled Prices and Earnings Around the Globe on their web site, www.ubs.com. The data in BigMac2003 are taken from their 2003 version for 70 world cities. These data are found in the alr3 library in R. The R command code chunks are approximately labeled to the corresponding problem numbers to help you identify which code blocks correspond to the questions. ```{r twinprep, echo=FALSE,warning=FALSE,message=FALSE} library(alr3) library(car) data(BigMac2003) ``` 7.5.1 a) Draw the scatterplot with BigMac on the vertical axis and FoodIndex on the horizontal axis. Provide a qualitative description of this graph. (1 pt) ```{r , fig.height=6,fig.width=7} with(BigMac2003, plot(BigMac ~ FoodIndex)) ``` * Give description: b) Use an inverse fitted value plot and the Box-Cox method to find a transformation of BigMac so that the resulting scatterplot has a linear mean function. Give a single value of the lambda you suggest based on this evidence. (2 pts) ```{r} modFI <- lm(BigMac ~ FoodIndex, data=BigMac2003) ``` ```{r , fig.height=6,fig.width=7} inverseResponsePlot(modFI, lambda=c(-1,0,0.5,1)) ``` ```{r } boxCox(modFI) ``` ```{r , fig.height=6,fig.width=7} boxCox(modFI) ``` Provide the value of lambda you suggest based on the graphical evidence provided: In the code block below, please specify the value of lambda you suggest. The code will make a summary graph of your suggested transform of BigMac graphed against FoodIndex. ```{r , fig.width=6,fig.width=7} lambda <- c(1) # you supply the correct lambda put # the value inside the vector c(1) instead of the value 1. with(BigMac2003, plot(BigMac^(lambda) ~ FoodIndex)) ``` c) Two of the cities, with very large values for BigMac, are very influential for selecting the transformation. You should do part(b) of this exercise again with those two cities removed. How does the suggested transform change? Do these cities greatly affect the suggested transform? (2 pts) ```{r } Case.names <-row.names(BigMac2003) jon <- subset(BigMac2003, Case.names !="Nairobi" & Case.names != "Karachi") m1 <- lm(BigMac~FoodIndex, data=jon) ``` ```{r , fig.height=6,fig.width=7} inverseResponsePlot(m1, lambda=c(-1,0,0.5,1)) ``` ```{r , fig.height=6,fig.width=7} boxCox(m1) ``` Give answers here: * How does the suggested transform change ? * Do these cities greatly affect the suggested transform? 7.5.2 a) Draw the scatterplot matrix of the variables (BigMac, Rice, Bread), and use the multivariate Box-Cox procedure to decide on normalizing transformations. Big Mac is response. (2pts) ```{r , fig.width=7,fig.height=6} scatterplotMatrix(~ BigMac + Rice + Bread, smooth=FALSE, reg.line=FALSE, data=BigMac2003) ``` ```{r } ans <- powerTransform(cbind(Rice, Bread) ~ 1,data=BigMac2003) summary(ans) ``` ```{r , fig.height=6,fig.width=7} scatterplotMatrix(~ Rice + Bread, transform=TRUE, family="bcPower", smooth=FALSE, reg.line=FALSE, data=BigMac2003) ``` Describe normalizing transformations: b) Test the null hypothesis that lambda = (1,1), against a general alternative, for transforming Rice and Bread. (4 pts) * Hypotheses: NH: $\lambda=(1,1)$ versus AH: $\lambda \neq (1,1)$. * Test Statistic Value: * Pvalue * Conclusion c) Does deleting Karachi and Nairobi change your conclusions about the transforms needed? Why or why not ? (1 pt) ```{r } # Note data frame jon is the subset without Karachi, Nairobi ans <- powerTransform(cbind(Rice, Bread) ~ 1,data=jon) summary(ans) ``` Answer for part c here: 7.5.3 a) Set up the regression using the four terms, log(Bread), log(Bus), log(TeachGI), and Apt^{0.33}, and with response BigMac. (1 pt) ```{r } m2 <- lm(BigMac ~ log2(Bread)+log2(Bus)+log2(TeachGI)+ I(Apt^(.33)),data=BigMac2003) summary(m2) ``` b) Draw the inverse fitted value plot of y-hat vs BigMac. (1 pts) ```{r , fig.width=7,fig.height=6} inverseResponsePlot(m2, lambda=c(-1,0,0.5,1)) ``` c) Estimate the best power transformation. Using the inverse fitted value plot from partb. (1 pt) Give answer for part c here: d) Check on the adequacy of your estimate by refitting the regression model with the transformed response and drawing the inverse fitted value plot again. If transformation was successful, this second inverse fitted value plot should have a linear mean function. (2 pts) ```{r } # Use this code block if you think the right transform is log: m3 <- lm(log2(BigMac) ~ log2(Bread)+log2(Bus)+log2(TeachGI)+ I(Apt^(.33)),data=BigMac2003) ``` ```{r , fig.width=7,fig.height=6} # This is the inverse response plot to use if you are # examining the reasonability of the log transform. inverseResponsePlot(m3,lambda=c(-1,0,0.5,1)) ``` Input your best guess for lambda in the code below, (don't put in zero for a log though), and plot the resulting inverse response plot after your best estimate. If your estimated lambda was a good choice, the inverse response plot in code block m4irp will show no evidence of needing a transform, it will suggest something around lambda of 1.0, which of course means no transform is suggested. ```{r } # put in the value of the best lambda below, if it is not log. lambda<- c(1) #put in best lambda value from prev evidence m4 <- lm(I(BigMac^(lambda)) ~ log2(Bread)+log2(Bus)+log2(TeachGI)+ I(Apt^(.33)),data=BigMac2003) ``` ```{r m4irp, fig.width=7,fig.height=6} inverseResponsePlot(m4,lambda=c(-1,0,0.5,1)) ``` Comment on the final inverse response plot here. Did the transform lambda you provided achieve linearity in this plot? ### Problem 7.6 Use the wool data with Cycles as the response. This is a three-factor experiment with each factor at three levels, for a total of 27 runs. Samples of worsted yarn were with different levels of the three factors were given a cyclic load until the sample failed. The goal is to understand how cycles to failure depends on the factors. This data frame contains the following columns: Len length of specimen (250, 300, 350 mm) Amp amplitude of loading cycle (8, 9, 10 min) Load load (40, 45, 50g) Cycles number of cycles until failure ```{r} data(wool) summary(wool) ``` 7.6.1 a) Draw the scatterplot matrix for these data and summarize the information in the plot. (1 pt) ```{r , fig.height=6,fig.width=7} scatterplotMatrix(~ Cycles + Len + Amp + Load, smooth=FALSE, reg.line=FALSE, data=wool) ``` 7.6.2 a) In R convert all the predictors as factors with three levels, and without transforming Cycles, fit the second order mean function with terms for all main effects and all two-factor interactions. (1 pt) ```{r } wool$Amp <- factor(wool$Amp) wool$Len <- factor(wool$Len) wool$Load <- factor(wool$Load) m1 <- lm(Cycles ~ Len*Amp + Amp*Load + Len*Load, data=wool) summary(m1) ``` b) Summarize the results from the summary command. (1 pt) 7.6.3 a) Fit the first-order mean function consisting of only the main effects. (1 pt) ```{r } memod <- lm(Cycles ~ Len + Amp + Load, data=wool) summary(memod) ``` b) Use the inverse fitted value plot to select a transformation for Cycles, based on the first order mean function in (a). Give the value of lambda that should be used based on this evidence. (2 pts) ```{r , fig.height=6,fig.width=7} inverseResponsePlot(memod) ``` * Answer for 7.6.3 b): What transform is suggested? c) Use the Box-Cox method to select a transformation for Cycles, based on the first order mean function in (a). Give the value that should be used based on this evidence. (2 pts) ```{r , fig.height=6,fig.width=7} boxCox(memod,grid=FALSE) ``` * Answer for 7.6.3 c): What transform is suggested? 7.6.4 a) In the transformed scale, refit the second-order model to observe which of the interactions are required in this scale. Does the transformation lead to a simpler model. (1 pt) ```{r } m1new <- lm(log2(Cycles) ~ Len*Amp + Amp*Load + Len*Load, data=wool) summary(m1new) ``` * Answer for 7.6.4: Using the new transformed scale, discuss the complexity of this model with that produced in 7.6.2 part a.