Type of Data: Quantitative (Numerical)
PACKAGE: BSDA GENERAL FORM OF R COMMAND: z.test(dataframename$variablename, alternative = " ", mu = munull, sigma.x = knownstandarddeviation, conf.level = 0.95) alternative could be equal to "greater", "less", or "two.sided". EXAMPLE: Dataset: We are asked to analyze the average height of individuals of Italian nationality. The variance of the Italian population is known to be 5. Here is the data: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179 We will test the hypothesis that the true mean height is different than 170, and construct a confidence interval. Since the data set is small we will enter the data directly to R without using a data frame. a = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179) library(BSDA) z.test(a, alternative = "two.sided", mu = 170, sigma.x = 2.24, conf.level = 0.95) Note that we have been given the varaince, you need to take the square root of 5 for the standard deviation. EXERCISE: Construct a 99% confidence interval for this problem GENERAL FORM OF R COMMAND: t.test(dataframename$variablename, alternative = " ", mu = munull, conf.level = 0.95) alternative could be equal to "greater", "less", or "two.sided". EXAMPLE: Dataset: We are asked to analyze the average height of individuals of Italian nationality. Here is the data: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179 We will test the hypothesis that the true mean height is different than 170, and construct a confidence interval. Since the data set is small we will enter the data directly to R without using a data frame. a = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179) t.test(a, alternative = "two.sided", mu = 170, conf.level = 0.95) EXERCISE: Test the hypothesis that mean height is more than 170.
|
|
Type of Data: Quantitative (Numerical)
GENERAL FORM OF R COMMAND: t.test(dataframename$variablename1, dataframename$variablename1, paired = TRUE, alternative = " ", mu = munull, conf.level = 0.95) alternative could be equal to "greater", "less", or "two.sided". EXAMPLE: Dataset: In this example we will compare affect of a new training on 100 meter runners. The following data gives time in seconds before and after training for 10 randomly selected athletes. Before: 12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3 After: 12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1 We will test the hypothesis that the true mean performances for these groups are different and construct a confidence interval for the difference between mean performances. Since the data set is small we will enter the data directly to R without using a data frame. Before = c(12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3) After = c(12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1) t.test(Before, After, paired = TRUE, alternative = "two.sided", mu = 0, var.equal = FALSE, conf.level = 0.95) EXERCISE: Construct a 90% confidence interval for the difference between means.
|
|
Type of Data: Qualitative (Categorical)
PACKAGE: BSDA GENERAL FORM OF R COMMAND: z.test(dataframename$variablename1, dataframename$variablename1, alternative = " ", mu = munull, sigma.x = knownstandarddeviation1, sigma.x = knownstandarddeviation2, conf.level = 0.95) alternative could be equal to "greater", "less", or "two.sided". EXAMPLE: Dataset: We are asked to compare the mean heights of individuals of Italian nationality and German nationality. The population variances for the Italian and German Nationalities are known to be 5 and 8.5, respectively. Here is the data: Italian: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179 German: 185, 169, 173, 173, 188, 186, 175, 174, 179, 180 We will test the hypothesis that the true mean heights for these groups are different and construct a confidence interval for the difference between mean heights. Since the data set is small we will enter the data directly to R without using a data frame. Italian = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179) German = c(185, 169, 173, 173, 188, 186, 175, 174, 179, 180) z.test(Italian, German, alternative = "two.sided", mu = 0, sigma.x = 2.24, sigma.y = 2.92, conf.level = 0.95) EXERCISE: Construct a 90% confidence interval for the difference between means. GENERAL FORM OF R COMMAND: t.test(dataframename$variablename1, dataframename$variablename1, alternative = " ", mu = munull, var.equal = TRUE/FALSE, conf.level = 0.95) alternative could be equal to "greater", "less", or "two.sided". If you assume that population variances are equal use var.equal = TRUE, otherwise use var.equal = FALSE EXAMPLE: Dataset: We are asked to compare the mean heights of individuals of Italian nationality and German nationality. Here is the data: Italian: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179 German: 185, 169, 173, 173, 188, 186, 175, 174, 179, 180 We will test the hypothesis that the true mean heights for these groups are different and construct a confidence interval for the difference between mean heights. Since the data set is small we will enter the data directly to R without using a data frame. Italian = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179) German = c(185, 169, 173, 173, 188, 186, 175, 174, 179, 180) t.test(Italian, German, alternative = "two.sided", mu = 0, var.equal = FALSE, conf.level = 0.95) EXERCISE: Construct a 90% confidence interval for the difference between means.
|
|
Type of Data: Quantitative (Numerical)
DATA FORMAT: Two columns; response variable (numerical) and treatments (factors) as follows
GENERAL FORM OF R COMMAND: aov(response ~ treatment, data = dataframename) EXAMPLE: Dataset: donut.csv Twenty four batches of donuts were prepared and six radomly assigned to each of the four fats. The amount of fat absorbed for each batch (in grams) were measured. Upload and load the data into RStudio. Here is the data
Note that the data is not in the format to be used in R so we will stack the data. donut <- read.csv(file.choose()) sdonut = stack(donut) This will create a data frame with variable names values and ind. sdonut summary(donut) attach(sdonut) boxplot(values ~ ind) aov(values ~ ind, data = sdonut) To see the ANOVA table donut.aov = aov(values ~ ind, data = sdonut) summary(donut.aov)
|
Type of Data: Quantitative (Numerical)
GENERAL FORM OF R COMMAND: wilcox.test(dataframename$variablename1, dataframename$variablename1, alternative = " ", mu = munull, conf.level = 0.95) alternative could be equal to "greater", "less", or "two.sided". EXAMPLE: Dataset: We are asked to compare the median heights of individuals of Italian nationality and German nationality. Here is the data: Italian: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179 German: 185, 169, 173, 173, 188, 186, 175, 174, 179, 180 We will test the hypothesis that the true median heights for these groups are different and construct a confidence interval for the difference between median heights. Since the data set is small we will enter the data directly to R without using a data frame. Italian = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179) German = c(185, 169, 173, 173, 188, 186, 175, 174, 179, 180) wilcox.test(Italian, German, alternative = "two.sided", mu = 0,conf.level = 0.95) EXERCISE: Construct a 90% confidence interval for the difference between medians. GENERAL FORM OF R COMMAND: wilcox.test(dataframename$variablename1, dataframename$variablename1, paired = TRUE, alternative = " ", mu = munull, conf.level = 0.95) alternative could be equal to "greater", "less", or "two.sided". EXAMPLE: Dataset: In this example we will compare affect of a new training on 100 meter runners. The following data gives time in seconds before and after training for 10 randomly selected athletes. Before: 12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3 After: 12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1 We will test the hypothesis that the true median performance for these groups are different and construct a confidence interval for the difference between median performances. Since the data set is small we will enter the data directly to R without using a data frame. Before = c(12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3) After = c(12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1) wilcox.test(Before, After, paired = TRUE, alternative = "two.sided", mu = 0, var.equal = FALSE, conf.level = 0.95) EXERCISE: Construct a 90% confidence interval for the difference between medians.
|
Type of Data: Binary (Two possible outcomes)
GENERAL FORM OF R COMMAND: prop.test(xnumberofsuccesses, nnumber of observations, p = hypothesized value, alternative = " ", correct = TRUE/FALSE, conf.level = 0.95) alternative could be equal to "greater", "less", or "two.sided". Without continuity correction use correct = FALSE. EXAMPLE: Dataset: Suppose that 60% of citizens in Minnesota voted in last election. 85 out of 148 people on the telephone survey said that they voted in currect election. Is there an evidence that the proportion of voters in the population is less than last election? We woud like to construct a 99% confidence interval for the true proportion of voters in the current election. prop.test(85,148, p = 0.6, alternative = "less", correct = FALSE, conf.level = 0.99) Note that this gives us a one-sided confidence interval. For the two-sided confidence interval use prop.test(85,148, p = 0.6, alternative = "two.sided", correct = FALSE, conf.level = 0.99)
|
|
Type of Data:Binary (Two possible outcomes)
GENERAL FORM OF R COMMAND: prop.test(c(success1,success2), c(total1, total2), alternative = " ", correct = TRUE/FALSE, conf.level = 0.95) alternative could be equal to "greater", "less", or "two.sided". Without continuity correction use correct = FALSE. EXAMPLE: Dataset: Based on the research published by Robert Rutledge, MD, and his colleaques in the Annals of Surgery (1993), in car accidents in1916 cases the patients did not use the seat belt and 135 of them died. On the other hand, in 1490 cases the patient use the seat belts and 47 of them died. Test the hypothesis that the proportion of the cases ended up with dead is the same for the no seat belt and seat belt groups. prop.test(c(135,47),c(1916,1490), alternative = "two.sided", correct = FALSE, conf.level = 0.99) |
Type of Data: Quantitative (Categorical)
Data Format: Frequency table
DATA FILE: Data file should look like the following.
GENERAL FORM OF R COMMAND: chisq.test(data.matrix, correct = FALSE) EXAMPLE: Dataset: Handedness.csv The table gives you offspring being left-handed and parental handedness. For the parental handedness fir one is for father the second one is for mother. Click on the file to download it and move it into RStudio. STEP 1. Create a data file outside of R by using Excel just like given above. Note that you give a variable name for the row categorical variable. In this example it is "Father.Mother". Download and load the file to RStudio. In this case this has been already done. Handedness <- read.csv(file.choose(), row.names = "Father.Mother") STEP 2. Carry out the chi-square test testresult <- chisq.test(Handedness, correct = FALSE) testresult STEP 3. If significant get observed andexpected counts, residuals and standardized residuals. testresult$observed STEP 4. Produce a graphical display. barplot(t(prop.table(Handedness)), beside = TRUE, xlab = "Parental Handedness", ylab = "Proportion", legend = T) mosaicplot(Handedness, shade = TRUE, main = "Genetics and Handedness") NOTE: Cell percentages: Row percentages: Column percentages: EXERCISE: The data were obtained by asking large number of people in the UK which of 13 characteristics they would associate with the nationals of the UK's partners in the European Community. Load the following data and analyze it. In this case row variables name is "Country".
GENERAL FORM OF R COMMAND: chisq.test(data.matrix, correct = FALSE) EXAMPLE: Randomly selected subjects by the Pew Research Center were asked about the use of marijuana for medical purposes and their genders have been noted. Here is the frequency table:
STEP 1. Create a data matrix. We will create the matrix from the rows and name the columns. Men = c(538, 167, 29) Women = c(557, 186, 31) data.table = rbind(Men, Women) colnames(data.table) = c("In favor", "Oppose", "Don't Know") STEP 2. Carry out the chi-square test testresult <- chisq.test(data.table, correct = FALSE) testresult STEP 3. If significant get observed andexpected counts, residuals and standardized residuals. testresult$observed STEP 4. Produce a graphical display. barplot(t(prop.table(data.table)), beside = TRUE, xlab = "Gender", ylab = "Proportion", legend = T) mosaicplot(data.table , shade = TRUE, main = "The Use of Marijuana") EXERCISE: The following tables are from a study on Eye-Dominance, Writing Hand, and Throwing Hand relationships. To see the original paper click here. If you would like to determine your dominant eye visit this site. Analyze the tables by using the technique that you have learned in this section.
|
Type of Data: Two Quantitative (Numerical)
GENERAL FORM OF R COMMAND: lm(ResponseVariable ~ ExplanatoryVariable, data=dataframename) EXAMPLE: Dataset: faithful There are two observation variables in the data set. The first one, called eruptions, is the duration of the geyser eruptions. The second one, called waiting, is the length of waiting period until the next eruption. We will carry out simple regression analysis of the waiting intervals (response) on the eruption durations (explanatory) lm(waiting ~ eruptions, data = faithful) To get the detailed analysiswaiting.lm = lm(waiting ~ eruptions, data = faithful) summary(waiting.lm) To see the residuals waiting.res = resid(waiting.lm) waiting.res To get the residula plotplot(faithful$eruptions, waiting.res) abline(0,0) To see the fitted values waiting.fit = fitted(waiting.lm) waiting.fit To find the Pearson's corelation cor(faithful$eruptions, faithful$waiting)
|
|