Designed by Engin A. Sungur


Lesson 5:

Contents

This fifth lesson covers ...

Objectives

After completing this lesson, you should be able to learn the following:

5. From Probability to Inference

5.1 Counts and Proportions

5.2 Sample Means

 


Reading Assignment

Read chapter 5 in Introduction to the Practice of Statistics.

Key Terms

Chapter 5


Chapter 5: Study Questions

  1. What is the Binomial setting?
  2. What is a Binomial random variable?
  3. What is the use of Binomial Table?
  4. What is Normal approximation to Binomial distribution?
  5. What is the difference between population mean and sample mean?
  6. What is the Central Limit Theorem?

Chapter 5: Study Notes


In this chapter we will look at two important parameters. (Do you remember what a parameter is? Parameter is a number that describes a population.) These parameters are the population proportion and population mean. Section 5.1 provides statistical tools to make an inference related with the population proportion. On the other hand Section 5.2 aims to develop inference tools for the population mean.

To be able to solve problems related with this section, first of all you should be able to see whether the problem related with proportions or means.

5.1. COUNTS AND PROPORTIONS

This is what we mean when we say count:

X=the number of times an event occurs in fixed number of trials.

or

X=count of occurrences of some outcome in fixed number of observations

n=fixed number of observations(trials)

This is what we mean when we say sample proportion: p^=sample proportion=X/n.

Binomial Setting:

  1. n observations (trials)
  2. observations (trials) are independent
  3. two possible outcomes for each observation("sucess" and "failure")
  4. p=P("sucess") is same for each observation.
Let X=number of successes out of n observations(trials) then X has a binomial distribution with parameters n & p.
Here are some examples:

Observation (Trial)

Success

Failure

p

n

X

Tossing a fair coin

Head

Tail

1/2

Number of tosses

Number of Heads in n tosses

Birth of a child

Girl

Boy

Practically 1/2

Fixed family size

Number of girls in a family of size n

Throwing two dice

7 dots

Anything else

6/36

Number of throws

Number of sevens in n throws

Inspecting a product

Non-defective

Defective

Proportion of defectives in the population

Sample size

Number of non-defectives in a sample of size n

Asking a question

Correct answer

Wrong answer

Probability of giving a correct answer

Number of questions asked

Number of correct answers out of n questions



Now, we have to figure out how to find Binomial probabilities.

Using the Table


The table gives you P(X=k). First find the n then p then k and read the table entry.
One of the things that you will notice is the table does not give you all possible p values. If p is greater than 0.5 then you have to switch the definition of success and failure.
Here is an example:
According to tables provided by the U.S. National Center for Health Statistics in Vital Statistics of the United States, there is about 80% chance that a person age 20 will be alive at age 65. Suppose 10 people age 20 are selected at random. Let us attempt to use the Table C to find the probability that the number alive at age 65 will be exactly 2. Go to the Table C and try to find the probability.

Since p=0.8 is not on the table we should do something else.

Note that X="Number alive at age 65 out of 10" has a Binomial distribution with n=10 and p=0.8.

Now switch the definition of the sucess with the failure. In other words, count the number not alive instead of alive.

In this case we start to work with

Y="Number not alive at age 65 out of 10".

the value of X automatically determines the value of Y. For example if X=2 (number of alive out of 10 is 2), then Y=8 (number not alibe out of 10).

So, Y will have a Binomial distribution with n=10 but p=0.2.
and P(X=2)=P(Y=8).

Now try to find P(Y=8) by using Table C.

If X had a Binomial distribution with n and p, then
Mean of X=mX=np
Standard Deviation of X=sX =Square Root of np(1-p)


Now let us attempt to figure out whether you have ESP(extrasensory perception). (If you have one we will ask you to select numbers for the today's lottery!). CLICK HERE TO LEARN.

Using the formula First let us get familiar with the formula to find the probability of obtaining k successes out of n trials, that is P(X=k).

P(X=k)=
Suppose that someone who visits a car dealer purchases a car with probability 0.2. The dealer has 8 cars in stock. On that day 10 people visits the dealership. What is the probability that the dealership run out of stock?

In this case the variable of interest is X=Number of puchases by 10 visitors.

X has a Binomial distribution with n=10 and p=0.2

The dealership will run out of stock if more than 8 visitors out of 10 decides to purchase a car. Therefore:

P("out of stock")=P(X>8)=P(X=9)+P(X=10)

Now we can use the formula: P(X=k)= to find P(X=9) and P(X=10).

For P(X=9) n=10 k=9 and p=0.2

P(X=9)= =0.00000409

Similarly, P(X=10)=0.000000102.

Therefore P(X>8)=0.00000409+0.000000102=0.000004192.


SAMPLE PROPORTIONS

Suppose that we want to get an idea on the proportion of the people who really find what they are looking for on the WEB. Since we can not reach all the people who are using WEB, we take a sample of size n and ask them whether or not they found what they were looking for on WEB.

Here is the explanation of the notations that we are using:

p=the proportion of the people who really find what they are looking for on the WEB (this is called a parameter and it is unknown)

p^=the proportion of the n people that we sampled who found what they were looking for (statistics, estimator of the unknown p).

X=the number of people in the sample(n people) who found what they were looking for.

n=sample size.

Note that p^=X/n=(number of successes in a sample of n)/(sample size)

Here the "success" is "finding what s/he looking for on the WEB".

We have learned that X has a binomial distribution with n and p.

How can we find a probability related with a proportion? For example, what is the probability that the proportion of people who found what they want in my sample is more than 50%?

To answer these type of questions we should translate p^>0.50 into an event interms of X and use the knowledge that we have gained on how to find Binomila probabilities.

P(p^>0.50)=P(X/n>0.50)=P{X>n(0.50)}

Suppose that we interviewed 100 people and p=0.80, then

P(p^>0.50)=P(X/100>0.50)=P{X>100(0.50)}=P(X>50)

Observe that if the proportion of people who found what they want is more than 50% in the sample of size 100, then out of 100 more than 50 of them found what they were looking for.

If X had a Binomial distribution with n and p, then
Mean of p^=mp^=p
Standard Deviation of p^=sp^ =Square Root of p(1-p)/n

Please compare the above properties of p^(sample proportion) with the properties of the X(count)

If X had a Binomial distribution with n and p, then
Mean of X=mX=np
Standard Deviation of X=sX =Square Root of np(1-p)


NORMAL APPROXIMATION FOR COUNTS AND PROPORTIONS

If the sample size is large then it will not be possible to use the Binomial tables. In this section we will approximate the Binomial probabilities for the large enough n by using the normal distribution.

Approximate Sampling Distribution of X

is Normal with mean np and standard deviation [square root of np(1-p)].

Approximate Sampling Distribution of p^

is Normal with mean p and standard deviation [square root of p(1-p)/n].

Rule of Thumb:

Use the Normal approximation if np>=10 and n(1-p)>=10.

Here are the steps that might help you to do the Normal approximation:
  1. Determine whether the problem is related with counts or proportions.
  2. Express the probability that you want to find in a form of
    • OR

  3. Check if np>=10 and n(1-p)>=10. If not stop you can not use Normal approximation.
  4. Now treat X and p^ as a Normal random variables with mean and standard deviation given in the above tables.Standardize the values by using z-score=(value-mean)/(standard deviation)
    • For counts
    • For proportions
  5. Use the Normal table to find the probability.

Example (Checking for Undercoverage in Surveys:

One way of checking the effect of undercoverage, and other sources of error in a sample survey is to compare the sample with known facts about the population.

The fact that we are going to use in this example is about 11% of the American adults are African Americans

The proportion, p^, of African American in a simple random sample of 1,500 adults should therefore be close to 11%. It is unlikely to be exactly 11% because of the sampling variability.

If a national survey contains only 9.2% African Americans, should we suspect that the sampling procedure is somehow underrepresenting African Americans?

  1. This is a proportion problem.
  2. We want to find the probability that a sample contains no more than 9.2% African Americans, when the population is 11% African American.

    That is, .

  3. np=(1,500)(0.11)=165>10 and n(1-p)=(1,500)(0.89)=1,335>10. Therefore we can use Normal approximation.
  4. Now we can treat p^ as a Normal random variable with mean p=0.11 and standard deviation

    .

  5. By using the Normal table we end up with 0.0129

    Conclusion:

    Only 1.29% of all samples would have so few African Americans. Because it is unlikely that a sample would include so few African americans, we have good reason to suspect that the sampling procedure underrepresents African Americans.



5.2 THE SAMPLE MEAN

The following are some interesting links that you may want to try.



Here are the important things that you should know:

Population

Sample

Characteristics that you are interested

X

You decide to get n individuals from this population and get the value of the characteristics that you are interested

X1, X2, ..., Xn

Average value(mean) of the characteristic

m

To get an idea on unknown population mean, you average your n observations

The sample mean is a random variable and it has mean:

Standard deviation of the characteristic

s

The sample standard deviation has the following standard deviation:

For the population the characteristic has a normal distribution. That is, if you get the value of this characteristics from all the individuals in your population and produce a graphical display such as stemplot or histogram it appears to be bell shaped.

If the characteristic has a Normal distribution then

will have a Normal distribution with mean and standard deviation given above.

For the population you do not have a slightest idea what the shape of the distribution is.

If you have a large enough sample, then

will have an approximately Normal distribution with mean and standard deviation given above.

This result is known as the

CENTRAL LIMIT THEOREM



Here are some steps that you may want to use to apply central limit theorem:
  1. Calculate the mean and the standard deviation of sample mean by using the above table.
  2. Sketch the Normal curve and locate the mean.
  3. Locate the interval onthe sketch from step 2 and shade the area corresponding to the probability that you wish to calculate.
  4. Find z-scores associated with the values of interest.
  5. Use the Normal table to find probability
  6. See the calculated value agrees with the shaded area.

    Written Assignment

    Do the following assignment. The problems listed are from "Introduction to the Practice of Statistics". When you have worked on the problems and are ready to turn in your findings, click the assignment link below. It will take you to a template where you can fill in your answers to the questions. When you are finished entering your answers, click the submit button, you will be given the location of your completed web page. You may check your assignment responses with your browser at any time, and submit a revision at any time before the due date of the assignment. The due date is Wednesday August 4..

    SECTION 5.1.Exercise 5.3, 5.7, 5.8, 5.12, 5.14 (page 391-394)
    SECTION 5.2.Exercises 5.25, 5.30, 5.40 (pages 408-413)
    CHAPTER EXERCISES.Exercises 5.68 (page 429)

    Lesson Submission 5

    Assignment #5.




    Internet Links

    Each day you go online, be sure to check out the Random Statistical Quote for the Day

    Written Assignment

    Do the following assignment. You will e-mail the answers to...
    1. Complete chapter 5 problems.

    Lesson Submission 5

    Use the e-mail lesson service instructions you received with your course study guide and send your answers to part 2 of this assignment to anon@mnstats.morris.umn.edu

    Internet Links

    Each day you go online, be sure to check out the Random Statistical Quote for the Day