STUDY
GUIDE AND WORK BOOK
STAT/MATH 2501
FALL, 2006
ENGIN A. SUNGUR
PROFESSOR OF
STATISTICS
TABLE
OF CONTENTS
I. Chapter 1 1
I.1. Case Study 1
I.2. Historical Remarks 1
I.3. Review 2
Set Theory 2
Basic Set Operations 3
Additional Definitions on Sets 3
Simple Laws and Some Extensions on Sets 3
I.4. Overview 4
Where We're Going 4
Reading Probabilistic Statements 5
I.5. Technical Details 6
Independent 7
How to Solve Problems Related with the Probability? 9
Case Study Revisited 16
I.6. Drunkard's Walk
II. Chapter 2 33
II.1. Review 33
Discrete Mathematics 33
Binomial Theorem 33
Geometric Series 33
Maclaurin Series 34
The Derivative of a Function 34
Derivatives of the Composite Functions 34
The Definite Integral of a Function over an Interval 35
Antidifferentiation 35
Evaluation of the Indefinite Integrals 36
Evaluation of the definite integral by using antiderivatives 36
The Three Elementary Methods of Integration 36
Integration by Substitution 36
Integration by Parts 37
Some Results Involving Limits 37
Some Results Involving Multivariate Calculus 38
Some Special Functions 40
Exponential Function 40
Gamma Function 41
Beta Function 41
References 43
Introduction to Probability Theory
(Motivation: Application of the topics that will
be covered in this chapter to the real world problems)
Radelet
(1981) studied effects of racial characteristics on whether individuals
convicted of homicide receive the death penalty. The events that are considered
on this study are the selection of a case with "death penalty
verdict", "not death penalty verdict", "white
defendant", "black defendant", "white victim", and
"black victim". The 326 subjects were defendants in homicide
indictment in 20 Florida counties during 19761977. The following table gives
the number of subjects for each of the defendant's race, victim's race and
death penalty combinations.
Death Penalty 

Defendant's Race 
Victim's Race 
Yes 
No 
Total 
White 
White 
19 
132 
151 

Black 
0 
9 
9 
Black 
White 
11 
52 
63 

Black 
6 
97 
103 
Total 

36 
290 
362 
Source: Agresti, A. Categorical Data Analysis, John
Wiley & Sons, 1990, pg. 135138
The
main question that one would like to answer is "Is there an evidence of
racial discrimination given the evidence on this table?". Also, one would
be interested with the following questions; (i) is there a relation between
defendant's race and victim's race?, (ii) is there a relation between victim's
race and death penalty? (iii) If we control for the victim's race, that is if
we look at the cases for black victims and white victims separately, what is
the relation between defendant's race and death penalty verdict?
(Motivation)
In the long run, we are all dead.
John Maynard Keynes (England, 18831946)
People
talk loosely about chance all the time, without doing any harm. What are the
chances of getting a job? of meeting someone? of rain tomorrow? of passing
Math. 3610? But for scientific purposes, it is necessary to give the word
chance a definite, clear
interpretation. This turns out to be hard, and mathematicians have struggled
with the job for centuries. They have developed several careful and rigorous
theories of chance; but the theories cover just a small range of cases where
people ordinarily speak of chance.
Exercise
1. How would you explain the fact
that when you toss a coin the chance of getting head is onehalf? How many
possibilities we have if you toss a coin? Can you repeat the experiment, that
is tossing a coin, under the same conditions many times?
Exercise
2. What is the meaning of
"chance of passing Math. 3610 is 0.80"? How many possibilities we
have for this experiment? Can you repeat the experiment, that is taking Math.
3610, under the same conditions many times? What could be the conditions that
would change if you repeat the experiment?
Frequentist
definition of chances works best for the processes that can be repeated over
and over again, independently under the same conditions. Games of chance fall
into this category, and in fact much of the frequency theory was developed to
solve gambling problems. One of the great masters was Abraham de Moivre, a
French Protestant who fled to England to avoid religious persecution. In his
book The Doctrine of Chances he
included the following letter,
To the Right Honorable the
Lord CARPENTER
My Lord
There are many people in the World who possessed with an
Opinion, that the Doctrine of Chances has a Tendency to promote Play; but they
soon will be undeceived... Your Lordship does easily perceive, that this Doctrine
is far from encouraging Play, that is rather a Guard against it, by setting in
a clear light, the Advantages and Disadvantages of those Games wherein Chance
is concerned...
Another use to be made of this Doctrine of Chances is
that it may serve in conjunction with the other parts of the Mathematicks, as a
fit introduction to the Art of Reasoning: it being known by experience that
nothing can contribute more to attaining that Art, than the consideration of a
long Train of Consequences, rightly deduced from undoubted Principles, of which
this Book affords many examples.
Historically,
probability had its origin in the gambling room. The Chevalier de Mere, a
professional French gambler, had asked his friend Blaise Pascal (16231662) to
solve the following problem: In what proportion should two players of equal
skill divide the stakes remaining on the gambling if they are forced to stop
before finishing the game? Pascal wrote to and began an active correspondence
with Pierre Fermat (16011665) concerning the problem (collaboration). Although
Pascal and Fermat agreed on the answer, they gave different proofs
(nonuniqeness of the solutions). It was in this series of correspondences
during the year 1652 that they developed the modern theory of probability.
A
century earlier the Italian mathematician and gambler (interesting
combination!) Girolomo Cardan (15011576) wrote The Book on Games of Chances. This is really a complete textbook for gamblers
since it contains many tips on how to cheat successfully. Cardan was also an
astrologer. According to the legend, he predicted his own death astrologically
and to guarantee its accuracy he committed a suicide on that day. (Of course,
that is the most convincing way to be right!) Like Cardan, De Moivre predicted
the day of his death. A rather interesting story is told of De Moivre's death.
De Moivre was ill and each day he noticed that he was sleeping 15 minutes
longer than he did on the preceding day. Using progressions, he computed that
he would die in his sleep on the day after he slept 23 hours and 45 minutes. On
the day following a sleep of 24 hours, De Moivre died.
The
French mathematician Pierre Simon de Laplace (17491827) also contributed much
to the historical development of probability. In 1812 he published Theorie
Analtique des probabilities, in
which he referred to probability
as a science that began with games but that had wideranging applications. In
particular, he applied probability theory not to gambling situations but as an
aid in astronomy.
Over
the course of many years probability theory has left the gambling rooms and has
grown to be an important and everexpanding.
In
this section we will look at the basic notions and notations needed for dealing
with sets (collection of objects of various kinds, such as
numbers or paired of numbers, called points). Although you may have dealt with
these concepts since grade school, it is important that this chapter be
utilized to assure the sound base for the applications of the probability that
follow in Chapter 1 of your textbook.
"Set"
is an undefined notion (like "point" or "line" in
highschool geometry); it is assumed that at least one exists, and attention is
restricted to a universe or
universal set. In probability theory universal set corresponds to
the sample space S, sets
correspond to the events. All operations on sets are with respect to S in
order to avoid paradoxical situations.
Exercise
4. The Barber of Seville shaves all
men who don't shave themselves. Who shaves the Barber of Seville.
There
are four basic set operations which we
will illustrate their meanings with diagrams called Venn diagrams.
Definition
1. Union (or join, or sum) of sets E and F:
E
é F
¼
E + F ¼ {w: w ë E or w ë F} (which you read the last notation as E union F
consists of points w which are elements of E or F).
Definition
2. Intersection (or meet, or product) of sets E and F:
E ‚ F ¼ EF ¼ {w: w ë E and w ë F}.
Definition
3. Complement of set E:
E¼ ¼ E' ¼ {w: w ë S and w ì E}.
Definition
4. Difference of sets E and F:
E  F ¼ E ‚ F¼ {w: w ë E and F ì F}.
Additional Definitions on Sets
Definition
5. E is a subset of F, written E C F, if w ë E Þ w ë F.
Definition
6. The empty set (or null
set), usually written ®, is the set containing no
points at all.
Definition
7. Sets E and F are said to be mutually
exclusive (or disjoint) if
E
‚ F
= ®.
Simple Laws and Some Extensions on Sets
In
the following theorem we are going to state some simple laws dealing with the
operations on sets.
Theorem
on Sets: Let S be the sample space and E, F, and G be the
events on S. Then
1. E ‚ E = E, and E é
E = E (Idempotency Laws).
2. E ‚ F = F ‚ E, and E é F = F é E (Commutative Laws).
3. E ‚ (F ‚ G) = (E ‚ F) ‚ G, and E
é (F é G) = (E é F) é G (Associative Laws).
4. E ‚ (F é G) = (E ‚ F) é (E ‚ G) , and E é (F ‚ G) = (E é F) ‚ (E é G) (Distributive
Laws).
5. E
‚ S
= E, E
é S
= S,
E
‚
® = ®, E é ® = E,
E
‚ E= ®, E é E= S,
6. (E ‚ F)= Eé F, and (E é F)= E‚ F.
7.
(E)= E
8. F is a subset of E if and only if E ‚ F = F.
9. F is a subset of E if and only if E é F = E.
Although
the terminology given in parentheses is not now in widespread use, it is
important to expose to it in order to be able to read other (and older) books
and writings.
Exercise
5. Translate the following informal
statements and definitions into the formal form by using the set notations.
Occurrence of two thing at
the same time 

Occurrence of one thing
prevents the occurrence of the other 

Occurrence of one thing
implies the occurrence of the other 

Occurrence
of either one of the two thing 

Occurrence of an opposite
of one thing 

Occurrence of three things
at the same time 

Occurrence of only one of
the two things 

Occurrence of one thing
given the occurrence of the other 

Occurrence of any one of
the possibilities 

Occurrence of an neither
one of the two things 

Occurrence of the either
one of the three distinct things 

Today,
probability theory is a well established branch of mathematics that finds its
applications in every area of scholarly activity from music to physics, and in
daily experience from weather prediction to predicting the risk of a new
medical treatment.
Most
important of all it will give us the necessary tools to find a measure of
reliability for the generalizations that we are going to make from sample to
the whole population. For example when you say "Vikings will win the super
bowl in 199293", is this statement reliable 100%? What is the reliability
measure of this statement?
To
understand the importance of the probability theory better, we should grasp the
difference between theory and reality. But note that reality does not always turn out to
be same as theory, that is what we expect. For example, in the theory if you
get your graduate degree, you should be able to get a wellpaid job. But a
person with a graduate education may end up with a lowpaid job.
Theories
are ideas proposed to explain phenomena in the real world and, as such, are
approximations or models for reality. These models are expressed either in a
verbal form or as mathematical relationships. Whereas the theory of social
change might be expressed verbally in sociology, the theory of heat transfer is
presented in a precise and deterministic mathematical manner in physics.
Neither gives an accurate and unerring explanation of the real life. Slight
variations from the mathematically expected can be observed in heat transfer
phenomena and other areas of physics. The deviations can not be blamed solely
on the measuring instruments, the explanation that one often hears, but are due
in part to a lack of agreement
between theory and reality. Anyone who believes that the physical
scientist completely understands the wonders of this world need only look at history to find a
contradiction. Theories assumed to be the "final" explanation for
nature have been superseded in rapid succession during the past century.
In
this course, we shall develop certain models of reality (probabilistic models);
we shall attempt to explain the motivation behind such a development and uses
of the resulting model. We will never claim that some statement is always
correct, but we will attach reliability, a type of chance statement on the
validity of our statement.
Reading
Probabilistic Statements
Exercise
6. Translate the following informal
statements and definitions into the formal form by using the notations used in
your text. The informal statements are taken from the Doctrine of Chances.
The chance of something
gives the percentage of time it is expected to happen, when the basic process
is done over and over again, independently under the same conditions. 
P(E)= 
The chances are between 0%
and 100% 

The chance of something
equals 100% minus the chance of the opposite thing. 

If the occurrence of one
thing prevents the occurrence of the other, then the chance of observing these
two things together is zero. 

If the chance of
occurrence of one thing is not affected by the occurrence of the other thing
they are called independent. 

Given the occurrence of
one thing the chance of occurrence of another thing is proportional to chance
of observing these two together, and inversely proportional to the chance of
observing the given thing 

The chance of observing
two things at the same time is equal to the chance of observing one of them
times the chance of observing other given the one we know its occurrence
chance. 

Two things are independent
if the chances for the second one given the first are the same, no matter how
the first one turns out. 

The chance of occurrence
of one thing is equal to the chance of occurrence of a second thing and that
thing together, plus the chance of nonoccurrence of the second thing and that
thing together. 

If two things can not
occur together than the chance of observing either one of them is the sum of
chance of observing each. 

The following are the basic
results that you have learned in this chapter:
P(E) = 1  P(E)
P(E é F) = P(E) +
P(F)  P(E ‚ F) = P(E) + P(F)  P(EF)
P(E  F) =
If E and F are mutually
exclusive then,
P(EF) = 0
If E and F are independent then, P(EF) = P(E)P(F)
If E and F are independent
then, P(E  F) = P(E) and P(F  E) = P(F)
P(E) = P(EF) + P(EF)
Exercise 7. Fill out the following table with the formulas that
you would use to find the required probability given the information in the
first column.
GIVEN 
FIND 
GENERAL 
MUTUALLY EXCLUSIVE 
INDEPENDENT 
P(E),
P(F), P(E é F) 
P(EF) 
P(EF) =
P(E) + P(F)  P(E é F) 
P(EF) = 0 
P(EF) =
P(E)P(F) 
P(E  F),
P(F) 
P(EF) 



P(F  E),
P(E) 
P(EF) 



P(E  F),
P(F), P(E) 
P(EF) 



P(E  F),
P(E  F), P(F) 
P(E) 



P(E  F),
P(E  F), P(F) 
P(F  E) 



P(E),
P(F), P(G), P(EF), P(EG), P(FG), P(EFG) 
P(E é F é G) 



P(EFG),
P(FG) 
P(E  FG) 



P(EFG),
P(E) 
P(FG  E) 



Independent, Mutually Independent and Pairwise Independent Events
Definition: The events E, E, ..., Eare said to be mutually independent if
P(EE... E) = P(E)P(E) ... P(E).
Definition:
The events E, E, ..., Eare said to be pairwise independent if
P(EE) = P(E)P(E) for all i ¹
j.
Definition:
The events E, E, ..., Eare said to be independent if for every subset
E, E, ..., E, r ² n, of these events
P(EE... E) = P(E)P(E) ... P(E).
Exercise
1: Consider the set of events E, E, E, E. What properties need to be shown to establish that
these events are independent?
For
random events we consider the notions of mutually independence and pairwise
independence. The questions of interest are: (i) Are there sets of random
events which are pairwise independent but not mutually independent?, (ii)
Conversely, are there sets of random events which are mutually independent but
not pairwise independent? In the following two examples we will try to answer
these questions.
Example
(pairwise independence does not imply mutual independence): (Berstein, 1928) Suppose that a box contains 4
tickets labeled by 112, 121, 211, and 222. Experiment consists of drawing one
ticket at random from the box. Let
E= "1 occurs at the first place"
E= "1 occurs at the second place"
E= "1 occurs at the third place".
It
follows that
P(E) = P(E) = P(E) = = ,
P(EE) = P({112}) = = P(E)P(E),
P(EE) = P({121}) = = P(E)P(E), and
P(EE) = P({211}) = = P(E)P(E).
That
is the events E, E, Eare pairwise independent. However the event EEE= ®. Therefore
P(EEE) = 0 ¹ P(E)P(E)P(E) = = .
Exercise
2: A box contains eight tickets,
each labeled with a binary number. Two are labeled 111, two are labeled 100,
two 010, and two 001. An experiment consists of drawing one ticket at random
from the box. Let Ebe the event "the first digit is 1", Ebe the event "the second digit is one", and
Ethe event "the third digit is 1". Show that
E, E, and Eare pairwise independent but not mutually independent.
Example
(mutual independence does not imply pairwise independence):
An
experiment consists of tossing two different standard dice, white and black.
The sample space S of the outcomes consists of all ordered pairs (ij), i,j = 1,
2, ..., 6, that is
S = {11, 12, 13, ..., 66}.
Each
point in S has a probability 1/36. Define the following events:
E= "first die is 1, 2 or 3"
E= "first die is 3, 4 or 5"
E= "sum of the faces is 9".
Therefore,
EE= {31, 32, 33, 34, 35, 36},
EE= {36},
EE= {36, 45, 54},
EEE= {36}.
It
follows that
P(E) = P(E) = , P(E) = and
P(EEE) = = = P(E)P(E)P(E).
However
the events are not pairwise
independent, because
P(EE) = ¹ = P(E)P(E),
P(EE) = ¹ = P(E)P(E),
P(EE) = ¹ = P(E)P(E).
Exercise
3: In exercise 2, let us change the
number on one ticket from 111 to 110, and the number of another ticket from 100
to 101. Show that E, E, and Eare, this time, mutually independent but not pairwise independent.
How
to Solve Problems Related with the Probability?
Most
important element in the solution of the probability and statistics problems is
the nonuniqueness of the solution method. The following graphical and tabular
techniques are aimed to help to understand the problem better:
I. Straight forward logical
set up,
II. Tables representations,
III. Venn diagrams,
IV. Trees.
Straight
forward logical set up gives the fastest solution of the problem. But it
requires a deep understanding of the definitions, concepts, theorems, and
formulas. Last three techniques help to eliminate this difficulty. These
techniques help to develop a logical and systematic thinking process on the
solution of the problem. Direct, blindfolded use of formulas might endanger
logical thinking and generally leads to an incorrect solution of a problem.
Whatever technique one prefers, the basic steps are;
(i) Define the experiment,
(ii) Define the sample space
(simple outcomes of the experiment)
(iii) Define the events that
are given in the problem by using capital letters
(iv) For compound events use
the set notation that we discussed before
(v) Write down the event
that we would like to find the chance of occurrence on
(vi) Select the formula that
will give us that probability given the information on the events you defined
in (iii) & (iv).
(vii) Use your intuition and
prior knowledge on the result and interpret that number. Does it make any
sense? Is it what you expected?, If not, did you carry out your calculations
correctly?
Exercise
8: A softball team has three
pitchers, A, B, and C, with winning percentages of 0.4, 0.6, and 0.8,
respectively. These pitchers pitch with frequency 2,3, and 5 out of every 10
games, respectively.
(a)
What is the probability that the team will win the game?
(b)
Suppose that the team won the game what is the probability that A pitched the
game?
Understanding the problem:
What is the experiment? :
Each game in this problem is the experiment
What are the possible
outcomes of the experiment? : First of all A may pitch the game or B may pitch
the game or C may pitch the game. So let us label each one of these simple events
A
= the event that A will pitch the game
B
= the event that B will pitch the game
C
= the event that C will pitch the game
Secondly, the team may win
the game or lose the game, so let us label these outcomes
W
= the event that the team will win the game
L
= the event that the team will lose the game
What is the relation between
the events? That is are they mutually exclusive or independent?
Since
the team cannot win and lose the game at the same time W and L are mutually
exclusive.
If
we assume that only one pitcher pitches the whole game A, B, and C are also
mutually exclusive.
The
questions that you must have at this point are the following:
Are
the events A, W mutually exclusive? If yes, what does it say for the pitcher A?
Are
the events B, L mutually exclusive? If yes, what does it say for the pitcher B?
Are
the events A, W independent? If yes, what does it say for the pitcher A?
Are
the events B, L dependent? If yes, what does it say for the pitcher B?
The
translation of the mathematics notation to the daily language plays an
important role in understanding and solving problems related with probability.
The following table shows the translation of these languages from one another.
Mathematics 
Notation 
AW=® 

Interpretation 
The intersection of the
two sets A and W is empty 
Statistics 

The sets A and W are
mutually exclusive 
English 
Symbolic 
"Winning the
game" and "A pitching the game" can not occur at the same time 

Interpretation 
If A pitches then we know
that the team will not win the game. Therefore, A is a terrible pitcher. 
Now, fill out the following
table for the A and W being independent.
Mathematics 
Notation 


Interpretation 

Statistics 


English 
Symbolic 


Interpretation 

What are the numbers that
are given in the statement of the problem?
First
of all the problem says A has a winning average 0.4. So A will win 40% of the
games. But for A to win the game it should play the game as a pitcher. In other
words A will win the game given that he plays the game. Therefore,
P(W
 A) = 0.40
P(W
 B) = 0.60
P(W
 C) = 0.80.
Since
these pitchers pitch with frequency 2, 3, and 5 out of every 10 games,
respectively;
P(A)
= 0.20
P(B)
= 0.30
P(C)
= 0.50.
The
first part of the problem is asking for P(W). What is the difference between
P(W) and any one of the following probabilities P(W  A), P(W  B), and P(W 
C)?
The
second part of the problem is asking for P(A  W). What is the difference
between P(W  A) = 0.40 and P(A  W)?
What
do you think that P(A  W) + P(B  W) + P(C  W) will be equal to?
Given
the information on P(A), P(B), P(C), P(W  A), P(W  B), and P(W  C), Do you
expect that P(A  W) will be greater than P(C  W) ? Why or why not?
Note
that the components that will affect the P(A  W) are (i) A should play, that
is P(A) and (ii) the team should win, that is P(W  A).
Now,
we will solve the problem by using the 4 techniques that we introduced above.
Solution
I. To find out P(W), one might use the following logical steps:
(i) In which cases the team
will win the game. The team will win the game when A pitches, or B pitches, or
C pitches. Therefore
"The team wins the
game" = "A pitches and the team wins"
or
"B pitches and the team wins"
or
"C pitches and the team wins".
By using our notation;
W = (A ‚ W) é (B ‚ W) é (C ‚ W) =
AW é BW é CW
Are the events AW, BW, CW
mutually exclusive? Justify your answer.
P(W) = P(AW é BW é CW) = P(AW) + P(BW) +
P(CW) (Why?)
Do we know the value of
P(AW), P(BW), and P(CW)?
Can we find P(AW), P(BW),
and P(CW) given the information on P(A), P(B), P(C),
P(W  A), P(W  B), and P(W
 C)?
By using the multiplication
rule (definition of the conditional probability), we end up with
P(AW) = P(W  A) P(A) = 0.40 « 0.20 =
0.08
P(BW) = P(W  B) P(B) = 0.60 « 0.30 =
0.18
P(CW) = P(W  C) P(C) = 0.80 « 0.50 =
0.40
Note that the multiplication
rule on P(AW) can be used also as follows:
P(AW) = P(A  W)P(W). Why we
did not use this form?
Is P(AW) = P(A  W)P(W) = P(W
 A) P(A) ? Justify your answer.
Therefore, P(W) = P(AW) + P(BW) + P(CW)
= P(W  A)P(A) + P(W  B)
P(B) + P(W  C) P(C)
=
0.08 + 0.18 + 0.40 = 0.66
P(W)
= P(W  A)P(A) + P(W  B)
P(B) + P(W  C) P(C)

The second part of the
problem is asking for P(A  W).
By using the definition of the conditional probability, we end up with
P(A  W) = = 
How did we get the second
part of the above formula?
Therefore,
P(A  W) = = 0.121. Similarly P(B  W) = 0.273, and P(C  W) =
0.606.
We are going to interpret
these results after we discuss all of the solution techniques.
Solution
II. Tables. The information that is given in the problem can be
represented by using a table. In this table the rows will show the result of
the game, and the columns will show who pitches the game. The elements in the
table will provide the probability that the corresponding row and column
elements occur together.

A PITCHES
THE GAME (A) 
B PITCHES
THE GAME (B) 
C PITCHES
THE GAME (C) 

THE TEAM
WINS THE GAME (W) 
P(AW) 
P(BW) 
P(CW) 
P(W) 
THE TEAM
LOSES THE GAME (W) 
P(AW) 
P(BW) 
P(CW) 
P(W) 

P(A) 
P(B) 
P(C) 
1 
Note that this table gives
us some important formulas such as
P(AW) + P(BW) + P(CW) = P(W)
P(AW) + P(AW) + P(AW) = P(W)
If the conditional
probabilities are available table can be modified by using the multiplication
rule as follows;

A PITCHES
THE GAME (A) 
B PITCHES
THE GAME (B) 
C PITCHES
THE GAME (C) 

THE TEAM
WINS THE GAME (W) 
P(AW)=P(WA)P(A) 
P(BW)=P(WB)P(B) 
P(CW)=P(WC)P(C) 
P(W) 
THE TEAM
LOSES THE GAME (W) 
P(AW)= P(WA)P(A) 
P(BW)= P(WB)P(B) 
P(CW)= P(WC)P(C) 
P(W) 

P(A) 
P(B) 
P(C) 
1 
Now, let us fill out the
table with the probabilities that are given

A PITCHES
THE GAME (A) 
B PITCHES
THE GAME (B) 
C PITCHES
THE GAME (C) 

THE TEAM
WINS THE GAME (W) 
0.4 « 0.2 = 0.08 
0.6 « 0.3 = 0.18 
0.8 « 0.5 = 0.40 
P(W)=?= 
THE TEAM
LOSES THE GAME (W) 
P(WA)P(A) 
P(WB)P(B) 
P(WC)P(C) 
P(W)=?= 

0.20 
0.30 
0.50 
1 
Fill out the rest of the
table, and find P(W) and P(A  W).
Solution III. This time we will use a Venn diagram. Each region on the diagram will correspond to a
relevant event.
Since
the probability is a special type of measure, just like the area of each piece
on the graph, same type of argument that is given in the solution I can be used
to answer to the question.
Solution
IV. Trees. When we analyze an experiment that takes place in a
sequence of steps, we often find it convenient to represent the sample space by
the set of all paths through the tree. A probability measure is assigned by
considering the conditional probabilities appropriate for the outcome of the
any step, given all previous outcomes. These weights are assigned at the
appropriate branches of the tree, and then the weights for a path through the
tree is the product of the branch weights along the path. In our experiment
first of all one of the pitchers should pitch the game then the team either
will win or lose. Therefore corresponding tree can be constructed as follows:
Mark the paths on the tree
where the team ends up winning the game. By using these paths write down the
formula for finding P(W)
P(W) = 
Find the probability that
the team will win the game by using the above formula
P(W) = 
Suppose that the team wins
the game. Given this information some of the branches of the tree will be
chopped off.
Write down the formula to
find P(A  W)
P(A  W) = 
Use the formula to find P(A
 W)
P(A  W) = 
Our
original tree gave us the probabilities for the result of the game given the
pitcher. The i.inverse probabilities, that is probabilities for the
pitcher given the result of the game are called Bayes probabilities. These
inverse probabilities can also be obtained by simply constructing the three for
the twostage experiment carried out in reverse order.
Fill out the rest of the
above reverse tree by using P(W) and P(W) that you have already found.
Interpretation of the
Results
In
this part we will try to interpret the results that we got. Note that if the
pitcher C pitched all the games the probability of winning would be 0.80. The
probability of winning reduces to 0.66 since all the games are not pitched by C
and all the pitcher are not equally good.
If
all the pitchers are equally good, say with the winning percentage 0.80, what
would be the probability that the team will win?
The winning percentages for
the pitchers A, B, and C are 0.40, 0.60, and 0.80 respectively. And A, B, C
pitches 20%, 30%, and 50% of the games.
Given the winning
percentages for each pitcher, is it reasonable if A, B, and C pitches 50%, 30%,
and 20% of the games?
If the team is winning the
probability that A is pitching the game is 0.121. If all the pitchers were
equally good, that is winning is independent to the whether A is pitching or
not, what would this probability be equal to?
What type of relation do you
expect between P(C  W) and P(C)? (P(C  W) > P(C) or
P(C  W) < P(C) or P(C 
W) = P(C))
Now
we are ready to answer the questions that we rose in our case study. The table
of frequencies can be changed into empirical probabilities as follows. For
example let us try to find the probability of selecting a case which involves
white defendant, white victim, and a death penalty verdict. There are 19 cases
out of 326 in which all of these events occurred together. Therefore the chance
of observing this event is 19/326=0.058.
Define the events of
interest for the case study.
A = "receiving a death
penalty verdict"
B = A=
C = "white
defendant"
D = C=
E = "white victim"
F = E=
What is the meaning of A and
C being mutually exclusive?
What is the meaning of A and
C being independent?
What is the meaning of C and
Ebeing mutually exclusive?
What is the meaning of C and
E being independent?
If there is no
discrimination, what type of relation you expect to see between A and C? That
is, do they have to be mutually exclusive, or independent, or both?
Which one of the following
probabilities will give you a better information on discrimination? P(A  C) or
P(C  A). Similarly P(A  D) or P(D  A).
Construct the marginal table
for defendant's race and the death penalty verdict by ignoring victim's race.
Death Penalty 

Defendant's
Race 
Yes 
No 
Total 

White 




Black 




Total 




Find
P(A)
P(A  C) =
P(A  D) =
P(C  A) =
P(D  A) =
Is there an evidence of
racial discrimination?
Construct the same table by
controlling for the victim's race. That is create a table for defendant's race
and death penalty verdict for white victims and black victims.
For white victims
Death Penalty 

Defendant's
Race 
Yes 
No 
Total 

White 




Black 




Total 




For black victims
Death Penalty 

Defendant's
Race 
Yes 
No 
Total 

White 




Black 




Total 




Find
P(A  CE) = P("death
penalty"  "white victim" and "white defendant') =
P(A  DE) =
P(A  CF) =
P(A  DF) =
Is there an evidence of
racial discrimination when we control for the race of the victim?
Construct a marginal table
for the defendant's race and victim's race.
Death
Penalty 

Defendant's
Race 
Yes 
No 
Total 

White 




Black 




Total 




Find
P(E  C) =
P(E  D) =
P(F  C) =
P(F  D) =
What type of relation we
have between defendant's race and victim's race?
I.6. Drunkard's Walk:
Random Walk:
Where is a drunk person going to end up?
Motivation: Scenario
Most
of the problems we encounter in daily living may be modeled probabilistically.
The probability models can help us to understand the nature of the problems
that we are facing better. Elementary probability models assume that the
possible outcomes for each experiment are the same and the outcome of one
experiment does not influence the outcomes of the other experiments
probabilistically. In some cases these assumptions are not valid. For example;
*
how much money you are going to have on your saving account next year depends
on how much money you have on this account this year,
*
next yearÕs interest and unemployment rates will be affected by the this yearÕs
rates,
*
a gamblerÕs fortune at a game depends on the gamblerÕs fortune at the previous
game (GamblerÕs Ruin Problem)
*
the population size of an endangered species in year 2000 depends on the
population of this species in previous years.
Which
leads to the big question: How do we model a phenomenon where knowledge of the
outcome of previous experiments influences our predictions for the outcomes of
the next experiment?
Learning Objectives
Learn
how to model chance processes for which the knowledge of previous outcomes
influences prediction for future experiment. After completing this activity you
should be able to construct a probability model for the cases where outcome in
the future given the outcomes in the past and present depends only on the
outcome that we observe in the present. You should also be able to see how
different tools, such as tree diagrams, graphs, matrices, and tables, can be
used to simplify the problem.
We
will be learning Markov chains that are a type of a stochastic process. A
stochastic process is a squenceof
random variables. One can represent a stochastic process as {X(t), tëT} where for each tëT, X(t) is a random variable. The index t is often
interpreted as time and, as a result, we refer to X(t) as the state of the
process at time t. T is called
the index set of the process. If
the index set is countable, the stochastic process is a discretetime
process. If T is an interval of the real line, the stochastic
process is said to be a continuoustime process. Generally, the discretetime process is represented
by X_{n }instead of X(t). S,
state space, is the set of all
possible random variables, X(t), of the stochastic process. A realization of a stochastic process is called a sample
path of the process.
As
warmup Activity, consider the Mouse
in a Maze. A hungry mouse starts in
cell 1, while a cat is hiding patiently in cell 7, and the cheese is in cell 9.
The mouse moves from cell to cell. . In the absence of learning, it will choose
the next cell to visit randomly from the adjoining cells. Assume that once the
mouse finds the piece of cheese or the cat, it will understandably stay there
forever.
Here is a sample path for this example:
Step (n) 
0 
1 
2 
3 
X_{n} 
1 
2 
3 
2 
Step (n) 
4 
5 
6 
7 
X_{n} 
5 
4 
5 
8 
Step (n) 
8 
X_{n} 
7 
What do you understand when
one says
¥ X_{0}=1
¥ X_{5}=4
¥ X_{8}=7
What is the state space S of this process?
First of all, why is this a stochastic
process?
Study the following matrix;
it is called a onestep transition probability matrix.
What ideas do you have about how this matrix was constructed and what it
means?
Study the following matrix;
it is called an incidence matrix. What ideas do you have about how this
matrix was constructed and what it means?
Study the following graph;
it is called a state diagram. What ideas do you have about how this
graph was constructed and what it means?
The
following matrix is obtained by multiplying the matrix I by itself. If you do not understand how to multiply
two matrices do not worry. Just tell us what you think that this matrix is
representing.
The
following matrix is obtained by multiplying the matrix I by itself three times. What do you think that this
matrix representing?
The
following matrix is obtained by multiplying the matrix I by itself four times. What do you think that this
matrix representing?
Suppose
that the cat is on vacation and you are given the following incidence matrix.
Draw the maze (some of the
doors between the cells are now closed)
Tell us where the cheese is.
Complete the following state
diagram.
Do
you think that drawing the above graph would help you to answer the last two
questions?
Now,
let us challenge ourselves. (I do not have slightest idea at this point whether
this can be solved or not) This time it is the CheeseÕs turn to take a
vacation. You are given the following twostep incidence matrix.
Draw
the maze.
Find
where the cat is.
Learning Activity
Suppose
that a person is walking along a fourblock stretch of Park Avenue. The person
starts at corner x and with probability 1/2, walks one block to the right and,
with probability 1/2, walks one block to the left, when the person comes to the
next corner she/he again randomly chooses her/his direction. The person
continues until she/he reaches corner 4, which is a bar, or corner 0, which is
a home. If the person reaches either home or the bar, she/he stays there. Here
is a graphical representation of the problem.
1. You will need a coin or other method of simulating an
event with a probability 1/2. Pick a starting point 1, 2, or 3. Toss a coin. If
the coin shows head move to the right, if it is tail move to the left. Continue
tossing the coin until you reach to the bar or the home.
For
example, if you get the following sequence and start at 1: H H T H T
Your position at each step
will be: 2 3 2 3 2.
Since
you could not reach the bar of the home you should continue tossing the coin.
2. a. Starting from 1, repeat the above experiment 10
times, fill out the following table, and record the number of times you ended
up home and bar.
Coin Toss Sequence 
Position 
Destination 
H
H T H T T T 
2
3 2 3 2 1 0 
Home 






























Total Number of homes 


b. Starting from 2, repeat the above experiment 10
times, fill out the following table, and record the number of times you ended
up home and bar.
Coin Toss Sequence 
Position 
Destination 
H
H T H T T T 
3444444 
Bar 






























Total Number of homes 


c. Estimate you chance of reaching home starting from 1
by using the results of the experiments? What about your chance of reaching the
bar?
Estimate
you chance of reaching home starting from 2 by using the results of the
experiments? What about your chance of reaching the bar?
Compare
your results and justify why these estimates are different.
3. Tree Diagrams
a. Consider the following tree which represents all
possible outcomes of tossing a coin 4 times. Follow the individual branches and
determine where the drunk person will be if she/he starts at 3.
b. Since, if the drunk person reaches either home or the
bar, she/he stays there, eliminate the unnecessary branches of the tree diagram
given above.
c. Is it possible that starting from block 3 this drunk
person will never be able to reach home or the bar? Give an example.
4. From
Trees to Matrices
As
you can see representing this problem in a compact form by using the trees is
difficult. You have to be able to produce tree diagrams with infinite number of
branches for each starting point. Now let us search for a better way of
summarizing this problem.
a. Suppose that we saw this person at one of the blocks,
the person had enough time to take only one step. Where this person will be
when we come back? What are the factors that will affect your answer?
b.
c. Fill out the following table cross out the impossible
steps, for the remaining ones put down the corresponding outcome of the coin
toss experiment.
Future Position (Position after one step) 


Present Position 
0 
1 
2 
3 
4 

0 






1 






2 






3 






4 






Replace corresponding outcomes of the experiment with
the probability that matches. Now, we can represent the table of probabilities
in a compact form as follows:
Interpret
this matrix by referring
to the table in c. This matrix is called onestep transition matrix. The states 0 and 4 are impossible to leave such
states are called absorbing states. States
which are not absorbing called transient.
5. Markov Property
Suppose
that the drunk person was in block 1, then 2, then 1, then 2 and now the person
is in block 3. You want to know where the person will be after the next step.
Do you need to know that the person was in corners 1, 2, 1, 2, to determine
where the person will be after visiting block 3?
Suppose
that the present the person is in block 3, does it make any difference if this
is the persons first, second, or tenth visit of block 3, to determine where the
person will be after the next step?
6. What will Happen After Two or More Steps?
Suppose
that the drunk person was in block 1, where this person could be after two steps?
Construct a tree diagram to show possibilities. As an example we have given
below a tree diagram when the drunk person was in block 1.
Fill
out the following table cross out the impossible steps, for the remaining ones
put down the corresponding outcome of the coin toss experiment. Note that this
time we have to toss the coin two times.
Position after two
step 


Present Position 
0 
1 
2 
3 
4 

0 






1 






2 
TT 
 
TH or HT 
 
HH 

3 






4 






Replace corresponding outcomes of the experiment with
the probability that matches. Now, we can represent the table of probabilities
in a compact form as follows:
Interpret
this matrix which is called twostep transition
matrix. Verify that
In
fact if you want to know what will happen probabilistically after n steps,
nstep transition matrix will be:
By
using a calculator or MINITAB find the 4step transition matrix and interpret.
MTB >
read c1c5
DATA> 1 0 0 0 0
DATA> 0.5 0 0.5 0 0
DATA> 0 0.5 0 0.5 0
DATA> 0 0 0.5 0 0.5
DATA> 0 0 0 0 1
DATA> end
5 ROWS
READ
MTB > copy c1c5 m1
MTB > print m1
MATRIX M1
1.0 0.0
0.0 0.0 0.0
0.5 0.0
0.5 0.0 0.0
0.0 0.5
0.0 0.5 0.0
0.0 0.0
0.5 0.0 0.5
0.0 0.0
0.0 0.0 1.0
MTB > multiply m1 m1 m2
MTB > print m2
MATRIX M2
1.00 0.00
0.00 0.00 0.00
0.50 0.25
0.00 0.25 0.00
0.25 0.00
0.50 0.00 0.25
0.00 0.25
0.00 0.25 0.50
0.00 0.00
0.00 0.00 1.00
MTB >
multiply m1 m2 m3
MTB > print m3
MATRIX M3
1.000 0.000
0.000 0.000 0.000
0.625 0.000
0.250 0.000 0.125
0.250 0.250
0.000 0.250 0.250
0.125 0.000
0.250 0.000 0.625
0.000 0.000
0.000 0.000 1.000
MTB >
multiply m1 m3 m4
MTB > print m4
MATRIX M4
1.000 0.000
0.000 0.000 0.000
0.625 0.125
0.000 0.125 0.125
0.375 0.000
0.250 0.000 0.375
0.125 0.125
0.000 0.125 0.625
0.000 0.000
0.000 0.000 1.000
Ongoing Assessment and Wrapup
7. Discuss the implications of Markov
Properties. List down possible tools that can be used to model a problem. Which
one of these tools is the most powerful in your opinion? Why?
8. Give some examples where this
process can be used in the real world.
Extensions
9. (Forecasting
the weather) The chance
of rain tomorrow depend on only todayÕs whether. If it rains today, then it
will rain tomorrow with probability 0.7 and if it does not rain today then it
will rain tomorrow with probability 0.4. Form the onestep transition matrix.
Find
the twostep transition probability matrix. What is the probability that it
will rain two days from today given that it is remaining today?
Find
the fourstep transition probability matrix and interpret. Did you notice any
interesting pattern in probabilities?
Answers:
10. (War
of long distance companies) It has been observed that customers switch from one long
distance company to another according to the following transition matrix.
Find
the twostep transition matrix and interpret one and twostep transition
matrices. Find the probability that a customer who uses MCI today will again be
using MCI two months from today, assuming that she or he switches once a month.
Find
the twelve step transition matrix. Interpret the matrix.
Answer:
11. (Gossip
spreading) If a
member of a community is entrusted with some news, she/he will tell it to
her/his friends truly and without any changes with probability p. The
probability that she/he will twist it around and tell the opposite is really
negligible, say, . The probability of such a misconduct and misbehavior is
less than one in a billion. Therefore, if a member of the community receives
news, she/he will tell it to another member truly and unchanged with
probability p,
and she/he will twist it and tell its opposite with probability . Let
T= Òthe true statement about the young personÕs dogÓ
N= Òthe negation of the true statement about the personÕs
dogÓ.
Construct the onestep transition matrix. Interpret the
matrix. Find ten, twenty, one hundredstep transition probability matrices. Do
you see anything interesting? How are the probabilities changing? Can you say
that in general in this community only half of the people is telling the truth?
12. Visit the Mouse in a Maze example
and place the cat so that the chance that the mouse will end up with cheese
will be lower than the cat catching the mouse.
I.7. A Markov Chain
Activity
MARKOV CHAINS
Objective: To learn how to set up probability transition matrix
and construct an appropriate state diagram.
Activity:
1. Consider a farmer using an old tractor. The tractor
is often in the repair shop but it always takes only one day to get it running
again. The first day out of the shop it always works but on any given day
thereafter, independently of its previous history, there is a 10% chance of it
not working and thus being sent back to the shop.
Let X_{0}, X_{1},
... be random variables denoting the daily condition of the tractor, where a
one denotes the working condition and a zero denotes the failed condition.
Explain in words
ÒX_{n}=1Ó¨
ÒX_{n}=0Ó¨
Write one step transition
probability matrix
P=
Here is the state diagram
for this Markov chain:
2. A salesperson lives in town a and is responsible for towns a, b, and c. Each week he is required to visit a different town.
When s/he is in her/his home town, it makes no difference which town s/he
visits next so s/he flips a coin and if it heads s/he goes to b and if tails s/he goes to c. However, after spending a week away from home s/he
has a slight preference for going home so when s/he is in ether towns b or c
s/he flips two coins. If two heads occur, then s/he goes to the other town;
otherwise s/he goes to a.
Define
X_{0}=
X_{1}=
X_{16}=
Explain in words
ÒX_{n}=aÓ¨
ÒX_{n}=bÓ¨
ÒX_{n}=cÓ¨
Write one step transition
probability matrix
P=
Construct a state diagram
for this Markov chain:
3. Let {X_{n}; n=0, 1, ...} be a Markov chain
with state space {1, 2, 3, 4} and transition probabilities given by
P=
Construct a state diagram
for this Markov chain:
Starting from state 4 is it
possible not to come back state 4 again? Give an example.
What happens if you start
with state 1.
4. This example illustrates that the n need not refer to time. Consider a page of text and
represent vowels by zeroes and consonants by ones. Thus the page becomes a
string of zeros and ones. It has been indicated that the sequence of vowels and
consonants in the Samoan language forms a Markov chain, where a vowel always
follows a consonant and a vowel follows another vowel with a probability of
0.51.
Write one step transition
probability matrix
P=
ASSESSMENT
Each morning an individual
leaves her house and goes for a run. She is equally likely to leave either from
her front or back door. Upon leaving the house, she chooses a pair of running
shoes (or goes running barefoot if there are no shoes at the door from which
she departed). On her return she is equally likely to enter, and leave her
running shoes, either by the front or back door. Suppose that she owns a total
of 5 pairs of running shoes.
Let X_{n} denote the number of
pairs of shoes at the door the runner departs from at the beginning of day n.
Write one step transition
probability matrix
P=
Random Variables
For
this chapter we need to know some mathematical facts on the functions. These
facts will help us to understand the properties of the probabilistic models
that we are going to develop. A function f from a set D to a set E is a correspondence that assigns to each
element of D a unique element of
E.
Exercise 1. Suppose that coin tossed three times, leading to a
sample space S={HHH,HTT,THT,TTH,THH,HTH,HHT,TTT}. Define X=Number of heads
observed when we tossed a coin three times. Is X a function? If yes, what is D
and E?
To
be able to understand and derive the properties of the discrete probabilistic
models we need the following facts from discrete mathematics.
Binomial Theorem;: The quantity (a+b)can be expressed in the form
(a+b)= ,
where
= ; i=0 , ..., n.
For
example, we have
(a+b)= 1
(a+b)= a+b
(a+b)= a+2ab+b
(a+b)= a+3ab+3ab+b.
Note:
In probability we will use binomial theorem not to extend (a+b)but to evaluate the sums of the form . For example can you figure out the value of the
following sum?
= ? = Hint:
a=?=
, b=?= 
What
about the following sum?
= ? = Maybe
we should look for a better way of evaluating this sum? 
Geometric Series:
Let
a0.
= a+ar+ar+ar+...+ar+...
is
called geometric series which
(i) converges and has the sum if r<1.
(ii)
diverges if r³1.
In
probability geometric series is used in a discrete distribution which is called
geometric distribution. Can you find the value of the following sum?
= ? = Note
that the lower limit of the sum starts from 1 not 0 as in the geometric
series. Maybe we should define i=n1. Then, what is a=?= , r=?= 
Maclaurin Series:
One
of the Maclaurian series that is used in the probability is the following
e= = 1+ a + + + ... .
By
using the Maclaurian series can you find the value of the following sum?
e= ? = 
What
about the following sum which is used to find expected value of a Poisson
random variable.
i e= ? = Note
that = , so take one of the lout of the summation and ... 
The Derivative of a Function:
One
way of understanding the behavior of a function f(x), is to look at its
derivative. With each function f(x) we associate another
function, called "the derivative of f(x)". This derivative is a
formula for the rate at which f(x) is changing as x changes, and it measures
the steepness of the graph of f(x). Here are the some of the formulas for the
derivatives that you should know(g(x) and h(x) are differentiable functions and
a, and b are arbitrary constants):
function f(x) 
its
derivative f= 
ax+b 
a 
ax+b 
nax 
e+b 
ae 
e+b 
ae 
e+b 
ge 
aln(x)+b 

h(x)g(x) 
hg(x)+h(x)g 


Table
1. Table of Derivatives for Some
Functions.
A
derivative of an integral as a function of its limit of integration can be
found by using the following relation (f(x) is a "nicely behaved"
function)
= f(x).
Derivatives of the Composite Functions: The Chain Rule
If
f=f(u) is a function of input variable u and u=u(x) is a function of the input
variable x, then
f(u(x)) = .
This
rule is useful in finding derivatives of functions with a complicated
relationship.
Exercise
2. Find the derivatives of the
following functions
(a) F(x)=1  le
(b) M(t)=
(c)
M(t)=
(d) F(x) = dz.
The Definite Integral of a Function over an Interval: The definite integral of a
function over an interval, say a²x²b, of xvalues, is a number which measures
the amount of f(x) aggregated in that interval.
Facts:
(i) If f(a) exists, then = 0.
(ii)
If f is integrable and f(x)³0 for
every x in [a,b], then the area A of the region under the graph of f from a to
b is
A = .
(iii) For a < c < b
= +
(iv)
If f(x)³0 and integrable on [a,b],
then
³ 0.
Antidifferentiation
is the reverse process to differentiation, that is antidifferentiation is the
process of finding a formula for a quantity if you its rate of change. The
process of antidifferentiation is commonly called integration or indefinite integration. The absence of the values bounding x at the top and
bottom of the integration symbol indicate that it calls for antiderivative and
not the definite integral.
= uv  
= u+ C, n1 
= lnu + C 
du = e+ C 
= a+ C 
= tan+ C 
= ln + C 
= ln + C 
Table
2. Table of Integrals
Evaluation of the Indefinite Integrals:
By reversing the table of
derivatives (Table.1), we obtain a list of antiderivatives of many basic types
of functions. More complicated functions need to be reexpressed in a simpler
form by algebraic manipulations or substitutions.
Evaluation of the definite integral by using
antiderivatives; can be done by taking the difference in the values of
an antiderivative of f(x) (if one can be determined) at x=b and x=a.
That is
= F(x)F(b)  F(a)
where,
F(x) = .
Note
that for the definite integral the constant in the antiderivative disappears.
The Three Elementary Methods of
Integration;. Given an integral to evaluate whose formula is not
in the table of integrals: what to do? Often we can obtain the result we seek
by using one of the three elementary methods of integration from the calculus.
Do you remember them? 1) Ask your roommate; 2) ask the instructor; and 3) look
it up. That is only partly facetious. There are extensive tables of integrals
in the CRCHandbook and in other sources, and "Mathematica" is
a powerful piece of computer software that will readily evaluate integrals far
more difficult than anything that we are likely to encounter. Nonetheless, the
three elementary methods are useful tools with which we should be familiar.
They are 1) integration by substitution; 2) integration by parts; and 3)
integration by partial fractions. In this course we will mostly use the first
two methods which are explained below.
Integration by Substitution;.
Indefinite Integral
I will explain the method for the following trivial
integral,
Step 1. Find
a u=u(x) such that when we plug in u the unfamiliar integral will transform
into one we recognize.
Step 2.
Find the relation between dx and du, and replace former by the later.
Step 3.
Find the integral (antiderivative) of the recognizable form.
Step 4.
Plug in u(x) in your result.
We know that = lnu + C, so reasonable choice for (recognize u(x)) u(x) = 3x+5, and (relation between dx and du) du=3dx
Þ dx= du (replace former by the later, and find
antiderivative) du = = lnu + C (put back u(x)) = ln3x+5 + C 
Definite Integral
I will explain the method for the following trivial
integral,
Step 1. Find
a u=u(x) such that when we plug in u the unfamiliar integral will transform
into one we recognize.
Step 2.
Find the relation between dx and du, and replace former by the later.
Step 3.
Find the new limits in terms of u.
Step 4.
Find the integral (antiderivative) of the recognizable form.
Step 5.
Find the value of the definite integral.
We know that = lnu + C, so reasonable choice for (recognize u(x)) u(x) = 3x+5, and (relation between dx and du) du=3dx
Þ dx= du (replace former by the later, find new limits,
and find antiderivative) = du = = {ln11  ln5} = ln 
Integration by Parts.
(Attention: This is the method that we are going
to use most frequently)
This method is based on the simple rule for finding
the differential of a product:
d(uv)
= vdu + udv
Rearranging and integrating,
= uv 
it being understood that if these are not definite
integrals, we must add an integration constant. We are initially stumped by the
integral on the left, but in our mind's eye we see a way to rearrange the
integral into two parts such that we can evaluate the integral on the right.
Here is an example and the steps that you need for
this method:
integral 
dx 
define u and dv 
u=x and dv=edx (Note that we know how to find the
antiderivative of dv) 
find du and v 
du=1dx and v== e 
plug in u, du, v, and dv into the formula for
integration by parts = uv  
dx = xe dx 
and the result 
dx = xe e+ C 
For some problems it is necessary to repeatedly use
integration by parts. For such type of problems it is more useful to produce
reduction formulae. For example using integration by parts we can establish
that
= xe dx 
This formula can be used repeatedly until all powers
of x have been removed.
In this section, we simply remind you of some of the
techniques on the calculations of the limits that we find most useful in
probability theory. These techniques will be useful on understanding Central
Limit Theorem, and finding moments by using moment generating functions.
Early in a calculus course the existence of the
following limit is discussed and is denoted by the letter e:
e = lim= lim 
Often it is rather easy to see the value of certain
limits. For example, with 1 < r < 1, the sum of the geometric
progression allows us to write
lim= lim= .
Since limr= 0, the limit of the ratio can be easily found.
However it is not that easy to determine the limit of
every ratio; for example consider
lim= lim.
Since both the numerator and the denominator of the
latter ratio are unbounded, we will not be able to find the limit easily.
Consider the ratio f(x)/g(x), if the limits of f(x)
and g(x) have the limits ° or ° as x approaches to c, we say
that f(x)/g(x) has the indeterminate form °/° at x=c. If the limits of both functions approaches to 0, then we say that
f(x)/g(x) has the indeterminate form 0/0 at x=c. L'Hospital's Rule can be
applied to the indeterminate forms to find the limit of the ratio.
L'Hopital's Rule: Suppose
the functions f and g are differentiable on an open interval (a,b) containing
c, except possibly at c itself. If g'(x) 0 for x c, and if f(x)/g(x) has
the intermediate form 0/0 or °/° at x=c, then lim= lim provided that lim[f'(x)/g'(x)] exists or lim[f'(x)/g'(x)] = °. 
Therefore, limcan be found taking the limit of the ratio of the
derivative of the numerator and the derivative of the denominator. We have
lim= lim= 0.
Note that you can use L'Hopital's rule more than
once.
Some Results Involving
Multivariate Calculus
In this section we only make some suggestions about
functions of two variables, say
z = g(x
But these results can be extended to more than two
variables. Usually all the results that we have learned for the one variable
case can be used for the two variable case by treating the "other"
variable as a constant. The two first partial derivatives
with respect to x and y, denoted by and can be found in the usual manner of differentiating
by treating the "other" variable constant. For illustration consider
the following simple examples,
z = f(x,y) = xy
= = = y
and
= = (e)(2xy).
The second partial derivatives are
simply first partial derivatives of the first partial derivatives. If z=xy,
then
= = 1.
For notation we use
= ,
= ,
= ,
= .
In general,
= ,
provided that partial derivatives involved are
continuous functions.
The value of the double integral
can usually be evaluated by an iterated procedure;
that is, evaluating two successive single integrals. For illustration, say A =
{(x,y): 0 ² x ² 1, 0 ² y ² x}.
Then
When placing the limits on the iterated integral, note that for each fixed x between zero and one, y
is restricted to the interval zero to x (see the figure below). Also in the
inner integral on y, x is treated as a constant.
In evaluating this double integral we could have
restricted y to the interval zero to one, then x would be between y and one.
That is, we would have evaluated the iterated integral
Finally, we will look at change of variables in a
double integral
If f(x,x) is a joint probability density function of Xand X, then the above double integral represents P[(X,X)ëA]. Consider only onetoone transformations, say y=gand y=gwith inverse transformations given by x=hand x=h. The determinant of order 2
is called the Jacobian of the transformation
is called the Jacobian of the inverse transformation.
Moreover, say the region A maps onto B in the (y,y) space. Since we are usually dealing with
probabilities in this course, we fixed the sign of the integral so that it is
positive by using the absolute value of the Jacobian. Then it is true that
That is P[(X,X)ëA] = P[(Y,Y)ëB].
Note that J(y,y) = J(x,x), but the right side is often easy to obtain,
rather than solving for xand xin terms of yand yor differentiating implicitly.
Now, let us look at the steps required for the
multivariate change of variables on the integration.
f(x,x) 

DETERMINE THE REGION A 

DETERMINE gAND g 

DETERMINE hAND h 

DETERMINE THE REGION B 

FIND THE JACOBIAN, J 

FIND THE ABSOLUTE VALUE OF THE JACOBIAN, J 

WRITE DOWN THE TRANSFORMED INTEGRAL 

Note that our objective is to develop probabilistic
models for the characteristic that we are interested with, such as survival
time of a cancer patient, life time of a TV set, or simply the midterm exam
scores of Math. 3610 students. Since the behavior of all the functions are not
the same the functional form of the model will characterize the random variable
differently. We will discuss this in detail in the next section. Now let us
look at some special functions and their properties.
The function f(x)=eis called exponential function (frequently written exp(x)). Figure 1. Graphs of eand e. Note that e=1 and e=0. ee=e (e)=e 
The gamma function, denoted by G(a) for all a>0, is given by G(a) = The gamma function satisfies the following
properties: G(a) = (a1)G(a1) a>1 G(n) = (n1)! = (n1)(n2)...(1)
n = 1, 2, ... (remember that O!=1 so G(1)=1) G( ) = . 
For a>0 and b>0, the beta function is given by beta(a,b) = The beta function has the following properties: beta(a,b) = beta(a,b) =
beta(b,a) 
Exercise 3.
The following type of problems which involves
integration in distribution theory to develop probabilistic models:
(a)
Find the value of k such that
= 1.
Answer: k>0.
(b)
Find the value of c such that
= 1.
Answer: c=12.
(c)
Find the value of c such that
= 1.
Hint: Use the definition of the beta function.
Answer: 20.
The following integrals arises in finding distribution
functions given
probability density functions.
(d)
Find the value of the following integral
F(x) = .
(F(x) is called cumulative distribution function.)
Answer: 1x; 1<x<´.
(e) Find
the value of the following integral
F(x) = .
Answer: 1  e; x>0.
The following integrals arises when we want to
find expected values.
(f)
Find the value of the following integral
E(X) = .
(The value of this integral is known as the first
moment or expected value of X)
Answer: .
(g) Find
the value of the integral
E(X) = .
Answer: .
The following type of integrals arises when we
want to find the moment generating functions.
(h)
Evaluate the following integral
f= ; l>0
(fis called moment generating function of
exponential distribution with
parameter l.)
Answer: .
(i) Evaluate
the following integral
f= .
(fis the moment generating function of a uniform random
variable over the
interval (a,b)).
Answer: .
(j)
Find the derivative of the fin (i) with respect to t.
Answer:
(k)
Find ffor (j).
(l) Let
y=xxand y=x+x. Find J(x,x)=J, and J(y,y)
Answer: J(y,y)=1/2, J(x,x)=2
Agresti, A. Categorical
Data Analysis, John Wiley & Sons, 1990
Christensen, H. B.,
Introduction to Statistics, HBJ, 1992
Dudewicz, E. J., and Mishra,
S. N., Modern Mathematical Statistics, John Wiley & Sons, 1988
Engelhardt, B., Introduction
to Probability and Mathematical Statistics, PWSKENT Publishing Company,
1992
Hogg, R.V., and Tanis, E.
A., Probability and Statistical Inference, MacMillan Publishing Company,
1993
Mendenhall, W., Wackerly, D.
D., and Scheaffer, Mathematical Statistics with Applications, PWSKENT,
1990
Sceaffer, R. L., Introduction
to Probability, PWSKENT, 1990
INDEX
Additional Definitions on Sets, 3
antiderivative, 36
Antidifferentiation, 35
Associative Laws, 3
Basic Set
Operations TC "Basic Set
Operations" \\l 4 , 3
Bayes
probabilities., 15
Beta Function
TC "Beta Function" \\l 4
, 41
Binomial Theorem, 33
change of
variables in a double integral, 39
Commutative Laws, 3
Complement, 3
conditional
probability, 12
cumulative
distribution function, 42
definite integral,
35, 36
derivative, 34
Derivatives of the
Composite Functions, 34
determinant, 39
Difference, 3
Discrete
Mathematics TC "Discrete
Mathematics" \\l 3 , 33
disjoint, 3
distribution
functions, 42
Distributive Laws, 3
double integral, 39
empty set, 3
Evaluation of the
definite integral by using antiderivatives,
36
Evaluation of the
Indefinite Integrals, 36
event, 2
expected values, 42
exponential
distribution, 42
exponential function,
40
first .i.partial
derivatives, 38
gamma function, 41
geometric
distribution, 34
Geometric Series, 33
Idempotency Laws, 3
Indefinite
Integral, 36
indefinite
integration, 35
Independent, 7, 10
independent, 6
indeterminate form
0/0, 38
indeterminate form
°/°, 38
integration, 35
Integration by Parts, 37
Integration by
Substitution, 36
Intersection, 3
join, 3
L'Hospital's Rule, 38
limits on the
iterated integral, 39
Maclaurin Series, 34
meet, 3
moment generating
function, 42
moment generating
functions, 42
mutually exclusive, 3, 6, 10
Mutually
Independent, 7
null set), 3
Pairwise
Independent, 7
partial
derivatives, 38
Poisson random
variable, 34
probability
density functions, 42
product, 3
Random Variables, 33
Reading
Probabilistic Statements TC
"Reading Probabilistic Statements" \\l 3 , 5
sample space, 2
second partial
derivatives, 38
set, 2
set notations, 4
set operations, 3
Set Theory
TC "Set Theory" \\l 3 , 2
Simple Laws and
Some Extensions on Sets TC
"Simple Laws and Some Extensions on Sets" \\l 4 , 3
Some Results
Involving Limits TC "Some
Results Involving Limits" \\l 3 , 37
Some Results
Involving Multivariate Calculus TC
"Some Results Involving Multivariate Calculus" \\l 3 , 38
Some Special
Functions TC "Some Special
Functions" \\l 3 , 40
subset, 3
sum, 3
Table of Derivatives
for Some Functions, 34
Tables, 12
The Chain Rule, 34
The Three
Elementary Methods of Integration, 36
Theorem on Sets, 3
Trees, 13
uniform random
variable, 42
Union, 3
universal set, 2
universe, 2
Venn diagram, 13
Venn diagrams, 3