Bayes’ Theorem - An Introduction

This is an introduction to Bayes’ Theorem, its definition, use and importance. The term, introduction, is used for there is a lot more that can, should, and will be discussed relative to Bayes’ Theorem. The intent here is to lay a foundational understanding so that more in-depth topics such as Bayesian Inference can be covered in subsequents posts. Bayesian Inference (and all that goes with it, such as Markov Chain Monte Carlo, or MCMC) is quite fascinating and certainly worth further study. It will, however, will be briefly discussed here.

The following sections are presented herein:

Probability Review

This section is not a comprehensive review of probability, but will provide a brief overview of probability sufficient to understand Bayes’ Theorem.

To understand probability, need to first give some definitions:

  • Random Experiment: A process leading to an uncertain outcome
  • Sample Space: Collection of all possible outcomes of a random experiment
  • Event: Possible outcome(s) of a random experiment
  • Probability: A measure for how likely an event is

There are three definitions of probability, classical, empirical and subjective. Given an event \(A\), then the probability of \(A\), \(P(A)\), is defined as follows:

  • Classical:

\[ P(A) = \frac{\text{Number of outcomes that satisfy the event}}{\text{Total number of outcomes in sample space}} \]

  • Empirical (aka Relative Frequency):

\[ P(A) = \frac{\text{Number of times the event A occurs in repeated trials}}{\text{Total number of trials in a random experiment}} \]

  • Subjective: An individual opinion, or belief about the probability of occurence. Hence, \(P(A)\) is determined based on one’s own belief, or judgment.

The following briefly highlights key fundamental probability rules (using the term rule loosely) and definitions:

  • Probability of an Event: The probability of an event must be between zero and one, inclusively. That is, for an event \(A\), \(0 \leq P(A) \leq 1\). A probability of 1 implies the event happens with absolute certainity. A probability of 0 implies the event is a non possibility.

  • Sum of Probabilities: The sum of the probabilities of all possible outcomes must equal 1.

  • Complement: For a given event, \(A\), the complement of \(A\) are those outcomes in the sample space that are not outcomes in \(A\). There are many different ways to denote the complement of \(A\); the notation, \(\text{~}A\), will be used here. The \(P(\text{~}A) = 1 - P(A)\).

  • Intersection: Given two events, \(A\) and \(B\), the intersection of \(A\) and \(B\), denoted as \(A \cap B\), is the combination of all outcomes that are elements in \(A\) and \(B\) (i.e, in both).

  • Joint Probability: Given two events,\(A\) and \(B\), then the probability of \(A\) and \(B\) is defined as \(P(A \cap B)\), Note, that joint proabilities are sometimes referred to as conjoint probabilities.

  • Disjoint Events: Given two events, \(A\) and \(B\), these events are disjoint, or mutually exclusive, if there are no outcomes in common: \(A \cap B = \{\emptyset\}\). The probability of disjoint events is 0: \(P(A \cap B) = 0\).

  • Union: Given two events, \(A\) and \(B\), the union of \(A\) and \(B\), denoted as \(A \cup B\), is the combination of all outcomes that are elements in \(A\) or \(B\).

  • Addition Rule: The probability of \(A \cup B\) is: \(P(A \cup B) = P(A) + P(B) − P(A \cap B)\). If \(A\) and \(B\) are disjoint, then \(P(A \cup B) = P(A) + P(B)\).

  • Independence: Two events, \(A\) and \(B\), are independent if the occurrence of one event does not affect the occurrence of the other. In probabilistic terms, if the \(P(A)\) is the same regardless of \(P(B)\), and vice versa, then the two events are independent.

  • Multiplication Rule: If two events, \(A\) and \(B\), are independent then the probability of both occurring is the product of the two individual probabilities. That is: \(P(A \cap B) = P(A) \cdot P(B)\). This rule scales to \(n\) independent events.

  • Conditional Probability: For two events, \(A\) and \(B\), the conditional probability of \(A\) given \(B\) is the probability that the event \(A\) occurs given that the event \(B\) has already occurred. The notation for conditional probability is: \(P(A|B)\) (the probability of \(A\) given \(B\)).

    • The implication here is that the probability of \(A\) occurring is different if \(B\) first occurred. One way to think of this is that knowing \(B\) occurred, then based on this new information, the sample space from which the proability of \(A\) occurring has changed; it is now the sample space created by the outcome of \(B\).
    • If, however, the probability of \(A\) remains unchanged given that \(B\) first occurred, then these two events are independent. That is, if \(P(A|B) = P(A)\), and conversely, \(P(B|A) = P(B)\), then the events \(A\) and \(B\) are independent.
    • The probability of \(A\) given \(B\) is defined as: \(P(A|B) = \frac{P(A \cap B)}{P(B)}\)
    • The probability of \(B\) given \(A\) is defined as: \(P(B|A) = \frac{P(A \cap B)}{P(A)}\)

top

Bayes Theorem

The conditional probability \(P(A|B)\) is defined as:

\[ P(A|B) = \frac{P(A \cap B)}{P(B)} \]

whereas the conditional probability \(P(B|A)\) is defined as:

\[ P(B|A) = \frac{P(A \cap B)}{P(A)} \]

and, \(P(A|B) \ne P(B|A)\). So, the question becomes, are these conditional probabilities, \(P(A|B)\) and \(P(B|A)\) related? A careful (or perhaps not so careful) study of the above equations shows that a common term in both is \(P(A \cap B)\). If one was to solve for \(P(A \cap B)\) in the \(P(B|A)\) equation:

\[ \begin{align} P(B|A) &= \frac{P(A \cap B)}{P(A)} \\ \\ P(A \cap B) &= P(B|A)P(A) \end{align} \]

and substitute into the \(P(A|B)\) equation yields:

\[ \begin{align} P(A|B) &= \frac{P(A \cap B)}{P(B)} \\ \\ P(A|B) &= \frac{P(B|A)P(A)}{P(B)} \end{align} \]

The conditional probability \(P(A|B)\) is now expressed in terms of the conditional probability \(P(B|A)\).

Repeating this idea, but now solving for \(P(A \cap B)\) in the \(P(A|B)\) equation and then making the appropriate substitution yields:

\[ \begin{align} P(A|B) &= \frac{P(A \cap B)}{P(B)} \\ \\ P(A \cap B) &= P(A|B)P(B) \\ \\ P(B|A) &= \frac{P(A \cap B)}{P(A)} \\ \\ P(B|A) &= \frac{P(A|B)P(B)}{P(A)} \end{align} \]

The conditional probability \(P(B|A)\) is now expressed in terms of the conditional probability \(P(A|B)\). Of course, \(P(B|A)\) could have been directly determined algebraically from \(P(A|B)\):

\[ \begin{align} P(A|B) &= \frac{P(B|A)P(A)}{P(B)} \\ \\ P(B|A)P(A) &= P(A|B)P(B) \\ \\ P(B|A) &= \frac{P(A|B)P(B)}{P(A)} \end{align} \]

This expression of the relationship between conditional probabilities is Bayes’s Theorem. The general definition, thus, is:

\[ P(A|B) = \frac{P(B|A)P(A)}{P(B)} \]

Bayes’ Theorem describes the fixed relationship between \(P(A)\), \(P(B)\), \(P(A|B)\) and \(P(B|A)\). Sometimes these are known as the flipping formulas for it allows the direct computation of one conditional probability from the other - that is, if one conditional probability is known, Bayes’ Theorem provides a mechanism to flip the direction of the condition. This ability to flip conditional probabilities is a key property, and use, of Bayes’s Theorem.

There are other ways to express Bayes’ Theorem as will be explored in the Breast Cancer and the Bayesian Inference Introduction examples below.

top

Conjoint Table With Joint and Marginal Probabilities

A very useful tool is the creation of a conjoint table with joint and marginal probabilities. Consider the (related) events \(A_{1}\) and \(A_{2}\) and the (related) events \(B_{1}\) and \(B_{2}\) organized in a 2 x 2 table as shown here (note, for simplicity using a 2 x 2 table but the idea scales as will be seen in the Breast Cancer example below):

Each cell contains the corresponding empirical data relative to the cell. The rows are summed as are the columns and a grand total being the sums of all the rows and columns. Each cell can then be expressed as a probability as shown here:

Each cell within the body of the table is the intersection of the two events represented by that cell. Hence, each cell represents the joint probability. The probabilities for each row and column are the marginal probabilities for they are in the margin of the table. A conjoint table with joint and marginal probabilities, then, is as shown here:

Note, that the marginal probability \(P(B_{1})\), for example, is the some of the joint probabilities in that row:

\[ P(B_{1}) = P(A_{1} \cap B_{1}) + P(A_{2} \cap B_{1}) \]

Similiar sums exist for the other marginal probabilities.

Some points to note:

  • The events must be mutually exclusive and totally exhaustive thus allowing for the probabilities to sum to 1
  • A empirical table data table can also be referred to as a contingency table which shows counts of one categorical variable contingent on the value of another

As an example of a conjoint table with joint and marginal probabilities and its usefulness, consider the marital status of individuals in the US 15 years of age or older (US Census data, 2013). The following table shows the break down of males and females for each of the five marital status categories: never been married, married, widowed, divorced or separated.

Having this empirical data thus represented, the probabilities of each cell can be computed:

Knowing that

  • Each cell represents the probability of the intersection of the row and column
  • Each row total represents the (marginal) probability of the event associated with that row
  • Each column total represents the (marginal) probability of the event associated with that column

a conjoint table with joint and marginal probabilities can be produced as follows:

A conjoint table with joint and marginal probabilities is quite useful in computing conditional probabilies:

  • Given a person who was selected at random was male, what is the probability that he has never been married? \[ \begin{align} P(\text{Never | Male}) &= \frac{P(\text{Male} \cap \text{Never})}{P(\text{Male})} \\ \\ &= \frac{0.16554}{0.48747} \\ \\ &= 0.33959 \end{align} \]

  • Given a person who was selected at random was male, what is the probability that he is divorced? \[ \begin{align} P(\text{Divorced | Male}) &= \frac{P(\text{Male} \cap \text{Divorced})}{P(\text{Male})} \\ \\ &= \frac{0.04377}{0.48747} \\ \\ &= 0.08980 \end{align} \]

  • Given a person who was selected at random was female, what is the probability that she is widowed? \[ \begin{align} P(\text{Female | Widowed}) &= \frac{P(\text{Female} \cap \text{Widowed})}{P(\text{Female})} \\ \\ &= \frac{0.04457}{0.51253} \\ \\ &= 0.08696 \end{align} \]

Next consider a slightly different question from the first one posed above:

  • Given a person who was selected at random has never been married, what is the probability that this person is a male?

\[ \begin{align} P(\text{Male | Never}) &= \frac{P(\text{Male} \cap \text{Never})}{P(\text{Never})} \\ \\ &= \frac{0.16554}{0.31276} \\ \\ &= 0.52994 \end{align} \]

This is not the same as the probability that the person has never been married given that the person is male. It is the flipped conditional probability! To see Bayes’ Theorem in action, assume that all that was known was the conditional probability \(P(\text{Never | Male})\) but the interest in knowning the conditional probablity \(P(\text{Male | Never})\). Applying Bayes’ Theorem:

\[ \begin{align} P(\text{Male | Never}) &= \frac{P(\text{Never | Male})P(\text{Male})}{P(\text{Never})} \\ \\ &= \frac{(0.33959)(0.48747)}{0.31276} \\ \\ &= 0.52994 \end{align} \]

This value of \(P(\text{Male | Never}) = 0.52994\) matches that which was computed from the conjoint table with joint and marginal probabilities, as it should! For recall, that Bayes’ Theorem established the relationship between \(P(A|B)\) and \(P(B|A)\).

top

Breast Cancer Example

Not to minimize breast cancer in any way, the following is a classic example and is worth study. Given the following data:

  • 200 out of 10,000 women of a particular age group who participate in a routine screening have breast cancer
  • 160 out of 200 women who participate in a routine screening and have breast cancer will have a positive mammogram
  • 980 out of 9,800 women who participate in a routine screening and have no breast cancer will have a false-positive mammogram

The problem statement is:

  • How many of the women of this age group who participate in a routine screening and receive positive mammograms actually have breast cancer?

The first way to go about answering this question is to create the beginnings of a conjoint table with joint and marginal probabilities:

and begin filling in with the given information.

From this information, the remaining cells of the table are easily computed:

Having the cells filled in the probabilities can be easily computed, thus producing the conjoint table with joint and marginal probabilities:

From the problem statement, the conditional probability \(P(\text{Cancer | Test Pos})\) is desired and using the table:

\[ \begin{align} P(\text{Cancer | Test Pos}) &= \frac{P(\text{Cancer} \cap \text{Test Pos})}{P(\text{Test Pos})} \\ \\ &= \frac{0.016}{0.114} \\ \\ &= 0.14035 \end{align} \]

So, roughly \(14\%\) of the women of this age group who participate in a routine screening and receive positive mammograms actually have breast cancer.

Let’s look at another view of this using Bayes’ Theorem. Recognize that

\[ \begin{align} P(\text{Cancer | Test Pos}) &= \frac{P(\text{Test Pos | Cancer})P(\text{Cancer})}{P(\text{Test Pos})} \\ \\ &= \frac{\left(\frac{0.016}{0.02}\right)(0.02)}{0.114} \\ \\ &= \frac{0.8(0.02)}{0.114} \\ \\ &= 0.14035 \end{align} \]

This value of \(P(\text{Cancer | Test Pos}) = 0.14035\) matches that which was computed from the conjoint table with joint and marginal probabilities, as it should!

In this particular case, the information provided was the probability of a positive test result given the existance of cancer. What was really asked was the probability of cancer given the positive test result. Bayes’ Theorem provided the means to flip the conditional probabilities, showing, again, that Bayes’ Theorem establishes the relationship between \(P(A|B)\) and \(P(B|A)\)! It is often the case that one conditional probability is easier to determine, or estimate, than the other. Bayes’ Theorem allows one to take full advantage of this!

Of note is that there are other ways of expressing Bayes’ Theorem. For though quite useful in the computation of conditional probabilities, these other ways can lead to other powerful uses, such as inference.

top

Bayesian Inference Introduction

Assume have an event \(A\), then also have its complement, \(\text{~}A\). Likewise for an event \(B\). The corresponding conjoint table with joint and marginal probabilities is:

Bayes’ Theorem can be expressed in a slightly different form:

\[ \begin{align} P(A|B) &= \frac{P(B|A)(P(A))}{P(B)} \\ \\ &= \frac{P(B|A)P(A)}{P(A \cap B) + P(\text{~}A \cap B)} \\ \\ &= \frac{P(B|A)(P(A))}{P(B|A)P(A) + P(B|\text{~}A)P(\text{~}A)} \end{align} \]

This form of Bayes’ Theorem is key for use in Bayesian Inference. (Note in the general form of Bayes’ Theorem, the knowledge or determination of \(P(B)\) - which is the probability of the event being conditioned on - is either unknown or difficult to acquire. The expansion into \(P(B|A)P(A) + P(B|\text{~}A)P(\text{~}A)\) often is computable. The example here showed its derivation from the logic of the table, but it is a direct consequence of the law of Total Probability.)

Applying this to our above breast cancer example, define the events \(A\) and \(B\):

  • Event \(A\):
    • A woman who gets screened (that is, has a mammogram) has breast cancer
  • Event \(\text{~}A\):
    • A woman who gets screened (that is, has a mammogram) does not breast cancer (recall, \(\text{~}A = 1 - A\))
  • Event \(B\):
    • A positive test results for a woman who has taken the mammogram
  • Event \(\text{~}B\):
    • A negative test results for a woman who has taken the mammogram

To answer the question, “How many of the women of this age group who participate in a routine screening and receive positive mammograms actually have breast cancer?”, need to find \(P(A|B)\). From the information given, can determine:

  • The probability of \(A\):

\[ P(A) = \frac{200}{10000} = 0.02 \]

  • The probability of \(\text{~}A\):

\[ P(\text{~}A) = (1 - 0.02) = 0.98 \]

  • The probability of \(B\) given \(A\):

\[ P(B|A) = \frac{160}{200} = 0.8 \]

  • The probability of \(B\) given \(\text{~}A\):

\[ P(B|\text{~}A) = \frac{980}{9800} = 0.1 \]

Hence,

\[ \begin{align} P(A|B) &= \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|\text{~}A)P(\text{~}A)} \\ \\ &= \frac{0.8(0.02)}{0.8(0.02) + 0.1(0.98)} \\ \\ &= \frac{0.016}{0.016 + 0.098} \\ \\ &= 0.14035 \end{align} \]

In Bayesian Inference terms there are two competing hypotheses: \(A\) and \(\text{~}A\) (will limit this discussion to only two). To each of these, probabilities are assigned, \(P(A)\) and \(P(\text{~}A)\). These are known as the prior probabilities. Then new data is collected (or observed), \(B\). From this, need to compute the likelihood of observing the data under each of the two hypotheses, \(P(B|A)\) and \(P(B|\text{~}A)\) (this is the hardest part). Finally, use Bayes’ Theorem to update the probability of the hypothesis of interest given the data, known as the posterior probability.

Let’s revisit:

\[ P(A|B) = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|\text{~}A)P(\text{~}A)} \]

or

\[ \text{Posterior Probability of hypothesis A given data B equals} \]

\[ \frac{\text{(Likelihood of data B given hyp A)(Prior pobability of hyp A)}}{\text{(Likelihood of data B given hyp A)(Prior pobability of hyp A)} + \text{(Likelihood of data B given hyp ~A)(Prior pobability of hyp ~A)}} \]

So,

\[ P(B|A) = \text{Likelihood of data B given hyp A} = 0.8 \]

\[ P(A) = \text{Prior pobability of hyp A} = 0.02 \]

\[ P(B|\text{~}A) = \text{Likelihood of data B given hyp ~A} = 0.1 \]

\[ P(\text{~}A) = \text{Prior pobability of hyp ~A} = 0.98 \]

\[ P(A|B) = \text{Posterior Probability of hypothesis A given data B equals} = 0.14035 \]

This interpretation is how a subjective belief can rationally change to account for new data (observations).

A lot more will be said about this in a subsequent post(s), inclusive of MCMC (Markov Chain Monte Carlo).

top

Prosecutor’s Fallacy

Unfortunately, there is not always a clear understanding of conditional probabilities (and good statistical reasoning). In some cases, that misunderstanding can arguably be intentional, to achieve some higher agenda. Regardless, this misunderstanding can, and often does, lead to some very unfortunate consequences, especially when encountered in courts of law. For this reason, the term, Prosecutor’s Fallacy, has come to be known as the general term for this type of situation (there are other less specific terms like inversion of the conditional).

In general, the Prosecutor’s Fallacy is the belief that \(P(A|B) = P(B|A)\).

To see the prosecutor’s fallacy in action consider the following commonly used hypothetical example where a defendent is on trial for some crime based on this person matching the collected evidence. Define the following events:

Events:

  • \(I\): the person on trial (defendent) is innocent
  • \(\text{~}I\): the person on trial (defendent) is guilty
  • \(E\): the collected evidence

The relevant conditional probabilities are:

  • \(P(E|I)\): the probability that an innocent defendent could match the collected evidence (the probability the evidence could match the defendent given innocence)
  • \(P(I|E)\): the probability that the collected evidence could be attributed to an innocent defendent (the probability the defendent is innocent given the matched evidence)

Assumptions:

  • Only one person could have committed the crime
  • The guilty person must have come from within a geographical area in which the population is \(500,000\)

Givens:

  • An “expert” witness for the prosecution states that there exists a 1 in \(100,000\) chance that an innocent person could match the collected evidence
    • In probabilistic terms: \(P(E|I) = \frac{1}{100000} = 0.00001\)

Prosecutor Action:

  • Based on the testimony of the “expert” witness inferred his testimony to be mean that there is a 1 in \(100,000\) chance the defendent is innocent, hence the defendent must be guilty given that a probability of \(0.00001\) is so small

Trial Verdict:

  • The defendent was found guilty by an allegedly jury of peers

Problem Statement:

  • The prosecutor committed the prosecutor’s fallacy!
    • The “expert” witness testified that \(P(E|I) = \frac{1}{100000} = 0.00001\)
    • The prosecutor inferred that \(P(I|E) = P(E|I) = 0.00001\), which meant that there was a \(\frac{1}{100000} = 0.00001\) chance that the defendent is innocent

What was really needed for the jury to hear, and understand, was what was the probability that the collected evidence could be attributed to an innocent defendent. That is, \(P(I|E)\). Enter Bayes’s Theorem which states:

\[ \begin{align} P(I|E) &= \frac{P(E|I)P(I)}{P(E)} \\ \\ &= \frac{P(E|I)P(I)}{P(E|I)P(I) + P(E|\text{~}I)P(\text{~I})} \end{align} \]

From the information presented, the probabilities needed are:

  • \(P(I)\):
    • \(P(I) = \frac{499,999}{500,000} = 0.999998\)
  • \(P(\text{~I})\):
    • \(P(\text{~I}) = 1 - P(I) = 0.000002\)
  • \(P(E|I)\):
    • \(P(E|I) = \frac{1}{100,000} = 0.00001\)
  • \(P(E|\text{~}I)\):
    • This is the probability that the collected evidence would exist given guilt
    • \(P(E|\text{~}I) = 1\)

Hence, can now compute \(P(I|E)\) using Bayes’ Theorem:

\[ \begin{align} P(I|E) &= \frac{P(E|I)P(I)}{P(E)} \\ \\ &= \frac{P(E|I)P(I)}{P(E|I)P(I) + P(E|\text{~}I)P(\text{~I})} \\ \\ &= \frac{0.00001(0.999998)}{0.00001(0.999998) + 1(0.000002)} \\ \\ &= \frac{0.00001}{0.00001 + 0.000002} \\ \\ &= \frac{0.00001}{0.000012} \\ \\ P(I|E) &= 0.83333472 \end{align} \]

Thus, there is an (approximate) 83% (or 5 out of 6) chance that given the evidence a person who matched the evidence is innocent! This, implies an (approximate) 17% (or 1 out of 6) chance of guilt. Which is a far cry from the 1 out of 100,000 chance alledged by the prosecutor!

The prosecutor’s fallacy (as understood to be a misuse, or misunderstanding, of conditional probabilities) is a very real phenomena in not only the judicial system but in many other disciplines as well - basically anywhere where there is evidence and the use of that evidence as applied to some defendent (application of the terms evidence and defendent are relative to the context of the problem). Avoidance of this misuse (more misunderstanding) can be realized by ensuring that the sought after probability is answering the right question(s). That is, it is important to see how the evidence applies to the defendent, not the other way around.

top

Bayes, The Man

Any introduction to Bayes’ Theorem would be quite remiss if at least some brief information on Bayes himself was not given.

Thomas Bayes (1701-1761) was an English mathematician and Nonconformist theologian (Nonconformists were Protestant Christians who separated from the Anglican Church). He was the (apparent) first who defined a way to use probability inductively and a mathematical basis for probability inference, That is, he defined a way to calculate the probability that an event will occur in future trials based on the frequencey of occurance in prior trials.

It wasn’t until the late 1740s that he did the work that now exists as a theorem bearing his name. Bayes did not publish his work, either by choice or by death. It is thought that Bayes may have not seen any immediate practical value to his work relative to the times in which he lived. After his death in 1761, his friend, Richard Price (1723-1791), discovered Bayes’ work and had it published as “An Essay towards solving a problem in the doctrine of chances in the Philosophical Transactions of the Royal Society of London 53 (1763). For anyone interested, the original published article can be read by visiting the Royal Society Publishing organization here: An Essay towards solving a problem in the doctrine of chances.

Bayes’ work was largely ignored. One reason, perhaps, was that the frequentist view of statistics was the predominant view of the statistical community. The Bayesian approach was subjective and thought to be unscientific. Another potential reason was that the Bayesian approach involved computations that were not readily done. With recent advances in computing technology, the Bayesian approach has (is) become more widely understood and accepted. Indeed, the Bayesian approach is used to forecast weather, detect forgeries, identify spam emails, among many others.

To explore the use of the theorem that would not die, I highly recommend reading the excellent book, “The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines & Emerged Triumphant from Two Centuries of Controversy”, by Sharon Bertsch McGrayne (Yale University Press, 2011). It is a great and fascinating read!

For a very thorough biography of Thomas Bayes, I refer the reader to the paper, The Reverend Thomas Bayes, FRS: A Biography to Celebrate the Tercentenary of His Birth, by D. R. Bellhouse.

top

Conclusion

Contained herein was a very brief introduction to Bayes’ Theorem

\[ P(A|B) = \frac{P(B|A)(P(A))}{P(B)}. \]

Hopefully insight was gleaned into not only understanding the theorem but an appreciation of its use and applicability. Repeating from above, to explore the use of the theorem that would not die, I highly recommend reading the excellent book, “The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines & Emerged Triumphant from Two Centuries of Controversy”, by Sharon Bertsch McGrayne (Yale University Press, 2011). It is a great and fascinating read!

There is so much more to delve into. Discussion of prior and posterior probabilities and likelihood functions are just a few topics. In addition, Markov Chain Monte Carlo (MCMC) techniques need presentation. These all will be discussed in subsequent posts (MC Integration and MCMC are currently in development).

top