Intro to Bayes Concepts

5 min readApr 12, 2021

Frequentist vs. Bayesian

A frequentist approach to statistics uses the long-term frequency of events. In the comic above, the probability of rolling two dice that both come up six, is 1/36. The Bayesian approach starts with a prior belief (the sun did not explode), updates their belief based on an experiment (rolling the dice), and forms an updated belief. In this case the Bayesian has a very strong prior belief that the sun did not explode.

Probability Definitions

A clearly defined process with an outcome is called an experiment. All possible outcomes of an experiment are known as the sample space. An event is a collection of outcomes.

Intersection is the set of elements in set A & set B. A∪B.

Union is the set of elements in set A or set B. A∩B.

The complement of A is the elements not in A. Aᶜ.

Probability Rules

Because the sum of probabilities adds up to 1, the probability of P( Aᶜ) = 1-P(A).

The probability that A & B will occur equals the probability that A will occur multiplied by the probability that B will occur given that A has already occurred. P(A∩B) = P(A)P(B|A)

The probability that A or B will occur is equal to the probability of A + probability of B minus the probability of A & B occurring (so it doesn’t get counted twice.) P(A∪B) = P(A) + P(B) + P(A∩B).

Random Variables

When a variable’s outcome is a numerical outcome based on a random event, it is called a random variable.

Distributions can be either discrete or continuous. A continuous random variable is an infinite number of values, whereas a discrete random variable takes on countable values. For example measuring something like the number of fish caught in a river on Sunday would be a discrete variable but measuring the mass of the fish would be continuous because you can always get more precise.

Distributions

A distribution is a set of all values of a variable and how frequently they occur.

A distribution is uniform if all outcomes are equally likely:

https://en.wikipedia.org/wiki/Continuous_uniform_distribution

A normal distribution is centered around its mean. You probably recognize it as a bell curve from school:

https://en.wikipedia.org/wiki/Normal_distribution

A Poisson distribution is the probability for discrete variables that an event occurs with a constant mean rate and independence from the previous result:

https://en.wikipedia.org/wiki/Poisson_distribution

A Binomial distribution shows the likelihood of a success or failure of an experiment:

https://en.wikipedia.org/wiki/Binomial_distribution

A Gamma distribution can include exponential distributions, also commonly used in setting up priors:

https://en.wikipedia.org/wiki/Gamma_distribution

Bayes theorem

The equation for Bayes Theorem is: P(A|B) = P(B|A)P(A) / P(B).

In more plain English that means the probability of A given that B occurred is equal to the Probability that B will occur given that A has occurred multiplied that the probability that A occurs independently, all over the probability that B occurs independently.

Example:

Let’s say it’s before 2020 and people were still going to the movies. You’re a movie buff and go once a week. You buy a popcorn half the time that you go. You eat popcorn about three times a week. What’s the probability that you’re at the movies given that you’re eating popcorn on a given day?

P(movies|popcorn) = P(popcorn|movies)P(movies) / P(popcorn)

The probability that you’re at the movies given you’re eating popcorn P(movies|popcorn) equals the probability that you’re eating popcorn given you’re at the movies P(popcorn|movies) times the probability you’re at the movies P(movies), divided by the probability you’re having popcorn P(popcorn). Let’s break it down and then plug it in:

P(movies) = 1/7, or about 0.143 since you go once a week.

P(popcorn) = 3/7 or about 0.429 since you eat popcorn three times a week.

P(popcorn | movies) = 0.5 because you buy popcorn half the time you’re at the movies.

So P(movies | popcorn) = 0.5 * 0.286 / 0.429 = 0.16 or a 16% chance that you’re at the movies given that you’re eating popcorn.

Components

The components of working with Bayes Theorem are the priors, likelihood, and posterior.

Priors are your beliefs about the distribution before running any trials. Industry knowledge is definitely useful here. If there is no prior information to work from, you can use Markov Chain Monte Carlo in order to simulate a prior. You’ll need to know which distribution you’re sampling from (hence the review of distributions above.)

Priors are updated by the likelihood, actual experiments being run.

After the updates, a posterior belief of outcomes is formed.

Closing

It can be a total mindset shift to think from a Frequentist point of view into a Bayesian one. Especially when you’ve been trained throughout your schooling to take a frequentist approach. Both approaches have their merits, based on the problem being solved.