Intro to Bayes Concepts

Photo by Edge2Edge Media on Unsplash

Frequentist vs. Bayesian

Probability Definitions

Intersection is the set of elements in set A & set B. A∪B.

Union is the set of elements in set A or set B. A∩B.

The complement of A is the elements not in A. Aᶜ.

Probability Rules

The probability that A & B will occur equals the probability that A will occur multiplied by the probability that B will occur given that A has already occurred. P(A∩B) = P(A)P(B|A)

The probability that A or B will occur is equal to the probability of A + probability of B minus the probability of A & B occurring (so it doesn’t get counted twice.) P(A∪B) = P(A) + P(B) + P(A∩B).

Random Variables

Distributions can be either discrete or continuous. A continuous random variable is an infinite number of values, whereas a discrete random variable takes on countable values. For example measuring something like the number of fish caught in a river on Sunday would be a discrete variable but measuring the mass of the fish would be continuous because you can always get more precise.


A distribution is uniform if all outcomes are equally likely:

A normal distribution is centered around its mean. You probably recognize it as a bell curve from school:

A Poisson distribution is the probability for discrete variables that an event occurs with a constant mean rate and independence from the previous result:

A Binomial distribution shows the likelihood of a success or failure of an experiment:

A Gamma distribution can include exponential distributions, also commonly used in setting up priors:

Bayes theorem

In more plain English that means the probability of A given that B occurred is equal to the Probability that B will occur given that A has occurred multiplied that the probability that A occurs independently, all over the probability that B occurs independently.


P(movies|popcorn) = P(popcorn|movies)P(movies) / P(popcorn)

Photo by Lynda Sanchez on Unsplash

The probability that you’re at the movies given you’re eating popcorn P(movies|popcorn) equals the probability that you’re eating popcorn given you’re at the movies P(popcorn|movies) times the probability you’re at the movies P(movies), divided by the probability you’re having popcorn P(popcorn). Let’s break it down and then plug it in:

P(movies) = 1/7, or about 0.143 since you go once a week.

P(popcorn) = 3/7 or about 0.429 since you eat popcorn three times a week.

P(popcorn | movies) = 0.5 because you buy popcorn half the time you’re at the movies.

So P(movies | popcorn) = 0.5 * 0.286 / 0.429 = 0.16 or a 16% chance that you’re at the movies given that you’re eating popcorn.


Priors are your beliefs about the distribution before running any trials. Industry knowledge is definitely useful here. If there is no prior information to work from, you can use Markov Chain Monte Carlo in order to simulate a prior. You’ll need to know which distribution you’re sampling from (hence the review of distributions above.)

Priors are updated by the likelihood, actual experiments being run.

After the updates, a posterior belief of outcomes is formed.


I am a data scientist, leveraging my experience in residential architecture to apply creativity to data informed solutions and graphically communicate results.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store