# Intro to Bayes Concepts

--

# Frequentist vs. Bayesian

A frequentist approach to statistics uses the long-term frequency of events. In the comic above, the probability of rolling two dice that both come up six, is 1/36. The Bayesian approach starts with a prior belief (the sun did not explode), updates their belief based on an experiment (rolling the dice), and forms an updated belief. In this case the Bayesian has a very strong prior belief that the sun did not explode.

# Probability Definitions

A clearly defined process with an outcome is called an **experiment**. All possible outcomes of an experiment are known as the **sample space**. An **event** is a collection of outcomes.

**Intersection **is the set of elements in set A & set B. A∪B.

**Union **is the set of elements in set A or set B. A∩B.

The **complement **of A is the elements not in A. Aᶜ.

## Probability Rules

Because the sum of probabilities adds up to 1, the probability of **P( Aᶜ) = 1-P(A)**.

The probability that A & B will occur equals the probability that A will occur multiplied by the probability that B will occur *given *that A has already occurred. P(A∩B) = P(A)P(B|A)

The probability that A or B will occur is equal to the probability of A + probability of B minus the probability of A & B occurring (so it doesn’t get counted twice.) P(A∪B) = P(A) + P(B) + P(A∩B).

# Random Variables

When a variable’s outcome is a numerical outcome based on a random event, it is called a **random variable**.

Distributions can be either discrete or continuous. A **continuous random variable** is an infinite number of values, whereas a **discrete random variable** takes on countable values. For example measuring something like the number of fish caught in a river on Sunday would be a discrete variable but measuring the mass of the fish would be continuous because you can always get more precise.

# Distributions

A **distribution **is a set of all values of a variable and how frequently they occur.

A distribution is **uniform** if all outcomes are equally likely:

A **normal distribution** is centered around its mean. You probably recognize it as a bell curve from school:

A **Poisson distribution** is the probability for discrete variables that an event occurs with a constant mean rate and independence from the previous result:

A **Binomial distribution** shows the likelihood of a success or failure of an experiment:

A **Gamma distribution** can include exponential distributions, also commonly used in setting up priors:

# Bayes theorem

The equation for Bayes Theorem is: P(A|B) = P(B|A)P(A) / P(B).

In more plain English that means the probability of A given that B occurred is equal to the Probability that B will occur given that A has occurred multiplied that the probability that A occurs independently, all over the probability that B occurs independently.

## Example:

Let’s say it’s before 2020 and people were still going to the movies. You’re a movie buff and go once a week. You buy a popcorn half the time that you go. You eat popcorn about three times a week. What’s the probability that you’re at the movies given that you’re eating popcorn on a given day?

P(movies|popcorn) = P(popcorn|movies)P(movies) / P(popcorn)

The probability that you’re at the movies given you’re eating popcorn *P(movies|popcorn)* equals the probability that you’re eating popcorn given you’re at the movies *P(popcorn|movies)* times the probability you’re at the movies *P(movies)*, divided by the probability you’re having popcorn *P(popcorn)*. Let’s break it down and then plug it in:

P(movies) = 1/7, or about 0.143 since you go once a week.

P(popcorn) = 3/7 or about 0.429 since you eat popcorn three times a week.

P(popcorn | movies) = 0.5 because you buy popcorn half the time you’re at the movies.

So P(movies | popcorn) = 0.5 * 0.286 / 0.429 = 0.16 or a 16% chance that you’re at the movies given that you’re eating popcorn.

# Components

The components of working with Bayes Theorem are the priors, likelihood, and posterior.

**Priors** are your beliefs about the distribution before running any trials. Industry knowledge is definitely useful here. If there is no prior information to work from, you can use **Markov Chain Monte Carlo** in order to simulate a prior. You’ll need to know which distribution you’re sampling from (hence the review of distributions above.)

Priors are updated by the **likelihood**, actual experiments being run.

After the updates, a **posterior** belief of outcomes is formed.

# Closing

It can be a total mindset shift to think from a Frequentist point of view into a Bayesian one. Especially when you’ve been trained throughout your schooling to take a frequentist approach. Both approaches have their merits, based on the problem being solved.