Advanced Supervised Learning, Neural Nets, & Unsupervised Model Overviews

Photo by Alina Grubnyak on Unsplash

Advanced Supervised Learning

  • CART
  • Bootstrapping & Bagging
  • Random Forest
  • Gradient Descent
  • Boosting
  • SVMs

Neural Nets

  • Intro to Neural Nets
  • Regularizing Neural Nets
  • RNNs
  • CNNs
  • GANs

Intro to Unsupervised Learning

  • K Means
  • DB Scan
  • PCA
  • Recommender Systems to come in a later post

Advanced Supervised Learning

CART

  • Don’t have to scale our data
  • Nonparametric, so don’t make assumptions about data distributions
  • Easy to interpret: Can map out decisions made by the tree & view feature importance
  • Speed: They fit very quickly
  • Pretty much guaranteed to overfit on training data
  • Locally optimal. Because Decision Trees make the best decision at the local level, we may end up with a worse overall solution.
  • Don’t work well with unbalanced data.

Bootstrapping & Bagging

  • Can get closer to a true function by averaging out multiple model results
  • Wisdom of the crowd
  • Can lower variance by exposing the model to differing samples of the training data
  • It is harder to interpret because it’s running multiple models at once
  • Computationally more expensive
  • Trees are highly correlated so still can have high variance

Random Forest & Extra Trees

  • Reduces overfitting by using the random subset of features at each level
  • Another step beyond Bagging because trees are less correlated.
  • More complex and a little harder to interpret than a single decision tree
  • More computationally expensive
  • Individual tree models within Extra Trees are even less correlated to each other
  • Faster than Random Forest
  • Trained on the entire dataset instead of just bootstrapped samples
  • Can increase bias because of addition of random elements
  • Random node splitting instead of optimal split

Gradient Descent

Boosting

  • Computationally inexpensive because it uses an ensemble of weaker predictors
  • Sensitive to outliers because it will weigh every incorrect prediction
  • Not interpretable
  • Lots of flexibility
  • Can optimize on different loss functions
  • Computationally expensive
  • Can be prone to overfitting because it’s designed to minimize all errors

SVMs

  • Exceptional performance
  • Effective in high dimensional data
  • Works with non-linear data
  • Computationally expensive
  • SVMs always find a hyperplane
  • Results are not interpretable
  • Have to scale the data

Neural Nets

Intro to Neural Nets

  • Number of hidden layers
  • Number of nodes in each layer
  • Choice of activation functions
  • Loss function
  • Good results
  • Flexible set up
  • Can overfit to training data
  • Computationally expensive

Regularizing Neural Nets

RNNs

CNNs

GANs

Intro to Unsupervised Learning

K Means

  • Guaranteed to converge
  • Fast
  • Simple and easy to implement
  • Very sensitive to outliers because it finds the centroid in the center of all the clusters
  • Convergence on the absolute minimum versus a local minimum depends on initialization of centroids. It is recommended to start the centroids distant from each other and run the model several times to verify.

DB Scan

  • Can detect patterns that may nor be found by K-Means
  • Don’t need to specify the number of clusters
  • Great for identifying outliers
  • Fixed epsilon
  • If features overlap they will be grouped into the same cluster
  • Doesn’t work well when the clusters are of varying density

PCA

Recommender Systems to come in a later post

--

--

--

I am a data scientist, leveraging my experience in residential architecture to apply creativity to data informed solutions and graphically communicate results.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

What is Tagging? | Text Analytics

What is tagging in Text Analytics

How valuable is a certificate from the Data Science Council of America?

Unreasonable Data Science Certifications

Fine Arts Classification and Generation

Edit OpenStreetMap on your tablet with the latest iD

The Best Big Data certifications to do in 2022

Jambo Network Weekly Report 14th Aug — 20th Aug

Day 136: Can data science make DnD better? (Or much, much worse?)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
David Holcomb

David Holcomb

I am a data scientist, leveraging my experience in residential architecture to apply creativity to data informed solutions and graphically communicate results.

More from Medium

Generalized Linear Models (GLMs)

Types of Machine Learning

Bias-Variance Trade off Machine Learning

How does a model developed with machine learning turn out?