Where to start with Machine Learning

* * *

Patrick Fox

What is Machine Learning?

It's not just stats

Just like biology isn't chemistry, machine learning is not just statistics.

It's not a solution to everything

You don't need terabytes of data but you shouldn't implement a ML solution for everything.


Is the science of interpreting data

Machine Learning:

Collection of algorithms which learn how to interpret data based from example

Don't panic

No extensive stats knowledge needed

Unless you want to build your own ML algorithm

Machine Learning techniques

Learning from example

Data usually takes the following form

Example A car has these features

  • Wheels

  • Color

  • Engine Size

  • Manufacturer

  • Milage

  • etc...

What features are more important for distinguishing between cars? In other words which features contain the most information.


Machine learning technique which will take all of the features and reconstruct new features. This reduced set of data will make it easier for classifying objects.

Common misconception is that PCA selects the best features

Manifold learning

Non-linear dimensionality reduction. Like PCA but finds features that correlate non-linearly.

Lots of algortihms to try but are slower. Sometimes results worse than PCA.

Naive Bayes

Classification technique which uses the assumption that all features are independent for each other. Given that my dataset had five male smokers, 3 female smokers, 2 male nonsmokers and 4 female nonsmokers. Is my new datapoint a male given that he smokes?

Basic but fast. Outperformed by Random Forests.

Neural Networks

Maps a set of features to a particular output. Given our car features is our car a "fast" type? Is it a valuable car? Acts as a set of mapping function that all work together.

Needs training data to find the weights of each mapping function (neuton)

Deep Learning: Neural Network with lots of layers (and complicated wiring)

Many more technques...

Usually a combination of techniques is used.

How do computers do this?

The Magic

These ML Techniques are mostly minimisation problems. And Computers (particularly GPUs) are very good at finding minima.

Well if you really must know...

Minimisation problems can be turned into root-finding problems.

Heres the proof that Newton-Raphson method will converge

Minimisation problems

  • PCA & Manifold learning

    Minimises loss of information

  • Neural Network

    Finds weights of neurons such that loss function is minimised

  • Outlier detection Algorithms

    Minimise covariance determinant

  • Clustering Algorithms

    Minimise sum of squares or distance

Mixed precision training

In 2018 Nvidia found a way of halfing memory requirements while speeding up arithmetic. This is done by performing the calculations at half precision then storing the gradients and weights of loss to very cleverly recontruct the data in full precision

New Nvidia Tensor Cores

Use mixed precision training technique

  • 3X Faster Deep Learning

    Than previous architecture

  • ~50% Less memory required

  • 672GB/s Memory Bandwidth

    2x faster than previous Titans

Where to start?

Machine Learning Made Easy

  • Make smarter solutions

    Enhance the way your tools solve problems
  • No complex maths needed

    No need to be scared
  • Plenty of tools

    Most are free and open-sourced

Thank You

Slides by: Patrick Fox