Despite the fact that Machine Learning draws upon on aspects of Statistics and Statistical Learning Theory (SLT), it has many differences.
Much of the math, statistics, modelling and computer science algorithm concepts that make ML possible are abstracted away in present day ML/DL code frameworks (sklearn
, pytorch
, pytorch-lightning
, etc) that allow you to train+predict using an ML model in a few lines of code. Thus, it is very natural to wonder at the differences and similarities between the concepts, particularly if you are approaching ML without academic training in/introduction to the underlying subjects.
Here I share 3 resources to hopefully clear some of the doubts.
Statistics Vs. ML: https://www.nature.com/articles/nmeth.4642
Why: Nature article that provides a helpful illustration of the difference with an example of inference in a biological process.The Actual Difference b/w Statistics and ML: https://towardsdatascience.com/the-actual-difference-between-statistics-and-machine-learning-64b49f07ea3
Why: While style might be slightly dismissive, but it explains several technicalities with good examples (from sensor data use-cases), and helpful resources.[Advanced Users] A Statistical Machine Learning Perspective to Deep Learning: http://www.cs.cmu.edu/~epxing/talks/DLtutorial2018.pdf
Why: useful insights into how DL makes use of SLT concepts, I found the following to be a good overview (it is very involved, I myself need to digest it more)
Fundamentally the cases made above are for studying data points and trying to either infer a relation between variables, or create predictive models. In very simplistic terms:
Statistics -> about relationships between variables. It is based on probability theory and assumes observations are a sample drawn from a population.
Statistical Learning Theory(SLT) -> built on the ideas of statistics and functional analysis to make build models that can infer relationships and thus predict dependent variable given data.
ML -> while ML is built atop concepts from SLT, it is focussed on predicting a target variable given an input (in supervised ML), without caring about exact relationship b/w variables. It is a function approximator, and does not assume a probabilistic model for the data generation process or observed counts.
This short writeup does NOT do justice to the topic. Always happy to discuss further!