Since the thread addresses machine learning specifically, let me throw this out there:
Most machine learning algorithms contain
no statistical models at all. I joke with people that machine learning is just statistics without any statistics.
The computational side of statistics (as opposed to study design) is ultimately a problem of optimization – using your sample data, can you identify the parameter estimates that most closely represent test data. For normally distributed errors, this maximum likelihood problem devolves into optimizing least squares, which is trivial. For non-normal distributions, the problem is not always trivial, and those who have dealt with complicated mixed models know that convergence problems are really a thing.
But the statistical assumptions going into the process are just that – assumptions. For more complicated systems that aren’t carefully controlled (meaning, just about anything that happens in life outside a lab setting), the assumptions are going to be an approximation. Often, they’re good approximations, and it’s always amazed me how often simple linear approximations work well.
But they’re still approximations. Machine learning algorithms do away with that, and simply optimize forecasts using a process. There’s still a constraint inherent in tool choice, but for many complicated processes machine learning algorithms perform vastly better than traditional statistical tools such as logistic regressions.
However, there’s a price to pay: less interpretability. The parameter estimates that come out of statistical analysis provide a reasoning that users can (sometimes) use to interpret the result. If someone asks me why the output of a linear regression changed, I should be able to trace that back to the input data and tell them why the model believes the forecast should be different. No such meaningful process exists for bagged trees, and for things like support vector machines I would argue such tracebacks aren’t enlightening.
There are times when we can have both accuracy and interpretability, but often it’s a tradeoff. I’ve made it sound like statistics is on the interpretable side, but it’s really more complicated than that. Adding multiple interactions between different variables can improve forecast accuracy, but interpretation becomes increasingly difficult, even if possible.
This is all more complicated than this (and I'm sure others will have lots of additions and corrections) but I think this gives a taste of how machine learning differs from traditional statistics. I think increasingly the two aren’t seen as different areas, and I sometimes hear the combined field called “statistical learning” – e.g.
Elements of Statistical Learning.