I was recently reviewing what I have
learned and what I still needed to learn to have a comfortable grasp of ML (not
even touching non-ML AI yet). I have
made good progress, but the amount of foundational knowledge I need remains
overwhelming. The only way to make
consistent progress was to temporarily suspend experiments and focus on reading. Lots of reading!
The internet is awash with recommendations
on a basic set of ML reading material. ML
is notable for requiring both breadth and
depth. Breadth goes with the
territory. An ML enthusiast has to know
several algorithms and techniques.
Unless one’s goal is simply to use ML on a simplified sandbox without
attempting to extract best performance, technical depth is also needed. There really is no getting around learning
the fundamentals and incremental knowledge accumulation in ML.
So, after spending time inefficiently reading
disparate material of dubious levels of clarity, I realized my time is best
invested reading less, focusing on few but highly authoritative and extensive ‘primer’
books. Invariably, the following
textbooks make the top of such lists, listed in no particular order (Update, 2017: I own physical copies of the books in bold font):
·
^*Learning
from Data, by Abu-Mostafa, Malik, and Lin
·
Machine
Learning: A Probabilistic Perspective, by Kevin Murphy
·
Pattern
Recognition and Machine Learning, by Christopher Bishop
·
*Bayesian
Reasoning and Machine Learning, by David Barber
·
*Information
Theory, Inference and Learning Algorithms, by David MacKay
·
*Elements
of Statistical Learning, by Hastie, Tibshirani, and Friedman (and its easier
version, *Introduction to Statistical Learning: with Applications in R, by James,
Witten, Hastie, and Tibshirani)
·
^Machine
Learning, by Tom Mitchell
·
#Deep
Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
Most of these ML books are expensive. Textbooks in general are expensive, more so
in the United States. It is expensive
enough that there seems to be a gray market for imported non-US-edition books
(e.g., apparently printed on thinner paper but similar content).
So, what does a budget constrained reader
do? Well, with great thanks to the
authors who convinced their publishers to allow an online copy, there are
enough free online versions to get through the introductory level. (Introductory at the PhD level, btw).
The ‘*’ in the above list indicates free
PDF copy somewhere; the ‘#’ indicates a free online copy, but not a PDF, the ‘^’
indicates a free online course is available, and in LFD’s case, the free PDF of
additional chapters is only available to those who register to a companion MOOC
course or buy the book. Interestingly, I
am more inclined to buy a book after seeing a free PDF, if only to say thanks
to these kind authors!
Murphy’s and Bishop’s book are quite
expensive at $82+. Unfortunately, they
are also the most common reference textbooks in many ML courses; by most
accounts the distinction is well-deserved.
The Learning from Data book is surprisingly inexpensive ($28 at Amazon;
I have this book). This was intentional. The authors turned down publishers who
demanded pricing above $70. The authors
self-published instead. This however
limited their media options (no e-books so far) and international distribution
network (most international readers have to import from Amazon US).
I have so far read through a few parts of the Hastie, Smola, Goodfellow,
Murphy, Barber, McKay and Abu-Mostafa’s books. They are excellent books. It is easy to be intimated and turned-off by
the dense and arcane math! A few books
even acknowledge this, and try to make it easy to readers who tend to not have
deep grounding in statistics and probability theory.
Barber, for instance, concedes:
“One
aim of part I of the book is to encourage Computer Science students into this
area. A particular difficulty that many modern students face is a limited formal
training in calculus and linear algebra, meaning that minutiae of continuous
and high-dimensional distributions can turn them away….”
“My
primary aim was to write a book for … without significant experience in
calculus and mathematics that gave an inroad into machine learning, much is
which is currently phrased in terms of probabilities and multi-variate
distributions. The aim was to encourage
students that apparently unexciting statistical concepts are actually highly
relevant for research….”
“The
literature on machine learning is vast….
In this respect, it is difficult to isolate particular areas…. The book
is written in an informal style at the expense of rigour and detailed
proofs. As an introductory textbook,
topics are naturally covered to a somewhat shallow level….”
Notice the phrasing in the last
paragraph. I can tell you that I had a
hard time reading the ‘un-rigorous’ and ‘shallow’ material! The introductory math is without a doubt
off-putting, before I could even get to proper ML topics in Part III of
Barber’s book.
Luckily, I learned a nice trick that kept
me interested. I could just jump into the
ML specific topics, skipping over the math I can’t (yet) understand, then hunting
down the needed background when there is time.
Sometimes I find that I already have enough math background to read
through some sections.
Update, April 2017:
I decided to buy a few of these books. I have PRML (Bishop), MLAPP (Murphy), and DL(Goodfellow). I have been checking their prices for a year now. PRML used to be $80s, then lingered around high $70s for a long time. One day it was high $50s on Amazon, so I bought it. :) It promptly went back to mid-$60s the next day. [It is down to $46, May 2017. Grrr....]
MLAPP went up to $90s for a long time, so I gave up. Then after I bought PRML, it went down to $70s. Bam! Then of course I saw the new DL book, now at $72. I was starting to not like reading the free PDF, and did not expect the price to go down soon being so new, so now I have three books. DL is easy-ish to read. I am finding MLAPP a bit easier than PRML so far. :)
I might buy ESL at some point, but the price seemed to be inching higher. For now, I have three books to keep me occupied.
As I am trying to understand ML math, I decided to be familiar with math notation and proof reading/writing. For discrete math and proofs, I settled on Richard Hammack's Book of Proof, which I bought a few months back. I like it. Good introduction of mathematical notation, discrete math, and set theory. For the longest time, I was tempted to buy How to Prove It, but I think this is a good replacement.
I am also very close to buying Philip Klein's Coding the Matrix, to refresh my Linear Algebra while coding.
Update, April 2017:
I decided to buy a few of these books. I have PRML (Bishop), MLAPP (Murphy), and DL(Goodfellow). I have been checking their prices for a year now. PRML used to be $80s, then lingered around high $70s for a long time. One day it was high $50s on Amazon, so I bought it. :) It promptly went back to mid-$60s the next day. [It is down to $46, May 2017. Grrr....]
MLAPP went up to $90s for a long time, so I gave up. Then after I bought PRML, it went down to $70s. Bam! Then of course I saw the new DL book, now at $72. I was starting to not like reading the free PDF, and did not expect the price to go down soon being so new, so now I have three books. DL is easy-ish to read. I am finding MLAPP a bit easier than PRML so far. :)
I might buy ESL at some point, but the price seemed to be inching higher. For now, I have three books to keep me occupied.
As I am trying to understand ML math, I decided to be familiar with math notation and proof reading/writing. For discrete math and proofs, I settled on Richard Hammack's Book of Proof, which I bought a few months back. I like it. Good introduction of mathematical notation, discrete math, and set theory. For the longest time, I was tempted to buy How to Prove It, but I think this is a good replacement.
I am also very close to buying Philip Klein's Coding the Matrix, to refresh my Linear Algebra while coding.