Resources

I was recently reviewing what I have learned and what I still needed to learn to have a comfortable grasp of ML (not even touching non-ML AI yet).  I have made good progress, but the amount of foundational knowledge I need remains overwhelming.  The only way to make consistent progress was to temporarily suspend experiments and focus on reading.  Lots of reading!

The internet is awash with recommendations on a basic set of ML reading material.  ML is notable for requiring both breadth and depth.  Breadth goes with the territory.  An ML enthusiast has to know several algorithms and techniques.  Unless one’s goal is simply to use ML on a simplified sandbox without attempting to extract best performance, technical depth is also needed.  There really is no getting around learning the fundamentals and incremental knowledge accumulation in ML.

So, after spending time inefficiently reading disparate material of dubious levels of clarity, I realized my time is best invested reading less, focusing on few but highly authoritative and extensive ‘primer’ books.  Invariably, the following textbooks make the top of such lists, listed in no particular order (Update, 2017: I own physical copies of the books in bold font):

·         ^*Learning from Data, by Abu-Mostafa, Malik, and Lin
·         Machine Learning: A Probabilistic Perspective, by Kevin Murphy
·         Pattern Recognition and Machine Learning, by Christopher Bishop
·         *Bayesian Reasoning and Machine Learning, by David Barber
·         *Information Theory, Inference and Learning Algorithms, by David MacKay
·         *Elements of Statistical Learning, by Hastie, Tibshirani, and Friedman (and its easier version, *Introduction to Statistical Learning: with Applications in R, by James, Witten, Hastie, and Tibshirani)
·         ^Machine Learning, by Tom Mitchell
·         #Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

Most of these ML books are expensive.  Textbooks in general are expensive, more so in the United States.  It is expensive enough that there seems to be a gray market for imported non-US-edition books (e.g., apparently printed on thinner paper but similar content).

So, what does a budget constrained reader do?  Well, with great thanks to the authors who convinced their publishers to allow an online copy, there are enough free online versions to get through the introductory level.  (Introductory at the PhD level, btw).

The ‘*’ in the above list indicates free PDF copy somewhere; the ‘#’ indicates a free online copy, but not a PDF, the ‘^’ indicates a free online course is available, and in LFD’s case, the free PDF of additional chapters is only available to those who register to a companion MOOC course or buy the book.  Interestingly, I am more inclined to buy a book after seeing a free PDF, if only to say thanks to these kind authors!

Murphy’s and Bishop’s book are quite expensive at $82+.  Unfortunately, they are also the most common reference textbooks in many ML courses; by most accounts the distinction is well-deserved.  The Learning from Data book is surprisingly inexpensive ($28 at Amazon; I have this book).  This was intentional.  The authors turned down publishers who demanded pricing above $70.  The authors self-published instead.  This however limited their media options (no e-books so far) and international distribution network (most international readers have to import from Amazon US).

I have so far read through a few parts of the Hastie, Smola, Goodfellow, Murphy, Barber, McKay and Abu-Mostafa’s books.  They are excellent books.  It is easy to be intimated and turned-off by the dense and arcane math!  A few books even acknowledge this, and try to make it easy to readers who tend to not have deep grounding in statistics and probability theory.

Barber, for instance, concedes:

“One aim of part I of the book is to encourage Computer Science students into this area. A particular difficulty that many modern students face is a limited formal training in calculus and linear algebra, meaning that minutiae of continuous and high-dimensional distributions can turn them away….”

“My primary aim was to write a book for … without significant experience in calculus and mathematics that gave an inroad into machine learning, much is which is currently phrased in terms of probabilities and multi-variate distributions.  The aim was to encourage students that apparently unexciting statistical concepts are actually highly relevant for research….”

“The literature on machine learning is vast….  In this respect, it is difficult to isolate particular areas…. The book is written in an informal style at the expense of rigour and detailed proofs.  As an introductory textbook, topics are naturally covered to a somewhat shallow level….”

Notice the phrasing in the last paragraph.  I can tell you that I had a hard time reading the ‘un-rigorous’ and ‘shallow’ material!  The introductory math is without a doubt off-putting, before I could even get to proper ML topics in Part III of Barber’s book.

Luckily, I learned a nice trick that kept me interested.  I could just jump into the ML specific topics, skipping over the math I can’t (yet) understand, then hunting down the needed background when there is time.  Sometimes I find that I already have enough math background to read through some sections.


Update, April 2017:
I decided to buy a few of these books.  I have PRML (Bishop), MLAPP (Murphy), and DL(Goodfellow).  I have been checking their prices for a year now.  PRML used to be $80s, then lingered around high $70s for a long time.  One day it was high $50s on Amazon, so I bought it. :)  It promptly went back to mid-$60s the next day.  [It is down to $46, May 2017.  Grrr....]

MLAPP went up to $90s for a long time, so I gave up.  Then after I bought PRML, it went down to $70s.  Bam!  Then of course I saw the new DL book, now at $72.  I was starting to not like reading the free PDF, and did not expect the price to go down soon being so new, so now I have three books.  DL is easy-ish to read.  I am finding MLAPP a bit easier than PRML so far. :)

I might buy ESL at some point, but the price seemed to be inching higher.  For now, I have three books to keep me occupied.

As I am trying to understand ML math, I decided to be familiar with math notation and proof reading/writing.  For discrete math and proofs, I settled on Richard Hammack's Book of Proof, which I bought a few months back.  I like it.  Good introduction of mathematical notation, discrete math, and set theory.  For the longest time, I was tempted to buy How to Prove It, but I think this is a good replacement.

I am also very close to buying Philip Klein's Coding the Matrix, to refresh my Linear Algebra while coding.