Thursday, June 2, 2016

MOOCS and more!


I'll slightly deviate from my standard post today.  Instead of talking about ML concepts, we will talk about how (where?) to learn ML concepts.  It is easier to write about mundane life experiences than technical posts where I have to exert extra care for precision’s sake :) .  Writing opinion is clearly easier than writing fact.  So today I'll relax a little, do a stream of consciousness analysis, and write about MOOCs for ML.

Now is a great time to study ML.  Many ML sites provide a lot of educational material to aspiring data scientists.  Even without MOOCs, anyone could get by, as I did before, by bouncing from internet link to internet link of varying degrees of sophistication, clarity, authenticity, and credibility.  However, I paid the price of a slow, disorganized learning progress.

ML is a very wide field.  I think this is what makes it hard for newcomers.  Topics are rarely self-contained.  It demands knowledge of too many different fields to start understanding things.  Without a map, it is easy to get lost and miss important fundamentals.  For example, I kept reading mentions of Random Forests as a very easy and powerful algorithm, but was too focused on convolutional neural networks that I only recently understood RFs.  This is where the guided structure of MOOCs come into play.  Plus we can be spoiled of choices with the many flavors of data science MOOC courses.

In recent weeks, I took (still taking) three MOOCs on ML and data science: Coursera's Machine Learning (Ng's course from Stanford), edX's The Analytics Edge (Bertsimas' course from MITx), and Caltech's Learning from Data (Mostafa's course from Caltech online, previously offered on edX).  Of these three, The Analytics Edge is the 'least' ML and 'most' data science (I am not ready to argue where ML and data science overlap, and if they are distinct to begin with :) ).

I took Mostafa’s course from Caltech's online site as an archive course, and therefore self-paced.  The first two are live courses with deadlines.  The MITx course has a strict weekly deadline, whereas Prof. Ng's course has a loose weekly deadline, and most assignments can be done in one go as long as completed within a final cutoff date.

"Hang on," you might say.  "That’s just three MOOCs.  That counts as 'few', let alone spoiled of choices.  Where’s the variety?"

Well, to beef up on more ML, I also signed up for archived courses on Hastie’s Statistical Learning (Intro to SL using R, not the more advanced Elements of SL), Kohler's Probabilistic Graphical Models, and Hinton's Intro to Neural Networks.  I have high hopes for all these.  The Hastie books on ISL and ESL are too good (and free!).  Kohler is one of the definitive authorities on graphical models, and Hinton is one of the three pillars of deep learning (alongside Bengio and Lecun).  Their courses have to be good material references.

That’s not all.  To complete the AI circle, I also added Berkeley's and Udacity's Intro to AI courses, Seoul National University’s robotics courses, and a couple of other intro robotics courses (ETH Zurich’s Autonomous Mobile Robots and MIT’s Underactuated Robots).  They are taught by some of the world’s preeminent researchers in these fields (e.g., Klein and Abbeel for Berkeley AI, Norvig and Thrun for Udacity AI) and amazing instructors (e.g., Frank Chongwoo Park's traditional chalk-on-blackboard drawings and instructional style is quite mesmerizing and funny at times).  These and similar lectures are all free on edX, Coursera, Udacity or the online lectures of universities (e.g., MIT OCW, Stanford Lagunita, and various online footprint of specific courses, e.g., Tom Mitchell’s Machine Learning course on CMU online).

Frankly, I have more courses to finish than time!  Homework aside (the live courses have homework), even just watching videos consume time.  For now, I focus on the first three.

There is no denying that MOOCs are a godsend to the willing.  I learned new things, reinforced things I already knew, or refreshed things I knew before that I no longer knew well after years of disuse.  They are outright good learning experiences.

But they come in two distinct flavors: breadth vs depth.  Breath often means easy; the other, hard.  Whether you’ll prefer one over the other depends on your background, and the assumptions the course makes about its audience.  Let us then frame MOOCs in the context of how it is presented to its audience (aka, the diverse peoples of the world).

To explore these polar opposites, let’s use the four courses I am taking.  On one end, we have The Analytics Edge, and to a certain extent, Coursera’s Machine Learning.  On the other, Learning from Data and Robot Mechanism and Control.

edX/MITx’s The Analytics Edge (TAE)

The world will be a better place if everyone took TAE.  Everyone that has to make a recommendation based on historical patterns or some data will be better off in their careers if they took TAE.  This is a comprehensive course on applied data analytics using R, offered under Sloan (the business school part of MIT), specifically, by the Operations Research people.  The number of cleaned and curated real world datasets it uses for lectures and homework is worth the $100 certificate (if you want a certificate, otherwise free)!

It is very comprehensive.  You will learn a wide variety of data analysis techniques in a few weeks, picking up R familiarity alongside.  You will come out able to apply the techniques on datasets of your own choosing.  But the course does not spend time on the analysis behind the techniques, so this is not a researchers' course.  It could be immensely useful for standard data analysis in businesses however, and that is how it should be.  It is geared for immediate business application.  Think of it as learning how to use the advanced functions of Excel (e.g., how to create a regression model on Excel, except here it is much easier under the R statistics package).

The course has rather easy requirements (the MITx staff only requires high school math and some basic statistics).  I think it is correct, though some participants have reasons to disagree.  I most assuredly did not need any special mathematics.  The course even skips derivations (I provided one for instance to help a classmate who was wondering how it got from equation A to equation B).  But due to different math quality in high schools around the world (or undergrad, including from US-educated students), some do complain about the misleading requirements.

I understood the source of the critique, i.e., different math backgrounds and comfort level.  But for the benefit of those who might wonder if they are ready for this course, so far I can guarantee that I did not find anything requiring advanced math.  The most 'advanced' were a log(x) and its inverse while discussing, well, logistic regression.  Most of the math is done by R.

A few concepts were momentarily confusing.  For instance, the week covering ROC curves and AUC, confusion matrix, specificity and sensitivity were hard to understand (even if I knew ROC curves already).  But the confusion could have been avoided.  The course chose a layman's negative event (the possibility of having cancer) as a positive outcome, in a case study discussing true negatives and false positives.  Maybe it was meant to drive the point that a positive event is intrinsic to what the model measures, not to its meaning outside the model.  (Genius!  Cruel, but genius....)

Overall, many will agree that the course is easy (meaning lectures are easy to understand and problems are easy to solve), but the workload is, as they typically say in technical terms, "non-trivial." :)

Trivial or otherwise, I would in no uncertain terms recommend this course to anyone who works with data (most of us do, we just don’t realize it, or we do not realize the potential of extracting valuable information from data).  Data analytics is too powerful and, with this course, too accessible (free!) to not study.  But be warned: the homework alone takes between 3-5 hours each week.

Let me say that again, because this is the 'bad' hallmark of the course.  You will sacrifice a few weekends and evenings to meet the weekly deadlines.  Three uninterrupted hours is the minimum per week for homework, even when things are going very well.  That's three required problem sets of about 20 questions each.  Each question is typically answerable within 10 seconds (e.g., type str(datasetX) in R, answer how many records in the dataset), or 5 minutes (if things are not working out and your calculations do not match any of the multiple choices), or 15 minutes (if your calculations do match one of the choices but is in fact labelled as wrong!).  Some weeks have 70-80 questions.  Make a mistake and you will have to backtrack to find the error, because the questions are procedural and continue to build on top of prior questions/answers.

This course demands absolute precision.  Some students are naturally very good at this.  Some are lucky to have careers that train them for this level of precision (e.g., debugging programs).  Many get frustrated.  Countless students declare in the course forums that the question checker has to be wrong, only to realize they made a silly error.  The course says 10-15 hours per week.  Previous participants say that is an understatement.  For those new to R and data analytics (and not used to the syntax precision required by programming languages), expect to commit more!  The lecture videos, at chipmunk voice speed, can be completed in 1.5 hours per week.  That speed is assuming prior knowledge of data analytics.  A typical participant will likely need 2-4 times that.

edX/Caltech’s Learning from Data (LFD)

For someone who wanted to learn the theory behind ML, I loved LFD!  It starts by answering the question whether it is possible to learn from data, and the limits and challenges such type of learning presents.  It is highly conceptual and mathematical right away, but somehow digestible [note my background however; it will be hard/very hard without a strong math-y undergrad].

Few ML courses take this route.  Most drop students straight into a specific algorithm on Day 1 to show how machine learning works, not mentioning that in theory, learning is not possible without some inductive bias or the presence of a pattern.  This is a rare find.  (The one other course that discussed this was Mitchell's Machine Learning course at CMU Online.)

I liked how difficult the first homework was (only nine questions)!  It forced me to write code for a perceptron straight from math equations, and the questions made me think about the concepts at a deeper context.  In fact, this was the trigger for my starting this blog.  I had the perceptron code in hand, so I decided to just experiment away.  I also bought the textbook after I got lost near the end of the fifth video, and the following lectures were about to be even more abstract.  The video lectures series seem better organized than the book, and dare I say it, easier to understand in some parts.

I highly recommend this to anyone who wants to study ML and how it works.  This is not a good introductory course (for non-CS students or those without programming background).  Those students are better off with Coursera’s flagship and highly rated Machine Learning course.

Coursera/Stanford's Machine Learning (CML)

CML is in between these TAE and LFD in the difficulty continuum.  It is conceptually harder than TAE (more math, even if most of the math is intentionally brushed aside), but far easier than LFD.  For example, being familiar with some ML, I could watch the CML videos with few pauses and repeats at high speed, but this time-saver is difficult to do on LFD, even if Prof. Mostafa already talks slow :) .  CML was the first time I was going through a well-defined progressive topic sequence, having learned ML topics piecemeal.  With that context, CML is useful as an introduction to machine learning, in the same way that TAE is useful as an introduction to data analysis for practical applications.  Both courses skip the heavy derivations.  Both would be useful to anyone, with TAE finding more immediate application in business settings.

The relative ease of CML surprised me.  I watched the original Stanford ML YouTube series by Prof. Ng, read the lecture PDFs and notes, and found that course more difficult than the Coursera version.  This is when I realized the whole point of MOOCs: for a MOOC course to be a successful MOOC course, it has to be widely-viewed (and MOOCs by definition have to be measured on its 'massive online' reach).  For it to be widely viewed, it has to be a watered down version of its original course.  It has to anticipate and cater to a hugely disparate set of backgrounds and abilities (instead of grad/PhD students).  Thus, the popular MOOC courses tend to be excellent at being introductory courses, offering a peek at a wide range of concepts without going too deep.

This is consistent with previous participants' reviews.  TAE is consistently well liked by everyone, CML is well liked by everyone (everyone being those curious about ML but taking their first ML steps), and LFD --even with a Feynman Teaching Prize winning teacher-- is not as universally liked (being considered hard to very hard by many pedestrian MOOCers, thus consequently unloved, yet well-liked by serious ML types who want to understand how the machine learns exactly).

I enjoyed all three.  I now have a more complete view of ML because I took all three (conceptual to practical).  But LFD stands out as immensely satisfying!  For someone interested in research, modifying and experimenting on standard ML techniques, I was happy to discover LFD.  It was unapologetic in saying it was a non-watered down version of a graduate-level course at Caltech, exactly as if you were taking it at Caltech.  It definitely presumed a lot of independent reading and programming outside the course to keep pace (you guessed it, like in a graduate-level course).

In my view, this tough-love approach has its place in MOOCs.  This is the same vein followed by SNU’s Robots and Mechanics course.  Professor Park explicitly declares it was a serious theoretical foundations course.  This is also why I loved the old MIT OCW lectures.  They were actual lectures of MIT classes, alongside the lecture notes, fully embracing the often abstract difficulty of the hows and whys.

I did not know where this write-up was heading when I started writing. :)  I was mostly happy to share my misadventures in the world of MOOCs.  Reading back, it seems I also wanted to make a point that the more successful MOOCs will tend to be the introductory, survey of the field type courses, while the deeper subject treatments have to remain a niche, and can be only offered once, then archived.

If the archived state of these courses is indeed an indication of the lukewarm reception (which is a failure for a MOOC course) by learners around the world, then it is a reality of the limits of MOOCs.  In a way this is to be expected.  But it is unfortunate, because there are many serious learners out there who will be well-served by these types of courses also.  Yet, I also recognize that edX and its brethren of non-profit MOOCs have to fulfill their global education mission, so the advanced topics will have to remain an occasional appearance.

No comments:

Post a Comment