Online courses recommended by Hacker News users. [about]

Distributed Machine Learning with Apache Spark

edX · University of California, Berkeley · 5 HN citations

Learn the underlying principles required to develop scalable machine learning pipelines and gain hands-on experience using Apache Spark.

View on edX
The vast majority of the courses listed here on HN.Academy are available from their providers for free. Many courses offer a completion certification for a fee. A few courses and specializations require an enrollment fee. HN.Academy receives a referral commission when you visit course pages through links on this site and then purchase courses and completion certificates. If you decide to purchase a certificate or course the commission does not increase the cost of the course and helps support the continued existence of HN.Academy which is much appreciated.

Hacker News Comments about Distributed Machine Learning with Apache Spark

All the comments and stories posted to Hacker News that reference this course.
Sep 01, 2015 minimaxir on Building a Movie Recommendation Service with Apache Spark and Flask !

It's one of the assignments. (The collaborative filtering one)

Jul 17, 2015 century19 on Ask HN: What is the best way to learn Machine Learning in Python?
The is an edX course going that covers Machine Learning with Python, though it does require "...familiarity with basic machine learning concepts".

"All exercises will use PySpark, but previous experience with Spark or distributed computing is NOT required. "

Jul 17, 2015 gbersac on How to learn data science
I am doing this course and find it really good :

It is about creating a linear and logistic regression + pca using spark (python api).

Jun 11, 2015 eranation on Announcing Apache Spark 1.4
Anyone who wants to pick up Spark basics - Berkeley (Spark was developed at Berkeley's AMPLab) in collaboration with DataBricks (Commercial company started by Spark creators) just started a free MOOC on edx:

(If you wonder what is Spark, in a very unofficial nutshell - it is a computation / big data / analytics / machine learning / graph processing engine on top of Hadoop that usually performs much better and has arguably a much easier API in Python, Scala, Java and now R)

It has more than 5000 students so far, and the Professor seems to answer every single Piazza question (a popular student / teacher message board).

So far it looks really good (It started a week ago, so you can still catch up, 2nd lab is due only Friday 6/12 EOD, but you have 3 days "grace" period... and there is not too much to catch up)

I use Spark for work (Scala API) and still learned one or two new things.

It uses the PySpark API so no need to learn Scala. All homework labs are done in a iPython notebook. Very high quality so far IMHO.

It is followed by a more advanced spark course (Scalable Machine Learning) also by Berkeley & Databricks.

(not affiliated with edx, Berkeley or databricks, just thought it's a good place for a PSA to those interested)

The Spark originating academic paper by Matei Zaharia (Creator of Spark) got him a PHd dissertation award in 2014 by the ACM ( )

Spark also set a new record in large scale sorting (Beating Hadoop by far):

* EDIT: typo in "Berkeley", thanks gboss for noticing :)

May 21, 2015 nl on Software development skills for data scientists
Do they really get better? I'm either going to jump straight to the (R) Statistical Inference[1] course from JHU, or switch to the Berkeley/EdX Spark course[2].

I use a lot more Spark in my day job than R, but I really should learn statistics more formally.



Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
HN.Academy is an independent project and is not managed or owned by Y Combinator, Coursera, edX, or any of the universities and other institutions providing courses.
~ [email protected]
;laksdfhjdhksalkfj more things ~ Privacy Policy ~