hn.academy

Online courses recommended by Hacker News users. [about]

Big Data Analysis with Apache Spark

edX · University of California, Berkeley · 8 HN points · 3 HN citations

Learn how to apply data science techniques using parallel programming in Apache Spark to explore big data.

View on edX
The vast majority of the courses listed here on HN.Academy are available from their providers for free. Many courses offer a completion certification for a fee. A few courses and specializations require an enrollment fee. HN.Academy receives a referral commission when you visit course pages through links on this site and then purchase courses and completion certificates. If you decide to purchase a certificate or course the commission does not increase the cost of the course and helps support the continued existence of HN.Academy which is much appreciated.

Hacker News Comments about Big Data Analysis with Apache Spark

All the comments and stories posted to Hacker News that reference this course.
Jul 07, 2015 nl on The Netflix Prize and Production Machine Learning Systems: An Insider Look
If this is interesting then I recommend (ha!) the EdX Spark course[1]. One assignment shows how to build a recommender on the MovieLens dataset mentioned in this article.

[1] https://www.edx.org/course/introduction-big-data-apache-spar...

Jun 15, 2015 rpalmaotero on IBM Invests to Help Apache Spark
For everyone that wants to start working with Spark and Big Data, I recommend them to enrole into this MOOC published by UC Berkeley at EDX: https://www.edx.org/course/introduction-big-data-apache-spar...
Jun 11, 2015 eranation on Announcing Apache Spark 1.4
Anyone who wants to pick up Spark basics - Berkeley (Spark was developed at Berkeley's AMPLab) in collaboration with DataBricks (Commercial company started by Spark creators) just started a free MOOC on edx: https://www.edx.org/course/introduction-big-data-apache-spar...

(If you wonder what is Spark, in a very unofficial nutshell - it is a computation / big data / analytics / machine learning / graph processing engine on top of Hadoop that usually performs much better and has arguably a much easier API in Python, Scala, Java and now R)

It has more than 5000 students so far, and the Professor seems to answer every single Piazza question (a popular student / teacher message board).

So far it looks really good (It started a week ago, so you can still catch up, 2nd lab is due only Friday 6/12 EOD, but you have 3 days "grace" period... and there is not too much to catch up)

I use Spark for work (Scala API) and still learned one or two new things.

It uses the PySpark API so no need to learn Scala. All homework labs are done in a iPython notebook. Very high quality so far IMHO.

It is followed by a more advanced spark course (Scalable Machine Learning) also by Berkeley & Databricks.

https://www.edx.org/course/scalable-machine-learning-uc-berk...

(not affiliated with edx, Berkeley or databricks, just thought it's a good place for a PSA to those interested)

The Spark originating academic paper by Matei Zaharia (Creator of Spark) got him a PHd dissertation award in 2014 by the ACM ( http://www.acm.org/press-room/news-releases/2015/dissertatio... )

Spark also set a new record in large scale sorting (Beating Hadoop by far): https://databricks.com/blog/2014/11/05/spark-officially-sets...

* EDIT: typo in "Berkeley", thanks gboss for noticing :)

Jun 06, 2015 kyrre submitted Introduction to Big Data with Apache Spark (2 points, 0 comments)
Jun 02, 2015 theyeti submitted Introduction to Big Data with Apache Spark (2 points, 0 comments)
Jan 31, 2015 sonabinu submitted Introduction to Big Data with Apache Spark (2 points, 0 comments)
Dec 05, 2014 myth_drannon submitted Introduction to Big Data with Apache Spark – EdX Course (2 points, 0 comments)
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
HN.Academy is an independent project and is not managed or owned by Y Combinator, Coursera, edX, or any of the universities and other institutions providing courses.
~ [email protected]
;laksdfhjdhksalkfj more things
hn.academy ~ Privacy Policy ~