HN Academy

The best online courses of Hacker News.

Hacker News Comments on
Big Data Analysis with Apache Spark

edX · University of California, Berkeley · 9 HN points · 4 HN comments

HN Academy has aggregated all Hacker News stories and comments that mention edX's "Big Data Analysis with Apache Spark" from University of California, Berkeley.
Course Description

Learn how to apply data science techniques using parallel programming in Apache Spark to explore big data.

HN Academy Rankings
Provider Info
This course is offered by University of California, Berkeley on the edX platform.
HN Academy may receive a referral commission when you make purchases on sites after clicking through links on this page. Most courses are available for free with the option to purchase a completion certificate.

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this url.
One of the Spark-focused EdX courses[0] has a very good module on Alternating Least Squares, that will help you understand how to build recommender systems in a scalable way with Spark.

[0] https://www.edx.org/course/big-data-analysis-spark-uc-berkel...

xky
This looks good but it's starting late 2016. Are there any old courses you could recommend?
dserban
[1] http://bugra.github.io/work/notes/2014-04-19/alternating-lea...

This looks like a good introduction to ALS, albeit Python/Pandas centric.

If this is interesting then I recommend (ha!) the EdX Spark course[1]. One assignment shows how to build a recommender on the MovieLens dataset mentioned in this article.

[1] https://www.edx.org/course/introduction-big-data-apache-spar...

For everyone that wants to start working with Spark and Big Data, I recommend them to enrole into this MOOC published by UC Berkeley at EDX: https://www.edx.org/course/introduction-big-data-apache-spar...
rshaban
It's been pretty good so far -- you can sign up now and only get late deductions (if you care about the grade) on two assignments.
TDL
Thanks for that link. I've been looking for something like this to get a better working understanding of Spark.
Anyone who wants to pick up Spark basics - Berkeley (Spark was developed at Berkeley's AMPLab) in collaboration with DataBricks (Commercial company started by Spark creators) just started a free MOOC on edx: https://www.edx.org/course/introduction-big-data-apache-spar...

(If you wonder what is Spark, in a very unofficial nutshell - it is a computation / big data / analytics / machine learning / graph processing engine on top of Hadoop that usually performs much better and has arguably a much easier API in Python, Scala, Java and now R)

It has more than 5000 students so far, and the Professor seems to answer every single Piazza question (a popular student / teacher message board).

So far it looks really good (It started a week ago, so you can still catch up, 2nd lab is due only Friday 6/12 EOD, but you have 3 days "grace" period... and there is not too much to catch up)

I use Spark for work (Scala API) and still learned one or two new things.

It uses the PySpark API so no need to learn Scala. All homework labs are done in a iPython notebook. Very high quality so far IMHO.

It is followed by a more advanced spark course (Scalable Machine Learning) also by Berkeley & Databricks.

https://www.edx.org/course/scalable-machine-learning-uc-berk...

(not affiliated with edx, Berkeley or databricks, just thought it's a good place for a PSA to those interested)

The Spark originating academic paper by Matei Zaharia (Creator of Spark) got him a PHd dissertation award in 2014 by the ACM (http://www.acm.org/press-room/news-releases/2015/dissertatio...)

Spark also set a new record in large scale sorting (Beating Hadoop by far): https://databricks.com/blog/2014/11/05/spark-officially-sets...

* EDIT: typo in "Berkeley", thanks gboss for noticing :)

spacko
> It is followed by a more advanced spark course (Scalable Machine Learning)

Is it really more advanced regarding Spark? The requirements state explicitely that no prior Spark knowledge is required.

eranation
Cool, I stand correct. Thanks
tomnipotent
"... on top of Hadoop".

Can safely remove this part. Hadoop not required.

digitalzombie
Hadoop isn't require and it only run better if you fit data in memory.

Spark does micro batch processing where as Hadoop traditionally does batch processing. Hadoop yarns is different now and even with old Hadoop if you can fit it into memory it can be supposely as fast according to a meetup I've attended.

There's also Apache Flink by data artisan.

gtt
I've been struggling to set up it correctly on my debian machine. Are there debian packages or some concise tutorial? I've found some thing on the web, but certain things does not much mine and I'm lost...
annapurna
Thanks for the detailed info and context. Just signed up for my first edX course.
yzh
Thanks! I've been following the course and so far it's been awesome!
julnepht
Thanks for the plug, I have signed up as well to the class and its great !
None
None
0xFFC
I would love to learn about spark,but as some one who li e in third world country I hate edx,instead I am in love with udacity and coursera.the place I am living ,we don't have much traffic monthly ,instead we can download everything we want between 1am-6am,so there is no way to download course from edx ,simply and using it later.I wish it was on udacitg or coursera,is there any torrent for course material?
sidmitra
I'm doing the spark course. Edx has a download button on the videos, and can download PDF files for the lectures. The rest like quizes that are embeded, i just screenshot or save as pdf for posterity.

Are you sure you can't download, or maybe they've changed recently.

0xFFC
Yes I am aware of download button , but consider every course is ~50 distict video and also consider our downloading time you are going to agree with me about downloading is extermely painful ,why they just doesn't put whole material (at least just videos) like the way udacidy does.
jm0
You can download the lectures using the edx-downloader: https://github.com/shk3/edx-downloader
Jun 06, 2015 · 2 points, 0 comments · submitted by kyrre
Jun 02, 2015 · 2 points, 0 comments · submitted by theyeti
May 13, 2015 · 1 points, 0 comments · submitted by tomaskazemekas
Jan 31, 2015 · 2 points, 0 comments · submitted by sonabinu
Dec 05, 2014 · 2 points, 0 comments · submitted by myth_drannon
HN Academy is an independent project and is not operated by Y Combinator, Coursera, edX, or any of the universities and other institutions providing courses.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.