hn.academy

Online courses recommended by Hacker News users. [about]

Data Manipulation at Scale: Systems and Algorithms

Coursera · University of Washington · 3 HN points · 8 HN citations

Data analysis has replaced data acquisition as the bottleneck to evidence-based decision making --- we are drowning in it. Extracting knowledge from large, ...

View on Coursera
The vast majority of the courses listed here on HN.Academy are available from their providers for free. Many courses offer a completion certification for a fee. A few courses and specializations require an enrollment fee. HN.Academy receives a referral commission when you visit course pages through links on this site and then purchase courses and completion certificates. If you decide to purchase a certificate or course the commission does not increase the cost of the course and helps support the continued existence of HN.Academy which is much appreciated.

Hacker News Comments about Data Manipulation at Scale: Systems and Algorithms

All the comments and stories posted to Hacker News that reference this course.
Dec 01, 2013 glimcat on Ask HN: what would you put in course for data science?
Bill Howe did a solid intro course for the University of Washington. Videos and other materials are available on Coursera.

https://www.coursera.org/course/datasci

The one thing I'd really change is to tighten up the range of tools used. It seems helpful to show students a range of tools, but it usually ends up being a major distraction for students and a lot of extra effort for course staff. Any such course is already going to be a blitz of new concepts and technology.

Go full Python, plus interactive tools as helpful (Weka, Tableu). Let them pick up R or D3.js or whatever later, after they have a better appreciation for the concepts and such which make them useful.

Nov 15, 2013 hoprocker on Launching our Data Science and Big Data Track
So looking through this 'track', I see one course which seems like it might be more central to the discipline, "Intro to Data Science"[0]. Has anybody had a chance to compare this one against Bill Howe's "Introduction to Data Science"[1] on Coursera?

[0] https://www.udacity.com/course/ud359 [1] https://www.coursera.org/course/datasci

Nov 14, 2013 jfxberns on Ask HN: To everybody who uses MapReduce: what problems do you solve?
"A lot of people fail to understand the overheads and limitations of this kind of architecture. Or how hard it is to program, especially considering salaries for this skyrocketed. More often than not a couple of large 1TB SSD PCIe and a lot of RAM can handle your "big" data problem."

It's not that hard to program... it does take a shift in how you attack problems.

If your data set fits on a few SSDs. then you probably don't have a real big data problem.

"Moving Big Data around is hard. Managing is harder."

Moving big data around is hard--that's why you have hadoop--you send the compute to where he data is, thus requiring a new way of thinking about how you do computations.

"Before doing any Map/Reduce (or equivalent), please I beg you to check out Introduction to Data Science at Coursera https://www.coursera.org/course/datasci"

Data science does not solve the big data problem. Here's my favorite definition of a big data problem: "a big data problem is when the size of the data becomes part of the problem." You can't use traditional linear programming models to handle a true big data problem; you have to have some strategy to parallelize the compute. Hadoop is great for that.

"A large telco has a 600 node cluster of powerful hardware. They barely use it."

Sounds more like organizational issues, poo planning and execution than a criticism of Hadoop!

Nov 11, 2013 hoprocker on CMU's Introduction to Machine Learning Course
For an introduction to the broader realm of data input, normalization, modeling, and visualization -- in which ML plays but a part -- you can "preview" Bill Howe's "Introduction to Data Science" class on Coursera[0]; I'm working through the lectures, and I find he gives compelling explanations of what all these parts are, why they're important, and how it all fits together in a larger context.

[0] https://www.coursera.org/course/datasci

Nov 10, 2013 alecco on Ask HN: To everybody who uses MapReduce: what problems do you solve?
A large telco has a 600 node cluster of powerful hardware. They barely use it. Moving Big Data around is hard. Managing is harder.

A lot of people fail to understand the overheads and limitations of this kind of architecture. Or how hard it is to program, especially considering salaries for this skyrocketed. More often than not a couple of large 1TB SSD PCIe and a lot of RAM can handle your "big" data problem.

Before doing any Map/Reduce (or equivalent), please I beg you to check out Introduction to Data Science at Coursera https://www.coursera.org/course/datasci

Aug 23, 2013 carlosgg on Coursera-dl – A script for downloading course material from coursera.org
Seems to work great over here, and the installation was pretty easy, too. You can even choose not to download certain types of files using the -n option. For example, if you have a large hard drive and a smaller one, you can download the whole course to the large HD:

coursera-dl -u username -p password -d pathToLargeHD course_name

and only download pdf lecture notes to the smaller one

coursera-dl -u username -p password -d pathToSmallHD -n mp4,pptx course_name

I tried that over here, worked great.

Some schools prefer students don't download course materials. I succesfully downloaded Machine Learning and Algorithms courses from Stanford but could not download this one, it says "now downloadable content found":

https://www.coursera.org/course/datasci

May 28, 2013 yahelc on Is Data Science Your Next Career?
This seems to fit the bill: https://www.coursera.org/course/datasci
Dec 06, 2012 thinkling on Introduction to Databases
Both in timing and in content this could be a good lead-in to the University of Washington's Intro to Data Science class that looks like it will have more of a focus on 'big data', NoSQL, Hadoop, data mining, etc.

https://www.coursera.org/course/datasci

http://news.ycombinator.com/item?id=4881581

Dec 06, 2012 thinkling submitted UW Introduction to Data Science (Online, Free) (2 points, 0 comments)
Jul 25, 2012 swGooF submitted Intro to Data Science via Coursera (1 points, 0 comments)
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
HN.Academy is an independent project and is not managed or owned by Y Combinator, Coursera, edX, or any of the universities and other institutions providing courses.
~ [email protected]
;laksdfhjdhksalkfj more things
hn.academy ~ Privacy Policy ~