When most people think about analytics and Hadoop, they tend to think of technologies such as Hive, Pig and Impala as the main tools a data analyst uses. When you talk to data analysts and data scientists though, they’ll usually tell you that their primary tool when working on Hadoop and big data sources is in fact “R”, the open-source statistical and modelling language “inspired” by SAS but now with its own rich ecosystem, and particularly suited to the data preparation, data analysis and data correlation tasks you’ll often do on a big data project
http://www.rittmanmead.com/2014/03/running-r-on-hadoop-using-oracle-r-advanced-analytics-for-hadoop/
Case Study
-
This note isn’t a “real” case study, i.e. it’s not going to show you
details of the data, indexes, plans etc. from a production system, but it
is modelling...
Acum o săptămână
Niciun comentariu:
Trimiteți un comentariu