Wednesday, November 13, 2013

Hadoop

At the Austin .NET Users Group on Monday I saw Brock Reeve speak on Hadoop which is an ETL in which data gets mapped to key value pairs, shuffled, and then reduced. You write rules for Hadoop in Java and Facebook hated that so they wrote a tool called Hive which sits on top of Hadoop and lets you hand in SQL which gets recompiled to the Java rules. Stuff is kept midstream in HDFS (Hadoop Distributed File Systems) mostly, but one may also use cloud services like H3. Translate from HDFS to SQL Server with Scoop.

No comments:

Post a Comment