Wednesday, May 14, 2014

Hadoop Quick Info

Apache Hadoop– Hadoop is an open source software framework which allows you to cheaply store and process vast amounts of structured and unstructured data.

Flume– A service for collecting, aggregating, and moving large amounts of log and event data into Hadoop.

HBase- A scalable, distributed, column-oriented data store that runs on top of HDFS. A short video overview of Flume.

HDFS– an acronym for "Hadoop Distributed File System"

Hive- A data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. It allows you to query data using a SQL-like language called HiveQL (HQL).

HiveQL (HQL)- A SQL like query language for Hadoop used to execute MapReduce jobs on HDFS.

JobTracker– the service within Hadoop which distributes MapReduce tasks to specific nodes in the cluster.

NameNode– the core of the HDFS file system. The NameNode maintains a record of all files stored on the Hadoop cluster.

Oozie - workflow scheduler system to manage Apache Hadoop jobs.

Pig– a high level programming language for creating MapReduce programs used within Hadoop. An introduction to Pig.

Sqoop– a tool for transferring data between Hadoop and relational databases.

YARN– a resource manager for Hadoop 2. YARN is short for "Yet another resource negotiator". Introduction to YARN on the Apache Hadoop website.

ZooKeeper - Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

No comments:

Post a Comment