Tuesday 21 July 2015

Hadoop

                



Apache Hadoop is an open source software framework written in java for distributed storage and distributed processing of very large data sets on computer groups built from commodity hardware. All the modules of Hadoop are constructing by a fundamental assumption failures are commonplace and thus should be automatically handled in software by the framework.

The core of Apache Hadoop consists of a storage part that is HDFS (Hadoop Distributed File System) and a processing part. Hadoop splits files into large blocks and distributes them among the nodes of cluster.

The base Apache Hadoop framework is composed of the following modules:
·         Hadoop Common – contains libraries and utilities needed by other Hadoop modules;
·         Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster;
·         Hadoop YARN – a resource-management platform responsible for managing computing resources in clusters and using them for scheduling of users' applications;[5][6]and
·         Hadoop MapReduce – a programming model for large scale data processing.

Prominent Users
Yahoo!
Facebook

Hadoop Hosting in the cloud

The cloud allows organizations to deploy Hadoop without hardware to acquire or specific setup expertise
·       Hadoop on Microsoft Azure

·       Hadoop on Amazon EC2/s3 services


                                          tHAnks......

No comments:

Post a Comment