Apache Hadoop is an open source software framework
written in java for distributed storage and distributed processing of very
large data sets on computer groups built from commodity hardware. All the
modules of Hadoop are constructing by a fundamental assumption failures are
commonplace and thus should be automatically handled in software by the
framework.
The core of
Apache Hadoop consists of a storage part that is HDFS (Hadoop Distributed File
System) and a processing part. Hadoop splits files into large blocks and
distributes them among the nodes of cluster.
The base Apache Hadoop framework is composed of the following
modules:
·
Hadoop Common – contains libraries and utilities needed by other
Hadoop modules;
·
Hadoop Distributed
File System (HDFS) – a
distributed file-system that stores data on commodity machines, providing very
high aggregate bandwidth across the cluster;
·
Hadoop YARN – a resource-management platform responsible for
managing computing resources in clusters and using them for scheduling of
users' applications;[5][6]and
·
Hadoop MapReduce – a programming model for large scale data processing.
Prominent Users
Yahoo!
Facebook
Hadoop Hosting in the cloud
The cloud allows organizations to deploy
Hadoop without hardware to acquire or specific setup expertise
· Hadoop on Microsoft Azure
· Hadoop on Amazon EC2/s3 services
tHAnks......
No comments:
Post a Comment