Trang

23 thg 4, 2011

Start yourself with Apache Hadoop


What is Hadoop: Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce.
Why Hadoop: MapReduce is Google's secret weapon: A way of breaking complicated problems apart, and spreading them across many computers. Hadoop is an open source implementation of MapReduce, and its own filesystem HDFS(Hadoop distributed file system). 
Hadoop has defeated Super Computer in tera sort: Hadoop clusters sorted 1 terabyte of data in 209 seconds, which beat the previous record of 297 seconds in the annual general purpose (daytona) terabyte sort benchmark. The sort benchmark, which was created in 1998 by Jim Gray, specifies the input data (10 billion 100 byte records), which must be completely sorted and written to disk. This is the first time that either a Java or an open source program has won.
To install hadoop:

Không có nhận xét nào:

Đăng nhận xét