TL; DR: The tremendous growth predicted for the open-source Hadoop architecture for data analysis is driven by the mind-boggling increase in the amount of structured and unstructured data in organizations, and the need for sophisticated, accessible tools to extract business and market intelligence from the data. New cloud services such as Morpheus let organizations of all sizes realize the potential of Big Data analysis.
The outlook is rosy for Hadoop — the open-source framework designed to facilitate distributed processing of huge data sets. Hadoop is increasingly attractive to organizations because it delivers the benefits of Big Data while avoiding infrastructure expenses.
A recent report from Allied Market Research concludes that the Hadoop market will realize a compound annual growth rate of 58.2 percent from 2013 to 2020, to a total value of $50.2 billion in 2020, compared to $1.5 billion in 2012.
Just how “big” is Big Data? According to IBM, 2.5 quintillion bytes of data are created every day, and 90 percent of all the data in the world was created in the last two years. Realizing the value of this huge information store requires data-analysis tools that are sophisticated enough, cheap enough, and easy enough for companies of all sizes to use.
Many organizations continue to consider their proprietary data too important a resource to store and process off premises. However, cloud services now offer security and availability equivalent to that available for in-house systems. By accessing their databases in the cloud, companies also realize the benefits of affordable and scalable cloud architectures.
The Morpheus database-as-a-service offers the security, high availability, and scalability organizations require for their data-intelligence operations. Performance is maximized through Morpheus’s use of 100-percent bare-metal SSD hosting. The service offers ultra-low latency to Amazon Web Services and other peering points and cloud hosting platforms.
The Nuts and Bolts of Hadoop for Big Data Analysis
The Hadoop architecture distributes both data storage and processing to all nodes on the network. By placing the small program that processes the data in the node with the much larger data sets, there’s no need to stream the data to the processing module. The processor splits its logic between a map and a reduce phase. The Hadoop scheduling and resource management framework executes the map and reduce phases in a cluster environment.
The Hadoop Distributed File System (HDFS) data storage layer uses replicas to overcome node failures and is optimized for sequential reads to support large-scale parallel processing. The market for Hadoop really took off when the framework was extended to support the Amazon Web Services S3 and other cloud-storage file systems.
Adoption of Hadoop in small and midsize organizations has been slow despite the framework’s cost and scalability advantages because of the complexity of setting up and running Hadoop clusters. New services do away with much of the complexity by offering Hadoop clusters that are managed and ready to use: there’s no need to configure or install any services on the cluster nodes.
Netflix data warehouse combines Hadoop and Amazon S3 for infinite scalability
For its petabyte-scale data warehouse, Netflix chose Amazon’s Storage Service (S3) over the Hadoop Distributed File System for the cloud-based service’s dynamic scalability and limitless data and computational power. Netflix collects data from billions of streaming events from televisions, computers, and mobile devices.
With S3 as its data warehouse, Hadoop clusters with hundreds of nodes can be configured for various workloads, all able to access the same data. Netflix uses Amazon’s Elastic MapReduce distribution of Hadoop and has developed its own Hadoop Platform as a Service, which it calls Genie. Genie lets users submit jobs from Hadoop, Pig, Hive, and other tools without having to provision new clusters or install new clients via RESTful APIs.
There is clearly potential in combining Hadoop and cloud services, as Wired’s Marco Visibelli explains in an August 13, 2014, article. Visibelli describes how companies leverage Big Data for forecasting by scaling from small projects via Amazon Web Services and scaling up as their small projects succeed. For example, a European car manufacturer used Hadoop to combine several supplier databases into a single 15TB database, which saved the company $16 million in two years.
Hadoop opens the door to Big Data for organizations of all sizes. Projects that leverage the scalability, security, accessibility, and affordability of cloud services such as Morpheus’s database as a service have a much greater chance of success.