Paul Miller Aug 12, 2014 (Oct 13, 2020)

Beyond MapReduce: How the new Hadoop works

Summary
What is Hadoop?
Hadoop 2.0: beyond MapReduce
Integrating Hadoop
Where Hadoop goes next
About Paul Miller
About GigaOm
Copyright

1. Summary

In only a few years Hadoop has graduated from a personal side project to become the poster child of the nascent multibillion dollar big-data industry. Leading providers of technical solutions based on Apache Hadoop attract large investments, and Hadoop-powered success stories continue to spread beyond the Silicon Valley giants in which these technologies were initially nurtured.

New features included in Hadoop’s latest releases go some way towards freeing an increasingly capable data platform from the constraints of its early dependence on one specific technical approach: MapReduce. Those same advances are also powering a new drive to embrace the complex and diverse enterprise workloads for which MapReduce was not necessarily the most appropriate data-processing tool, and where Hadoop’s early reputation for complexity and an apparent disregard for established enterprise processes around security, audit, and governance hindered adoption.

At the same time, the big-data landscape is becoming more complex. New tools like Apache Spark were quick to integrate with Hadoop but today also function increasingly well without it. Established enterprise IT firms co-opt the Hadoop name where they can while also pushing refreshes to their own tried and tested products.

In this report we explain what Hadoop is, how it has recently transformed, discuss what it’s good for, and consider how it might evolve as technology, expectations, requirements, and the broader competitive landscape alter around it.

Key Criteria VP/Architect

Premium

Andrew J. Brust Apr 29, 2024

GigaOm Key Criteria for Evaluating Data Pipeline Solutions

Data pipelines are solutions that manage the movement and/or transformation of data, readying it for storage in a target repository and…

Radar Engineer

Premium

Andrew J. Brust Apr 15, 2024 (May 7, 2024)

GigaOm Radar for Streaming Data Platforms

Streaming data platforms ingest, process, transform, analyze, and render action from streaming data in real time. The best tools can do…

Key Criteria VP/Architect

Premium

Andrew J. Brust Mar 26, 2024

GigaOm Key Criteria for Evaluating Streaming Data Platforms

Streaming data platforms ingest, process, transform, analyze, and render action from streaming data in real time. The best tools can do…

TCO & Benchmark VP/Architect/Engineer

Premium

Comissioned Research

Eric Phenix Mar 26, 2024 (Mar 21, 2024)

GigaOm Benchmark: Testing Zoom AI Companion

This GigaOm Benchmark Report was commissioned by Zoom. Artificial Intelligence has undergone a massive leap forward with the proliferation of generative…

CxO Decision Brief CxO

Comissioned Research

Howard Holton Mar 26, 2024 (Mar 27, 2024)

CxO Decision Brief: Zoom AI Companion Amplifies Meeting Outcomes

This CxO Decision Brief commissioned by Zoom. Zoom AI Companion, embedded within the widely deployed collaboration and communications platform (at no…

Sonar

Premium

Andrew J. Brust Mar 11, 2024 (Mar 4, 2024)

GigaOm Sonar for Time-Series Databases

A time-series database maximizes the storage, retrieval, and analysis of time-stamped data. Time-series data reflects data-driven events or measurements that reveal…

Beyond MapReduce: How the new Hadoop works

Table of Contents

1. Summary

Full content available to GigaOm Subscribers.

Table of Contents

1. Summary

Related Research

Full content available to GigaOm Subscribers.