The best approach to managing Hadoop

Table of Contents

  1. Summary
  2. DIY Hadoop on-premises is all about control
  3. How Hadoop in the cloud can lift the burden
  4. The best of both worlds: the importance of being manageable
  5. Key takeaways
  6. About George Anadiotis

1. Summary

Cloud-based Hadoop clusters receive the bulk of the attention even while most of the Hadoop action is on-premises. But an emerging solution promises to combine the best of both approaches by adding a management layer to Hadoop on-premises.

This report will investigate each of the three options on this spectrum, analyze the benefits and disadvantages of each, and help IT executives and database administrators considering Hadoop adoption or a change to their current processes.

Key findings in this report:

  • Organizations leveraging on-premises Hadoop can control their deployments and apply optimal configuration for their workloads. This eliminates recurring infrastructure costs and any worries about data leaving the premises. However, building and maintaining this kind of in-house infrastructure requires an up-front investment as well as expertise, and using it optimally is not always possible.
  • Organizations leveraging Hadoop in the cloud can achieve insight more quickly because they benefit from elasticity and do not have the burden of building and maintaining the infrastructure and developing the expertise required for supporting in-house Hadoop clusters. But costs can accumulate quickly, networking and latency usually prevent the cloud from being optimal for large workloads, and the organization is required to cede control to third parties.
  • Organizations can benefit from a management layer on top of Hadoop that enables them to have the best of both approaches, but this layer must include advanced enterprise security and systems-management features.

Thumbnail image courtesy of flickr user RachScottHalls.

Full content available to GigaOm Subscribers.

Sign Up For Free