Defining Hadoop: the Players, Technologies and Challenges of 2011

Table of Contents

  1. Summary
  2. About Derrick Harris
  3. Introduction – Apache Hadoop
    1. What Hadoop Is (and Is Not)
  4. The Hadoop Ecosystem
    1. The Distributions
    2. Other Hadoop-Based Products
    3. ISVs Supporting Hadoop
    4. Hadoop Use Cases
      1. Who’s Using Hadoop
      2. Specific Use Cases
      3. State of Deployment (Research or Production)
      4. Deployment Size
  5. Challenges
  6. Outlook
    1. New Technologies
    2. New Opportunities
  7. Further Reading

1. Summary

The word “Hadoop” almost inevitably comes up during any discussion about big data or next-generation analytics strategies, but there still is a fair amount of confusion about what Hadoop actually is and for what types of workloads it might best be used. Most concerned about Hadoop at least know that it, and Google MapReduce, on which it was based, have been used at massive scale by large web companies for applications such as search engines. In reality, Hadoop is so much more. This report takes a closer look at that reality, examining what Hadoop is (and isn’t), who’s doing what to productize it and why we can expect to see the market pick up serious steam in 2011.

Hadoop can be used for a wide variety of data-processing workloads, some of which are broadly applicable across any industry where unstructured data volumes are proliferating. Some use cases are even pushing Hadoop beyond its natural batchprocessing sweet spot. Further, it is being used by a growing number of companies across the many industries — including those among the Fortune 100 — and there is a growing number of commercial software vendors selling products designed to make Hadoop easier to use for mainstream customers. Probably the most famous is Cloudera, which gives away its own enterprise-hardened distribution of the core Apache Hadoop project, but also provides support, services and advanced management tools.

For all the advancement, though, Hadoop still has a long way to go before it becomes as widespread as its hype suggests. One big problem is that, despite the proliferation of tools designed to simplify the process of using Hadoop, it still is not always easy for inexperienced developers to create Hadoop applications and workflows. The result is that, as more organizations try to hire personnel to start or to grow their Hadoop deployments, it becomes difficult to find qualified people.

Hadoop will continue its trek to mainstream adoption in 2011. Companies of all types and in all geographic regions likely will be advancing or beginning their big data efforts in the coming months and years, and many will benefit from some hand-holding and technological assistance into this brave new world.

Full content available to GigaOm Subscribers.

Sign Up For Free