Table of Contents
- Summary
- Gnip: Licensing the Twitter Firehose
- Sulia: Filtering the Twitter Stream
- Opportunities for Social Media Data
- About David Card
- Further Reading
- About GigaOm
- Copyright
1. Summary
Recently, I attended GigaOM’s Structure: Big Data conference to take the pulse of technologies and services used to analyze and monetize the massive amounts of data produced by social media. As noted by Cloudscale CEO Bill McColl in the conference’s opening panel discussion, the data flowing from Facebook’s social network alone — 100 million entities generating tens of millions of events per second — dwarfs that of Wall Street. Twitter, meanwhile, funnels 140 public million tweets a day through the “Firehose” of data available from its streaming API.
Companies want to harness all this data because it will help them make decisions on product development and marketing, and aid in customer service based on consumer behavior. And while the promise and intent is there, the ecosystem (e.g., the technology, services and business models) that supports social media big data is immature. Companies offer point products and cobble together customized solutions, but the system isn’t yet self-sustaining or efficient. For example, many companies believe that how they obtain social data is a competitive advantage. In time, the raw data will be a commodity, while its analysis will be where the value is.
Social media data acquisition is tricky. Though most social networks offer some data access via APIs, the structure of that data, and the business terms and conditions for using it, vary widely. For instance, to protect the privacy of its users, and to preserve business opportunities like ad targeting for its own use, Facebook is fairly restrictive about how companies can use the information exposed by its APIs. To date, Facebook itself does not license the data contained in its users’ social graphs, even on an aggregated and/or anonymous basis. Braxton Woodham, CTO of Tap11, which offers feed-monitoring services in an attempt to be what his company calls “the Omniture of the real-time web,” said at the conference that Tap11 has focused its efforts on Twitter “because it’s the most open.” As it struggles to refine its own business model, Twitter has adjusted the terms under which companies can use its API data, sometimes shutting down access for productive members of its ecosystem.
Separately from the conference, I interviewed two of the companies building businesses around harnessing Twitter’s big data for developers, media companies and marketers. Gnip is in the data acquisition business and licenses Twitter’s Firehose to third parties; Sulia does some analysis and packaging of that kind of data, filtering Twitter streams into topical content channels. Both companies’ strategies and market targets are early indicators of how the social big data ecosystem is evolving.