Andy Thurai, Author at Gigaom

GigaOm Vendor Profile: VMware

Andy Thurai — Fri, 30 Apr 2021 19:16:57 +0000

The VMware Tanzu^TM solution suite, designed to support cloud, hybrid cloud, and containerized applications, adds observability to its portfolio with Tanzu Observability. VMware is expanding its support for cloud and Kubernetes and this platform, rebranded from the Wavefront product acquired in March 2020, is designed to help produce, maintain, and scale cloud-native applications.

Tanzu Observability offers full-stack observability using operational telemetry such as metrics, traces, histograms, span logs, and events aggregated across distributed applications, application services, container services, and the enterprise multi-cloud based on public, private, and hybrid cloud infrastructures.

The offering has been available for more than seven years now, making it one of the longest-running offerings and a mature one. Tanzu Observability provides effective integration with many AWS offerings, including AWS CloudWatch and AWS CloudTrail, as well as with Google Cloud (GCP Stackdriver) and Microsoft Azure, putting it on par with other observability solutions. VMware product integration with products such as vSphere, vSAN, TMC, TAS, and TKG shines, providing customers with complete observability for their hybrid infrastructure.

Tanzu Observability supports dependency mapping with application and service maps. The application map provides an overview of how the applications and services are linked; it allows users to focus on a specific service and to view request, error, and duration (RED) metrics for each service and its edges. The user can also view traces for the services and edges and drill down from the application map. Service map enables a user to view the service dependencies and follow the flow of request calls from service to service. The user can see the service RED metrics that reflect the service’s health—request count, error count, and trace duration at the 95th percentile, as well as overall traces (root spans) that originate in the service.

For open source enthusiasts, Tanzu Observability provides decent integration with Prometheus, Jaeger, Spring, and Zipkin, and it is OpenTelemetry compliant.

How to Read this Report

This GigaOm report is one of a series of documents that helps IT organizations assess competing solutions in the context of well-defined features and criteria. For a fuller understanding consider reviewing the following reports:

Key Criteria report: A detailed market sector analysis that assesses the impact that key product features and criteria have on top-line solution characteristics—such as scalability, performance, and TCO—that drive purchase decisions.

GigaOm Radar report: A forward-looking analysis that plots the relative value and progression of vendor solutions along multiple axes based on strategy and execution. The Radar report includes a breakdown of each vendor’s offering in the sector.

Vendor Profile: An in-depth vendor analysis that builds on the framework developed in the Key Criteria and Radar reports to assess a company’s engagement within a technology sector. This analysis includes forward-looking guidance around both strategy and product.

The post GigaOm Vendor Profile: VMware appeared first on Gigaom.

GigaOm Vendor Profile: Zebrium

Andy Thurai — Fri, 30 Apr 2021 19:15:47 +0000

Zebrium is an Observability/AIOps platform that uses unsupervised machine learning to auto-detect software problems and automatically find root causes, reducing manual labor and speeding incident response. The system requires no manual setup, instead training itself on patterns in logs and metrics to baseline the system, enabling the solution to be ready to perform incident and root cause detection within as little as one day. While most observability tools try to work the whole spectrum—from instrumentation to metrics to logs to incident correlation to root cause analysis—Zebrium concentrates on root cause identification using automated AI/ML to considerably reduce mean time to resolution (MTTR).

How to Read this Report

The post GigaOm Vendor Profile: Zebrium appeared first on Gigaom.

The Essence of Observability

Andy Thurai — Thu, 29 Apr 2021 17:00:25 +0000

This free 1-hour webinar from GigaOm features analysts David Linthicum and Andy Thurai and special guests from VMware, Harmen Van der Linde, Office of the CTO, and Clement Pang, Principal Engineer. The discussion focuses on how Observability enables organizations to respond to the evolving application management needs.

Monitoring of systems, infrastructure, and applications is familiar territory, but traditional approaches have become inadequate for multiple reasons— not least cloud and microservices-based architectures and practices such as continuous deployment. In response, Observability offers an emerging set of practices, platforms, and tools, such that events and incidents can be acted upon quickly.
In this 1-hour webinar, we’ll provide the findings from Gigaom’s Observability Key Criteria and Radar Report, together with insider commentary from the report’s authors and industry experts.

Above all, you will discover how you can gain a deeper level of visibility and insight with Observability. If you are moving to forward-facing application architecture or already there and grappling with the consequences, this webinar is for you.

The post The Essence of Observability appeared first on Gigaom.

Comprehensive Observability

Andy Thurai — Fri, 05 Mar 2021 19:24:11 +0000

I had the pleasure of producing my first analyst report in one of the hottest categories for GigaOm—cloud observability. Here are my thoughts on the research process, the technology space, and the vendors, as well as some advice for IT decision-makers.

For those who have been living under the rock, who have never heard of observability, here’s a helpful excerpt from my report:

Observability is an emerging set of practices, platforms, and tools that goes beyond monitoring to provide insight into the internal state of systems by analyzing external outputs. Monitoring has been a core function of IT for decades, but old approaches have become inadequate for a variety of reasons—cloud deployments, agile development methodology, continuous deployments, and new DevOps practices among them. These have changed the way systems, infrastructure, and applications need to be observed so events and incidents can be acted upon quickly. At the heart of the observability concept is a very basic premise: quickly learn what happens within your IT to avoid extended outages. And in the unfortunate event of an outage, you need to ensure that you can get to the root cause of it fast. Outages are measured by Mean Time To Resolution (MTTR) and it is the goal of the observability concept to drive the MTTR value to as close to zero as possible.

I had the privilege to speak with more than 20 companies and more than 50 customer executives on this topic. Some of the vendors have been around for over a century (looking at you, IBM) and some are barely a year old. For this first report, we decided to include 14 companies. The plan is to update the report shortly with two additional companies, as their briefings were delayed due to year-end, COVID-19, and other logistical reasons. While I want to save the element of surprise, the two vendors are well-known in this space. If you are wondering why a certain company that is reimagining the observability space didn’t make it into my report, fear not, it is coming soon.

A quick note about the GigaOm Radar chart and how it works compared to other charts you might be used to seeing from analyst firms. First, in the GigaOm Radar chart, the best solutions are those set closest to center. Don’t get hung up on that upper-right quadrant. Read carefully how our classifications are done. There is a clear distinction among the quadrants, with mature solutions residing in the upper hemisphere and more innovation-focused ones appearing in the lower hemisphere. Meanwhile, the left and right hemispheres indicate if a solution is more focused on individual features (Feature Play) or on broader platform engagement (Platform Play). The point being, any of the companies in my report can help you depending on your situation. So please, read all the vendor capsules carefully and not just the Radar graphic and the description that goes with it.

After all is said and done, I am blown away by the innovation that is coming out of some of these young companies. As I said earlier, the cloud has leveled the playing field, and smaller and bigger companies are now competing equally to solve customer problems.

If you are a company in the service/application/infrastructure observability space and would like to brief me, please reach out to schedule a time. I will be more than happy to engage you and possibly consider you for the upcoming refresh of the report if it is not too late.

Comprehensive Observability: Core to Future-Proofing your IT Infrastructure.

If you are a CxO struggling to make sense of this IT operations mess, I would welcome a chance to talk to you about your experience and see if there is anything I can do to help.

If you are an end user of any of these solutions in the observability space, I’d love to hear from you as I continue my research in this space. I want to know what you did right, what you did wrong, and how easy (or difficult) your journey was. If your case is compelling enough, we might want to write it up as a use-case/case study, if you are willing. Or I could host you on my podcast to discuss how you solved your challenge, or what made you select a certain solution and the thought process you went through.

I did share this report confidentially with some customer executives to get their feedback. Here is some of the feedback I received:

“I wish I had your report before I made the decision to buy xyz.”

“You are spot on with the company we are using. We chose them for the strengths you highlighted, only to find the weaknesses later on.”

“I asked my guys to stop the evaluation process of our observability mission and read your report first. I think you have some nuggets that are going to be very helpful with our process. Thank you!”

I would like to hear your views, opinions, and opposing takes. If you like the report, please help me socialize this post. If you disagree, reach out to me to let me know why—I would love to hear your views.

You can access the full report here if you are a GigaOm subscriber or client. Otherwise, you’ll be treated to an abstract (though I’m told you can expect some freely available coverage of the report on the GigaOm website next week). If you are not a client, you may want to get on board—you are missing some great work done by my colleagues. You can contact GigaOm here.

Still have questions? Reach out to me, I’m happy to answer them or get you in touch with someone at GigaOm who can. One way or the other, we can help you with your “fully observable, AIOps-infused cloud-native” journey.

Finally, check out the recent blog post I wrote in Forbes on a related topic: “AIOps vs Observability vs Monitoring—What Is The Difference? Are You Using The Right One For Your Enterprise?”

This blog post by GigaOm Analyst Andy Thurai was first published on his website, The Field CTO. It is being published here as well at Thurai’s request, so GigaOm readers can learn more about his experience and findings in the cloud observability space. You can find the original post here.

–Ed

The post Comprehensive Observability appeared first on Gigaom.

GigaOm Radar for Cloud Observability

Andy Thurai — Fri, 26 Feb 2021 16:57:41 +0000

Of course, monitoring has been a core function of IT for decades, but old approaches have become inadequate for a variety of reasons—cloud deployments, agile development methodology, continuous deployments, and new DevOps practices among them. These have changed the way systems, infrastructure, and applications need to be observed so events and incidents can be acted upon quickly.

At the heart of the observability concept is a very basic premise: quickly learn what happens within your IT to avoid extended outages. And in the unfortunate event of an outage, you need to ensure that you can get to the root cause of it fast. Outages are measured by Mean Time To Resolution (MTTR) and it is the goal of the observability concept to drive the MTTR value to as close to zero as possible.

No surprise, building resilient service delivery systems that are available with high uptime is the ultimate end goal for any business. Achieving this goal requires executing three core concepts:

Monitoring: This is about understanding if things are working properly in a service-centric manner.
Observability: This is about enabling complete end-to-end visibility into your applications, systems, APIs, microservices, network, infrastructure, and more.
AIOps: This is about using comprehensive visibility to derive meaning from the collected data to yield actionable insights and courses of action.

To achieve observability, you need to measure the golden telemetry signals—logs, metrics, and traces. Logs and metrics have been measured by IT professionals for decades, but traces is a fairly new concept that emerged as modern applications increasingly were built using distributed microservices. A service request is no longer completed by one service but rather by a composition of microservices, and as such there is an imperative to track or trace the service request from start to finish. In order to generate proper telemetry, all the underlying systems must be properly instrumented. This way enterprises can achieve full visibility into their systems to track service calls, identify outages, and determine if the impacted systems are on-premises, in the cloud, or somewhere else.

Observability is not always about introducing new tools, but about consolidating the telemetry data, properly instrumenting systems to get the appropriate telemetry, creating actionable insights, and avoiding extended outages. Comprehensive observability is core to future proofing IT infrastructure.

This report evaluates key vendors in the emerging application/system/infrastructure observability space and aims to equip IT decision makers with the information they need to select providers according to their specific needs. We analyze the vendors on a set of key criteria and evaluation metrics, which are described in depth in the “Key Criteria Report for Cloud Observability.”

How to Read this Report

The post GigaOm Radar for Cloud Observability appeared first on Gigaom.

Key Criteria for Evaluating Cloud Observability

Andy Thurai — Fri, 18 Dec 2020 22:50:23 +0000

The concept of observability has evolved over the years, referring to the ability to monitor internal states of systems using the externally exhibited characteristics of those systems. It provides the ability to predict the future behavior of those systems using data analysis and other technologies.

By monitoring what has already happened, enterprises can reactively fix the issues. Observability, in contrast, helps predict issues before they surface, thus helping to build a proactive enterprise. Ultimately, by automating observability, it’s possible to build hyper-automated, self-healing enterprises that fully understand what’s happening within the systems under management, and to predict and respond to likely outcomes.

Observability is important in the management and monitoring of modern systems, especially because modern applications are built to be quick and agile. The widespread practice of bolting on monitoring and management after applications are deployed is no longer sufficient. Thus, the notion of observability includes the implementation of modern instrumentation designed to help users better understand the properties and behaviors of an application.

It should be noted that monitoring and observability are not the same thing. Monitoring involves the passive consumption of information from systems under management. It uses dashboards to display the state of systems, and is generally purpose-built for static systems or platforms that don’t change much over time.

In contrast, observability takes a proactive approach to revealing system attributes and behavior, asking questions based on a system’s internal data. The technology is purpose-built for dynamic systems and platforms with widely changing complexity. An important feature of observability is the ability to analyze “what if” scenarios and trending applications to predict a possible future failure (See Figure 1).

Figure 1: Observability Is Built on Monitoring, Analysis, and AI-Enabled Learning Systems

Why is Cloud Observability So Important Now?

The insight that observability grants is particularly important because the way new applications are built and architected has changed over the years:

Cloud-based applications depend on multiple services, often via RESTful APIs and microservices.
Serverless and service-based applications have little visibility into underlying infrastructure.
Applications are highly distributed, typically based on containers and microservices, with the ability to scale up and down in seconds.
Hybrid and multi-cloud architectures that depend on multiple providers, in-house and external, are exceedingly complex and difficult to track.

Most monitoring and operations tools, including AIOps tools, claim some role in observability, but in some respects it’s like slapping a new label on an old nostrum. However, if you consider observability as a concept, not as a set of tools and technologies, you get a better idea of the value. For instance, observability means using analytical views that take raw data, provide a holistic view, find patterns in the data, make calculated predictions, and react to those predictions using automated and non-automated responses.

Observability should provide a full view of the enterprise, including on-premises, multi-cloud and hybrid cloud environments. The successful use of appropriate tools can help eliminate the cloud complexity issues that hinder some cloud migrations and new development.

The Emerging Observability Sector

One of the more important topics covered in this Key Criteria report and in the GigaOm Radar for Cloud Observability report is how to sift through the technologies that claim to have observability nailed. At present, the providers generally take different approaches, and while you can indeed find common patterns, it’s still difficult to compare apples to apples.

More tools are purpose-built for observability today, including most of the emerging AIOps toolsets, as well as other monitoring technologies like application performance monitoring (APM), network performance monitoring and diagnostics (NPMD), digital experience monitoring (DEM), and infrastructure monitoring, which have the ability to deal with complex operational data.

However, while all of these technologies claim to be built for observability, most are repurposed monitoring and management tools, the original use of which involved anything to do with bringing operational data to a single view. Therefore, we have a few ways to consider the observability market, including:

Traditional Monitoring Tools: These have been upgraded to provide modern services, such as storage and analysis, plus machine learning to support the notion of observability. These tools may be decades old but, generally speaking, have the advantage of providing better legacy systems integration.

Purpose-Built Observability Tools: These are typically focused on the acquisition and analysis of system data to report and respond to observations. They are generally data-first tools that may or may not have automated responses to analytical findings. Those without automation may expose an API that can be leveraged programmatically to deal with automation needs.

Hybrid Observability Tools: These are a mix of traditional tooling and more modern purpose-built observability tools, typically a combination of two or more technologies, created either through an acquisition or a strategic partnership. The risk with these is that you’re dealing with two or more tools that have their own independent paths and business interests and you may not be able to count on their continued functionality.

The logical features and functions found in most observability tools are depicted in Figure 2. Such tools need a place to store logs and other data for trending, correlation, and analysis. They need an analytics engine to determine patterns and trends. They typically include machine learning and AI systems to provide deeper knowledge management, as well as the ability to get smarter as they find and solve problems. Finally, they typically perform automation through an event processing system, which could be as simple as an API or as complex as a full-blown development environment that supports event orchestration.

Figure 2: Elements of an Observability System

Cloud Observability Findings

This report covers various aspects of observability, including how to plan, how to pick the right technologies, how CloudOps works in multi-cloud and hybrid cloud environments, and what operations and tooling you will need to be successful. Among the conclusions reached in this report are:

Observability is an advanced concept, and cloud observability tools are often not deployed correctly. Moreover, the skills needed for planning, building of the models, gathering system data, and proper deployment are frequently unavailable, all but ensuring failure on the part of many organizations.

Those leveraging observability tools typically don’t employ them to their maximum potential, especially with regard to using machine learning subsystems effectively. This limits the potential benefits an organization can accrue in terms of staying ahead of systems issues.

When leveraged correctly, observability tools potentially can increase systems reliability 100-fold because issues are automatically discovered, analyzed, and corrected—without human intervention. Moreover, the system is able to learn from successes, so system reliability improves over time. Proper observability tools and optimized IT processes can reduce the mean time to resolution (MTTR) metric between 50% to 90%, an important KPI to measure with modern “always on” systems.

The cloud observability market includes dozens of tools that claim to support the concept, but often these tools have very different primary missions, such as network, application performance, logging, or general operational monitoring. The lack of any real feature/functionality standards is causing many enterprises to wait for the market to normalize. The Open Telemetry initiative by the Cloud Native Computing Foundation (CNCF) is one such effort that is gaining traction. Most vendors in our report seem to have adopted or in the process of adopting it.

The push to move systems quickly to the cloud increases both complexity and risk, making observability more critical. This will likely persist for the next several years.

Organizations change over time, and their systems need to be able to change as well. Thus, extendable and configurable tools are a huge advantage.

The post Key Criteria for Evaluating Cloud Observability appeared first on Gigaom.