Ron Williams, Author at Gigaom

Chaos Theory and Observability

Ron Williams — Tue, 26 Mar 2024 17:17:44 +0000

Can observability deal with the IT chaos facing so many enterprises today? It’s a question worth digging into.

IT Chaos (Monitoring, Observability, and Intelligence)

IT chaos is a function of monitoring, observability, and intelligence. Yes, I added intelligence, but I’m not talking about artificial intelligence (AI)—yet. Just as monitoring has generated more data than humans can consume, observability can produce more observations than anyone can understand. The overload of observation information is particularly true when multiple observation tools come into play.

Machine learning can help, but the questions we want to answer are changing. Once, we wanted to know if services in a public cloud worked and how to merge that data with the on-premises noise. Now, the questions have changed to what to do about the observations. Automation allows restarting poorly performing items and expanding memory or computing power on demand, but you have to store the data somewhere, and storage is not free. Leading observability solutions now include real-time cost comparisons between cloud vendors. The best observability tools have financial operations (FinOps) abilities to find underused, overused, and abandoned resources in clouds (public or private).

Observability tooling has enough data to predict future states. Unfortunately, chaos theory does not help. Data at the element level does not exist at the observability level. Regression analysis, least-squares fits, and more complicated algorithms allow the prediction of chaos. The more data available, the more accurate the predictions, but storing data is costly. Vendors are addressing the issues with consumption-based licensing, lower-cost storage tiers, and other methods to deal with the wave of data needed for observability.

IT chaos will never end, but at least we can try to manage it. The new hope is generative AI (GenAI)—maybe.

Chaos, Observability, and Artificial Intelligence

The chaos function contains the steps from monitoring to observability to intelligence and requires new approaches to answer questions. Monitoring tells us the state of items, observability can create relationships and provide a meta view of the elements, and intelligent questions are possible with the help of GenAI.

Ask an observability tool when the next outage will occur, and you may get an answer. Ask it to automate a known failure mode, and it performs a perfect dance. Ask an observability tool if the enterprise is OK, and you get nothing. The question is beyond its capabilities. Observability tools as they exist today focus on IT, including developers in DevOps pipelines, operations management team members working to keep the lights on, and the newly coined (by my more than 40-year standard) system reliability engineers (SREs). Observability explains the data from monitoring.

Enter GenAI, the big rock in the pond creating its version of chaos. In chaos theory, a single element can tip an entire system over the edge. The math makes this abundantly clear (I’ll get to that in a moment). So, what happens next?

GenAI is already improving IT, from better chatbots to consuming all the data and providing remarkable insights. Yet GenAI is brand new and disruptive. Few observability vendors are using it to significant effect now, and a smaller number can predict the impacts in 24 to 26 months.

Observability can slow the devolution into chaos, pointing to a calmer IT environment with GenAI somewhere in the future. Actual intelligence for the enterprise comes when GenAI consumes data from every source in the company, allowing unthinkable questions and a future where the tsunami of GenAI-created change does not disrupt the company.

Chaos Theory: What Is It?

I’ve mentioned chaos theory a few times. Let’s look into what it is. Chaos theory is a popular trope that allows writers to invent seemingly impossible situations the protagonists must overcome or to base an entire story concept on moving a single item. If any large-scale, easily conceived system can be said to embody chaos, then information technology stands out. Chaos is the normal state of IT, particularly in large enterprises. I’m going to lay out the math for you.

Hold on. Why am I writing about mathematics in an IT blog?

I’m a physicist, and though I’ve been doing IT for over 40 years, I rely on my education for even the most mundane things. Observability and chaos theory are related—the how and why are essential when we look at the entire enterprise. I could have used entropy, but chaos theory is sexier and closer to the reality of an IT ecosystem. Now, to the esoteric math discussion.

Chaos theory has equations that help mathematicians and physicists analyze the systems under study. In 1975, Robert May created a model to demonstrate the chaotic behavior of dynamic systems. I have modified May’s model for incidents:

I_n+1 = r • I_n • (1 – I_n)

- I_n
  - The proportion of the system’s capacity affected by incidents at a given time includes the number of incidents, severity, or the total impact on the system, with the value ranging from zero (no impact) to one (full impact or system-wide failure).
  - In a perfect world, this is always zero, but this is about IT, where the value is never zero. Oh, but we do try hard. NASA has some of the best methods and processes anywhere, but the first place they looked after the Challenger explosion was the range safety code, which can blow up the shuttle. It was deemed perfect after a multimillion-dollar, line-by-line examination.
- r
  - This represents the rate of incident generation and resolution, influenced by factors such as system complexity, change frequency, and the effectiveness of incident management processes. High values indicate a system where incidents are rapidly generated or poorly resolved, leading to a more chaotic system. Lower values suggest a stable system where incidents are effectively managed or are infrequent.
  - In another perfect world, perhaps in the multiverse, this would be equal to or less than one. In this same universe, pigs fly, and nothing ever breaks. I’m sure other strange things happen in this utopia to take the shine off the whole perfection thing.

In another version of Earth, I can simulate every IT element to identify systems and processes on the precipice of chaos and magically heal them. IT does not create dinosaurs, except in the form of mainframe computers running COBOL.

OK, that isn’t happening, but I can monitor all those elements and gather state information (on or off), metrics (memory usage, CPU performance), and more. Then I can send all that information to a team to determine the system’s chaos level and respond accordingly.

Oops, BAM! We have another data glut (monitoring often accounts for 25% of network traffic in a large enterprise).

Observability strives to infer a system’s internal state from its external outputs. We have scads of data but no idea what it means. Observability tooling, whether specifically for public and private clouds, networks, storage, or applications, is a view into the chaos.

The Intersection of May’s Equation and Observability

May’s equation and observability intersect. Here’s how:

- - Understanding system behavior: Observability and May’s equation aim to enhance understanding of complex systems. Observability allows for real-time monitoring and knowledge of a system’s state based on outputs, while May’s equation shows how system behavior can change dramatically with slight parameter shifts.
  - Predictability and stability: May’s equation highlights the limits of predictability in complex systems due to their sensitivity to initial conditions. Observability, in contrast, is a tool for gaining insight into the system. It increases predictability by allowing for early detection of minor issues before they escalate into significant problems. Thus, the value of “r” above keeps our system from exploding into chaos.
  - Adapting to change: The logistic map in May’s equation shows how systems can transition from stable to chaotic regimes with a single parameter change. Observability provides the means to detect and respond to these transitions, offering a method to help manage and mitigate the risks of entering chaotic states.
  - Feedback loops: Observability can act as a feedback mechanism in complex IT systems, identifying when a system is approaching a chaotic regime. This feedback can inform adjustments to system parameters to maintain desired performance and stability levels.

Technology impacts us almost everywhere—doctor visits, the news, social media, refrigerators, and even our cars (including gas-powered vehicles). The change in a single parameter can bring a company to its knees. Ask AT&T about a simple configuration change that brought their entire network down. Look into how British Airways had to cancel hundreds of flights because a software component failed after a simple change.

IT systems are always on the precipice of chaos. Observability tools are one way to examine every IT enterprise’s chaotic state.

Next Steps

To learn more, take a look at GigaOm’s cloud observability Key Criteria and Radar reports. These reports provide a comprehensive overview of the market, outline the criteria you’ll want to consider in a purchase decision, and evaluate how a number of vendors perform against those decision criteria.

- - GigaOm Key Criteria for Evaluating Cloud Observability Solutions
  - GigaOm Radar for Cloud Observability

If you’re not yet a GigaOm subscriber, you can access the research using a free trial.

The post Chaos Theory and Observability appeared first on Gigaom.

GigaOm Radar for Cloud Observability

Ron Williams — Wed, 20 Mar 2024 15:00:42 +0000

Cloud observability is the process of gaining comprehensive insights into the performance, health, and state of cloud-based applications and infrastructure through monitoring, metrics, tracing, logging, and other telemetry data. It enables organizations to proactively detect, understand, and resolve issues to ensure optimal application performance and user experience.

Observability is one step in a larger operational intelligence workflow wherein organizations move from monitoring to observability to intelligence (Figure 1).

Monitoring can determine the states of various hardware or software resources. Observability enables the consolidation of these states to obtain meaning, estimate the impact on critical services, predict future states based on past observations, and automatically remediate known problems. Intelligence synthesizes both technical information and business data.

Figure 1. From Monitoring to Intelligence

Observability tools reduce the data overload and bring insight to the monitored data. These solutions leverage application performance management (APM), service orchestration and automation, Kubernetes management, and cloud provider tooling (for public clouds) and apply machine learning (ML) capabilities and predictive analytics to filter the monitored data. The resulting information is targeted at IT operations and other technical personnel such as developers and systems managers. IT operations staff no longer have to be experts on the software and hardware that run the enterprise. With predictive analytics, IT resources can concentrate on what is failing and what is likely to have problems. Additionally, the use of OpenTelemetry gives enterprises a source for metrics, events, logs, and traces (MELT) that is vendor-agnostic.

Intelligence is the final step in this process—it reflects the operational state of the entire company. Intelligence builds on monitoring and observability and begins to deliver on the promise of artificial intelligence for IT operations (AIOps) by including data from the entire company—marketing, sales, legal, human resources, manufacturing data, and other sources.

The history of IT is filled with products and services deemed to be smart or intelligent— hardware and software initiatives, application solutions, databases, and others, often, however, not much more than marketing terms. In 2023, large language models (LLMs) exploded—spearheaded by OpenAI and ChatGPT—creating the beginning of actual intelligent software. This new ability is finding its way into IT solutions and complicates the definition of observability versus intelligence.

The focus of this analysis will be on observability within cloud environments, including multiple public cloud offerings, private clouds, and any combination of cloud and on-premises operations. LLM-driven capabilities will be considered from a “this is new” perspective and an understanding that LLM abilities are inconsistent across all vendors.

This is our fourth year evaluating the cloud observability space in the context of our Key Criteria and Radar reports. This report builds on previous analysis and considers how the market has evolved over the last year.

This GigaOm Radar report examines 21 of the top cloud observability solutions and compares offerings against the capabilities (table stakes, key features, and emerging features) and nonfunctional requirements (business criteria) outlined in the companion Key Criteria report. Together, these reports provide an overview of the market, identify leading cloud observability offerings, and help decision-makers evaluate these solutions so they can make a more informed investment decision.

GIGAOM KEY CRITERIA AND RADAR REPORTS

The GigaOm Key Criteria report provides a detailed decision framework for IT and executive leadership assessing enterprise technologies. Each report defines relevant functional and nonfunctional aspects of solutions in a sector. The Key Criteria report informs the GigaOm Radar report, which provides a forward-looking assessment of vendor solutions in the sector.

The post GigaOm Radar for Cloud Observability appeared first on Gigaom.

GigaOm Key Criteria for Evaluating Cloud Observability Solutions

Ron Williams — Thu, 07 Mar 2024 19:12:52 +0000

Observability aims to monitor and proactively respond to problems in IT infrastructure—or an entire business—by analyzing the massive amounts of data generated by an enterprise. Modern IT organizations instrument and monitor everything, including applications, network access, on-premises and cloud storage, and Kubernetes clusters. The volume of data is far too large for practical human consumption and analysis so companies increasingly rely on observability solutions to capture, analyze, and derive actionable intelligence from all this data.

Specifically looking at cloud observability, monitoring involves data that is created by a wide variety of applications and infrastructure that may reside in—and operate across—multiple public cloud, private cloud, and on-premises networks. Enterprises cannot understand the topology of a critical application or service from monitoring data alone. Cloud computing services, microservices, and remote infrastructure are often distributed between both public and private clouds. Logging is one of the traditional tools for “measuring” the health of a system, but with modern infrastructure scale and complexity, logs often provide only a glut of contextless data. The instrumentation data returned from systems may be in different formats or time signatures, with little context and no chance of timely human correlation and deduplication.

A cloud observability solution gives modern IT organizations more complete, efficient, and actionable visibility across their diverse cloud-based infrastructure. It enables finding answers to critical questions regarding what systems exist, where they are, what they’re doing, who is using them, and how they’re operating from the massive amount of data available for review by IT operations.

Business Imperative
Public and private clouds have become a critical part of businesses regardless of size. The complexity of cloud-based solutions makes them difficult to understand and control, and simple monitoring solutions leave IT teams exposed to a mounting glut of data to sift through.

Cloud observability allows the company to answer critical oversight questions about these systems that monitoring alone cannot reveal. In Figure 1, you can see that monitoring asks questions concerning the status of a device or individual software component. Observability, regardless of tooling, answers more complicated questions.

Figure 1. From Monitoring to Intelligence

Observability uses monitoring data to understand what is happening, what the data means, and information on how to either manually fix incidents or—better still—what can be done to automatically and proactively identify and remediate issues. Observability consumes the data from monitoring and turns it into useful or actionable information for IT operations or other technical specialists with responsibility for software development, infrastructure, networking, or security.

Sector Adoption Score
To help executives and decision-makers assess the potential impact and value of a cloud observability solution deployment to the business, this GigaOm Key Criteria report provides a structured assessment of the sector across five factors: benefit, maturity, urgency, impact, and effort. By scoring each factor based on how strongly it compels or deters adoption of a cloud observability solution, we provide an overall Sector Adoption Score (Figure 2) of 4.6 out of 5, with 5 indicating the strongest possible recommendation to adopt. This indicates that a cloud observability solution is a credible candidate for deployment and worthy of thoughtful consideration.

The factors contributing to the Sector Adoption Score for cloud observability are explained in more detail in the Sector Brief section that follows.

Key Criteria for Evaluating Cloud Observability Solutions

Sector Adoption Score

Deters
Adoption

Discourages
Adoption

Merits
Consideration

Encourages
Adoption

Compels
Adoption

Figure 2. Sector Adoption Score for Cloud Observability

This is the fourth year that GigaOm has reported on the cloud observability space in the context of our Key Criteria and Radar reports. This report builds on our previous analysis and considers how the market has evolved over the last year.

This GigaOm Key Criteria report highlights the capabilities (table stakes, key features, and emerging features) and nonfunctional requirements (business criteria) for selecting an effective cloud observability solution. The companion GigaOm Radar report identifies vendors and products that excel in those decision criteria. Together, these reports provide an overview of the market, identify leading cloud observability offerings, and help decision-makers evaluate these solutions to make more informed investment decisions.

GIGAOM KEY CRITERIA AND RADAR REPORTS

The post GigaOm Key Criteria for Evaluating Cloud Observability Solutions appeared first on Gigaom.

Key Criteria for Evaluating Incident Response Platforms (IRPs)

Ron Williams — Tue, 28 Nov 2023 16:53:06 +0000

Incident response platforms (IRP) are specialized tools designed to streamline the detection, management, and resolution of incidents within IT systems and operations. By consolidating alerts, facilitating real-time communication, automating escalation processes, and integrating with monitoring and collaboration tools, IRPs enable organizations to address and mitigate issues rapidly, ensuring minimal service disruption and optimizing system uptime. As the core of IT operations, they are pivotal in maintaining system health, availability, and performance.

Scheduling and escalation are the backbone of IRP solutions. Incidents must be assigned to a resource, either an individual or a team. Which resources are available when an incident is prioritized determines how quickly the problem is resolved–or in the case of major incidents, when business activities may resume. Ideally, managing resource schedules is a simple process. However, individuals have paid time off, become ill, or have personal emergencies that could interfere with workload and timely response to incidents. The higher the priority of the incident and the greater the impact to the business, the more critical the ability to reassign an incident to another resource or escalate to a team lead, line manager, or supervisor.

The evolution in IRPs has been measured, with incremental feature additions and user interface improvements. Machine learning (ML) and artificial intelligence have entered the IRP space, adding functionality that allows auto-remediation of known events, consolidation of related incidents, and creation of reports at the conclusion of incidents, all of which give operations teams insights to improve the recurrence of the same or related issues.

A new generation of collaboration tools such as Zoom, Slack, and Microsoft Teams, among others, has improved group communication, alerting, and responsiveness for many teams. In many cases, these tools provide prebuilt, out-of-the-box integration features–and custom integration via APIs or other means–with a wide variety of commonly used applications and platforms. When reviewing IRPs, it’s important to understand the organization’s existing use of communications infrastructure–it’s often deeply ingrained in workflows and culture. A solution requiring new collaboration tools or skills may slow implementation and compliance.

This is the second year that GigaOm has reported on the incident response space in the context of our Key Criteria and Radar reports. This report builds on our previous analysis and considers how the market has evolved over the past year.

This GigaOm Key Criteria report highlights the capabilities (table stakes, key criteria, and emerging technologies) and non-functional requirements (evaluation metrics) for selecting an effective IRP solution. The companion GigaOm Radar report identifies vendors and products that excel in those capabilities and metrics. Together, these reports provide an overview of the category and its underlying technology, identify leading incident response offerings, and help decision-makers evaluate these solutions so they can make a more informed investment decision.

How to Read this Report

This GigaOm report is one of a series of documents that helps IT organizations assess competing solutions in the context of well-defined features and criteria. For a fuller understanding, consider reviewing the following reports:

Key Criteria report: A detailed market sector analysis that assesses the impact that key product features and criteria have on top-line solution characteristics—such as scalability, performance, and TCO—that drive purchase decisions.

GigaOm Radar report: A forward-looking analysis that plots the relative value and progression of vendor solutions along multiple axes based on strategy and execution. The Radar report includes a breakdown of each vendor’s offering in the sector.

The post Key Criteria for Evaluating Incident Response Platforms (IRPs) appeared first on Gigaom.

GigaOm Radar for Incident Response Platforms

Ron Williams — Fri, 24 Nov 2023 16:00:18 +0000

Security incidents can originate from any place in the enterprise. Incident response platforms enable organizations to detect, respond to, and recover from them, centralizing the complex process of notifying the correct resources in a timely manner, ensuring they are where they need to be, and providing those resources with the tools necessary to shorten the time to remediation.

IT service management (ITSM) solutions often contain many of the incidents, but do not handle triage, notifications, and escalations. Incident response solutions provide this added layer of value. Additionally, they handle scheduling, so they are more likely to find a resource more quickly than using phone calls and emails. Collaboration tools such as Slack, Microsoft Teams, and Zoom can be integrated into the incident response solution to facilitate faster and more efficient communication.

Workflows can enhance incident response using runbooks, automation, and orchestration to allow resources to accomplish more in a shorter time span. Incident response platforms can also help post incident reviews, thereby providing feedback to the workflows and improving them for future incidents.

Still, it is the nature of technology environments that incidents happen again and again. The ability to do a proper post-mortem analysis to learn from an incident is key to an effective incident response platform. Before making a selection, decision-makers must ensure that a solution under consideration has all the capabilities necessary to enable such a process.

The incident response process is shown in Figure 1 below. Incidents must be identified and may be reported from any source including observability data, AIOps analysis, the service desk, or network management. Response includes notification, support, and if necessary, escalation to the appropriate resources. The intent of the response, whether human or digital, is to resolve the incident in the shortest amount of time. Resolution of the issue may be handled by an individual or automatically from a defined workflow. Once the incident is resolved and normal operations are restored, the ability to analyze the incident, learn from it, and update workflows or runbooks becomes important to improve responses to future incidents.

Figure 1. Incident Response Process

This GigaOm Radar report highlights key incident response platform vendors and equips IT decision-makers with the information needed to select the best fit for their business and use case requirements. In the corresponding GigaOm report “Key Criteria for Evaluating Incident Response Platforms,” we describe in more detail the key features and metrics that are used to evaluate vendors in this market.

All solutions included in this Radar report meet the following table stakes—capabilities widely adopted and well implemented in the sector:

On-call management and scheduling
Incident response management
Alert management
Workflow management
Reporting and analytics
SLA management
User management

How to Read this Report

The post GigaOm Radar for Incident Response Platforms appeared first on Gigaom.

Key Criteria for Evaluating Application Performance Management (APM) Solutions

Ron Williams — Tue, 07 Nov 2023 14:35:43 +0000

Today, applications exist in on-premises infrastructure, public clouds, private clouds, SaaS deployments, and hybrids of these. They’re also increasingly separated into parts (services or microservices), with individual modules managed by different developers or support teams for each portion of the application.

The complexity of modern business computing environments places pressure on IT operations teams to ensure the applications and services that support the enterprise are running properly and places an additional load on developers who must handle the application code complexity and infrastructure use. With a growing emphasis on DevOps practices, developers and citizen programmers are now a part of the operations landscape and must have the tools and knowledge to participate.

The goal of application performance management (APM) is to ensure applications deliver optimal speed, reliability, scalability, and end-user experience to support key business services and objectives. APM solutions help organizations to ensure that applications and services remain performant and available, providing a tangible benefit to any business IT environment, regardless of size.

APM solutions provide end-to-end visibility into all components of the application delivery environment, from backend databases and servers to networking infrastructure, application code, and user locations and devices. This holistic view helps IT teams quickly identify, isolate, and resolve performance issues before they impact productivity or customers. Additionally, via low-code/no-code capabilities, APM solutions provide a collaborative environment for operations personnel, developers, and citizen programmers.

This is the second year that GigaOm has reported on the APM space in the context of our Key Criteria and Radar reports. This report builds on our previous analysis and considers how the market has evolved over the last few years.

This GigaOm Key Criteria report highlights the capabilities (table stakes, key criteria, and emerging technologies) and non-functional requirements (evaluation metrics) for selecting an effective APM solution. The companion GigaOm Radar report identifies vendors and products that excel in those capabilities and metrics. Together, these reports provide an overview of the category and its underlying technology, identify leading APM offerings, and help decision-makers evaluate these solutions so they can make a more informed investment decision.

How to Read this Report

The post Key Criteria for Evaluating Application Performance Management (APM) Solutions appeared first on Gigaom.

GigaOm Radar for Application Performance Management (APM)

Ron Williams — Fri, 20 Oct 2023 15:00:56 +0000

Bespoke applications are the backbone of technology-enabled businesses. However, the increasing complexity of modern applications places pressure on IT operations monitoring teams to ensure these applications are running properly. Today’s applications may execute in on-premises infrastructure or public or private cloud environments, with SaaS applications, in low-code/no-code environments, and in various combinations.

Moreover, with the additional emphasis on DevOps, developers are now a part of the operations landscape and must have the tools they need to participate. Developers must handle the application code complexity as well as the use of virtualized and as-code infrastructure. Application performance management (APM) tools and solutions are able to respond to this need, offering insights to both IT operations staff and development teams. These tools must handle a much more complex environment that now includes both operations personnel and developers as well as citizen programmers.

Current APM solutions often interact with the code via code injection or an application agent. These full-stack solutions provide detailed insights in the actual code responsible for generating metrics and other observations.

Many APM solutions have been folded into observability tools, moving the needle from monitoring to observability and eventually to awareness. Organizational awareness requires the monitoring and observability of IT systems and applications. The relationships among monitoring, observability, and awareness (MOA) are shown in Figure 1. MOA refers to the process by which data about operations (IT) and business (people and processes) is tracked and evaluated to enable a company to develop organizational awareness.

Monitoring provides the state of a single system (or service) and metrics about it (performance or a break/fix condition).
Observability looks at the state of multiple systems and asks additional questions about the health of these systems as a whole (such as why devices, systems, or applications are behaving a certain way within the context of IT).
Awareness brings together all information about the company to evaluate whether operations (IT) and business (people and processes) are performing in an acceptable way, what is likely to break next, and how to prevent problems before they are monitored or observed.

APM solutions are now situated squarely within the observability space, with some vendors moving toward awareness resulting from the addition of more analytical tooling using AI.

Figure 1. Monitoring, Observability, and Awareness Relationship

This GigaOm Radar report highlights key APM vendors and equips IT decision-makers with the information needed to select the best fit for their business and use case requirements. In the corresponding GigaOm report “Key Criteria for Evaluating APM Solutions,” we describe in more detail the capabilities and metrics that are used to evaluate vendors in this market.

This is our second year evaluating the APM space in the context of our Key Criteria and Radar reports. This is our second year evaluating the APM space in the context of our Key Criteria and Radar reports. This report builds on our previous analysis and considers how the market has evolved over the last year.

All solutions included in this Radar report meet the following table stakes—capabilities widely adopted and well implemented in the sector:

Application metrics and availability
Infrastructure discovery and monitoring
Transaction traceability and monitoring
User experience (UX) monitoring
Access controls

Inside the GigaOm Radar

The GigaOm Radar weighs each vendor’s execution, roadmap, and ability to innovate to plot solutions along two axes, each set as opposing pairs. On the Y axis, Maturity recognizes solution stability, strength of ecosystem, and a conservative stance, while Innovation highlights technical innovation and a more aggressive approach. On the X axis, Feature Play connotes a narrow focus on niche or cutting-edge functionality, while Platform Play displays a broader platform focus and commitment to a comprehensive feature set.

The closer to center a solution sits, the better its execution and value, with top performers occupying the inner Leaders circle. The centermost circle is almost always empty, reserved for highly mature and consolidated markets that lack space for further innovation.

The GigaOm Radar offers a forward-looking assessment, plotting the current and projected position of each solution over a 12- to 18-month window. Arrows indicate travel based on strategy and pace of innovation, with vendors designated as Forward Movers, Fast Movers, or Outperformers based on their rate of progression.

Note that the Radar excludes vendor market share as a metric. The focus is on forward-looking analysis that emphasizes the value of innovation and differentiation over incumbent market position.

The post GigaOm Radar for Application Performance Management (APM) appeared first on Gigaom.

GigaOm Radar for Patch Management

Ron Williams — Tue, 26 Sep 2023 15:00:15 +0000

Patch management is the process used to identify, acquire, verify, and install software and firmware patches to physical and virtual devices and the software systems that reside on them. No software is without flaws: bugs require fixing, and security vulnerabilities need to be mitigated or removed before bad actors can take advantage of them.

One key challenge with patch management is that infrastructure is generally complex: fixing one system or service may have ripple effects on others, so processes and mechanisms need to work across the architecture.

Patch management requires the proper tools, processes, and methods to minimize risks and should support the functionality of the underlying hardware or software. Patch characterization, prioritization, testing, implementation tracking, and verification are all part of robust patch management. AI/ML may facilitate these capabilities to lower risk and control vulnerabilities.

Security patching is particularly important in the current environment because the threat landscape is growing and will continue to do so for the foreseeable future. Good patch management practices require mitigation of direct risks—and the root causes responsible for cyber events. The inclusion of security evaluation for compliance is also becoming a prerequisite for patch management solutions.

Most organizations patch laptops, desktops, and servers in an ad hoc manner, so a patch management solution can add rigor and consistency to what may already be in place. When searching for a solution, companies should evaluate the coverage that already exists in their organization, so they can assess and address any gaps.

This GigaOm Radar report highlights key patch management vendors and equips IT decision-makers with the information needed to select the best fit for their business and use case requirements. In the corresponding GigaOm report “Key Criteria for Evaluating Patch Management Solutions,” we describe in more detail the capabilities and metrics that are used to evaluate vendors in this market.

This is our second year evaluating the patch management space in the context of our Key Criteria and Radar reports. All solutions included in this Radar report meet the following table stakes—capabilities widely adopted and well implemented in the sector:

Patch identification and management
Collaboration
Reporting
Auditing

How to Read this Report

The post GigaOm Radar for Patch Management appeared first on Gigaom.

Key Criteria for Evaluating Patch Management Solutions

Ron Williams — Fri, 08 Sep 2023 01:08:46 +0000

Patch management is a process by which organizations can automatically detect, deploy, and report on software patches across the enterprise. Patch management solutions ensure systems and software stay up to date by installing software updates, service packs, and hotfixes to reduce security risks, improve system performance, and avoid downtime.

Good patch management practices in the current global environment require identifying and mitigating the root causes responsible for cyberattacks. Patch management also requires the proper tools, processes, and methods to minimize security risks and support the functionality of the underlying hardware or software. Patch prioritization, testing, implementation tracking, and verification are all part of robust patch management.

While patch management primarily addresses security vulnerabilities, patch management and security operations (SecOps) are typically performed in different organizational environments. SecOps is concerned about risk, compliance, and security. ITOps has similar objectives but focuses more on risk and security in terms of reducing vulnerability rather than ensuring compliance. These are similar goals; however, in practice, ITOps is concerned with installing patches and keeping software up-to-date, whereas SecOps focuses more broadly on the entire enterprise. This bifurcation can lead to issues, notably delays between security problems being found by SecOps and the communication of the issues to ITOps. Shortening the length of exposure before a given security patch is applied is a key factor in protecting the organization. Patch management solutions that bridge the gap between SecOps and ITOps can improve the security of the enterprise, though this may be costly.

This is the second year that GigaOm has reported on the patch management space in the context of our Key Criteria and Radar reports. This report builds on our previous analysis and considers how the market has evolved over the last year.

This GigaOm Key Criteria report details the capabilities (table stakes, key criteria, and emerging technologies) and non-functional requirements (evaluation metrics) for selecting an effective patch management solution. The companion GigaOm Radar report identifies vendors and products that excel in those capabilities and metrics. Together, these reports provide an overview of the category and its underlying technology, identify leading patch management offerings, and help decision-makers evaluate these solutions so they can make a more informed investment decision.

How to Read this Report

The post Key Criteria for Evaluating Patch Management Solutions appeared first on Gigaom.

GigaOm Radar for AIOps

Ron Williams — Fri, 09 Jun 2023 18:36:49 +0000

Artificial intelligence for IT operations (AIOps) encompasses the technologies that automate, identify, and resolve IT issues. Additionally, it can predict and automatically resolve concerns before they become problems.

The infrastructure, services, and applications in an enterprise produce different types of data, including metrics, performance data, and log data. As a key component in the development of operational and organizational awareness, AIOps combines data and information from across the enterprise to improve root cause analysis, prediction, and automatic response, reducing time to resolution (MTTR) and producing better enterprise-level outcomes. Moreover, the integration of business intelligence (BI) data enables AIOps to answer questions about the state of the entire business.

By providing a link between IT operations and business operations, an AIOps solution helps answer questions about the state of the entire organization (Figure 1).

Figure 1. Organizational Awareness is Built on Operational Awareness and Business Awareness

Organizational awareness requires the monitoring and observability of IT systems and operations. The relationships among monitoring, observability, and awareness (MOA) are shown in Figure 2. MOA refers to a process by which data about operations (IT) and business (people and processes) is tracked and evaluated to enable a company to develop organizational awareness.

Monitoring provides the state of that single system and metrics about it (performance or a break/fix condition).
Observability combines the state of multiple systems and asks additional questions about the health of these systems as a whole (such as why devices, systems, or applications are behaving a certain way within the context of IT).
Awareness brings together all information about the company to evaluate whether operations (IT) and business (people and processes) are OK, what is likely to break next, and how to prevent problems before they are monitored or observed.

AIOps solutions perform MOA functions—either internally or by ingestion of data from other tools—and as such, this becomes a critical consideration when evaluating AIOps solutions.

Figure 2. Monitoring, Observability, and Awareness Relationship

Vendor selection should be based on the needs of the organization, and the process involves more than just evaluating the technical capabilities of solutions. AIOps can be disruptive to organizations, and if the IT operations team lacks the technical and functional ability to move with changing business strategies, the political capital that may be required for successful implementation of an AIOps solution can be substantial.

With domain-specific tools, existing IT tools for monitoring and observability may have to be replaced or else the organization may have to live with redundant data gathering or analysis without retiring the displaced tools. However, if the displaced tools are retired, a single vendor can provide a solution that spans from monitoring to awareness.

In contrast, domain-agnostic solutions can be layered over existing tools, does not make them redundant, and requires less political expenditure. The challenge then becomes obtaining data from the various silos within the enterprise. Friction may be lower, but the time to value may be longer.

This GigaOm Radar report highlights key AIOps vendors and equips IT decision-makers with the information needed to select the best fit for their business and use case requirements. In the accompanying GigaOm report, “Key Criteria for Evaluating AIOps Solutions,” we describe in more detail the capabilities and metrics that are used to evaluate vendors in this market.

How to Read this Report

The post GigaOm Radar for AIOps appeared first on Gigaom.

Ron Williams, Author at Gigaom

Chaos Theory and Observability

IT Chaos (Monitoring, Observability, and Intelligence)

Chaos, Observability, and Artificial Intelligence

Chaos Theory: What Is It?

The Intersection of May’s Equation and Observability

Next Steps

GigaOm Radar for Cloud Observability

GIGAOM KEY CRITERIA AND RADAR REPORTS

GigaOm Key Criteria for Evaluating Cloud Observability Solutions

Key Criteria for Evaluating Cloud Observability Solutions

Sector Adoption Score

DetersAdoption

DiscouragesAdoption

MeritsConsideration

EncouragesAdoption

CompelsAdoption

GIGAOM KEY CRITERIA AND RADAR REPORTS

Key Criteria for Evaluating Incident Response Platforms (IRPs)

How to Read this Report

GigaOm Radar for Incident Response Platforms

How to Read this Report

Key Criteria for Evaluating Application Performance Management (APM) Solutions

How to Read this Report

GigaOm Radar for Application Performance Management (APM)

Inside the GigaOm Radar

GigaOm Radar for Patch Management

How to Read this Report

Key Criteria for Evaluating Patch Management Solutions

How to Read this Report

GigaOm Radar for AIOps

How to Read this Report

Deters
Adoption

Discourages
Adoption

Merits
Consideration

Encourages
Adoption

Compels
Adoption