Digital Edition

SYS-CON.TV
What If It Is the Network? Dive Deep to Find the Root Cause
How do you identify the real cause behind the network problems?

Modern Application Performance Management (APM) solutions can be tremendously helpful in delivering end-to-end visibility into the application delivery chain: across all tiers and network sections, all the way to the end user. In previous blog posts we showed how to narrow down to various root causes of the problems that the end users might experience. Those issues ranged from infrastructure through application and network, and through the end-user client application or inefficient use of the software. When the problem comes from the end user application, e.g., a Web 2.0 Web site, user experience management (UEM) solutions can offer broad analysis of possible root causes. Similarly, when an APM fault domain algorithm points to the application, the DevOps team can go deep into the actually executed code and database queries to identify the root cause of the problem.

But what do you do when your APM tool points to the network as the fault domain? How do you identify the real cause behind the network problems? Most of the APM tools stop there, forcing the network team to use separate solutions to monitor the actual network packets.

In this article we show how an Application-Aware Network Performance Management (AANPM) suite can be used to not only zero in on the network problems as the fault domain, but also dive deeper to show the actual trace of network packets in the selected context, captured back at the time when the problem happened.

Isolating Fault Domain to the Network
In one of our blog posts we wrote how Fonterra used our APM tools to identify the problem with SAP application used in the milk churn scanning process. The operations team could easily isolate the fault domain to network problems (see Figure 1); they required, however, further analysis to identify the root cause behind that network problem.

Figure 1: The performance report indicates network problems as the fault domain

In some cases information about loss rate or zero window events is enough to successfully find and resolve the problem. In general, finding the root cause may require you to analyze more detailed, packet level views in order to see exactly what is causing this network performance problem. These details can not only help to determine why we experienced packet loss or zero window events, but also whether the problem was gradually ramping up or if there was a sudden flow control blockage, which would indicate congestion.

For example, a number of users start to experience performance degradation of the service and APM points to the network as the fault domain. The detailed, packet-level analysis can show that the whole service delivery process was blocked by failed initial name resolution.

What Really Happened in the Network?
Why is detailed packet-level analysis so important when our AANPM points to the network?

Let's first consider what happens when we determine fault domain with one of the application delivery tiers. The engineers responsible for that application can start analyzing logs or, better, drill down to single transaction execution steps and often isolate the problem to the actual line of code that was causing the whole performance degradation of the whole application.

However, when our AANPM tells us it is the network, there are no logs or code execution steps to drill down to. Unless we can deliver conclusive and actionable evidence in the form of detailed, packet-level analysis, the network team might have a problem determining the root cause and may remain skeptical whether the network is at fault at all.

This is exactly what happened to one of our customers. An APM solution had correctly identified that there was a performance problem with the web server. The reports showed who was affected and where the users affected by that problem were located when the problem was occurring. The system also pointed toward the network as the primary fault domain.

The network team tried to determine the root cause of the problem. They needed packet level data for that. But, capturing all traffic with a network protocol analyzer after the incident happened not only overloaded the IT team with unnecessary data, but eventually turned out to be a hit and miss.

What the team needed were the network packets at the time the problem occurred, and only those few packets that related to the actual communication realizing affected transactions.

Figure 2: You can drill down to analyze captured network packets in the context of given user operations

For Figure 3, and further insight, click here for the full article.

About Sebastian Kruk
Sebastian Kruk is a Technical Product Strategist, Center of Excellence, at Compuware APM Business Unit.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1



ADS BY GOOGLE
Subscribe to the World's Most Powerful Newsletters

ADS BY GOOGLE

Using new techniques of information modeling, indexing, and processing, new cloud-based systems can ...
A valuable conference experience generates new contacts, sales leads, potential strategic partners a...
Containers and Kubernetes allow for code portability across on-premise VMs, bare metal, or multiple ...
We are seeing a major migration of enterprises applications to the cloud. As cloud and business use ...
SYS-CON Events announced today that Silicon India has been named “Media Sponsor” of SYS-CON's 21st I...
DXWorldEXPO LLC announced today that "IoT Now" was named media sponsor of CloudEXPO | DXWorldEXPO 20...
In this presentation, you will learn first hand what works and what doesn't while architecting and d...
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22n...
Everyone wants the rainbow - reduced IT costs, scalability, continuity, flexibility, manageability, ...
Founded in 2000, Chetu Inc. is a global provider of customized software development solutions and IT...
SYS-CON Events announced today that DatacenterDynamics has been named “Media Sponsor” of SYS-CON's 1...
DXWorldEXPO LLC announced today that All in Mobile, a mobile app development company from Poland, wi...
Most DevOps journeys involve several phases of maturity. Research shows that the inflection point wh...
Andi Mann, Chief Technology Advocate at Splunk, is an accomplished digital business executive with e...
DXWorldEXPO LLC announced today that ICOHOLDER named "Media Sponsor" of Miami Blockchain Event by Fi...
Today, we have more data to manage than ever. We also have better algorithms that help us access our...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: D...
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held Novemb...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, ...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22...