Digital Edition

SYS-CON.TV
Web Service Monitoring 101: Bad Deployments | @DevOpsSummit [#DevOps]
We recently moved some of our systems between two of our data centers – even moving some components to the public cloud

Web Service Monitoring 101: Identifying Bad Deployments

Have you ever deployed a change to production and thought "All went well - Systems are operating as expected!" but then you had to deal with users complaining that they keep running into errors?

When deployments fail you don't want your users to be the first to tell you about it: Sit down with the Business and Dev to define how and what to monitor

We recently moved some of our systems between two of our data centers - even moving some components to the public cloud. Everything was prepared well, system monitoring was set up and everyone gave the thumbs up to execute the move. Immediately following, our Operations dashboards continued to show green. Soon thereafter I received a complaint from a colleague who reported that he couldn't use one of the migrated services (our free dynaTrace AJAX Edition) anymore as the authentication web service seemed to fail. The questions we asked ourselves were:

  • Impact: Was this a problem related to his account only or did it impact more users?
  • Root Cause: What is the root cause and how was this problem introduced?
  • Alerting: Why don't our Ops monitoring dashboards show any failed web service calls?

It turned out that the problem was in fact

  • Caused by an outdated configuration file deployment
  • It only impacted employees whose accounts were handled by a different authentication back-end service
  • Didn't show up in Ops dashboards because the used SOAP Framework always return HTTP 200 transporting any success/failure information in the response body which doesn't show up in any web server log file

In this blog I give you a little more insight on how we triaged the problem and some best practices we derived from that incident in order to level-up technical implementations and production monitoring. Only if you monitor all your system components and correlate the results with deployment tasks will you be able to deploy with more confidence without disrupting your business.

Bad Monitoring: When Your End Users become your Alerting System
So - when I got a note from a colleague that he could no longer use dynaTrace AJAX Edition to analyze the web site performance of a particular web site, I launched my copy to verify this behavior. It failed with my credentials, which proved that it was not a local problem on my colleague's machine:

Business Problem: Our end users can't use our free product due to failing authentication service

Asking our Ops Team that manages and monitors these web services resulted in the following response:

"We do not see any errors on the Web Server nor do we have any reported availability problems on our authentication service. It's all green on our infrastructure dashboards as can be seen on the following screenshot:"

Infrastructure is all green: No HTTP-based errors or SLA problems based on IIS log or on any of the resources on the host

Web Server Log Monitoring Is Not Enough
As mentioned in the initial paragraph, it turned out that our SOAP Framework always returns HTTP 200 with the actual error in the response body. This is not an uncommon "Best (or worst) Practice" as you can see for instance on the following discussion on GitHub.

The problem with that approach though is that "traditional" operations monitoring based on web server log files will not detect any of these "logical/business" problems. As you don't want to wait until your users start complaining, it's time to level-up your monitoring approach. How can this be done? Those developing and those monitoring the system need to sit down and figure out a way how to monitor the usage of these services and need to talk with business to figure out which level of detail to report and alert on.

How can you find out if your current monitoring approach works? Start by looking more closely at problems reported by your users but that you don't get any automatic alerts on. Then, talk with engineers and see whether they use frameworks like mentioned here.

For further insight, and for lessons learned, click here for the full article.

About Andreas Grabner
Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1



ADS BY GOOGLE
Subscribe to the World's Most Powerful Newsletters

ADS BY GOOGLE

The standardization of container runtimes and images has sparked the creation of an almost overwhelm...
SYS-CON Events announced today that DatacenterDynamics has been named “Media Sponsor” of SYS-CON's 1...
Most DevOps journeys involve several phases of maturity. Research shows that the inflection point wh...
Dynatrace is an application performance management software company with products for the informatio...
Today, we have more data to manage than ever. We also have better algorithms that help us access our...
Andi Mann, Chief Technology Advocate at Splunk, is an accomplished digital business executive with e...
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held Novemb...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: D...
DXWorldEXPO LLC announced today that ICOHOLDER named "Media Sponsor" of Miami Blockchain Event by Fi...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22...
SYS-CON Events announced today that IoT Global Network has been named “Media Sponsor” of SYS-CON's @...
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitori...
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news an...
CloudEXPO New York 2018, colocated with DXWorldEXPO New York 2018 will be held November 11-13, 2018,...
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing w...
Disruption, Innovation, Artificial Intelligence and Machine Learning, Leadership and Management hear...
"We host and fully manage cloud data services, whether we store, the data, move the data, or run ana...
Enterprises are striving to become digital businesses for differentiated innovation and customer-cen...
DXWorldEXPO LLC announced today that Telecom Reseller has been named "Media Sponsor" of CloudEXPO | ...
Enterprise architects are increasingly adopting multi-cloud strategies as they seek to utilize exist...