Most Read This Week
From the Blogosphere
The Dos and Don’ts of SLA Management | @DevOpsSummit #DevOps #WebPerf #APM #Monitoring
SLAs can be very tricky to manage for a number of different reasons
By: Mehdi Daoudi
Nov. 29, 2017 02:00 PM
The Dos and Don'ts of SLA Management
The past few years have seen a huge increase in the amount of critical IT services that companies outsource to SaaS/IaaS/PaaS providers, be it security, storage, monitoring, or operations. Of course, along with any outsourcing to a service provider comes a Service Level Agreement (SLA) to ensure that the vendor is held financially responsible for any lapses in their service which affect the customer’s end users, and ultimately, their bottom line.
SLAs can be very tricky to manage for a number of different reasons: discrepancies over the time period being addressed, the source of the performance metrics, and the accuracy of the data can lead to legal disputes between vendor and customer. However, there are several things that both sides can do to get accurate and verifiable performance data as it pertains to their SLAs.
The first and most critical step is to define the parameters around which the data will be used; this includes the method of data collection (often an agreed-upon neutral third party), and the time and locations from which the performance will be measured. The first part of this is critical. If the vendor and the customer are using different monitoring tools to measure the Service Level Indicators (SLIs), then there will inevitably be disagreements on the validity of the data and whether the Service Level Objective (SLO) was reached or not.
Selecting that vendor depends a great deal on the number of users being served, and where they are located. For a company such as Flashtalking, an ad serving, measuring, and technology company delivering ad impressions throughout the US, Europe, and other international markets, the need for a monitoring tool which can accurately measure the performance and user experience in many different areas around the world is critical to their SLA management efforts.
Flashtalking agrees upon the external monitoring tool with every one of their clients as part of their SLAs, using Catchpoint as the unbiased third party due to the number of monitoring locations and the accuracy of the data. Their customers obviously want the most accurate view of the customer experience and the impressions garnered, so monitoring from as close to the end user as possible is the best way to achieve that. In that sense, the more locations from which to test the product, the more accurate the data from an end user’s perspective.
Those measurement locations should include backbone and last mile, as well as any cloud provider from which the ads are being served. This diversity of locations ensures that they will still have visibility and reporting capabilities should the cloud provider itself experience an outage; the backbone tests eliminate noise and are therefore the cleanest for validating the SLO, and the last mile tests best replicate the end user experience./p>
Once the SLA and its parameters are agreed upon by both sides, each one of Flashtalking’s products is then set up with a single test that captures the performance of their clients’ ads through every stage of the IT architecture, whether it’s a single site, single server, or encompasses multiple databases/networks/etc.
Of course, establishing criteria and setting up the tests is only part of the SLA management battle. For a cloud provider to stay on top of its SLAs, they must also be able to rely on the alerting features to notify them if they are in danger of being in breach, as well as the accuracy and depth of the reporting to assist with identifying the root cause of the issue. In many cases, an ad serving company such as Flashtalking is relying on other third parties such as DNS resolvers, cloud providers, and content delivery networks to deliver the ads to the end users, which means that a disruption in service is not necessarily their fault. Still, they must be able to share their performance data with their own vendors in order to resolve the issue as quickly as possible for their own customers. In cases such as these, they must be able to easily separate their first- and third-party architecture components to show when a service disruption is not their fault and hold their own vendors accountable instead.
To learn more about SLA management and how both customers and vendors can ensure continuous service delivery, check out our SLA handbook.
Subscribe to the World's Most Powerful Newsletters
Today's Top Reads