Digital Edition

SYS-CON.TV
Sync Your Timeouts: When Load Balancers Cause Database Deadlocks
Have you seen this error message before “java.sql.Exception: ORA-00060: deadlock detected while waiting for resource”?

Have you seen this error message before "java.sql.Exception: ORA-00060: deadlock detected while waiting for resource"?

This is caused when parallel updates require locks on either rows or tables in your database. I recently ran into this exception on an instance of an IBM eCommerce Server. The first thought was that there are simply too many people hitting the same functionality that updates Sales Tax Summary information - which was showing up in the call stack of the exception:

Exception stack trace showing that createOrderTaxes ran into the deadlock issue on the database

The logical conclusion would be to blame this on too many folks accessing this functionality or outdated table statistics causing update statements to run too long causing others to run into that lock. It turned out to be caused by something that wasn't that obvious and wouldn't have shown up in any Exception stack traces or log files. A misconfigured timeout setting on the load balancer caused a re-execute of the original incoming web request. While the first app server was still updating the table and holding the lock - as it had a longer timeout specified as the load balancer - the second app server tried to do the same thing causing that exception.

In this article I'll show you the steps necessary to analyze the symptoms (timeouts and client errors) and to identify and fix the root cause of the problem.

Step #1: Identifying Who and What Is Impacted
Identifying failing results is easy - start by looking at HTTP Response Codes (Web Server Access Log), Severe Log Messages (App Server Logs) or problematic Exception objects (either in log files or other monitoring tools). In our case I identified the SQL Lock Exception, a corresponding severe log message and the resulting HTTP 500 and also traced it back to the individual users and their actions that caused these issues.

Linking the errors to the User Action reveals that the problem happens when adding items to the shopping cart

This impacts our business.

Now we know that this problem impacts a critical feature in our app: Users can't add items to their cart.

Step #2: Understanding the Transaction Flow
Before drilling deeper I typically get an overview of the flow of the transaction from the browser all the way back to the database. This high-level view lets me understand which application components are involved and how they are interconnected. The transaction flow in this case highlights some interesting issues with the "Add Item to Cart" click. It appears to execute more than 33k SQL Statements for this single user interaction, causing 45% time executed just in Oracle:

Transaction Flow highlights several hotspots such as 33k SQL Executions in Total and Load Balancer (IHS) splitting up a request

This Is an Architectural Problem
Getting the full end-to-end execution path for a single user interaction (Add Item to Cart) and seeing how it "branches" out makes it really obvious how these individual problems (too many SQL, High Execution Time, ...) end up impacting end users. Just spotting the individual hotspots without having this connection would make it harder to understand the real root cause.

For steps 3 & 4, and for a list of key takeaways, click here for the full article

About Andreas Grabner
Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1



ADS BY GOOGLE
Subscribe to the World's Most Powerful Newsletters

ADS BY GOOGLE

JETRO showcased Japan Digital Transformation Pavilion at SYS-CON's 21st International Cloud Expo® at...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: D...
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO an...
Containers and Kubernetes allow for code portability across on-premise VMs, bare metal, or multiple ...
In this presentation, you will learn first hand what works and what doesn't while architecting and d...
Cloud applications are seeing a deluge of requests to support the exploding advanced analytics marke...
The now mainstream platform changes stemming from the first Internet boom brought many changes but d...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: D...
More and more companies are looking to microservices as an architectural pattern for breaking apart ...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22...
CloudEXPO New York 2018, colocated with DXWorldEXPO New York 2018 will be held November 11-13, 2018,...
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news an...
The Internet of Things will challenge the status quo of how IT and development organizations operate...
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held Novemb...
Organizations planning enterprise data center consolidation and modernization projects are faced wit...