Digital Edition

SYS-CON.TV
Understanding Application Performance on the Network | Part 4
Packet Loss

We know that losing packets is not a good thing; retransmissions cause delays. We also know that TCP ensures reliable data delivery, masking the impact of packet loss. So why are some applications seemingly unaffected by the same packet loss rate that seems to cripple others? From a performance analysis perspective, how do you understand the relevance of packet loss and avoid chasing red herrings?

In Part II, we examined two closely related constraints - bandwidth and congestion. In Part III, we discussed TCP slow-start and introduced the Congestion Window (CWD). In Part IV, we'll focus on packet loss, continuing the concepts from these two previous entries.

TCP Reliability
TCP ensures reliable delivery of data through its sliding window approach to managing byte sequences and acknowledgements; among other things, this sequencing allows a receiver to inform the sender of missing data caused by packet loss in multi-packet flows. Independently, a sender may detect packet loss through the expiration of its retransmission timer. We will look at the behavior and performance penalty associated with each of these cases; generally, the impact of packet loss will depend on both the characteristics of the flow and the position of the dropped packet within the flow.

The Retransmission Timer
Each packet a node sends is associated with a retransmission timer; if the timer expires before the sent data has been acknowledged, it is considered lost and retransmitted. There are two important characteristics of the retransmission timer that relate to performance. First, the default value for the initial retransmission timeout (RTO) is almost always 3000 milliseconds; this is adjusted to a more reasonable value as TCP observes actual path round-trip times. Second, the timeout value is doubled for subsequent retransmissions of a packet.

In small flows (a common characteristic of chatty operations - like web pages), the retransmission timer is the method used to detect packet loss. Consider a request or reply message of just 1000 bytes, sent in a single packet; if this packet is dropped, there will of course be no acknowledgement; the receiver has no idea the packet was sent. If the packet is dropped early in the life of a TCP connection - perhaps one of the SYN packets during the TCP 3-way handshake, or an initial GET request or a 304 Not Modified response - the dropped packet will be retransmitted only after 3 seconds have elapsed.

Triple Duplicate ACK
Within larger flows, a dropped packet may be detected before the retransmission time expires if the sender receives three duplicate ACKs; this is generally more efficient (faster) than waiting for the retransmission timer to expire. As the receiving node receives packets that are out of sequence (i.e., after the missing packet data should have been seen), it sends duplicate ACKs, the acknowledgement number repeatedly referencing the expected (missing) packet data. When the sending node receives the third duplicate ACK, it assumes the packet was in fact lost (not just delayed) and retransmits it. This event causes the sender to assume network congestion, reducing its congestion window by 50% to allow congestion to subside. Slow-start begins to increase the CWD from that new value, using a relatively conservative congestion avoidance ramp.

As an example, consider a server sending a large file to a client; the sending node is ramping up through slow-start. As the CWD reaches 24, earlier packet loss is detected via a triple duplicate ACK; the lost data is retransmitted, and the CWD is reduced to 12. Slow-start resumes from this point in its congestion avoidance mode.

While arguments abound about the inefficiency of existing congestion avoidance approaches, especially on high-speed networks, you can expect to see this behavior in today's networks.

Transaction Trace Illustration
Identifying retransmission timeouts using merged trace files is generally quite straightforward; we have proof the packet has been lost (because we see it on the sending side and not on the receiving side), and we know the delay between the dropped and retransmitted packets at the sending node. The Delta column in the Error Table indicates the retransmission delay.

Error Table entry showing a 3-second retransmission delay caused by a retransmission timeout (RTO)

For larger flows, you can illustrate the effect of dropped packets on the sender's Congestion Window by using the Time Plot view. For Series 1, graph the sender's Frames in Transit; this is essentially the CWD. For Series 2, graph the Cumulative Error Count in both directions. As errors (retransmitted packets or out-of-sequence packets) occur, the CWD will be reduced by about 50%.

Time Plot view showing the impact of packet loss (blue plot) on the Congestion Window (brown plot)

For more networking tips click here for the full article

About Gary Kaiser
Gary Kaiser is a Subject Matter Expert in Network Performance Analytics at Dynatrace, responsible for DC RUM’s technical marketing programs. He is a co-inventor of multiple performance analysis features, and continues to champion the value of network performance analytics. He is the author of Network Application Performance Analysis (WalrusInk, 2014).

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1



ADS BY GOOGLE
Subscribe to the World's Most Powerful Newsletters

ADS BY GOOGLE

SUSE is a German-based, multinational, open-source software company that develops and sells Linux pr...
Lori MacVittie is a subject matter expert on emerging technology responsible for outbound evangelism...
NanoVMs is the only production ready unikernel infrastructure solution on the market today. Unikerne...
Big Switch's mission is to disrupt the status quo of networking with order of magnitude improvements...
Yottabyte is a software-defined data center (SDDC) company headquartered in Bloomfield Township, Oak...
Dynatrace is an application performance management software company with products for the informatio...
Chris Matthieu is the President & CEO of Computes, inc. He brings 30 years of experience in developm...
All in Mobile is a mobile app agency that helps enterprise companies and next generation startups bu...
Every organization is facing their own Digital Transformation as they attempt to stay ahead of the c...
Blockchain is a new buzzword that promises to revolutionize the way we manage data. If the data is s...
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO Silicon Valley 2019 will cover all of these tools, with the m...
Serveless Architectures brings the ability to independently scale, deploy and heal based on workload...
I spend a lot of time helping organizations to “think like a data scientist.” My book “Big Data MBA:...
The standardization of container runtimes and images has sparked the creation of an almost overwhelm...
Whenever a new technology hits the high points of hype, everyone starts talking about it like it wil...
Wasabi is the hot cloud storage company delivering low-cost, fast, and reliable cloud storage. Wasab...
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitori...
David Friend is the co-founder and CEO of Wasabi, the hot cloud storage company that delivers fast, ...
In addition to 22 Keynotes and General Sessions, attend all FinTechEXPO Blockchain "education sessio...
Early Bird Registration Discount Expires on August 31, 2018 Conference Registration Link ▸ HERE. Pic...