In my latest adventure I had to untangle the interaction between a pair of Cisco ACE 4710s and Akamai’s Content Distribution Network (CDN) including SiteShield, Mointpoint, and SiteSpect. It’s truly amazing how complex and almost convoluted a CDN can make any website. Any when it fails you can guess who’s going to get the blame. Over the past few weeks I’ve been looking at a very interesting problem where an Internet facing VIP was experiencing a very unbalanced distribution across the real servers in the severfarm. I wrote a few quick and dirty Bash shell scripts to-do some repeated load tests utilizing curl and sure enough I was able to confirm that there was something amiss between the CDN and the LB. If I tested against the origin VIP I had near perfect round-robin load-balancing across the real servers in the VIP, if I tested against the CDN I would get very uneven load-balancing results.
When a web browser opens a connection to a web server it will generally send multiple requests across a single TCP connection similar to the figure below. Occasionally some browsers will even utilize HTTP pipelining if both the server and browser support that feature, sending multiple requests without waiting for the corresponding TCP acknowledgement.
The majority of load balancers, including the Cisco ACE 4710 and the A10 AX ADC/Thunder, will look at the first request in the TCP connection and apply the load-balancing metric and forward the traffic to a specific real server in the VIP. In order to speed the processing of future requests the load balancer will forward all traffic in that connection to the same real server in the VIP. This generally isn’t a problem if there’s only a single user associated with a TCP connection.
Akamai will attempt to optimize the number of TCP connections from their edge servers to your origin web servers by sending multiple requests from different users all over the same TCP connection. In the example below there are requests from three different users but it’s been my experience that you could see requests for dozens or even hundreds of users across the same TCP connection.
And here lies the problem, the load balancer will only evaluate the first request in the TCP connection, all subsequent requests will be sent to the same real server leaving some servers over utilized and others under utilized.
Thankfully there are configuration options in the majority of load balancers to work around this problem and instruct the load balancer to evaluate all requests in the TCP connection independently.
A10 AX ADC/Thunder
Cisco ACE 4710
parameter-map type http HTTP_PARAMETER_MAP persistence-rebalance strict
With the configuration change made now every request in the TCP connection is evaluated and load-balanced independently resulting in a more even distribution across the real servers in the farm.
In this scenario I’m using HTTP cookies to provide session persistence and ‘stickiness’ for the user sessions. If your application is stateless then you don’t really need to worry that a user lands on the same real server for each and every request.
Image Credit: topfer
Very useful post.
How did you go about using the bash script to test the issue ?
Could be a good blog post of its own.
Michael McNamara says
I was really ugly but it served it’s purpose… I hastily through together a bash shell script that called curl and tailed the HTML output which contained the real server name in a HTML comment. The bash script ran through that curl command 100-500 times, I then fed the output from those commands into a second script that just counted the interactions (grep | wc -l) of the cookie values and server names (output from curl). That gave me a rough idea of how many times I hit each server and if the number of cookie values matched the number of times that server name was included in the source HTML.
Texas Brit says
Thank you so much for posting this incredibly useful article. I am literally seeing the same thing and couldn’t figure it out for the life of me.
Forgive me if this is a really stupid question. This is my understanding of your article. You are using cookies for persistence. When a TCP connection is made with 3 users that have NOT been to your site before and therefore don’t have a cookie (to specify a web server) every GET request will be assigned the same web server and thus get the same cookie (web server specified in the web server). Is that correct?
If that TCP connection has 2 users that have not been to your website before and 1 user that has (and thus has a cookie already), what happens then? The user with the cookie gets the same web server as they had before, and the 2 new users both get assigned to a web server independently (and both get the same one)?
Michael McNamara says
Yes, I’m using cookies for session persistence. HTML transactions will all be treated and load balance individually.