I am evaluating Couchbase 4.5 on Centos and earlier Ubuntu. On
both operating systems I am observing a weird n1ql query latency distribution
using CB 4.5 Enterprise and the latest python SDK, which I understand is the
C++ sdk underneath. For the following graph I ran 6 Centos 6.4 servers in a CB
4.5 enterprise cluster, and a client on a PC running Windows 7 with the CB python client.
The response sample distribution, with latency measured at the client, looks
The horizontal axis is latency in milliseconds. The vertical
axis axis is the number of samples observed in about 1/100 of the full range of
latencies (here about 47 to 435 ms). Between the blips in the image, there are mostly
zero observations. The total number of samples is 10000. There is about a 42 ms
ping time between the client and server. Query response size ranges between
about 100 bytes and 100,000 bytes.
Notice the distinct separation of completion times apparently
correlated with complete round-trip times between client and server. This would
be expected e.g. if there were a single large buffer that fills at the client and
then is acknowledged back to the server before any more information is sent
from the server. Nagle TCP issues alone would likely not be able to make so many
blips, but I am not ruling Nagle out entirely—perhaps in conjunction with
something else. However, I found documentation that Nagle is turned off by
default in CB 4.5, and I did not turn it on.
For clients having lower ping times to the server cluster,
this shows up as a long thick tail with blurring between the blips.
Do you have a suggestion about how to eliminate this?
Could I size some buffers differently perhaps, or what
should I do to eliminate these response delays?
If possible, please be specific about exactly what I might do
or attempt to stop this.