Periodic CAS operation performance issue


#1

We’re using a high volume of read operations and getting consistently great performance. On our back-end servers that populate data, we’re using CAS operations and seeing strange peaks followed by degradation in write rate. Our throughput (number of CAS operations / sec) looks like this:

The fact that our throughput peaks and falls in a way that’s centralized seems like there’s something limiting us. We’re looking into datacenter / network issues, but if that were the case I’d expect us to see issues in read-only performance also. Is there something in the way CAS works that would cause performance to degrade periodically?


#2

Is all your data in memory(active resident ratio = 100%)?


#3

Assuming those numbers are op/s, I wouldn’t expect a few hundred CAS operations per second to put much load on the cluster.

  • Have you checked all the obvious things - what’s the utilisation (CPU, disk, memory, network) of your servers?
  • How large are your documents?
  • How many documents?
  • Are you changing a small set of the total number of documents?

At a high level there isn’t anything particularly special about CAS which should cause these problems - the request will come into the cluster, if the CAS value matches what the server has the operation will succeed, otherwise it will fail.

Given the distributed, shared-nothing nature of a Couchbase cluster, generally speaking any correlation you see between servers is due to application server behaviour, or something else central in the system.

Remember - your op/s are ultimately related to both how often the client makes a request and how quickly the cluster can respond to that. In other words if you request less operations you’ll get less responses.