Getting slow performance with PersistTo or ReplicateTo

samarthvarshney · March 11, 2015, 9:16am

Hi,

I was trying to use the provided persistence and replication options given while performing a replace operation, when I faced an issue. Without PersistTo and ReplicateTo I am able to perform a replace operation within 15ms. But, using either one of PersistTo or ReplicateTo, the same replace operation takes in between 100 to 150ms.

I have setup a local cluster with 2 nodes; each having Couchbase 3.0.2 Enterprise Edition. I am using the Java SDK 2.1.0. I have a bucket with the default settings and 1GB RAM quota. Can you please help me figure out why I am facing this huge gap in performance?

Thanks,
Samarth

daschl · March 11, 2015, 9:32am

Hi,

well, keep in mind that you have an additional network hop with ReplicateTo and you add disk latency when using PersistTo. Without those overloads you’ll get a response acknowledged as soon as it is stored in the managed memory of the master node.

Which settings are you using and which numbers are you getting exactly?

samarthvarshney · March 11, 2015, 9:59am

I took the average of 10 replace operations, all of them replacing the document with the same content:

Without persistTo and replicateTo -> 2.33ms
With PersistTo.ONE alone -> 121.8ms
With ReplicateTo.ONE alone -> 67ms
With ReplicateTo.ONE and PersistTo.ONE -> 79ms

The variation is big on using either PersistTo.ONE or ReplicateTo.ONE. There are certain times I get the response within 5ms and others when it takes more than 150ms.

cihangirb · March 11, 2015, 3:13pm

Hi @samarthvarshney, the additional durability guarantees that come from persistTo and replicateTo fight laws of physics. They can only run at the speed of your storage subsystem or your network speed. Another important things to remember about these numbers is, as you add more operations things don’t necessarily get worse because both for persistTo and replicateTo we batch operations. So if you are doing a standalone update vs 1K in a burst, you will have the same latency but better throughput.

I’d recommend you also take more samples. The numbers for persist to look like entry level public cloud storage performance. Replicate to number is a very low. Could you be working in a noisy network? #2 and #4 also seem to not add up. There must be a huge variance in the storage subsystem to cause 121 vs 79 me latency for persisting.

I am not sure the new defaults are on the Java SDK is but it may make sense to take a look at the observe poll interval. Michael (@daschl) may be able to give us recommendations on those if those are set to be too high.

thanks
-cihan

ingenthr · March 11, 2015, 3:38pm

Actually @cihangirb, this looks like it could be related to MB-13359.