Reading from replicas automatically when node is down

cbax007 · January 9, 2015, 7:10pm

In the old client, is was possible that with a persistent bucket with replicas enabled, when the master node went down, it would automatically send reads over to the replica node until the master came back up. I have noticed with the new client (both via experimentation and through the docs) that calls to the regular get method will not do this kind of behavior. It seems that if I want this behavior that I would have to either:

Use getFromReplicawith the type set to FIRST and basically never read from the master (but not sure what happens if the first replica goes down instead)
Use getFromReplica and use ALL and take the first item in the list returned. This seems more fail safe in that if a node is down as long as both aren’t, you will still get data. But this seems rather inefficient to always be reading from two nodes.

Am I missing something here. Is there a better way to gracefully fail the reads over to a replica if the master is not available?

cbax007 · January 9, 2015, 7:26pm

Nevermind. I see how to do this in the Mastering Observables section of the documentation. This should suffice.

daschl · January 9, 2015, 8:02pm

Hm actually on the old one those two methods are also by conception different - can you tell me more on the old one’s behavior?

cbax007 · January 9, 2015, 8:15pm

@daschl, I thought that in spymemcached you could set the FailureMode on a bucket with the options being: Cancel, Retry and Redistribute. If you set the FailMode to Redistribute then I thought the behavior was to redistribute to a replica if there is a failure reading from the master.

Not sure how current the docs are here, but adding a link for reference:
http://dustin.sallings.org/java-memcached-client/apidocs/net/spy/memcached/FailureMode.html

daschl · January 9, 2015, 8:37pm

That is only true for memcached buckets where the Ketama algorithm allows for it. Since we use fixed partitions the op is put in a retry queue and dispatched to the proper node once available again.

So you are getting the same approach for 1.4 and 2.0. Now we split it up for a good reason: once you read from a replica the consistency is weakened and you trade it for availability. Falling back automatically without you knowing would get you into trouble most probably once things start to fail.

Does that clear it up?

cbax007 · January 9, 2015, 8:39pm

Yes, it makes sense. I’m not a huge fan of having to wait until the timeout occurs before the onErrorRetryNext kicks in but it’s better than not getting the data at all considering that a downed node is not a normal and frequent situation.

daschl · January 9, 2015, 8:50pm

I’m in the planning stages of a fail fast mode for 2.0… There are some edge cases to be considered, but we’ll work through it.

Sam_K · January 10, 2015, 2:49am

I tested this issue. when node down is occurred
getfromreplica can give results occassionally in timeout (I set to 50ms), but ops down is occurred from 100k to 10k and the situation gets worse as time gose by.
Request queuing to failed node gets larger and retrying these failed node requests makes delay new requests and goes to server hangup situation.

So I asked @daschl how to skip failed node requests without failover.

If client knows what node has the data requested and knows how about the node status, client can decide to request or to skip.

[Test Environment]
Couchbase Server v3.0.1 : 12 nodes

Data : 70 million (data size : 12 ~ 30 kbytes )

Load Test Server : 15 nodes

4 threads per node

Web Server : 6 node (L7)

Couchbase Client v2.0.2 (Java)
master op timeout : 50ms
replica op timeout : 50ms

[Test Results]

Normal Situation

throughput : 100 ~ 125k ops
response time : 4ms
cb cpu : 97% idle
webserver cpu : 85% idle

Node failure (node down)

throughput : 10k under
response time : 25 ~ 100ms (Results is very shaking)
cb cpu : 97% idle
webserver cpu : 85% idle, massive timeout exception occurred