Reading from replicas automatically when node is down

I tested this issue. when node down is occurred
getfromreplica can give results occassionally in timeout (I set to 50ms), but ops down is occurred from 100k to 10k and the situation gets worse as time gose by.
Request queuing to failed node gets larger and retrying these failed node requests makes delay new requests and goes to server hangup situation.

So I asked @daschl how to skip failed node requests without failover.

If client knows what node has the data requested and knows how about the node status, client can decide to request or to skip.

[Test Environment]
Couchbase Server v3.0.1 : 12 nodes

  • Data : 70 million (data size : 12 ~ 30 kbytes )

Load Test Server : 15 nodes

  • 4 threads per node

Web Server : 6 node (L7)

  • Couchbase Client v2.0.2 (Java)
  • master op timeout : 50ms
  • replica op timeout : 50ms

[Test Results]

  1. Normal Situation
  • throughput : 100 ~ 125k ops
  • response time : 4ms :smile:
  • cb cpu : 97% idle
  • webserver cpu : 85% idle
  1. Node failure (node down)
  • throughput : 10k under
  • response time : 25 ~ 100ms (Results is very shaking)
  • cb cpu : 97% idle
  • webserver cpu : 85% idle, massive timeout exception occurred