Setup is a 2 node cluster with a bucket with 1 replica.
Everything is working.
I suddenly shut down one node,
Bucket.Get<T>(sKey) fails
I’m calling:
Bucket.GetFromReplica<T>(sKey);
Also fails with “Failed to acquire a connection after 5 tries.”
What should I do to read from replica ?
Restoring the node and the app cannot continue unless I restart it (which is really bad). Errors are: “The operation has timed out” and “Failed to acquire a connection after 5 tries.”
What should I change for the app to restore functionality without restart ?
Assuming you have re-added the node these errors should be resolved once the client syncs with the cluster. Are you doing a remove node/failover from the mgmt console or literally stopping the server hosting CB?
There was a bug found with the existing replica reads found when this ticket was being implemented: [NCBC-840][1]. It is fixed now and will be in 2.1.0.
Shut down is shut down, simulating a lost node.
I thought that if I have 1 replica it should mitigate the unfortunate lose of a server immediately.
Restoring is making it run again, simulating a recovered node
I thought that if the node returns, then it should seamlessly join the cluster, which it does. However, the app is not capable to restore communication.
I have been seeing similar behavior. I had to resize my data partitions and when I failover a node, it seems that SDK is not always reading the replica data. I wish I knew how to test this. I am using views to get the keys for documents I need, so maybe that is where the failures are.