Also, we are seeing a ton of errors on our clients that are not reconnecting back to the nodes that have been rebalanced. We see that with a ton of warnings / errors in the logs, no traffic for a particular bucket on a particular node (even though the other nodes are taking traffic for that bucket). Our .NET Clients are not having similar problems.
{"timestamp":"2016-03-03T19:48:55.606Z","level":"DEBUG","thread":"Memcached IO over {MemcachedConnection to 10.52.40.121/10.52.40. 121:11210 10.52.40.122/10.52.40.122:11210 10.52.40.123/10.52.40.123:11210}","logger":"com.couchbase.client.CouchbaseConnection","m essage":"Handling IO for: sun.nio.ch.SelectionKeyImpl@48756c48 (r=true, w=false, c=false, op={QA sa=10.52.40.123/10.52.40.123:112 10, #Rops=1, #Wops=0, #iq=0, topRop=Cmd: 2 Opaque: 1190090 Key: S2::XYZXYZ-rabbitmq-consumer::string::message::lock::S2::555 58684-e267-4326-bf47-597c1b2aa8b0 Cas: 0 Exp: 605 Flags: 0 Data Length: 3, topWop=null, toWrite=0, interested=1})","context":"defa ult"} {"timestamp":"2016-03-03T19:48:55.606Z","level":"DEBUG","thread":"Memcached IO over {MemcachedConnection to 10.52.40.121/10.52.40. 121:11210 10.52.40.122/10.52.40.122:11210 10.52.40.123/10.52.40.123:11210}","logger":"com.couchbase.client.CouchbaseConnection","m essage":"Read 7885 bytes","context":"default"} {"timestamp":"2016-03-03T19:48:55.606Z","level":"DEBUG","thread":"Memcached IO over {MemcachedConnection to 10.52.40.121/10.52.40. 121:11210 10.52.40.122/10.52.40.122:11210 10.52.40.123/10.52.40.123:11210}","logger":"net.spy.memcached.protocol.binary.StoreOpera tionImpl","message":"Reading 24 header bytes","context":"default"} {"timestamp":"2016-03-03T19:48:55.606Z","level":"DEBUG","thread":"Memcached IO over {MemcachedConnection to 10.52.40.121/10.52.40. 121:11210 10.52.40.122/10.52.40.122:11210 10.52.40.123/10.52.40.123:11210}","logger":"net.spy.memcached.protocol.binary.StoreOpera tionImpl","message":"Reading 7861 payload bytes","context":"default"} {"timestamp":"2016-03-03T19:48:55.606Z","level":"DEBUG","thread":"Memcached IO over {MemcachedConnection to 10.52.40.121/10.52.40. 121:11210 10.52.40.122/10.52.40.122:11210 10.52.40.123/10.52.40.123:11210}","logger":"net.spy.memcached.protocol.binary.StoreOpera tionImpl","message":"Transitioned state from READING to RETRY","context":"default"} {"timestamp":"2016-03-03T19:48:55.609Z","level":"DEBUG","thread":"Memcached IO over {MemcachedConnection to 10.52.40.121/10.52.40. 121:11210 10.52.40.122/10.52.40.122:11210 10.52.40.123/10.52.40.123:11210}","logger":"com.couchbase.client.vbucket.config.Couchbas eConfig","message":"Nodes with active VBuckets: [10.52.40.123, 10.52.40.121, 10.52.40.122]","context":"default"} {"timestamp":"2016-03-03T19:48:55.609Z","level":"DEBUG","thread":"Memcached IO over {MemcachedConnection to 10.52.40.121/10.52.40. 121:11210 10.52.40.122/10.52.40.122:11210 10.52.40.123/10.52.40.123:11210}","logger":"com.couchbase.client.CouchbaseConnection","m essage":"Reschedule read op due to NOT_MY_VBUCKET error: Cmd: 2 Opaque: 1190090 Key: S2::XYZXYZ-rabbitmq-consumer::string::m essage::lock::S2::55558684-e267-4326-bf47-597c1b2aa8b0 Cas: 0 Exp: 605 Flags: 0 Data Length: 3 ","context":"default"} {"timestamp":"2016-03-03T19:48:55.612Z","level":"DEBUG","thread":"Memcached IO over {MemcachedConnection to 10.52.40.121/10.52.40. 121:11210 10.52.40.122/10.52.40.122:11210 10.52.40.123/10.52.40.123:11210}","logger":"net.spy.memcached.protocol.binary.BinaryMemc achedNodeImpl","message":"Setting interested opts to 0","context":"default"} {"timestamp":"2016-03-03T19:48:55.612Z","level":"WARN","thread":"Memcached IO over {MemcachedConnection to 10.52.40.121/10.52.40.1 21:11210 10.52.40.122/10.52.40.122:11210 10.52.40.123/10.52.40.123:11210}","logger":"com.couchbase.client.CouchbaseConnection","me ssage":"Cancelling operation Cmd: 2 Opaque: 1190090 Key: S2::XYZXYZ-rabbitmq-consumer::string::message::lock::S2::55558684-e 267-4326-bf47-597c1b2aa8b0 Cas: 0 Exp: 605 Flags: 0 Data Length: 3because it has been retried (cloned) more than 100times.","context":"default"}
And, if we look locally, we have no connectivity to 10.52.40.121 on port 11210, while the Cluster Map alluded to above does know about 10.52.40.121:
netstat -anp | grep *PID* | grep 11210
tcp 0 0 ::ffff:10.15.16.18:56246 ::ffff:10.52.40.122:11210 ESTABLISHED 28077/java tcp 0 0 ::ffff:10.15.16.18:56338 ::ffff:10.52.40.122:11210 ESTABLISHED 28077/java tcp 0 0 ::ffff:10.15.16.18:57210 ::ffff:10.52.40.123:11210 ESTABLISHED 28077/java tcp 0 0 ::ffff:10.15.16.18:40679 ::ffff:10.52.40.123:11210 ESTABLISHED 28077/java tcp 0 0 ::ffff:10.15.16.18:40676 ::ffff:10.52.40.123:11210 ESTABLISHED 28077/java tcp 0 0 ::ffff:10.15.16.18:55815 ::ffff:10.52.40.122:11210 ESTABLISHED 28077/java