Strange behavior with async cluster and opening multiple buckets concurrently w/o all server nodes supplied


#1

I’ve noticed something strange when playing around with the new client (2.0.2) and connecting to a cluster via AsyncCouchbaseCluster. From what I’ve seen, if I create the cluster and then attempt to open connections to two different buckets w/o waiting for the first one to connect before attempting the second connection I see strange stuff logged at debug level. The message I see is (repeated over and over, many many times):

DEBUG c.c.c.c.node.locate.KeyValueLocator - Node list and configuration's partition hosts sizes : 5 <> 3, rescheduling

Now I only see this if I don’t give a complete node list the creation of the AsyncCouchbaseCluster. I have 3 nodes in the cluster but on creation, I am only supplying either one or two nodes and that will lead to this behavior. If I gave it all 3 nodes then I don’t ever see this log message. Also, if I switch to use regular CouchbaseCluster instead of the async one and do not give it a complete node list, I don’t see the issue. So this has to be something with the combo of AsyncCouchbaseCluster, an incomplete node list and opening multiple buckets (both couchbase type buckets) at the same time. My code to reproduce this is (in Scala, but that should have nothing to do with it):

val env = DefaultCouchbaseEnvironment.create()
val cluster = CouchbaseAsyncCluster.create(env, nodes)
val b1 = cluster.openBucket("bucket1", "password")
val b2 = cluster.openBucket("bucket2", "password")

Thread.sleep(5000) //sleeping to allow connections to happen
b1.toBlocking.first.get("somekey").toBlocking().firstOrDefault(null)

#2

@cbax007 interesting behaviour. So the thing is that when you do it completely async some ops “race” on opening the bucket, since there is more involved in the background -> fetching a config, applying it and reconfiguring (adding nodes, services and endpoints).

So my main question is:

  • Are you experiencing issues or is it just “weird” and all still works without issues (timeouts,…)
  • You don’t need to apply the sleep, it should just requeue the op until it is able to schedule it.

Also, it be great if you can upload a log somewhere so I can take a look?


#3

I think once things get into this state it does not recover (at least not right away) and there are things like timeouts because I don’t think the KeyValueLocator can properly locate the node to send the request to. I will attach a log in the morning tomorrow as I am about to head out for the day.


#4

Okay, here is the log from the code I posted that leads to that situation. I stopped the app after a few seconds once it went into that weird state as the log will fill up with those messages from KeyValueLocator pretty quickly. The log info is linked to below, with the two buckets being named campaignevent and datacenter.