IndexOutOfBoundsException in DefaultCouchbaseBucketConfig.nodeIndexForMaster, flushing not possible afterwards


#1

After creating and opening a bucket programmatically with 2.2.4 I get hundreds of those exceptions:

WARN  2016-02-16T11:53:18,777 Slf4JLogger.warn (l. 166): Exception while Handling Request Events RequestEvent{request=null}
java.lang.IndexOutOfBoundsException: Index: 10989, Size: 0
    at java.util.ArrayList.rangeCheck(ArrayList.java:653) ~[?:1.8.0_66]
    at java.util.ArrayList.get(ArrayList.java:429) ~[?:1.8.0_66]
    at com.couchbase.client.core.config.DefaultCouchbaseBucketConfig.nodeIndexForMaster(DefaultCouchbaseBucketConfig.java:136) ~[core-io-1.2.4.jar:?]
    at com.couchbase.client.core.node.locate.KeyValueLocator.calculateNodeId(KeyValueLocator.java:199) ~[core-io-1.2.4.jar:?]
    at com.couchbase.client.core.node.locate.KeyValueLocator.locateForCouchbaseBucket(KeyValueLocator.java:159) ~[core-io-1.2.4.jar:?]
    at com.couchbase.client.core.node.locate.KeyValueLocator.locate(KeyValueLocator.java:87) ~[core-io-1.2.4.jar:?]
    at com.couchbase.client.core.RequestHandler.onEvent(RequestHandler.java:200) ~[core-io-1.2.4.jar:?]
    at com.couchbase.client.core.RequestHandler.onEvent(RequestHandler.java:76) ~[core-io-1.2.4.jar:?]
    at com.couchbase.client.deps.com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) [core-io-1.2.4.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_66]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_66]
    at com.couchbase.client.deps.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) [core-io-1.2.4.jar:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66]

They vary in the “Index” value of the IndexOutOfBoundsException.
The code does not progress after this.

Flushing any bucket is not possible afterwards. Doing it programmatically will result in “HTTP 503”. We had to rebalance the whole cluster node by node in order to get it flushing again. But after running the code again, the symptoms reappeared.

The server is 4.0.0-4051. It does not matter if I use Java SDK 2.2.3. This issue suddenly appeared after having worked for a year before and 1 month on the CB4 cluster.


#2

Hi @SebastianF thanks for reporting. So are you saying this also happens with 2.2.3?

Is there any chance you can enable TRACE logging and send the log over somehow?
Do you have an example to reproduce it?

The weird part here is that the index should not be higher than 1023 in the first place since we don’t have more partitions. Do you happen to know at which operation this happens and/or in which phase of the client (bootstrap?)


#3

@daschl

Please see the attached file for a trace log.
I managed to minimize the example a bit.
The whole thing happens within a test suite. Those loads of exceptions I wrote about above originate from a flush call right in the beginning, this was unnecessary. I removed the call and only ran one test from the suite. Then I only get the IndexOutOfBoundsException once in the beginning. The test starts but any database operation leads to a timeout, failing the test.

Furthermore I tried connecting to a Couchbase 3 cluster which resulted in “Requests cancelled in-flight” and strangely an AuthentificationException after it had already connected.

crash_log.zip (9.3 KB)

We create the bucket like:

DefaultCouchbaseEnvironment.Builder settings = DefaultCouchbaseEnvironment.builder();
        
couchbaseDb = CouchbaseCluster.create(settings.build(), "couchbase01.an-app-test.nl.a-host.com", "couchbase02.an-app-test.nl.a-host.com", "couchbase03.an-app-test.nl.a-host.com");
clusterManager.insertBucket(DefaultBucketSettings.builder().name(bucketName).replicas(2).quota(250).enableFlush(true));