Server down causes java.lang.StackOverflowError after failover


#1

Running a system with 4 servers nodes.

Java application using spymemcached-2.10.2 and couchbase-client-1.2.2

We see following problem:

  1. We force one node down
  2. We can see some “OperationTimeoutException” in our java application
  3. After 30s Auto-Failover kicks in.
  4. a few seconds after the failover we see following Exception:

Exception in thread “Memcached IO over {MemcachedConnection to pmc1-a/192.168.1.11:11210 cb2/192.168.1.14:11210 cb1/192.168.1.13:11210 pmc1-b/192.168.1.12:11210}” java.lang.StackOverflowError
at net.spy.memcached.ops.MultiGetOperationCallback.gotData(MultiGetOperationCallback.java:36)
at net.spy.memcached.ops.MultiGetOperationCallback.gotData(MultiGetOperationCallback.java:36)
at net.spy.memcached.ops.MultiGetOperationCallback.gotData(MultiGetOperationCallback.java:36)
at net.spy.memcached.ops.MultiGetOperationCallback.gotData(MultiGetOperationCallback.java:36)
at net.spy.memcached.ops.MultiGetOperationCallback.gotData(MultiGetOperationCallback.java:36)

at net.spy.memcached.ops.MultiGetOperationCallback.gotData(MultiGetOperationCallback.java:36)

  1. after this all our get operations is timed out. If we create a new CouchbaseClient after this it works ok again.

We ran similar test application using c api and in that case it works as expected after the failover has been done.

Any ideas why this is happening?


#2

Hmm, this is so strange, its look like every line in the stack trace -> came from the same line 36

1.) It could be some weird JVM kind of bug -> could you tell us what VM are you exactly running?

2.) Try this VM option -Xss set java thread stack size.

Cheers,
Loolek