Server down causes java.lang.StackOverflowError after failover

Running a system with 4 servers nodes.

Java application using spymemcached-2.10.2 and couchbase-client-1.2.2

We see following problem:

  1. We force one node down
  2. We can see some “OperationTimeoutException” in our java application
  3. After 30s Auto-Failover kicks in.
  4. a few seconds after the failover we see following Exception:

Exception in thread “Memcached IO over {MemcachedConnection to pmc1-a/192.168.1.11:11210 cb2/192.168.1.14:11210 cb1/192.168.1.13:11210 pmc1-b/192.168.1.12:11210}” java.lang.StackOverflowError
at net.spy.memcached.ops.MultiGetOperationCallback.gotData(MultiGetOperationCallback.java:36)
at net.spy.memcached.ops.MultiGetOperationCallback.gotData(MultiGetOperationCallback.java:36)
at net.spy.memcached.ops.MultiGetOperationCallback.gotData(MultiGetOperationCallback.java:36)
at net.spy.memcached.ops.MultiGetOperationCallback.gotData(MultiGetOperationCallback.java:36)
at net.spy.memcached.ops.MultiGetOperationCallback.gotData(MultiGetOperationCallback.java:36)

at net.spy.memcached.ops.MultiGetOperationCallback.gotData(MultiGetOperationCallback.java:36)

  1. after this all our get operations is timed out. If we create a new CouchbaseClient after this it works ok again.

We ran similar test application using c api and in that case it works as expected after the failover has been done.

Any ideas why this is happening?

Hmm, this is so strange, its look like every line in the stack trace -> came from the same line 36

1.) It could be some weird JVM kind of bug -> could you tell us what VM are you exactly running?

2.) Try this VM option -Xss set java thread stack size.

Cheers,
Loolek

A StackOverflowError is simply signals that there is no more memory available. It is to the stack what an OutOfMemoryError is to the heap: it simply signals that there is no more memory available. JVM has a given memory allocated for each stack of each thread, and if an attempt to call a method happens to fill this memory, JVM throws an error. Just like it would do if you were trying to write at index N of an array of length N. No memory corruption can happen. The stack can not write into the heap.

The common cause for a stackoverflow is a bad recursive call. Typically, this is caused when your recursive functions doesn’t have the correct termination condition, so it ends up calling itself forever. Or when the termination condition is fine, it can be caused by requiring too many recursive calls before fulfilling it.

Here’s an example:

public class Overflow {
    public static final void main(String[] args) {
        main(args);
    }
}

That function calls itself repeatedly with no termination condition. Consequently, the stack fills up because each call has to push a return address on the stack, but the return addresses are never popped off the stack because the function never returns, it just keeps calling itself.