Binary deserialization


#1

I have a complex Java Serializable object composition that gets serialized to a Couchbase based cache.

When I try to deserialize it back, I get a java.util.concurrent.TimeoutException.

The object deserialization in question has a custom class readObject(in:ObjectInputStream) method that will eventually call to the same and other buckets to fetch other required objects.

Currently the code works just fine when using a different cache (Coherence). Any idea on how to fix this?

Couchbase version: 4.1.0-5005


#2

Try converting the object into JSON. You should see CPU usage decrease ,on the app server, and speed increase(I.e no more time outs).

Check out the Dev Preview of Version 4.5.x

You can take advantage of the new subdocument API to GET or UPDATE only parts of the JSON doc on the cluster.

http://developer.couchbase.com/documentation/server/4.5-dp/sub-doc-api.html
&

Query the JSON with SQL using N1QL.


#3

@bmrosantos can you shed more light on the code you are executing, your response times and more info on the exceptions? Maybe we can optimize the lookup pattern and/or the code, environment.


#4

@daschl, here are some details.

Basically I’m using a modified version of couchbase-spring-cache.

Notice that CouchbaseCache class uses a SerializableDocument strategy which was perfect to minimize impact changes in existing code.

The setup for now is a single Couchbase server deployed in a local docker-machine without replication. Initially the timeouts were the defaults. I’ve start trying to play with them, but without any success.

Here are values that used to be default and that I’ve recently tried with:

    envBuilder.connectTimeout(2000)
    envBuilder.kvTimeout(200);
    envBuilder.queryEnabled(false);
    envBuilder.retryStrategy(BestEffortRetryStrategy.INSTANCE);
    envBuilder.retryDelay(Delay.linear(MILLISECONDS, 120000, 0, 100));
    envBuilder.keepAliveInterval(5000);
    final CouchbaseEnvironment env = envBuilder.build();
    return CouchbaseCluster.create(env, cbHosts);

Everything works just fine until I realized that a relatively large object structure from a query ends-up in cache (legacy code). To deserialize it, several classes implement the following scheme as they are being deserialized:

class Foo implements Serializable {
// Complex structure list/map/etc…

private OtherId otherId; // Serialized
private transient Other other; // Not serialized

private void readObject(final ObjectInputStream in) throws IOException, ClassNotFoundException {
in.defaultReadObject();
// Fetch Foo.other
if (this.other == null && otherId != null) {
this.foo = cache.find(otherId); // <— Other calls to Couchbase while current one has not yet fully completed
}
}
}

I’ll get back to you with the response times.


#5

@daschl, here’s the exception:
2016-04-24 22:04:28,065 ERROR CouchbaseCacheFacade.getAll.(153) | [] | Error retrieving entries [com.foo.core.persistence.entity.compid.FooId@1164c6c1[ida=FOOBAA,locale=en_US]]from cache Foo: java.lang.RuntimeException: java.util.concurrent.TimeoutException
java.lang.RuntimeException: java.util.concurrent.TimeoutException
at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:75)
at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:149)
at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:144)


#6

Regarding response times, most hits are within 1 to 6 milliseconds.

The cases that throw the TimeoutException go above 200 ms (which happens to be current kvTimeout) and is the addition of several other calls to the cache as it deserializes the byte stream back to its object composition representation.


#7

@daschl, the following information has been submitted through the proper support channel. Adding this so that this thread can also have all details on the issue.


The following test replicates the Couchbase issue that I’m experiencing:

https://github.com/bmsantos/couchbase-spring-cache/blob/serialization/src/integration/java/com/couchbase/client/spring/cache/DeepSerializationTest.java

Change the DEEP_LEVEL property to build a simple or more complex object composition

You can checkout the project and specific branch from:

git clone git@github.com:bmsantos/couchbase-spring-cache.git -b serialization --single-branch

I’m probably wrong on my analysis of the situation, but here goes nothing. To me, it appears that Couchbase client resources (in my case 8 threads, one per CPU) are being starved while waiting for others threads to complete. Once all threads are busy and once the kvTimout time is exceeded, the deserialization fails to complete with a TimeoutException:

java.lang.RuntimeException: java.util.concurrent.TimeoutException

at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:75)
at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:149)
at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:144)
at com.couchbase.client.spring.cache.CouchbaseCache.get(CouchbaseCache.java:186)

I’ve also found that increasing the number of computation thread pools can postpone the occurrence of this issue, but the actual problem is still present.


#8

@bmrosantos this is indeed a problem of starvation, stemming from the fact that the computation threads are where the get+transcoding happens, and your readObject (which is called during SerializableDocument transcoding) recursively calls get.

If you rely on Java for the serialization of the full Object graph, it works:

  • remove the transient qualifier in Foo and Other
  • comment out the readObject methods
  • make sure to do cache.put(root.getId(), root) at the end of DeepSerializationTest.getFooStructure()

But I guess you tried all that because you had a lot of shared objects and absolutely wanted them to be each in their own Couchbase document?

This is a tricky case, and I’m not sure we’ll be able to provide a solution, but maybe there’s a way by going one layer down into core-io and doing the transcoding more manually in readObjet(). I’ll look into that.


#9

Even with direct core messaging coupled with going into a growing thread pool (.observeOn(Schedulers.io())), it doesn’t work due to recursion. The computation threads get blocked and the pool starves.

Bad news, I don’t think that is something we can fix in the Cache support implementation :frowning:

I’ve found a way around the problem by doing lazy recursive fetching of the nested documents, using the client.async() API rather than relying on custom deserialization, but it completely bypasses the cache.get :frowning: :frowning: :’(

I’ll include it here for information:

The idea is to keep the Other and Foo attributes transient but don’t implement readObject (or rather don’t do a get inside readObject). Then to read the root object, directly use the Couchbase async SDK and create a recursive stream that asynchronously deserializes items and repopulates parent items.

The gist is here


#10

@simonbasle, thanks for getting back to me so quickly.

Unfortunately there are a significantly amount of legacy code. At this stage we are contemplating to substitute the current cache (Oracle Coherence) with Couchbase with minimal or no changes to the graph. This allows us to keep stability and, more importantly, to easily switch between one or another cache in case of deployment issues. In addition, duplication of object instances during serialization is not an option.


#11

Thanks for the gist. I’ll have a look in it.

For now, I have to switch gears into something else and do not expect to get back to this for the next couple of weeks. So, no hurry, take your time. I would appreciate if you would keep me posted with any update on this front.