Not able to do any single upsert. Always get "rx.exceptions.OnErrorThrowable$OnNextValue: OnError while emitting onNext value: com.couchbase.client.core.message.kv.UpsertResponse.class"

#1

I have a Couchbase Java client using v2.2.2 of the SDK that I have been using for a while to insert a large amount of data into 4 buckets in a cluster with 2 nodes. I’m using Couchbase 4.1.0 DP on Debian 7 machines.

During insertion of this data I’ve been experiencing multiple errors like the following on a regular basis:

Service 'memcached' exited with status 137. Restarting. Messages: 2015-11-24T21:54:26.078632Z WARNING 72: Slow STAT operation on connection (127.0.0.1:61516 => 127.0.0.1:11209): 20159 ms
2015-11-24T21:54:26.078618Z WARNING 57: Slow STAT operation on connection (127.0.0.1:41537 => 127.0.0.1:11209): 20159 ms
2015-11-24T21:54:26.078631Z WARNING 56: Slow STAT operation on connection (127.0.0.1:49198 => 127.0.0.1:11209): 20159 ms
2015-11-24T21:54:40.439683Z WARNING 46: Slow GET_CLUSTER_CONFIG operation on connection (10.32.3.212:52335 => 10.32.3.212:11210): 1644 ms
2015-11-24T21:54:40.439691Z WARNING 41: Slow STAT operation on connection (127.0.0.1:23930 => 127.0.0.1:11209): 3565 ms

Control connection to memcached on 'ns_1@10.32.3.212' disconnected: {{badmatch,
{error,
closed}},
[{mc_client_binary,
cmd_vocal_recv,
5,
[{file,
"src/mc_client_binary.erl"},
{line,
156}]},
{mc_client_binary,
select_bucket,
2,
[{file,
"src/mc_client_binary.erl"},
{line,
351}]},
{ns_memcached,
ensure_bucket,
2,
[{file,
"src/ns_memcached.erl"},
{line,
1299}]},
{ns_memcached,
handle_info,
2,
[{file,
"src/ns_memcached.erl"},
{line,
748}]},
{gen_server,
handle_msg,
5,
[{file,
"gen_server.erl"},
{line,
604}]},
{ns_memcached,
init,1,
[{file,
"src/ns_memcached.erl"},
{line,
177}]},
{gen_server,
init_it,
6,
[{file,
"gen_server.erl"},
{line,
304}]},
{proc_lib,
init_p_do_apply,
3,
[{file,
"proc_lib.erl"},
{line,
239}]}]}

Control connection to memcached on 'ns_1@10.32.3.212' disconnected: {bad_return_value,
{error,
closed}}

Anyway, eventually the system was recovering after a while and the buckets were ready again:

Bucket "Bingo_Game" loaded on node 'ns_1@10.32.3.212' in 620 seconds.	ns_memcached000	ns_1@10.32.3.212	22:05:25 - Tue Nov 24, 2015
Bucket "Bingo_PlayerCards" loaded on node 'ns_1@10.32.3.212' in 597 seconds.	ns_memcached000	ns_1@10.32.3.212	22:05:02 - Tue Nov 24, 2015
Bucket "Bingo_Card" loaded on node 'ns_1@10.32.3.212' in 515 seconds.	ns_memcached000	ns_1@10.32.3.212	22:03:40 - Tue Nov 24, 2015
Bucket "Bingo_PlayerShout" loaded on node 'ns_1@10.32.3.212' in 43 seconds.	ns_memcached000	ns_1@10.32.3.212	21:55:48 - Tue Nov 24, 2015

But now after last interruption, although the cluster status seems to be ok, the client is unable to perform any single upsert operation. Every time it tries one, it fails with the following stack trace:

com.couchbase.client.java.error.TemporaryFailureException: null
	at com.couchbase.client.java.CouchbaseAsyncBucket$16.call(CouchbaseAsyncBucket.java:515)
	at com.couchbase.client.java.CouchbaseAsyncBucket$16.call(CouchbaseAsyncBucket.java:496)
	at rx.internal.operators.OperatorMap$1.onNext(OperatorMap.java:54)
	at rx.observers.Subscribers$5.onNext(Subscribers.java:234)
	at rx.subjects.SubjectSubscriptionManager$SubjectObserver.onNext(SubjectSubscriptionManager.java:222)
	at rx.subjects.AsyncSubject.onCompleted(AsyncSubject.java:101)
	at com.couchbase.client.core.endpoint.AbstractGenericHandler$1.call(AbstractGenericHandler.java:265)
	at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: rx.exceptions.OnErrorThrowable$OnNextValue: OnError while emitting onNext value: com.couchbase.client.core.message.kv.UpsertResponse.class
	at rx.exceptions.OnErrorThrowable.addValueAsLastCause(OnErrorThrowable.java:109)
	at rx.exceptions.Exceptions.throwOrReport(Exceptions.java:188)
	at rx.internal.operators.OperatorMap$1.onNext(OperatorMap.java:56)
	... 12 common frames omitted

The piece of code that does the upserts is the following:

Observable
        .from(documents)
        .flatMap((docToInsert) -> {
            Bucket bucket = null;
            String type = docToInsert.content().getString("type");
            switch (type) {
                case GAME_TYPE:
                    bucket = gameBucket;
                    break;
                case PLAYER_CARDS_TYPE:
                    bucket = playerCardsBucket;
                    break;
                case CARD_TYPE:
                    bucket = cardBucket;
                    break;
                case PLAYER_SHOUT_TYPE:
                    bucket = playerShoutBucket;
                    break;
                default:
                    throw new IllegalArgumentException("Invalid type: " + type);
            }
            return bucket.async().upsert(docToInsert);
        })
        .last()
        .toBlocking()
        .single();

Thanks in advance. Any help is much appreciated.

#2

Looks like your cluster is not in a very stable state. The TemporaryFailureException is thrown when the cluster answers with ERR_TEMP_FAIL or ERR_BUSY error codes, indicating transient errors on the server side.

I think you should investigate more on the errors you listed at the beginning of your post, even if the webconsole seems to show a stabilized cluster this is likely not the case. Perhaps someone in the couchbase-server section will be able to help?

#3

I have restarted the two server nodes and the problem seems to have dissapeared, although I’m still regularly getting the previously mentioned errors.
Thanks anyway.