Intemitant couchbase upsert failure


#1

We have 4 node (m1.large) couchbase cluster on AWS.
We are using java SDK 2.1.2 with couchbase server 3.0.1 (community edition).

We are continuously pushing data to couchbase. After few iterations application is throwing timeout exceptions as given below:

2015-07-28 09:11:03 INFO CouchBaseManager:190 - Couchbase write failed. java.util.concurrent.TimeoutException
java.lang.RuntimeException: java.util.concurrent.TimeoutException
at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:93)
at com.couchbase.client.java.CouchbaseBucket.upsert(CouchbaseBucket.java:278)
at com.rfxcel.rts.couchbase.CouchBaseManager.upsert(CouchBaseManager.java:176)

Even if exception is thrown, record is actually written to couchbase and we retrive that using query. What i dont understand is why this timeout exception then ?

We are using following variant of upsert:
jsonDoc = bucket.upsert(JsonDocument.create(id,(JsonObject)doc), persistTo, replicateTo, timeOut, TimeUnit.MILLISECONDS)
persisteTo : MASTER
replicateTo : ONE
timeout is : 300000

If it successfully returns, it returns within few hundred ms, otherwise it never it looks (i tried max 5 min timeout).
Again, even thow timeout exception is thrown, data is actually written to couchbase.


#2

hi @kbaswaraj

Since you’re using 2.1.2 version of the SDK with a PersistTo.MASTER and ReplicateTo.ONE, I think you may be hitting a bug in the observe mechanism. This bug has been solved in 2.1.4 version of the SDK.

Can you upgrade the version you use and confirm that your problem of timeouts is solved?

Thanks
Simon


#3

Thank you for quick turn-around. I will verify and let you know.


#4

Hi @simonbasle ,
I tried with 2.1.4, it looks issue still exists with 2.1.4 SDK.


#5

Please let me know if you need more information.
What are the primary causes of this if there is no bug in SDK or Serever ?
We have 4 node cluster with 2 cores, 8 GB RAM each.
We have 1 bucket with 8 GB total cluster RAM (2 GB * 4).
swap is disabled. We use views very heavily.
we can see from couch Base console that indexing is always going on.


#6

Sorry for the late replies :confused:

Since you’ve upgraded and you’re apparently not hitting the bug we had fixed, what would be great is to have a detailed log of both the bootstrap of the SDK and the point where the upsert just hangs.

Can you reproduce this issue easily enough?

By detailed logs, I mean at TRACE level, which will log all packets the SDK sends and receives. See the documentation here on how to activate logs and change levels.

If you don’t want to publicly share that kind of logs here, store them somewhere (gist, S3, dropbox, pastebin, etc…) and send me the link via PM.

Thanks
Simon