Java Client 2.3.1 : N1QL queries throwing TimeoutException [upgraded to Couchbase 4.5]

We are experiencing an this issue on our production server.
N1QL queries through our java application are throwing Timeout exception.
I have looked into all this :

  • All bucket.get(id) functionality is working. i.e. primary index lookup is working
  • bucket.insert and bucket.update is working fine
  • we have primary indexes on each bucket(2 total) in place
  • N1QL queries are working through cbq command-line
  • No such issue on our staging server which has same configuration
  • Server is up (as it shows on couchbase 8091 web client)
  • Web client shows a warning which i dont think should be a problem Fail Over Warning: At least two servers with the data service are required to provide replication!.

Exception details :

    13:55:26 [http-bio-80-exec-64]  ERROR ib.cms.admin.controller.CmsExceptionHandler - java.util.concurrent.TimeoutException
java.lang.RuntimeException: java.util.concurrent.TimeoutException
        at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:71)
        at com.couchbase.client.java.CouchbaseBucket.query(CouchbaseBucket.java:652)
        at com.couchbase.client.java.CouchbaseBucket.query(CouchbaseBucket.java:643)
        at com.couchbase.client.java.CouchbaseBucket.query(CouchbaseBucket.java:572)
        at ib.cms.dao.CouchbaseDao.execute(CouchbaseDao.java:355)
        at ib.cms.dao.CouchbaseDao.getEntities(CouchbaseDao.java:331)
        at ib.cms.admin.dao.impl.CategoryDaoImpl.findRootCategories(CategoryDaoImpl.java:20)
        at ib.cms.admin.service.impl.CategoryServiceImpl.findRootCategories(CategoryServiceImpl.java:183)
        at ib.cms.admin.controller.CategoryController.getRootCategories(CategoryController.java:114)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)
        at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:137)
        at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:110)
        at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:806)
        at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:729)
        at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)

Hi @parasdiwan, a few questions:

  • which version of the SDK do you use?
  • what do the timeouting query look like?
  • when you execute the same queries in the cbq tool, what are the execution times like?

java SDK version - 2.3.1
query - SELECT meta().id, * FROMcmsWHERE _type = "Category" AND _active = TRUE AND parent IS NOT VALUED
time taken in cbq tool is

"metrics": {
    "elapsedTime": "9.551233ms",
    "executionTime": "9.505874ms",
    "resultCount": 0,
    "resultSize": 0
}

We’ve just cleaned the database, so there is not much data right now.

Problem disappears when tomcat server is re-started and returns after 15-20 minutes.

I think @mlblount45 is also facing the same issue - N1QL queries timing out when run in java sdk

Could this be a problem of thread choking or thread pool connections in any way?

Hello @parasdiwan,

Yes i resolved this one I downgraded to version 2.2.7 of the SDK and this resolved it for me. In my research it appears that couchbase has a race condition in the sdk source code this was first discovered in version 2.1(I believe) I believe it was only effecting windows machines and AWS servers it was resolved but now it looks like it is back in version 2.3.1. @simonbasle please shine some light where ever you see fit

1 Like

hey @mlblount45, do you have more specifics on that pre-existing issue (like a commit, or JIRA ticket reference)? I’m not sure what you are referring to…

thanks @mlblount45
I will try that. Did you try it with version 2.3.0? It has a subdocument feature which we are using right now

No i didnt try with 2.3 but give it a shot and let me know the results

Its not working with 2.3.0. Moving to 2.2.7

had a similar issue, also resolved by moving back to 2.2.7: Java Client, v2.3.1, random KeepAliveRequest aborted?

@siriousje, @parasdiwan, @mlblount45,
to try to determine the root cause of this, you should first of all answer to the one of @simonbasle 's questions: "what do the timeouting query look like? "
I suppose you have slightly different queries, but it’s the only thing that could help.
oh, sorry, @parasdiwan i’ve missed yours:

SELECT meta().id, * FROM cms WHERE _type = “Category” AND _active = TRUE AND parent IS NOT VALUED

@egrep @simonbasle, if i recall correctly once couchbase got in this bad state it didnt matter what query I sent it would always timeout but here is one of my examples

select location.id, trigger 
from read as location 
unnest location.trigger 
where location.trigger is not missing 
and location.id is not missing 
and location.docType = 'locationV2' 
and location.areaId = '124gff';  

Hopefully the query alone can be of some value i will not be able to provide the actual document as it contains sensitive information.

@parasdiwan did 2.2.7 do the trick for you?

@simonbasle,
just for interest, what is the further fate of “potentially hanged / delayed” observable after this point ?

@mlblount45 yes, 2.2.7 is working just fine. Thanks for your help

@parasdiwan :thumbsup:

This problem is not query specific. Timeout exception occurred for every N1ql query.
This query mentioned above is just an example

ok, good to know…

@parasdiwan @mlblount45 does it affect other types of payload that you guys know of? (key/value, views)

note that 2.2.6 and 2.2.7 have a bug with keepalive, where keep alive requests are not sent anymore (due to a netty configuration change, the idleness of the socket isn’t detected anymore).

@simonbasle

  1. Sorry I can’t confirm I didn’t try any of these while experiencing the timeouts.
  2. so would you recommend going to version 2.2.8 any large defects there? (hopefully this timeout issue isn’t there)
  3. in response to your comment a few day ago: “hey @mlblount45, do you have more specifics on that pre-existing issue”: https://issues.couchbase.com/browse/JCBC-692

Same problem here with spring-boot application using spring-data-couchbase version 2.0.0.RELEASE (making use of java-client 2.2.3). Blocking.blockForSingle is throwing RuntimeException for parameterized N1QL query. Same query performs wel with cbq. Strange thing is that the application was working for a while and is still working on other machines.
After updating java-client to version 2.2.8 problem seems to be fixed.
While switching back and forward between java-client versions 2.2.3 and 2.2.8 the problem can be reproduced.