Hi,
I’m having a hard time finding out an issue causing TimeoutException. I have a WAR file which runs several batch jobs internally. Those batch jobs are managed by a Spring Scheduler.
Java Client version: 2.1.4
Couchbase Server: 3.0.1 Community (single EC2 instance, T2.m)
This is the kind of exception i’m seeing each time the batch job is triggered:
ERROR c.g.c.s.spring.SchedulerConfig - Error in scheduled task java.lang.RuntimeException: java.util.concurrent.TimeoutException
at rx.observables.BlockingObservable.blockForSingle(BlockingObservable.java:472)
at rx.observables.BlockingObservable.single(BlockingObservable.java:341)
at com.grinstone.design.crud.dao.vu.VirtualUserService.findDescriptionByProjectId(VirtualUserService.java:98)
at com.grinstone.design.cleanup.task.BlobResourcesCleanup.call(BlobResourcesCleanup.java:82)
at com.grinstone.design.cleanup.task.BlobResourcesCleanup.call(BlobResourcesCleanup.java:45)
Caused by: java.util.concurrent.TimeoutException: null
at rx.internal.operators.OperatorTimeoutBase$TimeoutSubscriber.onTimeout(OperatorTimeoutBase.java:169)
at rx.internal.operators.OperatorTimeout$1$1.call(OperatorTimeout.java:42)
at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
at rx.schedulers.ExecutorScheduler$ExecutorSchedulerWorker.run(ExecutorScheduler.java:98)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
I’m also seeing:
2015/07/26 09:26:16 [ip-10-0-0-147] [taskScheduler-2] ERROR c.g.c.s.spring.SchedulerConfig - Error in scheduled task java.lang.IllegalStateException: The Content of this Observable is already released. Subscribe earlier or tune the CouchbaseEnvironment#autoreleaseAfter() setting. at com.couchbase.client.core.utils.UnicastAutoReleaseSubject$OnSubscribeAction.call(UnicastAutoReleaseSubject.java:230) at com.couchbase.client.core.utils.UnicastAutoReleaseSubject$OnSubscribeAction.call(UnicastAutoReleaseSubject.java:202) at rx.Observable$1.call(Observable.java:145) at rx.Observable$1.call(Observable.java:137) at rx.Observable$1.call(Observable.java:145) 2015/07/26 09:26:16 [ip-10-0-0-147] [taskScheduler-1] ERROR c.g.c.s.spring.SchedulerConfig - Error in scheduled task java.lang.RuntimeException: java.util.concurrent.TimeoutException at rx.observables.BlockingObservable.blockForSingle(BlockingObservable.java:472) at rx.observables.BlockingObservable.firstOrDefault(BlockingObservable.java:198) at ...
This WAR is running on an EC2 t2.micro instance, with 1 cpu / 1Gb RAM. Other batch jobs inside this same war are running fine. Only some of the batch jobs are failing.
Here is a code example of what a job looks like:
public void run() {
final long now = System.currentTimeMillis();
users
.asyncFindAll()
.filter(user -> now - user.getLastLoginTimestamp() >= 86400L)
.map(User::getId)
.filter(indices::exists)
.toBlocking()
.forEach(indices::close);
}
The underlying users.asyncFindAll() operation does a view query (with reduce=false):
public Observable<T> queryView(final ViewQuery query) {
return bucket.query(query)
.flatMap(AsyncViewResult::rows)
.flatMap(row -> row.document(RawJsonDocument.class))
.map(deserializer)
.timeout(timeout, unit);
}
Nothing special there, and it still doesn’t work.
What i already tried:
- Check if the couchbase view are correctly created, happened to be the case. Tried to query views via the web UI, without any issue. Queried the database through the rest API from the EC2 batch work instance successfully,
- Tune the io / compute pool number of threads; i thought it may be a deadlock, but the pools are already with a minimum size of 3 threads each,
- Tune RxJava Compute pool through RxJavaPlugins hook: didn’t change anything,
- Tried to run this war on my local machine (4 cores): could not manage to reproduce the issue with a local couchbase database,
- Tried to understand why only specific batches are failing whereas other in the same war using the same couchbase connection don’t, could not figure out,
- Tried to switch to blocking observable to ensure the scheduled task waits until the observable is exhausted,
- Tried to increase Observable timeout (configured to 30sec) to 5min but didn’t change anything,
- Tried to increase autorelease timeout without success,
- Tried to profile the Java App for possible Deadlock, YourKit didn’t found anything nore i did,
- Tried to run those failing batches every 2min instead of every day to see if the error is the same, and it is,
- Tried to check jvm health with JVisualVM, sampling CPU and memory but nothing interesting showed up.
Note that:
- the connection to the database is working fine, other batches are running fine inside the same WAR and they are connecting to the database too without any error.
- The abstraction layer above the couchbase client is also successfully used in another war without any issue.
I’m absolutely clueless now on how to fix this issue. I would really appreciate some help to find out what i can do to debug this and find the root cause. We can even setup a meeting with a profiler with the couchbase team if it helps.
Thanks in advanced for your time.