The v2 of the java client leaks RxJava threads on shutdown

looks like the SDK-managed threads are not closed (but RxJava threads are) :frowning:

in @farrault’s case cb-io-xxx and cb-core-xxx were already stopping gracefully before the 2.1.4 patch and cb-computation-xxx were also stopped after the patch.

can you both share your platform (which container is used, etc…) and how you build you couchbase environment/cluster and the method you use to shut it down, for comparison?

@farrault could you also test the behavior with RxJava 1.0.15-SNAPSHOT seeing if it resolves the issue in you case?

can you both share your platform (which container is used, etc…) and how you build you couchbase environment/cluster and the method you use to shut it down, for comparison?

Platform:
Ubuntu 14.04.3 LTS (GNU/Linux 3.13.0-62-generic x86_64)
java version "1.8.0_51"
Java™ SE Runtime Environment (build 1.8.0_51-b16)
Java HotSpot™ 64-Bit Server VM (build 25.51-b03, mixed mode)
Tomcat7
rxjava-1.0.15-SNAPSHOT.jar
couchbase-core-io-1.1.4.jar
couchbase-java-client-2.1.4.jar

env = DefaultCouchbaseEnvironment //create
					.builder()
					.kvTimeout(CB_KEY_VALUE_TIMEOUT) // 20000
					.connectTimeout(CB_CONNECT_TIMEOUT) //20000
					.disconnectTimeout(CB_DISCONNECT_TIMEOUT) //200000
					.build();
                               				
			couchCluster = CouchbaseCluster.create(env, argNodes);


//shutdown
bucket.close();
Cluster.disconnect();
Schedulers.shutdown();
CouchbaseEnvironment.shutdown();

@mcarvalho are you using a framework like Spring or something similar, where you call the shutdown code in a special method/hook?

@mcarvalho are you using a framework like Spring or something similar, …
@simonbasle Actually not. Our stack is basically based on servlets 3.0 for some components and struts 1 for some others. We don’t run under any container like Spring or similar products.

where you call the shutdown code in a special method/hook?
I have a simples class who implements ServletContextListener and these shutdown process occur inside

public void contextDestroyed(ServletContextEvent arg0)

Let me know if I can help with more informations.
Regards,
Mauricio

I’ve reopened another ticket, https://issues.couchbase.com/browse/JVMCBC-251, because if found room for improvement.

There is a slight subtility with shutdown though: CouchbaseEnvironment.shutdown() returns an Observable<Boolean> and as such it needs to be subscribed. So your code should be:

//shutdown
//bucket.close(); //this will be called when disconnecting the cluster
cluster.disconnect();
//note: as soon as the env was created by you, you must call shutdown() on it
//here we trigger subscription and wait for termination by blocking on the Observable
couchbaseEnvironment.shutdown().toBlocking().single();
Schedulers.shutdown(); //reordered, last

I think that with this modification, things should be far better.

Improvements on that front have been submitted to master.

:arrow_right_hook: prefer using upcoming 2.2.1 release (snapshot can be built from current master) and upgrade to RxJava 1.0.15 as soon as it comes out in order to be able to call rx.Schedulers.shutdown() :slight_smile:

:warning: don’t forget to call toBlocking().single() (or at least subscribe()) after an environment.shutdown()

Note: The improvements have been partially backported to release11 branch for inclusion in the upcoming 2.1.5 release, but improvements for RxJava and Netty threads couldn’t be backported, so it’s mainly clarity of code, integration tests and logs that have been backported.

Really god! We’re in the right way.
I’ve set the last cb dependencies based on master branch and the last memory leak is this threadDeathWatcher:

13:42:04.218 [cb-io-1-4] DEBUG c.c.c.d.i.n.buffer.PoolThreadCache - Freed 21 thread-local buffer(s) from thread: cb-io-1-4
13:42:04.218 [cb-io-1-3] DEBUG c.c.c.d.i.n.buffer.PoolThreadCache - Freed 10 thread-local buffer(s) from thread: cb-io-1-3
13:42:04.218 [cb-io-1-2] DEBUG c.c.c.d.i.n.buffer.PoolThreadCache - Freed 16 thread-local buffer(s) from thread: cb-io-1-2
13:42:04.219 [cb-io-1-1] DEBUG c.c.c.d.i.n.buffer.PoolThreadCache - Freed 13 thread-local buffer(s) from thread: cb-io-1-1
SEVERE: The web application [/mauricio] appears to have started a thread named [threadDeathWatcher-4-1] but has failed to stop it. This is very likely to create a memory leak.

Note: The improvements have been partially backported to release11 branch for inclusion in the upcoming 2.1.5 release, but improvements for RxJava and Netty threads couldn’t be backported, so it’s mainly clarity of code, integration tests and logs that have been backported.

Any specific reason of why Netty threads couldn’t be backported?

Thanks,
Mauricio

Just for knowledge, after build java client from master branch and use the 2.2.1 release, I started to get this error during some operations:

Exception in thread "cb-computations-4" java.lang.IllegalStateException: Fatal Exception thrown on Scheduler.Worker thread.
at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:62)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.NoSuchMethodError: com.couchbase.client.core.message.kv.UpsertResponse.mutationToken()Lcom/couchbase/client/core/message/kv/MutationToken;
at com.couchbase.client.java.CouchbaseAsyncBucket$16.call(CouchbaseAsyncBucket.java:501)
at com.couchbase.client.java.CouchbaseAsyncBucket$16.call(CouchbaseAsyncBucket.java:493)
at rx.internal.operators.OperatorMap$1.onNext(OperatorMap.java:54)
at rx.observers.Subscribers$5.onNext(Subscribers.java:234)
at rx.subjects.SubjectSubscriptionManager$SubjectObserver.onNext(SubjectSubscriptionManager.java:222)
at rx.subjects.AsyncSubject.onCompleted(AsyncSubject.java:101)
at com.couchbase.client.core.endpoint.AbstractGenericHandler$1.call(AbstractGenericHandler.java:199)
at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
… 7 more

@mcarvalho it looks like there is something wrong with your classpath, mutation tokens have been added in 2.2.0 / 1.2.0 and for some reason they are not found.

mmh did you also build core-io project (couchbase-jvm-core) from master? and if so, can you make sure the core-io version and the corresponding dependency match in both projects pom.xml?

I’ve imagined it, that’s the point! I didn’t clone the jvm-core project.
Thank you for the advice.

Hi,
Facing issue with leaking rx threads, we have managed to eliminate the issue by running 2.1.4 with RXJava snapshot(1.0.15). and call rx.Schedulers.shutdown().

Is there any open ticket within couchbase regarding this issue, eg. updating the RxJava version as soon as it’s available?

yes there was a ticket for the threads leaking, see post #31 above…
we’ll upgrade to RxJava 1.0.15 (or above) in the 2.2.x bugfix release that follows the RxJava release, hopefully next one.

Thanks for the fast response.

We notice the ticket, but where confused the ticket where closed and fix version not set to 2.x.

yeah there was also a kind of “parent” change https://issues.couchbase.com/browse/JCBC-773
the 251 change is in the core project, so it has core-io numbering scheme… it also only targets SDK threads…

note that you don’t necessary have to wait for us to bump the minimum dependency of the SDK to RxJava 1.0.15 once it is out (this may take some time), you can use it explicitly as soon as it is officially out.

1 Like

@simonbasle @daschl Thank for helping until here, I really appreciate it.

After these advices, lib updates and shutdown changes, I’m able to not see any Thread lock error anymore although after disconnect from cb cluster and shutdown the env + schedulers, the cb pooling system reconnect in a blink and after that maintain pooling active even after the application shutdown at all, in some way it still there pooling and checking the buckets, creating a memory leak :grimacing: .

                couchCluster.disconnect();
	  	env.shutdown().toBlocking().single();
	  	Schedulers.shutdown();

and after that I get the cb trying to reconfigure the buckets:

18:08:57.835 [http-bio-8080-exec-5] DEBUG c.c.c.c.config.ConfigurationProvider - Closing all open buckets
18:08:57.850 [cb-computations-3] DEBUG c.c.c.c.config.ConfigurationProvider - Closing bucket ordering
18:08:57.853 [cb-computations-3] DEBUG c.c.c.c.config.ConfigurationProvider - Removing bucket ordering configuration from known configs.
18:08:57.853 [cb-computations-3] DEBUG c.c.client.core.RequestHandler - Got notified of a new configuration arriving.
18:08:57.854 [cb-computations-3] DEBUG c.c.client.core.RequestHandler - Starting reconfiguration.
18:08:57.854 [cb-computations-3] DEBUG c.c.client.core.RequestHandler - Starting reconfiguration for bucket databank-revisions
18:08:57.875 [cb-computations-3] DEBUG c.c.client.core.RequestHandler - Got notified of a new configuration arriving.
18:08:57.875 [cb-computations-3] DEBUG c.c.client.core.RequestHandler - Starting reconfiguration.
18:08:57.875 [cb-computations-3] DEBUG c.c.client.core.RequestHandler - Starting reconfiguration for bucket databank
18:08:57.875 [cb-computations-3] DEBUG c.c.client.core.RequestHandler - Node my-domain/208.94.39.167 already registered, skipping.

All buckets are live again at this point:

18:08:59.747 [Thread-65] INFO  c.c.client.core.env.CoreEnvironment - Shutdown Netty: success 
18:08:59.753 [http-bio-8080-exec-5] DEBUG c.lettersandarts.client.InitListener - --- Finished ROOT APP ---

Any tips, thoughts or ideas?

You should be able to easily reproduce it.
Any question LMK.

Regards,
Mauricio

Just for curious, I know this pooling system is something new included in the last version. Is possible to disable it before shutdown or for dev/test environments?

We are running in a cluster with 3 nodes, using 3 buckets each. it looks like below during application execution:

 11:48:33.623 [cb-core-3-1] DEBUG c.c.c.c.config.ConfigurationProvider - Received signal for outdated configuration.
11:48:33.623 [cb-core-3-1] DEBUG c.c.c.c.config.ConfigurationProvider - Received signal for outdated configuration.
11:48:33.638 [cb-computations-3] DEBUG c.c.c.c.config.ConfigurationProvider - New Bucket databank-revisions config proposed.
11:48:33.639 [cb-computations-4] DEBUG c.c.c.c.config.ConfigurationProvider - New Bucket ordering config proposed.
11:48:33.646 [cb-computations-1] DEBUG c.c.c.c.config.ConfigurationProvider - New Bucket ordering config proposed.
11:48:33.649 [cb-computations-3] DEBUG c.c.c.c.c.refresher.CarrierRefresher - Completed refreshing config for bucket "databank-revisions"
11:48:33.650 [cb-computations-3] DEBUG c.c.c.c.config.ConfigurationProvider - New Bucket databank-revisions config proposed.
11:48:33.650 [cb-computations-1] DEBUG c.c.c.c.c.refresher.CarrierRefresher - Completed refreshing config for bucket "ordering"
11:48:33.651 [cb-computations-3] DEBUG c.c.c.c.c.refresher.CarrierRefresher - Completed refreshing config for bucket "databank-revisions"
11:48:33.653 [cb-computations-2] DEBUG c.c.c.c.config.ConfigurationProvider - New Bucket ordering config proposed.
11:48:33.657 [cb-computations-3] DEBUG c.c.c.c.config.ConfigurationProvider - New Bucket databank config proposed.
11:48:33.657 [cb-computations-4] DEBUG c.c.c.c.c.refresher.CarrierRefresher - Completed refreshing config for bucket "ordering"
11:48:33.657 [cb-computations-1] DEBUG c.c.c.c.config.ConfigurationProvider - New Bucket databank config proposed.
11:48:33.660 [cb-computations-4] DEBUG c.c.c.c.config.ConfigurationProvider - New Bucket databank-revisions config proposed.
11:48:33.661 [cb-computations-2] DEBUG c.c.c.c.c.refresher.CarrierRefresher - Completed refreshing config for bucket "ordering"
11:48:33.661 [cb-computations-2] DEBUG c.c.c.c.config.ConfigurationProvider - New Bucket databank config proposed.
11:48:33.662 [cb-computations-1] DEBUG c.c.c.c.c.refresher.CarrierRefresher - Completed refreshing config for bucket "databank"
11:48:33.662 [cb-computations-4] DEBUG c.c.c.c.c.refresher.CarrierRefresher - Completed refreshing config for bucket "databank-revisions"
11:48:33.662 [cb-computations-3] DEBUG c.c.c.c.c.refresher.CarrierRefresher - Completed refreshing config for bucket "databank"
11:48:33.663 [cb-computations-2] DEBUG c.c.c.c.c.refresher.CarrierRefresher - Completed refreshing config for bucket "databank"

ha good catch, indeed the Refresher should stop polling once the environment has been shut down…
:arrow_right: JVMCBC-257

1 Like

Thank you @simonbasle. I’m waiting for the fix release version. :smile:

By the way, how can I assign for me and pull request some bug fixes?
Do you guys have any how-to for open source contribution to couchbase SDK?

I’m getting close to this amazing solution and I’m feeling confident to try some bug fixes.

Regards,
Mauricio

awesome @mcarvalho, I’ve continued the discussion about contributing in Contributing to the Java SDK