The v2 of the java client leaks RxJava threads on shutdown

We’ve seen the same behavior on an EC2 ElasticBeanstalk Tomcat. When deploying a new WAR, thread pools are not stopped properly. Using Java SDK 2.1.4 too.

In my case :

  • Before Java SDK 2.1.4, “cb-io-xxxx” and "“cb-core-xxxx”, were correctly stopped when the CouchBase Client was (explicitly) shut down.
  • With Java SDK 2.1.4 “cb-computations-xxx” are correctly stopped too.

But all this is unfortunately “useless” until RxJava provide a way to stop its internal threads (RxComputationThreadPool-xxxxx) (https://github.com/ReactiveX/RxJava/issues/1730)

I made a very dirty hack to kill them, but I in my case I still have a memory leak (its probably unrelated with couchbase SDK and RxJava, but I haven’t dug into it further yet) :

private void forceRxJavaSchedulerShutdown() {

	logger.warn("FORCE RxJava Scheduler shutdown");
	
	Scheduler  els = Schedulers.computation();
	Object pool = getField(els, "pool");
	NewThreadWorker[] workers = (NewThreadWorker[]) getField(pool, "eventLoops");
	for(NewThreadWorker w : workers) {
		w.unsubscribe();
	}
	
}

You can register your own computation scheduler with RxJavaPlugin, and then shutdown it properly yourself, instead of relying on the buitl-in one which is not stoppable from the outside:

Register a RxJavaSchedulersHook and provide your own Scheduler based on an Executor for example, using:

Schedulers.from(Executor);

@daschl I hope so. Follow my shutdown code.

  public void contextDestroyed(ServletContextEvent arg0) {
     super.contextDestroyed(arg0);
     try {
	  if(initializer != null){
		  initializer.interrupt();
	  }
	  
	  if (aBucket != null) aBucket.close();
	  if (bBucket!= null) bBucket.close();
	  if (cBucket != null) cBucket.close();
	  if (couchCluster != null) couchCluster.disconnect();

	  env.shutdown();
	  
	 aBucket = null;
	  bBucket = null;
	  cBucket = null; 
	  couchCluster = null;
	  env = null;
	  
  } catch (Exception ex) {
	  ex.printStackTrace();
  }
}

Maybe it can be a temporary solution. Can you share the “getField” implementation?

I’ve Imagined you’re doing something like:

private Object getField(Object object, String field) {
	try {
		Class<? extends Object> clazz = object.getClass();
		Field[] fields = clazz.getFields();
		
		log.warn("Found " + fields.length + " fields");
		for (Field iField : fields) {
			log.warn("Found Field: " + iField.toString() + " - " + iField.getName());
		}
		
		if(Arrays.asList().contains(field)) return clazz.getField(field);
	} catch (NoSuchFieldException | SecurityException e) {
		e.printStackTrace();
	}
	return null;
}

Unfortunately I was not able to find “fields” based on Scheduler.computation() instance. Follow my test log:

06:17:55.808 [http-nio-8086-exec-8] WARN c.l.core.client.BaseInitListener - FORCE RxJava Scheduler shutdown
06:17:55.808 [http-nio-8086-exec-8] WARN c.l.core.client.BaseInitListener - Found 0 fields

Here comes my calling code:

private void forceRxJavaSchedulerShutdown() {
	log.warn("FORCE RxJava Scheduler shutdown");
	Object pool = getField(Schedulers.computation(), "pool");
	for (NewThreadWorker w : (NewThreadWorker[]) getField(pool, "eventLoops")) {
		w.unsubscribe();
	}
}

Thanks for your advice. It should be a temporary change but will help us for a while.
I’ll really appreciate if you can help me to figure out it.

Concerning RxJava, the latest PR concerning this issue is https://github.com/ReactiveX/RxJava/pull/3149 (so monitor that one rather than any older).
Hopefully this will be merged in soon :hourglass_flowing_sand: :smile:

1 Like

this PR https://github.com/ReactiveX/RxJava/pull/3149 is merged! :smiley:

After that these computations/RXJava threads should start to stop decently, shouldn’t?

I’ll report my experience soon when I switch between last version and rxjava-1.0.15-SNAPSHOT.jar

Thank you!

AFAIK there should be a shutdown() method to be called on Schedulers at the time you shut the environment down.

I think this is best left to the user’s care, since the application code could continue using RxJava even though the Cluster has been shut down.

Please report with your findings, hope this solves the issue once and for all :wink:

Should I just call Schedulers.shutdown() before CouchbaseEnvironment.shutdown() shouldn’t?
If it is just that it didn’t work at all.

Basically my steps are:

bucket.close();
Cluster.disconnect();
Schedulers.shutdown();
CouchbaseEnvironment.shutdown()

Any tips?

I’ll be back if I have any news!

I think Schedulers.shutdown() should be called last but yeah that should do the trick.
However stopping of rx threads attempts to be graceful and should be given maybe a few hundred milliseconds to complete.

What did you observe exactly?

note: the CouchbaseEnvironment needs only be closed if you instantiated it yourself

ok. Here we go.

I switched between cbenv.shutdown and shedulers.shutdown and in any order we have the same behavior.

Now RXThreads is not showing as Memory leaks anymore but we still having these threads from CB:

9:21:22.022 [http-nio-8086-exec-25] DEBUG c.c.client.core.RequestHandler - Starting reconfiguration.
09:21:22.022 [cb-core-3-1] DEBUG c.c.c.c.config.ConfigurationProvider - Received signal for outdated configuration.
09:21:22.022 [cb-core-3-1] DEBUG c.c.c.c.config.ConfigurationProvider - Received signal for outdated configuration.
09:21:22.022 [http-nio-8086-exec-25] DEBUG c.c.client.core.RequestHandler - No node found in config, disconnecting all nodes.
09:21:22.024 [http-nio-8086-exec-25] DEBUG c.c.c.c.config.ConfigurationProvider - Closing all open buckets
Oct 07, 2015 9:21:22 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-computations-1] but has failed to stop it. This is very likely to create a memory leak.
Oct 07, 2015 9:21:22 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-computations-2] but has failed to stop it. This is very likely to create a memory leak.
Oct 07, 2015 9:21:22 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-computations-3] but has failed to stop it. This is very likely to create a memory leak.
Oct 07, 2015 9:21:22 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-computations-4] but has failed to stop it. This is very likely to create a memory leak.
Oct 07, 2015 9:21:22 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-io-1-1] but has failed to stop it. This is very likely to create a memory leak.
Oct 07, 2015 9:21:22 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-io-1-2] but has failed to stop it. This is very likely to create a memory leak.
Oct 07, 2015 9:21:22 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-io-1-3] but has failed to stop it. This is very likely to create a memory leak.
Oct 07, 2015 9:21:22 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [threadDeathWatcher-4-1] but has failed to stop it. This is very likely to create a memory leak.
Oct 07, 2015 9:21:22 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-io-1-4] but has failed to stop it. This is very likely to create a memory leak.

looks like the SDK-managed threads are not closed (but RxJava threads are) :frowning:

in @farrault’s case cb-io-xxx and cb-core-xxx were already stopping gracefully before the 2.1.4 patch and cb-computation-xxx were also stopped after the patch.

can you both share your platform (which container is used, etc…) and how you build you couchbase environment/cluster and the method you use to shut it down, for comparison?

@farrault could you also test the behavior with RxJava 1.0.15-SNAPSHOT seeing if it resolves the issue in you case?

can you both share your platform (which container is used, etc…) and how you build you couchbase environment/cluster and the method you use to shut it down, for comparison?

Platform:
Ubuntu 14.04.3 LTS (GNU/Linux 3.13.0-62-generic x86_64)
java version "1.8.0_51"
Java™ SE Runtime Environment (build 1.8.0_51-b16)
Java HotSpot™ 64-Bit Server VM (build 25.51-b03, mixed mode)
Tomcat7
rxjava-1.0.15-SNAPSHOT.jar
couchbase-core-io-1.1.4.jar
couchbase-java-client-2.1.4.jar

env = DefaultCouchbaseEnvironment //create
					.builder()
					.kvTimeout(CB_KEY_VALUE_TIMEOUT) // 20000
					.connectTimeout(CB_CONNECT_TIMEOUT) //20000
					.disconnectTimeout(CB_DISCONNECT_TIMEOUT) //200000
					.build();
                               				
			couchCluster = CouchbaseCluster.create(env, argNodes);


//shutdown
bucket.close();
Cluster.disconnect();
Schedulers.shutdown();
CouchbaseEnvironment.shutdown();

@mcarvalho are you using a framework like Spring or something similar, where you call the shutdown code in a special method/hook?

@mcarvalho are you using a framework like Spring or something similar, …
@simonbasle Actually not. Our stack is basically based on servlets 3.0 for some components and struts 1 for some others. We don’t run under any container like Spring or similar products.

where you call the shutdown code in a special method/hook?
I have a simples class who implements ServletContextListener and these shutdown process occur inside

public void contextDestroyed(ServletContextEvent arg0)

Let me know if I can help with more informations.
Regards,
Mauricio

I’ve reopened another ticket, https://issues.couchbase.com/browse/JVMCBC-251, because if found room for improvement.

There is a slight subtility with shutdown though: CouchbaseEnvironment.shutdown() returns an Observable<Boolean> and as such it needs to be subscribed. So your code should be:

//shutdown
//bucket.close(); //this will be called when disconnecting the cluster
cluster.disconnect();
//note: as soon as the env was created by you, you must call shutdown() on it
//here we trigger subscription and wait for termination by blocking on the Observable
couchbaseEnvironment.shutdown().toBlocking().single();
Schedulers.shutdown(); //reordered, last

I think that with this modification, things should be far better.

Improvements on that front have been submitted to master.

:arrow_right_hook: prefer using upcoming 2.2.1 release (snapshot can be built from current master) and upgrade to RxJava 1.0.15 as soon as it comes out in order to be able to call rx.Schedulers.shutdown() :slight_smile:

:warning: don’t forget to call toBlocking().single() (or at least subscribe()) after an environment.shutdown()

Note: The improvements have been partially backported to release11 branch for inclusion in the upcoming 2.1.5 release, but improvements for RxJava and Netty threads couldn’t be backported, so it’s mainly clarity of code, integration tests and logs that have been backported.

Really god! We’re in the right way.
I’ve set the last cb dependencies based on master branch and the last memory leak is this threadDeathWatcher:

13:42:04.218 [cb-io-1-4] DEBUG c.c.c.d.i.n.buffer.PoolThreadCache - Freed 21 thread-local buffer(s) from thread: cb-io-1-4
13:42:04.218 [cb-io-1-3] DEBUG c.c.c.d.i.n.buffer.PoolThreadCache - Freed 10 thread-local buffer(s) from thread: cb-io-1-3
13:42:04.218 [cb-io-1-2] DEBUG c.c.c.d.i.n.buffer.PoolThreadCache - Freed 16 thread-local buffer(s) from thread: cb-io-1-2
13:42:04.219 [cb-io-1-1] DEBUG c.c.c.d.i.n.buffer.PoolThreadCache - Freed 13 thread-local buffer(s) from thread: cb-io-1-1
SEVERE: The web application [/mauricio] appears to have started a thread named [threadDeathWatcher-4-1] but has failed to stop it. This is very likely to create a memory leak.

Note: The improvements have been partially backported to release11 branch for inclusion in the upcoming 2.1.5 release, but improvements for RxJava and Netty threads couldn’t be backported, so it’s mainly clarity of code, integration tests and logs that have been backported.

Any specific reason of why Netty threads couldn’t be backported?

Thanks,
Mauricio

Just for knowledge, after build java client from master branch and use the 2.2.1 release, I started to get this error during some operations:

Exception in thread "cb-computations-4" java.lang.IllegalStateException: Fatal Exception thrown on Scheduler.Worker thread.
at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:62)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.NoSuchMethodError: com.couchbase.client.core.message.kv.UpsertResponse.mutationToken()Lcom/couchbase/client/core/message/kv/MutationToken;
at com.couchbase.client.java.CouchbaseAsyncBucket$16.call(CouchbaseAsyncBucket.java:501)
at com.couchbase.client.java.CouchbaseAsyncBucket$16.call(CouchbaseAsyncBucket.java:493)
at rx.internal.operators.OperatorMap$1.onNext(OperatorMap.java:54)
at rx.observers.Subscribers$5.onNext(Subscribers.java:234)
at rx.subjects.SubjectSubscriptionManager$SubjectObserver.onNext(SubjectSubscriptionManager.java:222)
at rx.subjects.AsyncSubject.onCompleted(AsyncSubject.java:101)
at com.couchbase.client.core.endpoint.AbstractGenericHandler$1.call(AbstractGenericHandler.java:199)
at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
… 7 more

@mcarvalho it looks like there is something wrong with your classpath, mutation tokens have been added in 2.2.0 / 1.2.0 and for some reason they are not found.