The v2 of the java client leaks RxJava threads on shutdown

(I could not succeed to connect to your JIRA … So I put my problem here …
=> This morning I do succeed to connect to JIRA (?!), so this issue is know also reported here
https://issues.couchbase.com/browse/JCBC-773
)

We use the couchbase client v2.1.3 in a webapp application : in development we use hot redeploy of this webapp and we got memory issue probably caused by the fact that RxJava Threads are not stopped when we close the couchbase client.

The issue is easy to reproduce in a simple main class :
Just create de couchbase client Cluster, connect to a Bucket and do a little query then stop the Cluster (Cluster.shutdown).
Then if you dump the threads and you see there are still some RxJava Threads :

In my case before Cluster shutdown, I have these threads :

  • 13 : cb-io-1-4
  • 25 : threadDeathWatcher-4-1
  • 12 : cb-io-1-3
  • 11 : cb-io-1-2
  • 10 : cb-io-1-1
  • 23 : cb-computations-4
  • 22 : cb-computations-3
  • 21 : cb-computations-2
  • 20 : cb-computations-1
  • 19 : cb-core-3-2
  • 18 : cb-core-3-1
  • 16 : RxComputationThreadPool-3
  • 15 : RxComputationThreadPool-2
  • 14 : RxComputationThreadPool-1

And after the shutdown, there is still those threads :

  • 25 : threadDeathWatcher-4-1
  • 23 : cb-computations-4
  • 22 : cb-computations-3
  • 21 : cb-computations-2
  • 20 : cb-computations-1
  • 16 : RxComputationThreadPool-3
  • 15 : RxComputationThreadPool-2
  • 14 : RxComputationThreadPool-1

(dumping threads with :
ThreadInfo[] threadInfos = threadMXBean.getThreadInfo(threadMXBean.getAllThreadIds());
for(ThreadInfo ti : threadInfos) {
System.out.printf(" - %s : %s%n",ti.getThreadId(), ti.getThreadName());
}
)

hi @farrault and thanks for the JIRA ticket!

I was able to reproduce the bug, and indeed the cb-computation-xxx threads are not terminated when the environment is shut down, which leads to a growth in thread (eg. when creating such environments and shutting them in a loop).

The RxComputationThreadPool doesn’t seem to grow however, and it is managed by RxJava.

I’ll fix the pool managed by the SDK, but do keep in mind that if you provide scheduler(...) or ioPool(...) with custom instances, it will be your responsability to clear them correctly. A hook will be provided to allow for such a custom shutdown action to be called by the CoreEnvironment at shutdown() time

Keep an eye on the Jira ticket to track progress :wink:

Thanks to have looked into this.

Please note, that in my case, RxComputationThreadPool-x threads do add up :
RxJava is into my war and a new war classloader is created at each deployment causing a new “instance” of the RxJava infrastructure to be initialized

What container do you use? I’m gonna assume Tomcat for the sake of my answer. I’d also like to know what does your deployment look like (how many webapps in the container, etc… both in dev and prod)?

Isn’t that more of an issue with the hot-redeploy mechanism and class loader black magic though?

If only one war uses the SDK and/or RxJava, I guess you could put both jars outside your war and in the common library folder instead, but otherwise IIRC you open the way to leaking one of the webapps classloader (see Tomcat’s doc say on the matter here)

Are you absolutely certain the memory leak / errors comes from thread starvation (including the RxComputation ones)? Have you considered a periodical Tomcat clean restart?

The threads we manage should be cleaned up by the SDK when the environment is properly shut down, and the fact that their number continues to grow is a proper leak though (as it occurs in isolation).

I’m currently using JBoss but the problem will be the same with any servlet container.
I only have one war here.

There no classloader “black magic” here, just classical web container classloading : a instance of a new child classloader per webapp (a new classloader being created in case of a redeployment)
( maybe I misslead you : I’m using here classical and full redeployment of a webapp without restarting the server, not jvm black magic such as jpda hotswap or fancy hot-deploy framework :wink: )

You are right : putting the RxJava jar in the server global classloader would indeed prevent the multiplication of RxJava internal threads in case of redeployment. (If the SDK correctly unregisters itself from RxJava infrastructure on shutdown, I would be quite confident there is no memory leak cause by this, but bugs exists sometimes )
That being said, it is not my deployment scenario : RxJava is in my war.

Based on the analysis of a memory dump :

  • I’m certain my memory leak is caused by old classloaders not being garbage collected
  • SDK and/or RxJava threads referenced directly the webapp classloaders through their java.lang.Thread.contextClassLoader field (including RxComputationThreadPool),
    preventing why my classloaders to be GCed.

Server clean restart is currently the workaround. But it is currently manual and I prefer a clean and easy redeployment procedure.

All of this to say that :
RxJava internal threads (RxComputationThreadPool-x apparently) also have to be stopped to free reference to the classloader that created them.

So in my situation, I need them to go away as well when I shutdown the SDK.
You may need to allow the SDK user to tweek this behaviour : I understand that in somewhat rare cases, that may not be the needed behaviour.

I see that you’ve opened a ticket as well, which we have already started working on!

https://issues.couchbase.com/browse/JCBC-773

@simonbasle : I thought again about your proposition to put RxJava in the global classloader and I read more carefully the specific section of the tomcat documentation you referenced.

You’re right, without specific support for this usecase in RxJava, there would certainly still be a memory leak because of the contextClassloader of RxJava threads.

I would be interested to know if you find some documentation about this in RxJava during the evaluation of this topic for the SDK

@farrault there was a discussion around offering means of shutting down the threads in RxJava but it wasn’t implemented in the end. I’ll revive the discussion as this is relevant to your case.

the discussion has been revived here : https://github.com/ReactiveX/RxJava/issues/1730

Thanks @simonbasle for the information and to have dug the issue.
Regards

Hello,

Any ETA for a fix on this as am having the same problem (Java client 2.1.2)?

hi, this issue (at least the SDK side of it, that is all the threads directly pooled by the SDK and not RxJava) has a fix scheduled for 2.1.4. It should be released first week of July.

@simonbasle I’ve updated it to 2.1.4 but I’m still facing these same problems. Any tips?

Updated to use:
core-io-1.1.4.jar and java-client-2.1.4.jar


Follow my erros:
Aug 06, 2015 7:55:55 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [RxComputationThreadPool-1] but has failed to stop it. This is very likely to create a memory leak.
Aug 06, 2015 7:55:55 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [RxComputationThreadPool-2] but has failed to stop it. This is very likely to create a memory leak.
Aug 06, 2015 7:55:55 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [RxComputationThreadPool-3] but has failed to stop it. This is very likely to create a memory leak.
Aug 06, 2015 7:55:55 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-computations-1] but has failed to stop it. This is very likely to create a memory leak.
Aug 06, 2015 7:55:55 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-computations-2] but has failed to stop it. This is very likely to create a memory leak.
Aug 06, 2015 7:55:55 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-computations-3] but has failed to stop it. This is very likely to create a memory leak.
Aug 06, 2015 7:55:55 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-computations-4] but has failed to stop it. This is very likely to create a memory leak.
Aug 06, 2015 7:55:55 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-io-1-1] but has failed to stop it. This is very likely to create a memory leak.
Aug 06, 2015 7:55:55 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-io-1-2] but has failed to stop it. This is very likely to create a memory leak.
Aug 06, 2015 7:55:55 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-io-1-3] but has failed to stop it. This is very likely to create a memory leak.
Aug 06, 2015 7:55:55 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [threadDeathWatcher-4-1] but has failed to stop it. This is very likely to create a memory leak.
Aug 06, 2015 7:55:55 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [cb-io-1-4] but has failed to stop it. This is very likely to create a memory leak.
Aug 06, 2015 7:55:55 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/mauricio] appears to have started a thread named [Timer-18] but has failed to stop it. This is very likely to create a memory leak.
Aug 06, 2015 7:55:55 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads

Are you shutting down the cluster and the env properly? Can you paste the code to reproduce, ideally?

1 Like

We’ve seen the same behavior on an EC2 ElasticBeanstalk Tomcat. When deploying a new WAR, thread pools are not stopped properly. Using Java SDK 2.1.4 too.

In my case :

  • Before Java SDK 2.1.4, “cb-io-xxxx” and "“cb-core-xxxx”, were correctly stopped when the CouchBase Client was (explicitly) shut down.
  • With Java SDK 2.1.4 “cb-computations-xxx” are correctly stopped too.

But all this is unfortunately “useless” until RxJava provide a way to stop its internal threads (RxComputationThreadPool-xxxxx) (https://github.com/ReactiveX/RxJava/issues/1730)

I made a very dirty hack to kill them, but I in my case I still have a memory leak (its probably unrelated with couchbase SDK and RxJava, but I haven’t dug into it further yet) :

private void forceRxJavaSchedulerShutdown() {

	logger.warn("FORCE RxJava Scheduler shutdown");
	
	Scheduler  els = Schedulers.computation();
	Object pool = getField(els, "pool");
	NewThreadWorker[] workers = (NewThreadWorker[]) getField(pool, "eventLoops");
	for(NewThreadWorker w : workers) {
		w.unsubscribe();
	}
	
}

You can register your own computation scheduler with RxJavaPlugin, and then shutdown it properly yourself, instead of relying on the buitl-in one which is not stoppable from the outside:

Register a RxJavaSchedulersHook and provide your own Scheduler based on an Executor for example, using:

Schedulers.from(Executor);

@daschl I hope so. Follow my shutdown code.

  public void contextDestroyed(ServletContextEvent arg0) {
     super.contextDestroyed(arg0);
     try {
	  if(initializer != null){
		  initializer.interrupt();
	  }
	  
	  if (aBucket != null) aBucket.close();
	  if (bBucket!= null) bBucket.close();
	  if (cBucket != null) cBucket.close();
	  if (couchCluster != null) couchCluster.disconnect();

	  env.shutdown();
	  
	 aBucket = null;
	  bBucket = null;
	  cBucket = null; 
	  couchCluster = null;
	  env = null;
	  
  } catch (Exception ex) {
	  ex.printStackTrace();
  }
}

Maybe it can be a temporary solution. Can you share the “getField” implementation?

I’ve Imagined you’re doing something like:

private Object getField(Object object, String field) {
	try {
		Class<? extends Object> clazz = object.getClass();
		Field[] fields = clazz.getFields();
		
		log.warn("Found " + fields.length + " fields");
		for (Field iField : fields) {
			log.warn("Found Field: " + iField.toString() + " - " + iField.getName());
		}
		
		if(Arrays.asList().contains(field)) return clazz.getField(field);
	} catch (NoSuchFieldException | SecurityException e) {
		e.printStackTrace();
	}
	return null;
}

Unfortunately I was not able to find “fields” based on Scheduler.computation() instance. Follow my test log:

06:17:55.808 [http-nio-8086-exec-8] WARN c.l.core.client.BaseInitListener - FORCE RxJava Scheduler shutdown
06:17:55.808 [http-nio-8086-exec-8] WARN c.l.core.client.BaseInitListener - Found 0 fields

Here comes my calling code:

private void forceRxJavaSchedulerShutdown() {
	log.warn("FORCE RxJava Scheduler shutdown");
	Object pool = getField(Schedulers.computation(), "pool");
	for (NewThreadWorker w : (NewThreadWorker[]) getField(pool, "eventLoops")) {
		w.unsubscribe();
	}
}

Thanks for your advice. It should be a temporary change but will help us for a while.
I’ll really appreciate if you can help me to figure out it.

Concerning RxJava, the latest PR concerning this issue is https://github.com/ReactiveX/RxJava/pull/3149 (so monitor that one rather than any older).
Hopefully this will be merged in soon :hourglass_flowing_sand: :smile:

1 Like