Upgrading to latest netty dependency

Hi,

are there any plans to upgrade the netty dependency?
The one used by couchbase java-client is 4.0.56.Final which is almost 1 year old.

The reason for asking that is potentially a memory leak fix which might have been fixed in latest version (@see https://github.com/netty/netty/issues/6343).

A bit of context:
I’m querying a large data set using async bucket and stream it to a http response.

I see in logs the following warning:
ERROR util.ResourceLeakDetector - LEAK: ByteBuf.release() was not called before it’s garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.

which leads to the following error:
com.couchbase.client.deps.io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 3204448263, max: 3221225472) at

Any hints to address this issue would be greatly appreciated.

Thanks,
Alex

Hi Alex,

I’ve filed JCBC-1277 for upgrading to the latest Netty version. In the mean time, one thing to try would be to enable advanced leak detection, which will pinpoint the location of the leak. Try adding this to your Java command line:

-Dcom.couchbase.client.deps.io.netty.leakDetection.level=advanced

(Replace advanced with paranoid to monitor all buffers instead of just 1% of them).

In the Couchbase Java SDK, the Netty system property names are prefixed with com.couchbase.client.deps. So for example if you want to tweak the io.netty.allocator.pageSize property, the name to use would be com.couchbase.client.deps.io.netty.allocator.pageSize.

Thanks,
David

Thanks David for quick reply.

I have modified vm args as suggested. Additionally the following argument was added:
-Dcom.couchbase.client.deps.io.netty.noPreferDirect=true
to use heap instead of direct memory (this helps trigger the problem faster).

Below is a stacktrace which might help to identify the root cause of the leak:

05-12-2018 13:12:20.622 [cb-io-1-4] ERROR c.c.c.d.i.n.u.ResourceLeakDetector.error - LEAK: ByteBuf.release() was not called before it’s garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records:
Created at:
com.couchbase.client.deps.io.netty.buffer.AdvancedLeakAwareByteBuf.writeBytes(AdvancedLeakAwareByteBuf.java:572)
com.couchbase.client.deps.io.netty.buffer.PooledHeapByteBuf.copy(PooledHeapByteBuf.java:210)
com.couchbase.client.deps.io.netty.buffer.SlicedByteBuf.copy(SlicedByteBuf.java:181)
com.couchbase.client.deps.io.netty.buffer.AbstractByteBuf.copy(AbstractByteBuf.java:937)
com.couchbase.client.deps.io.netty.buffer.WrappedByteBuf.copy(WrappedByteBuf.java:699)
com.couchbase.client.deps.io.netty.buffer.AdvancedLeakAwareByteBuf.copy(AdvancedLeakAwareByteBuf.java:651)
com.couchbase.client.core.endpoint.view.ViewHandler.parseViewRows(ViewHandler.java:508)
com.couchbase.client.core.endpoint.view.ViewHandler.parseQueryResponse(ViewHandler.java:379)
com.couchbase.client.core.endpoint.view.ViewHandler.decodeResponse(ViewHandler.java:277)
com.couchbase.client.core.endpoint.view.ViewHandler.decodeResponse(ViewHandler.java:72)
com.couchbase.client.core.endpoint.AbstractGenericHandler.decode(AbstractGenericHandler.java:338)
com.couchbase.client.deps.io.netty.handler.codec.MessageToMessageCodec$2.decode(MessageToMessageCodec.java:81)
com.couchbase.client.deps.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
com.couchbase.client.deps.io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
com.couchbase.client.deps.io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
com.couchbase.client.deps.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
com.couchbase.client.deps.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:299)
com.couchbase.client.deps.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:415)
com.couchbase.client.deps.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
com.couchbase.client.deps.io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
com.couchbase.client.deps.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1304)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:921)
com.couchbase.client.deps.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:135)
com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646)
com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581)
com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
com.couchbase.client.deps.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.lang.Thread.run(Thread.java:748)

1 Like

A quick update:

I believe, the issue could be related to the ViewHandler implementation. It probably related to backpressure (slow consumer vs fast producer). As I can see, when the ViewQueryResponse is instantiated it uses the following view row observable:

viewRowObservable.onBackpressureBuffer().observeOn(scheduler),

Given the scenario, when the queried view returns a large data set (about 1mln), how should it is supposed to handle such a load without putting to much pressure on the consumer and eventually get OOM error?

Thanks,
Alex

1 Like

Hi David,

What version of couchbase java client has the fix for direct memory leak issue?

Hi @karki.geetanjali ,

As of Couchbase Java SDK 2.7.9, we’re still on Netty 4.0.56 (most recent version in the 4.0.x line). Couchbase Java SDK 3 (currently in development, to be released alongside Couchbase Server 6.5) will use the newest Netty.

In SDK 3 the query/view result handling is also much improved, which should address the OOM and backpressure issues present in SDK 2.x.

If you’d like to experiment with an alpha version of SDK 3, more info is here: https://docs.couchbase.com/java-sdk/3.0/hello-world/start-using-sdk.html

Thanks,
David

Is there are workaround for the OOM problem? We had an occurrence last week in our production environment. We were running a query that returned a huge payload and I believe that is what caused the OOM. Do you know what we can do in the meanwhile to prevent this from happening again?

Hi,

we have found a workaround for the memory leak issue by using pagination. More specifically, by using startkey / startkey_docid parameters.

The idea is to expose a method which would return an observable over a potentially infinite number of rows. The returned observable would use the pagination, making the client agnostic about this implementation detail.

You can find our implementation here.

Two ideas come to mind, first is that you try everything possible to make sure that the application can keep up with the incoming data so it won’t OOM, by say processing each row as lightly as possible.

Second, you could paginate the data as @alexo is suggesting, so you’re fetching more manageable chunks.

As David says, SDK3 is going to address this directly by providing automatic backpressure handling, so it will slow down requesting rows from the producer, if the consumer cannot keep up.

@graham.pople any meaningful processing would not be light enough to avoid OOM. In our case we just streamed the content to a servlet output stream, which in theory is very fast.

Looking forward to see the SDK3 but I’m sure it is not backward compatible with the current client, making it harder to adopt for larger applications. Btw, what are the timelines for releasing SDK3?

Thanks ! I am working on implementing these suggestions in our application.

It’s aligned to Couchbase 6.5 which is targeting Q1.

It is a major version so it does change the API, though not too much since form follows function.