Query on Java SDK about JNI global references


#1

Hi,

I am using Java SDK 2.6.0 and Couchbase Server 5.1.1 Community Edition.

I am observing a large count of “JNI global references” in my application’s heap dump.

JNI global references: 359

Could someone please clarify if there are any open bugs regarding this? or if at all JNI global references are used within SDK in the first place?

Appreciate any help/suggestions !!


#2

Can you tell me which problem you are seeing? Also can you identify where the JNI references are coming from? In general though I don’t see a problem why JNI would be used.


#3

My apologies for posting less info.

We are facing issues on load testing and I am currently looking at heap dumps, gc logs, system logs etc available at my disposal and here is a list of items/problems found.

  1. Clients crash with OOM. Large number of JNI references found in histo logs (as described earlier). Please do let me know if SDK is using any JNI references causing memory leaks.
  2. “Operations over threshold” warning seen with total_us > 7 secs
    [
    {
    “top”:[
    {
    “operation_name”:“get”,
    “last_local_id”:“5C2AEB30919DC5A7/FFFFFFFF853D2B29”,
    “last_local_address”:“10.64.105.94:42010”,
    “last_remote_address”:“10.64.106.184:11210”,
    “last_dispatch_us”:734,
    “decode_us”:70,
    “last_operation_id”:“0x5034a55”,
    “total_us”:7871814
    }
    ],
    “service”:“kv”,
    “count”:1
    }
    ]
  3. No GC were running at the moment when the above warning was seen.
  4. lsof shows huge number of connections from 1 host to couchbase server (upto 60k), all in ESTABLISHED state. No network errors are seen.
  5. REBALANCING on Couchbase server takes 12~15 minutes (even with empty 9 ephemeral buckets). We are running only dataservice on all the nodes and this duration (12~15 min) is not reasonable.

We are running SDK CouchbaseEnvironment with default values.
Any inputs/suggestions will be greatly appreciated.


#4

@ravikrn.13 the only way we use JNI is through netty, our IO layer. This might correspond with your observation here

lsof shows huge number of connections from 1 host to couchbase server (upto 60k), all in ESTABLISHED state. No network errors are seen.

Can you verify that those connections are coming from the jvm? Are you using environment, cluster and bucket as a singleton? It is very important to not open a new connection every time you do an op, which can lead to this behavior. Can you share DEBUG level logs for the entire duration of the run?


#5

Yes, all of these are being used as singleton. I am using AsyncCluster, AsyncBuckets, and also EventBus for system events like NodeDisconnectedEvent.

Working on getting debug logs.