The remote side disconnected the endpoint unexpectedly - warn in Java sdk 3.0 + couchbase server 6.5.1

Hello, I just installed Couchbase 6.5.1 build 6299 + Java client 3.0.4.
Unfortunately, every 5-15 minutes I get the following error (warn) in Java log:
[cb-events] WARN com.couchbase.endpoint - [com.couchbase.endpoint][UnexpectedEndpointDisconnectedEvent] The remote side disconnected the endpoint unexpectedly {"circuitBreaker":"DISABLED","coreId":"0x7f2debc400000001","local":"127.0.0.1:52754","remote":"localhost:8093","type":"QUERY"}

I tried to restart the server, close and open the web console - the error is always present.
I haven’t found examples for sdk 3.0 initialization, but similar to version 2.7, I create two global instance of cluster and bucket and access them from different threads (I use kotlin+ktor as a framework) - perhaps you can suggest a better solution for initialization and configuration params for server app, but for now I initialize couchbase as follows:

lateinit var bucket: Bucket
lateinit var cluster: Cluster
...
fun initCouchbase() {
    val env = ClusterEnvironment
            .builder()
            .timeoutConfig(TimeoutConfig.kvTimeout(Duration.ofSeconds(15)).queryTimeout(Duration.ofSeconds(15)).connectTimeout(Duration.ofSeconds(15)))
            .ioConfig(IoConfig.maxHttpConnections(11).numKvConnections(11))
            .requestTracer(com.couchbase.client.core.cnc.tracing.ThresholdRequestTracer.builder(null).queryThreshold(Duration.ofSeconds(12)).build())
            .build()

    cluster = Cluster.connect("localhost", clusterOptions("...", "...").environment(env))
    bucket = cluster.bucket("...")
}

The error appears in the logs even when there are no requests to the database from the server. After this error appears, the server continues to work and correctly executes requests to the database, if they are received. I found no mention of this error in discussions on the forum or elsewhere, so I decided to write here. I would like to figure out if this error can affect the stability and performance of the server.

Sincerely.

1 Like

Can I get any comments from the support team or other users who are experiencing this problem? I am ready to provide all the necessary logs, but information from which files can help?

Hi Poltar

My guess, refer to https://docs.couchbase.com/java-sdk/current/ref/client-settings.html (JAVA) or https://docs.couchbase.com/dotnet-sdk/3.0/ref/client-settings.html#io-options (NODE) and search for the string " Circuit Breaker Options" you should be able to add the Template to your current ClusterEnvironment.

See Matt’s answer ingenthr below

Best

Jon

This sounds a lot like the behavior @daschl identified from the change in MB-37032. Effectively, the query service changed its behavior to drop idle connections as mitigation against an attack vector. Since the SDK prior to that is designed to keep the connection open to have the lowest latency when a query is requested, the SDK will constantly reconnect whenever query drops the connection.

Unfortunately, the change wasn’t well identified in advance and it wasn’t found in testing.

The workaround is to lower the idle timeout in the SDK.

For Java 3.x, that’s:

CouchbaseEnvironment env = DefaultCouchbaseEnvironment.builder().queryServiceConfig(QueryServiceConfig.create(0, 12, 10)).build();

Similar tune-ables exist for Java 2.x and other SDKs.

There’s a rough plan to adjust to the new cluster behavior tracked under CBD-3366.

Thank you for your answer.
I configured couchbase like this:
val env = ClusterEnvironment.builder().timeoutConfig(TimeoutConfig.kvTimeout(Duration.ofSeconds(16)).queryTimeout(Duration.ofSeconds(16)).connectTimeout(Duration.ofSeconds(16))).ioConfig(IoConfig.maxHttpConnections(12).numKvConnections(12).enableDnsSrv(false).kvCircuitBreakerConfig(CircuitBreakerConfig.builder().enabled(true).volumeThreshold(45).errorThresholdPercentage(25).sleepWindow(Duration.ofSeconds(1)).rollingWindow(Duration.ofMinutes(2))).queryCircuitBreakerConfig(CircuitBreakerConfig.builder().enabled(true).volumeThreshold(45).errorThresholdPercentage(25).sleepWindow(Duration.ofSeconds(1)).rollingWindow(Duration.ofMinutes(2))).managerCircuitBreakerConfig(CircuitBreakerConfig.builder().enabled(true).volumeThreshold(45).errorThresholdPercentage(25).sleepWindow(Duration.ofSeconds(1)).rollingWindow(Duration.ofMinutes(2)))).ioEnvironment(IoEnvironment.eventLoopThreadCount(12)).requestTracer(com.couchbase.client.core.cnc.tracing.ThresholdRequestTracer.builder(null).queryThreshold(Duration.ofSeconds(12)).build()).build()

But now this type of error appears in log:
[cb-events] WARN com.couchbase.endpoint - [com.couchbase.endpoint][UnexpectedEndpointDisconnectedEvent] The remote side disconnected the endpoint unexpectedly {"circuitBreaker":"CLOSED","coreId":"0xbe7ca8b000000001","local":"127.0.0.1:56635","remote":"localhost:8093","type":"QUERY"}

@ingenthr, thank you, I will try your solution!)

DefaultCouchbaseEnvironment and ClusterEnvironment.queryServiceConfig are unresolved reference.
I think DefaultCouchbaseEnvironment is for 2.x.x SDK.

For 3.0.0 I think you might need
import com.couchbase.client.core.service.QueryServiceConfig;
but I am not sure how to use it.
But you can use the following to disable / enable the query circuit breaker however it will still emits log messages but I imagine if it is disable it isn’t doing anything.
IoConfig.queryCircuitBreakerConfig(CircuitBreakerConfig.builder().enabled(false))

@Poltar as far as your issue with the messages RE “circuitBreaker”:“DISABLED” as I also got in my test setup

Jun 12, 2020 9:29:32 AM com.couchbase.client.core.cnc.LoggingEventConsumer$JdkLogger warn
WARNING: [com.couchbase.endpoint][UnexpectedEndpointDisconnectedEvent] The remote side disconnected the endpoint unexpectedly {"circuitBreaker":"DISABLED","coreId":"0x15ed0a3e00000001","local":"192.168.3.249:50781","remote":"192.168.3.150:8093","type":"QUERY"}

the new Java SDK seems to have lowered a key server setting from 30 seconds down to 5 seconds adjusting idleHttpConnectionTimeout to 4 seconds will eliminate the messages you see.

Note setting the other item to 12 i.e. queryThreshold seems to clean up another log file message that I see in my specific test jig you may or may not need this.

ClusterEnvironment env = ClusterEnvironment.builder()
			.ioConfig(IoConfig.idleHttpConnectionTimeout(Duration.ofSeconds(4)))
			.requestTracer(ThresholdRequestTracer.builder(null)
			.queryThreshold(Duration.ofSeconds(12)).build())
			.build();

Sorry I can’t provide the internal details on the why and how this works. I only got the messages suppressed/cleaned up with some help from @daschl

Setting the idleHttpConnectionTimeout parameter solved the problem. Some of my queries take more than two seconds (the default values for the queryThreshold parameter, if I’m not mistaken) - this is normal and I don’t need a log message about it, so I increased this value.

Thank you! I noted your answer as a solution to the problem)