Parent cluster object has been closed

@socketman2016
Just to be clear in your case the error happen even after a period of normal operation, not just immediately from the first server boot? Cause if it’s the latter it is pretty scary.

@00christian00 Yes, the error happens time and times without pattern, may be after hours may be after days, Not only time out error but also 2 other errors

I’m seriously considering going back to sdk 2 then if this doesn’t get fixed in the next version, I have an abstraction layer for the db operation, if I have to check for issues with the workaround I’d rather check for issues with the sdk rollback.
In some cases I got lazy and went with the raw sdk when I didn’t have a specific method in my abstraction layer implemented, will have to fix that, that’s the only reason I didn’t do the rollback sooner cause I’ve planning to do so for a long time since I found several issues with the new sdk, we are barely near production ready now after 1 year.

@socketman2016 Sorry for the late reply, we have a lot going on right now, but let me respond to a few of the replies that have been posted since my previous post. In particular, you’re asking why we’re asking for more details now when @brett19 already said (several things)… but Brett was just trying to speculate and provide some rough ideas without seeing the details. Since that didn’t help, it doesn’t apply, so I assume this must be a different problem, and therefore we need more details.

Keep in mind that this same error message may manifest in different scenarios and that some of the replies in this thread appear to be unrelated to what you are trying to report.

Specifically:

  1. The problem that @00christian00 reported appears to be very different than what you’re reporting here. From the earlier post, it sounds like that issue is associated with attaching a debugger which is most likely causing some internal libuv timers/events to fail/expire, which could cause unexpected results. If either of you can provide instructions to reproduce that problem consistently, we can certainly take a look at that more closely, but at best, we would expect any actions in flight to fail, and future actions that happen without the debugger attached to succeed. However, if it ends up being an issue that only happens when attaching a debugger, it will not have a critical priority.
  2. The other bug that Brett originally pointed out only happens when first trying to establish a connection and we have not seen it happen after a connection is established and credentials have not changed, no networking problems, etc.

So in summary, we have not seen the problem you are reporting with a cluster that is already connected and running without any changes happening to the network or cluster nodes. We have performed several long running tests and we have not seen the problem you’re reporting. We also have customers who are running with Node.js v3.2.3 in their production environments and have not reported this problem (and they also have servers running 24/7 so we would have had problems reported by now).

I would love to help get to the bottom of this, but there literally isn’t anything more we can do if you’re unable to give us any more details. We either need a way to reproduce the problem or we need to see DEBUG logs that include the period when this problem is reproduced. If you can provide either of these, we can take a look and try to help.

@raycardillo Thanks for your response
Yes, I understand
Unfortunately I have no idea about the problem, it exists but I don’t know how can I reproduce it, because there is no pattern for the bug
I cannot provide DEBUG logs at the moment, may be later, But I think if I want to provide a DEBUG logs, I must run it for hours or may be days and we have a very big log file

@raycardillo @brett19 I am running an application with DEBUG logs
I am not facing the issue now, but in the log I have the following lines repeatedly (I masked IP address with xxx)

2021-11-23T10:32:35.323Z couchnode:lcb:trace (bootstrap @ ../deps/lcb/src/bootstrap.cc:176) Background-polling for new configuration
2021-11-23T10:32:35.324Z couchnode:lcb:trace (confmon @ ../deps/lcb/src/bucketconfig/confmon.cc:319) Refreshing current cluster map (bucket: app-test)
2021-11-23T10:32:35.324Z couchnode:lcb:trace (server @ ../deps/lcb/src/mcserver/mcserver.cc:919) <xxx.xxx.xxx.xx1:11210> (CTX=0x64f5cd0,memcached,SRV=0x630d900,IX=1) Scheduling next timeout for 2500 ms. This is not an error
2021-11-23T10:32:35.324Z couchnode:lcb:trace (server @ ../deps/lcb/src/mcserver/mcserver.cc:919) <xxx.xxx.xxx.xx2:11210> (CTX=0x649ac60,memcached,SRV=0x630e760,IX=2) Scheduling next timeout for 2500 ms. This is not an error
2021-11-23T10:32:35.324Z couchnode:lcb:trace (server @ ../deps/lcb/src/mcserver/mcserver.cc:919) <xxx.xxx.xxx.xx3:11210> (CTX=0x5f26f00,memcached,SRV=0x66a8ac0,IX=3) Scheduling next timeout for 2500 ms. This is not an error
2021-11-23T10:32:35.324Z couchnode:lcb:trace (server @ ../deps/lcb/src/mcserver/mcserver.cc:919) <xxx.xxx.xxx.xx0:11210> (CTX=0x600d540,memcached,SRV=0x630bd40,IX=0) Scheduling next timeout for 2500 ms. This is not an error
2021-11-23T10:32:35.324Z couchnode:lcb:trace (confmon @ ../deps/lcb/src/bucketconfig/confmon.cc:157) Not applying configuration received via CCCP (bucket="app-test", source=CCCP, address="xxx.xxx.xxx.xx0"). No changes detected. A.rev=1:19492, B.rev=1:19478. Changes: servers=no, map=no, replicas=no
2021-11-23T10:32:35.324Z couchnode:lcb:trace (confmon @ ../deps/lcb/src/bucketconfig/confmon.cc:305) Attempting to retrieve cluster map via CCCP
2021-11-23T10:32:35.325Z couchnode:lcb:trace (cccp @ ../deps/lcb/src/bucketconfig/bc_cccp.cc:164) Re-Issuing CCCP Command on server struct 0x630d900 (xxx.xxx.xxx.xx1:11210)
2021-11-23T10:32:35.326Z couchnode:lcb:trace (confmon @ ../deps/lcb/src/bucketconfig/confmon.cc:157) Not applying configuration received via CCCP (bucket="app-test", source=CCCP, address="xxx.xxx.xxx.xx1"). No changes detected. A.rev=1:19492, B.rev=1:19492. Changes: servers=no, map=no, replicas=no

Hey @socketman2016 ,

These are some trace-level logs which are simply showing the internal behaviour of the SDK. They are all expected on a normally behaving cluster.

Cheers, Brett

I checked the server and it sill working well, without error, We must wait days for the error to happens!
As I told before there is no pattern, sometimes it happens after hours sometimes days

@brett19 @raycardillo I ran an application with DEBUG logs days ago, the error happens now

For security reason I masked IPs with xxx

I have a log file for DEBUG logs , when I tail the file and do some lookupIn couchbase operations, there is no output in the DEBUG logs, but I have another error log file, it show the following out put

ERROR 4107 nodejs.TimeoutError: timeout
    at Object.translateCppError (/home/app/node_modules/couchbase/dist/bindingutilities.js:180:20)
    at Object.<anonymous> (/home/app/node_modules/couchbase/dist/connection.js:147:54)
name: "TimeoutError"
cause: {"code":201}
context: undefined
pid: 4107

Currently I have only nodejs.TimeoutError error and the error repeated multiple times in the error log file, the DEBUG logs file has 233MB size

this is the last log of couchbase, after that, the server is up for 14 hours and the couchbase still stucked in the timeout error and has no debug output
My teammate decides to restart the server, he think that the error occures at the restarts, this is true because after multiple restarting the error happens and we have nodejs.TimeoutError, But I’m sure we have multiple bugs, I’m sure the ConnectionClosedError , TimeoutError and parent cluster object has been closed error ocurrs when the server is up and running too without any network issue

2021-12-03T15:50:23.933Z couchnode:lcb:trace (confmon @ ../deps/lcb/src/bucketconfig/confmon.cc:305) Attempting to retrieve cluster map via CCCP
2021-12-03T15:50:23.933Z couchnode:lcb:info (cccp @ ../deps/lcb/src/bucketconfig/bc_cccp.cc:176) Requesting connection to node xxx.xxx.xxx.xxx0:11210 for CCCP configuration
2021-12-03T15:50:23.933Z couchnode:lcb:debug (lcbio_mgr @ ../deps/lcb/src/lcbio/manager.cc:501) <xxx.xxx.xxx.xxx0:11210> (HE=0x51c3f00) Creating new connection because none are available in the pool
2021-12-03T15:50:23.933Z couchnode:lcb:trace (lcbio_mgr @ ../deps/lcb/src/lcbio/manager.cc:413) <xxx.xxx.xxx.xxx0:11210> (HE=0x51c3f00) New pool entry: I=0x580e100
2021-12-03T15:50:23.934Z couchnode:lcb:info (connection @ ../deps/lcb/src/lcbio/connect.cc:497) <xxx.xxx.xxx.xxx0:11210> (SOCK=98cd346edf8dc44c) Starting. Timeout=2000000us
2021-12-03T15:50:23.934Z couchnode:lcb:error (bootstrap @ ../deps/lcb/src/bootstrap.cc:205) Failed to bootstrap client=0x5196da0. Error=LCB_ERR_TIMEOUT (201) (Last=LCB_ERR_TIMEOUT (201)), Message="Failed to bootstrap in time"
2021-12-03T15:50:23.936Z couchnode failed to connect to bucket: TimeoutError: timeout
    at Object.translateCppError (/home/app/node_modules/couchbase/dist/bindingutilities.js:180:20)
    at Object.<anonymous> (/home/app/node_modules/couchbase/dist/connection.js:147:54) {
  cause: LibcouchbaseError: libcouchbase error 201
      at Object.translateCppError (/home/app/node_modules/couchbase/dist/bindingutilities.js:174:21)
      at Object.<anonymous> (/home/app/node_modules/couchbase/dist/connection.js:147:54) {
    code: 201
  },
  context: undefined
}
2021-12-03T15:50:23.936Z couchnode:lcb:debug (lcbio_mgr @ ../deps/lcb/src/lcbio/manager.cc:523) <xxx.xxx.xxx.xxx0:11210> (HE=0x51c3f00) Request=0x54c8b10 has no connection.. yet

this is the first previous log of couchbase

2021-12-03T15:50:10.670Z couchnode:lcb:info (instance @ ../deps/lcb/src/instance.cc:521) Version=3.2.1+13-njs, Changeset=204a528618e9a86007b85c109fd46a45c21aa053
2021-12-03T15:50:10.670Z couchnode:lcb:info (instance @ ../deps/lcb/src/instance.cc:522) Effective connection string: couchbase://xxx.xxx.xxx.xxx0,xxx.xxx.xxx.xxx1,xxx.xxx.xxx.xxx2,xxx.xxx.xxx.xxx3/app-test?enable_tracing=on&enable_operation_metrics=on&client_string=app%2F3.2.3%20(node%2F14.18.2%3B%20v8%2F8.4.371.23-node.85%3B%20ssl%2F1.1.1l). Bucket=app-test
2021-12-03T15:50:10.670Z couchnode:lcb:debug (instance @ ../deps/lcb/src/instance.cc:389) Applying initial cntl enable_tracing=on
2021-12-03T15:50:10.670Z couchnode:lcb:debug (instance @ ../deps/lcb/src/instance.cc:389) Applying initial cntl enable_operation_metrics=on
2021-12-03T15:50:10.670Z couchnode:lcb:debug (instance @ ../deps/lcb/src/instance.cc:389) Applying initial cntl client_string=app/3.2.3 (node/14.18.2; v8/8.4.371.23-node.85; ssl/1.1.1l)
2021-12-03T15:50:10.670Z couchnode:lcb:debug (instance @ ../deps/lcb/src/instance.cc:155) Adding host xxx.xxx.xxx.xxx0:8091 to initial HTTP bootstrap list
2021-12-03T15:50:10.670Z couchnode:lcb:debug (instance @ ../deps/lcb/src/instance.cc:155) Adding host xxx.xxx.xxx.xxx0:11210 to initial CCCP bootstrap list
2021-12-03T15:50:10.670Z couchnode:lcb:debug (instance @ ../deps/lcb/src/instance.cc:155) Adding host xxx.xxx.xxx.xxx1:8091 to initial HTTP bootstrap list
2021-12-03T15:50:10.670Z couchnode:lcb:debug (instance @ ../deps/lcb/src/instance.cc:155) Adding host xxx.xxx.xxx.xxx1:11210 to initial CCCP bootstrap list
2021-12-03T15:50:10.670Z couchnode:lcb:debug (instance @ ../deps/lcb/src/instance.cc:155) Adding host xxx.xxx.xxx.xxx2:8091 to initial HTTP bootstrap list
2021-12-03T15:50:10.671Z couchnode:lcb:debug (instance @ ../deps/lcb/src/instance.cc:155) Adding host xxx.xxx.xxx.xxx2:11210 to initial CCCP bootstrap list
2021-12-03T15:50:10.671Z couchnode:lcb:debug (instance @ ../deps/lcb/src/instance.cc:155) Adding host xxx.xxx.xxx.xxx3:8091 to initial HTTP bootstrap list
2021-12-03T15:50:10.671Z couchnode:lcb:debug (instance @ ../deps/lcb/src/instance.cc:155) Adding host xxx.xxx.xxx.xxx3:11210 to initial CCCP bootstrap list
2021-12-03T15:50:10.671Z couchnode:lcb:trace (instance @ ../deps/lcb/src/instance.cc:193) Bootstrap hosts loaded (cccp:4, http:4)
2021-12-03T15:50:10.671Z couchnode:lcb:info (bootstrap @ ../deps/lcb/src/bootstrap.cc:277) Requested network configuration: heuristic
2021-12-03T15:50:10.671Z couchnode:lcb:debug (confmon @ ../deps/lcb/src/bucketconfig/confmon.cc:94) Preparing providers (this may be called multiple times)
2021-12-03T15:50:10.671Z couchnode:lcb:debug (confmon @ ../deps/lcb/src/bucketconfig/confmon.cc:100) Provider CCCP is ENABLED
2021-12-03T15:50:10.671Z couchnode:lcb:debug (confmon @ ../deps/lcb/src/bucketconfig/confmon.cc:100) Provider HTTP is ENABLED
2021-12-03T15:50:10.671Z couchnode:lcb:trace (confmon @ ../deps/lcb/src/bucketconfig/confmon.cc:319) Refreshing current cluster map (bucket: app-test)

@brett19 @raycardillo can you please take a look?

Hey @socketman2016,

I took a look at the log messages above but unfortunately there is nothing in the logs that indicates any sort of fault occurring except that the SDK isn’t able to connect to the cluster in a normal period of time. Would you be able to take a look and see if perhaps there is an overuse of connections on the system this is running on? If there are connections being leaked by the application, this could lead to the Operating System rejecting our attempts to connect to the cluster, which could lead to this behaviour.

Cheers, Brett

Would you be able to take a look and see if perhaps there is an overuse of connections on the system this is running on?

How?
If it’s true, Why we have no issue when we use SDK v2

If there are connections being leaked by the application, this could lead to the Operating System rejecting our attempts to connect to the cluster, which could lead to this behaviour.

I am not agree with you, I have some reason

  1. Other connections like PostgreSQL , RabbitMQ, MQTT, S3 ,… works well in this case
  2. There is no log of couchbase that shows it want to attempts to connect to the cluster, consider that after the latest DEBUG logs of couchbase, the server is up for 14 hours, everything is works well on the system, i.e we have connection pooling for postgres it can close and create new connection, but even in case of doing some couchbase operations, like lookupIn there is no DEBUG logs for couchbase, it just hanging since 14 hours ago!