Not able to connect to cluster anymore suddenly

Hello,

I am suddenly not able anymore to connect to my couchbase cluster, the only way I found was to rollback my codebase to couchbase-sdk 2.x.
I recently migrated to 3.0, which was working fine and very well, now suddenly it will fail in every scenario: either on my local dev setup with nodejs and couchbase and as well on my local docker-compose setup.
I already tried debugging, also with sdk-doctor, with general debugging I can’t get any meaningful information, while sdk-doctor doesnt report any errors:

NodeJS Error: Error: cluster object was closed

cluster: Cluster {
    _connStr: 'couchbase://localhost/',
    _trustStorePath: undefined,
    _kvTimeout: undefined,
    _kvDurableTimeout: undefined,
    _viewTimeout: undefined,
    _queryTimeout: undefined,
    _analyticsTimeout: undefined,
    _searchTimeout: undefined,
    _managementTimeout: undefined,
    _auth: { username: 'xxx', password: 'xxx' },
    _closed: false,
    _clusterConn: null,
    _conns: {
        fwdisplay: Connection {
            _inst: CbConnection {},
            _closed: true,
            _pendOps: [],
            _pendBOps: [],
            _connected: false,
            _opened: true
        }
    },
    _transcoder: DefaultTranscoder {},
    _logFunc: undefined
}

SDK Doc Summary:
[WARN] Your connection string specifies only a single host. You should consider adding additional static nodes from your cluster to this list to improve your applications fault-tolerance
[WARN] Could not test Analytics service on 127.0.0.1 as it was not in the config

This is the connection setup:

this.cluster = new couchbase.Cluster(`couchbase://${COUCHBASE_HOSTNAME}/`, {
    username: COUCHBASE_USERNAME,
    password: COUCHBASE_PASSWORD
})
this.bucket = this.cluster.bucket(COUCHBASE_BUCKET)
this.collection = this.bucket.defaultCollection()

anyone has any ideas as to what to change to fix it?

Had the same issue and had to roll back.

Is anyone else having this? is it resolved with 3.1?

Did your setup first work for 3.x and then end to work? In my case, it did.

I now tried the sample https://github.com/couchbaselabs/try-cb-nodejs/tree/6.5-collections, but it crashes as well, leading me to believe that the error is not on my end.

Same as myself, initially it worked, but once it stops working my api is essentially broken, I can’t restart couchbase or the sync gateway periodically just because the nodejs library is broken, hence my rollback to the previous release.

Hello, I am assuming your problem statement aligns with Cluster closed - reinitialize connection?

This seems to be a bug with the underlying libcouchbase and will be fixed in the upcoming release.

There seems to be plenty of bug reports for the infamous cluster object was closed issue around here for the Node SDK… I just wanted to chime in on this thread with my own report.

Context

  • Couchbase (6.6.0 and 6.6.1), installed via the kubernetes operator
  • Node SDK 3.0.4 to 3.1.1 - tried all versions, used with intra-cluster DNS with the recommended connection string (my-cluster-srv DNS service)
  • ~4 pods with the same nodejs app connected to the cluster, and 2 sync gateway pods

Behaviour

Things seem to work fine for a few minutes, until suddenly on one given container (but NOT the others), the logs (activated with DEBUG=couchnode:lcb:error) are flooded with these errors:

2021-01-21T07:46:53.722Z couchnode:lcb:error (cccp @ ../deps/lcb/src/bucketconfig/bc_cccp.cc:187) <NOHOST:NOPORT> (CTX=(nil),) Could not get configuration: LCB_ERR_TIMEOUT (201)
2021-01-21T07:46:53.726Z couchnode:lcb:error (cccp @ ../deps/lcb/src/bucketconfig/bc_cccp.cc:187) <NOHOST:NOPORT> (CTX=(nil),) Could not get configuration: LCB_ERR_TIMEOUT (201)
2021-01-21T07:46:53.730Z couchnode:lcb:error (cccp @ ../deps/lcb/src/bucketconfig/bc_cccp.cc:187) <NOHOST:NOPORT> (CTX=(nil),) Could not get configuration: LCB_ERR_TIMEOUT (201)

Until it all comes to a stop with this error:

FATAL ERROR:
    libcouchbase experienced an unrecoverable error and terminates the program
    to avoid undefined behavior.
    The program should have generated a "corefile" which may used
    to gather more information about the problem.
    If your system doesn't create "corefiles" I can tell you that the
    assertion failed in ../deps/lcb/src/mcserver/negotiate.cc at line 50

This does not crash the container, but it seems to make it hang somehow, since there’s no more logs (including application logs), and the port that the app listens to becomes unresponsive, causing my livenessProbe to fail, and kubernetes to eventually kill and restart the container.
Other pods seem to do fine at the same time but will also randomly fail in the same way.
Sync Gateway is fine all along.

Diags:

Couchbase UI is fine.

sdkdoctor never seems to complain:

|====================================================================|
|          ___ ___  _  __   ___   ___   ___ _____ ___  ___           |
|         / __|   \| |/ /__|   \ / _ \ / __|_   _/ _ \| _ \          |
|         \__ \ |) | ' <___| |) | (_) | (__  | || (_) |   /          |
|         |___/___/|_|\_\  |___/ \___/ \___| |_| \___/|_|_\          |
|                                                                    |
|====================================================================|

Note: Diagnostics can only provide accurate results when your cluster
 is in a stable state.  Active rebalancing and other cluster configuration
 changes can cause the output of the doctor to be inconsistent or in the
 worst cases, completely incorrect.

08:54:32.016 INFO ▶ Parsing connection string `couchbase://oaf-couchbase-srv.default.svc.cluster.local/fs-bucket-v0`
08:54:32.016 INFO ▶ Connection string was parsed as a potential DNS SRV record
08:54:32.020 INFO ▶ Connection string identifies the following CCCP endpoints:
08:54:32.020 INFO ▶   1. 10-36-0-7.oaf-couchbase-srv.default.svc.cluster.local:11210
08:54:32.020 INFO ▶   2. 10-32-0-19.oaf-couchbase-srv.default.svc.cluster.local:11210
08:54:32.020 INFO ▶   3. 10-35-0-39.oaf-couchbase-srv.default.svc.cluster.local:11210
08:54:32.020 INFO ▶ Connection string identifies the following HTTP endpoints:
08:54:32.020 INFO ▶ Connection string specifies bucket `fs-bucket-v0`
08:54:32.027 WARN ▶ The hostname specified in your connection string resolves both for SRV records, as well as A records.  This is not suggested as later DNS configuration changes could cause the wrong servers to be contacted
08:54:32.027 INFO ▶ Performing DNS lookup for host `10-32-0-19.oaf-couchbase-srv.default.svc.cluster.local`
08:54:32.029 INFO ▶ Bootstrap host `10-32-0-19.oaf-couchbase-srv.default.svc.cluster.local` refers to a server with the address `10.32.0.19`
08:54:32.030 INFO ▶ Performing DNS lookup for host `10-36-0-7.oaf-couchbase-srv.default.svc.cluster.local`
08:54:32.031 INFO ▶ Bootstrap host `10-36-0-7.oaf-couchbase-srv.default.svc.cluster.local` refers to a server with the address `10.36.0.7`
08:54:32.032 INFO ▶ Performing DNS lookup for host `10-35-0-39.oaf-couchbase-srv.default.svc.cluster.local`
08:54:32.034 INFO ▶ Bootstrap host `10-35-0-39.oaf-couchbase-srv.default.svc.cluster.local` refers to a server with the address `10.35.0.39`
08:54:32.034 INFO ▶ Attempting to connect to cluster via CCCP
08:54:32.035 INFO ▶ Attempting to fetch config via cccp from `10-36-0-7.oaf-couchbase-srv.default.svc.cluster.local:11210`
08:54:32.042 INFO ▶ Attempting to fetch config via cccp from `10-32-0-19.oaf-couchbase-srv.default.svc.cluster.local:11210`
08:54:32.050 INFO ▶ Attempting to fetch config via cccp from `10-35-0-39.oaf-couchbase-srv.default.svc.cluster.local:11210`
08:54:32.054 WARN ▶ Bootstrap host `10-36-0-7.oaf-couchbase-srv.default.svc.cluster.local` is not using the canonical node hostname of `oaf-couchbase-0005.oaf-couchbase.default.svc`.  This is not neccessarily an error, but has been known to result in strange and challenging to diagnose errors when DNS entries are reconfigured.
08:54:32.054 WARN ▶ Bootstrap host `10-32-0-19.oaf-couchbase-srv.default.svc.cluster.local` is not using the canonical node hostname of `oaf-couchbase-0003.oaf-couchbase.default.svc`.  This is not neccessarily an error, but has been known to result in strange and challenging to diagnose errors when DNS entries are reconfigured.
08:54:32.054 WARN ▶ Bootstrap host `10-35-0-39.oaf-couchbase-srv.default.svc.cluster.local` is not using the canonical node hostname of `oaf-couchbase-0004.oaf-couchbase.default.svc`.  This is not neccessarily an error, but has been known to result in strange and challenging to diagnose errors when DNS entries are reconfigured.
08:54:32.054 INFO ▶ Selected the following network type: external
08:54:32.054 INFO ▶ Identified the following nodes:
08:54:32.054 INFO ▶   [0] 95.216.208.78
08:54:32.054 INFO ▶                  mgmtSSL: 30971,    eventingAdminPort: 30535,                 mgmt: 31351
08:54:32.054 INFO ▶                     n1ql: 30386,                  fts: 30561,          eventingSSL: 31810
08:54:32.054 INFO ▶                     cbas: 30104,                 capi: 30103,                   kv: 31941
08:54:32.054 INFO ▶                    kvSSL: 31297,              capiSSL: 32655,              n1qlSSL: 30074
08:54:32.054 INFO ▶                   ftsSSL: 31779,              cbasSSL: 32761
08:54:32.054 INFO ▶   [1] 95.217.218.135
08:54:32.054 INFO ▶              eventingSSL: 30673,              n1qlSSL: 31678,                kvSSL: 31871
08:54:32.054 INFO ▶                  capiSSL: 30075,                 n1ql: 30863,                 cbas: 31413
08:54:32.054 INFO ▶                  cbasSSL: 30705,              mgmtSSL: 30953,               ftsSSL: 30924
08:54:32.054 INFO ▶                       kv: 32210,                 capi: 30896,                  fts: 30585
08:54:32.054 INFO ▶        eventingAdminPort: 31922,                 mgmt: 31705
08:54:32.054 INFO ▶   [2] 135.181.30.248
08:54:32.054 INFO ▶                     n1ql: 32549,    eventingAdminPort: 32752,          eventingSSL: 31661
08:54:32.054 INFO ▶                  capiSSL: 32329,                   kv: 31872,                 capi: 30976
08:54:32.054 INFO ▶                  n1qlSSL: 32370,                  fts: 30763,              cbasSSL: 31852
08:54:32.054 INFO ▶                     mgmt: 32453,                kvSSL: 30068,               ftsSSL: 32228
08:54:32.054 INFO ▶                  mgmtSSL: 32355,                 cbas: 30578
08:54:32.054 INFO ▶ Fetching config from `http://95.216.208.78:31351`
08:54:32.090 INFO ▶ Received cluster configuration, nodes list:
[
  {
    "addressFamily": "inet",
    "alternateAddresses": {
      "external": {
        "hostname": "95.216.208.78",
        "ports": {
          "capi": 30103,
          "capiSSL": 32655,
          "kv": 31941,
          "mgmt": 31351,
          "mgmtSSL": 30971
        }
      }
    },
    "clusterCompatibility": 393222,
    "clusterMembership": "active",
    "configuredHostname": "oaf-couchbase-0003.oaf-couchbase.default.svc:8091",
    "couchApiBase": "http://oaf-couchbase-0003.oaf-couchbase.default.svc:8092/",
    "couchApiBaseHTTPS": "https://oaf-couchbase-0003.oaf-couchbase.default.svc:18092/",
    "cpuCount": 8,
    "externalListeners": [
      {
        "afamily": "inet",
        "nodeEncryption": false
      },
      {
        "afamily": "inet6",
        "nodeEncryption": false
      }
    ],
    "hostname": "oaf-couchbase-0003.oaf-couchbase.default.svc:8091",
    "interestingStats": {
      "cmd_get": 0,
      "couch_docs_actual_disk_size": 4752118309,
      "couch_docs_data_size": 3594756877,
      "couch_spatial_data_size": 0,
      "couch_spatial_disk_size": 0,
      "couch_views_actual_disk_size": 12589014,
      "couch_views_data_size": 12589014,
      "curr_items": 1091813,
      "curr_items_tot": 2185369,
      "ep_bg_fetched": 0,
      "get_hits": 0,
      "mem_used": 1808067560,
      "ops": 0,
      "vb_active_num_non_resident": 656502,
      "vb_replica_curr_items": 1093556
    },
    "mcdMemoryAllocated": 25088,
    "mcdMemoryReserved": 25088,
    "memoryFree": 13474230272,
    "memoryTotal": 32884228096,
    "nodeEncryption": false,
    "nodeUUID": "bccd30747f9e69e0269c24020361c680",
    "os": "x86_64-unknown-linux-gnu",
    "otpNode": "ns_1@oaf-couchbase-0003.oaf-couchbase.default.svc",
    "ports": {
      "direct": 11210,
      "distTCP": 21100,
      "distTLS": 21150,
      "httpsCAPI": 18092,
      "httpsMgmt": 18091
    },
    "recoveryType": "none",
    "services": [
      "cbas",
      "eventing",
      "fts",
      "index",
      "kv",
      "n1ql"
    ],
    "status": "healthy",
    "systemStats": {
      "allocstall": 0,
      "cpu_cores_available": 8,
      "cpu_stolen_rate": 0,
      "cpu_utilization_rate": 31.35483870967742,
      "mem_free": 13474230272,
      "mem_limit": 32884228096,
      "mem_total": 32884228096,
      "swap_total": 0,
      "swap_used": 0
    },
    "thisNode": true,
    "uptime": "44946",
    "version": "6.6.1-9213-enterprise"
  },
  {
    "addressFamily": "inet",
    "alternateAddresses": {
      "external": {
        "hostname": "95.217.218.135",
        "ports": {
          "capi": 30896,
          "capiSSL": 30075,
          "kv": 32210,
          "mgmt": 31705,
          "mgmtSSL": 30953
        }
      }
    },
    "clusterCompatibility": 393222,
    "clusterMembership": "active",
    "configuredHostname": "oaf-couchbase-0004.oaf-couchbase.default.svc:8091",
    "couchApiBase": "http://oaf-couchbase-0004.oaf-couchbase.default.svc:8092/",
    "couchApiBaseHTTPS": "https://oaf-couchbase-0004.oaf-couchbase.default.svc:18092/",
    "cpuCount": 8,
    "externalListeners": [
      {
        "afamily": "inet",
        "nodeEncryption": false
      },
      {
        "afamily": "inet6",
        "nodeEncryption": false
      }
    ],
    "hostname": "oaf-couchbase-0004.oaf-couchbase.default.svc:8091",
    "interestingStats": {
      "cmd_get": 0,
      "couch_docs_actual_disk_size": 4599740551,
      "couch_docs_data_size": 3572644462,
      "couch_spatial_data_size": 0,
      "couch_spatial_disk_size": 0,
      "couch_views_actual_disk_size": 11864147,
      "couch_views_data_size": 11864147,
      "curr_items": 1091273,
      "curr_items_tot": 2181978,
      "ep_bg_fetched": 0,
      "get_hits": 0,
      "mem_used": 1846524952,
      "ops": 0,
      "vb_active_num_non_resident": 640880,
      "vb_replica_curr_items": 1090705
    },
    "mcdMemoryAllocated": 25088,
    "mcdMemoryReserved": 25088,
    "memoryFree": 9966014464,
    "memoryTotal": 32884191232,
    "nodeEncryption": false,
    "nodeUUID": "00583abf725fca65006ff32e80185f0c",
    "os": "x86_64-unknown-linux-gnu",
    "otpNode": "ns_1@oaf-couchbase-0004.oaf-couchbase.default.svc",
    "ports": {
      "direct": 11210,
      "distTCP": 21100,
      "distTLS": 21150,
      "httpsCAPI": 18092,
      "httpsMgmt": 18091
    },
    "recoveryType": "none",
    "services": [
      "cbas",
      "eventing",
      "fts",
      "index",
      "kv",
      "n1ql"
    ],
    "status": "healthy",
    "systemStats": {
      "allocstall": 0,
      "cpu_cores_available": 8,
      "cpu_stolen_rate": 0,
      "cpu_utilization_rate": 76.33289986996098,
      "mem_free": 9966014464,
      "mem_limit": 32884191232,
      "mem_total": 32884191232,
      "swap_total": 0,
      "swap_used": 0
    },
    "uptime": "44130",
    "version": "6.6.1-9213-enterprise"
  },
  {
    "addressFamily": "inet",
    "alternateAddresses": {
      "external": {
        "hostname": "135.181.30.248",
        "ports": {
          "capi": 30976,
          "capiSSL": 32329,
          "kv": 31872,
          "mgmt": 32453,
          "mgmtSSL": 32355
        }
      }
    },
    "clusterCompatibility": 393222,
    "clusterMembership": "active",
    "configuredHostname": "oaf-couchbase-0005.oaf-couchbase.default.svc:8091",
    "couchApiBase": "http://oaf-couchbase-0005.oaf-couchbase.default.svc:8092/",
    "couchApiBaseHTTPS": "https://oaf-couchbase-0005.oaf-couchbase.default.svc:18092/",
    "cpuCount": 8,
    "externalListeners": [
      {
        "afamily": "inet",
        "nodeEncryption": false
      },
      {
        "afamily": "inet6",
        "nodeEncryption": false
      }
    ],
    "hostname": "oaf-couchbase-0005.oaf-couchbase.default.svc:8091",
    "interestingStats": {
      "cmd_get": 0,
      "couch_docs_actual_disk_size": 4690227191,
      "couch_docs_data_size": 3548986919,
      "couch_spatial_data_size": 0,
      "couch_spatial_disk_size": 0,
      "couch_views_actual_disk_size": 12224473,
      "couch_views_data_size": 12224473,
      "curr_items": 1090141,
      "curr_items_tot": 2179107,
      "ep_bg_fetched": 0,
      "get_hits": 0,
      "mem_used": 1873142368,
      "ops": 0,
      "vb_active_num_non_resident": 739779,
      "vb_replica_curr_items": 1088966
    },
    "mcdMemoryAllocated": 25088,
    "mcdMemoryReserved": 25088,
    "memoryFree": 20557443072,
    "memoryTotal": 32884228096,
    "nodeEncryption": false,
    "nodeUUID": "ae669a001fa9bf0f31524b8c5aef9195",
    "os": "x86_64-unknown-linux-gnu",
    "otpNode": "ns_1@oaf-couchbase-0005.oaf-couchbase.default.svc",
    "ports": {
      "direct": 11210,
      "distTCP": 21100,
      "distTLS": 21150,
      "httpsCAPI": 18092,
      "httpsMgmt": 18091
    },
    "recoveryType": "none",
    "services": [
      "cbas",
      "eventing",
      "fts",
      "index",
      "kv",
      "n1ql"
    ],
    "status": "healthy",
    "systemStats": {
      "allocstall": 0,
      "cpu_cores_available": 8,
      "cpu_stolen_rate": 0,
      "cpu_utilization_rate": 36.88946015424165,
      "mem_free": 20557443072,
      "mem_limit": 32884228096,
      "mem_total": 32884228096,
      "swap_total": 0,
      "swap_used": 0
    },
    "uptime": "42851",
    "version": "6.6.1-9213-enterprise"
  }
]
08:54:32.093 INFO ▶ Successfully connected to Key Value service at `95.216.208.78:31941`
08:54:32.099 INFO ▶ Successfully connected to Management service at `95.216.208.78:31351`
08:54:32.103 INFO ▶ Successfully connected to Views service at `95.216.208.78:30103`
08:54:32.105 INFO ▶ Successfully connected to Query service at `95.216.208.78:30386`
08:54:32.106 INFO ▶ Successfully connected to Search service at `95.216.208.78:30561`
08:54:32.108 INFO ▶ Successfully connected to Analytics service at `95.216.208.78:30104`
08:54:32.109 INFO ▶ Successfully connected to Key Value service at `95.217.218.135:32210`
08:54:32.118 INFO ▶ Successfully connected to Management service at `95.217.218.135:31705`
08:54:32.119 INFO ▶ Successfully connected to Views service at `95.217.218.135:30896`
08:54:32.121 INFO ▶ Successfully connected to Query service at `95.217.218.135:30863`
08:54:32.121 INFO ▶ Successfully connected to Search service at `95.217.218.135:30585`
08:54:32.124 INFO ▶ Successfully connected to Analytics service at `95.217.218.135:31413`
08:54:32.131 INFO ▶ Successfully connected to Key Value service at `135.181.30.248:31872`
08:54:32.137 INFO ▶ Successfully connected to Management service at `135.181.30.248:32453`
08:54:32.142 INFO ▶ Successfully connected to Views service at `135.181.30.248:30976`
08:54:32.144 INFO ▶ Successfully connected to Query service at `135.181.30.248:32549`
08:54:32.149 INFO ▶ Successfully connected to Search service at `135.181.30.248:30763`
08:54:32.155 INFO ▶ Successfully connected to Analytics service at `135.181.30.248:30578`
08:54:32.163 INFO ▶ Memd Nop Pinged `95.216.208.78:31941` 10 times, 0 errors, 0ms min, 1ms max, 0ms mean
08:54:32.169 INFO ▶ Memd Nop Pinged `95.217.218.135:32210` 10 times, 0 errors, 0ms min, 0ms max, 0ms mean
08:54:32.182 INFO ▶ Memd Nop Pinged `135.181.30.248:31872` 10 times, 0 errors, 0ms min, 1ms max, 0ms mean
08:54:32.182 INFO ▶ Diagnostics completed

Summary:
[WARN] The hostname specified in your connection string resolves both for SRV records, as well as A records.  This is not suggested as later DNS configuration changes could cause the wrong servers to be contacted
[WARN] Bootstrap host `10-36-0-7.oaf-couchbase-srv.default.svc.cluster.local` is not using the canonical node hostname of `oaf-couchbase-0005.oaf-couchbase.default.svc`.  This is not neccessarily an error, but has been known to result in strange and challenging to diagnose errors when DNS entries are reconfigured.
[WARN] Bootstrap host `10-32-0-19.oaf-couchbase-srv.default.svc.cluster.local` is not using the canonical node hostname of `oaf-couchbase-0003.oaf-couchbase.default.svc`.  This is not neccessarily an error, but has been known to result in strange and challenging to diagnose errors when DNS entries are reconfigured.
[WARN] Bootstrap host `10-35-0-39.oaf-couchbase-srv.default.svc.cluster.local` is not using the canonical node hostname of `oaf-couchbase-0004.oaf-couchbase.default.svc`.  This is not neccessarily an error, but has been known to result in strange and challenging to diagnose errors when DNS entries are reconfigured.

Found multiple issues, see listing above.

I haven’t tried downgrading to 2.x - not sure it works with Typescript…