Cbbackup throws 'Internal server error, please retry your request' while backing up

Hi, we have a Community Couchbase 6.5.1 cluster and while running cbbackup. Recently “Internal server error, please retry your request” messages started appearing in the backup process.

13/10/2021 00:49:42 /[...]/cbbackup couchbase://localhost:8091 /[...]/backup_location -u backup_user -p backup_password -t 6 -m diff
  [####################] 100.0% (1267/estimated 1267 msgs)
bucket: bucket-00-1, msgs transferred...
       :                total |       last |    per sec
 byte  :           1155738291 | 1155738291 | 12492257.8
2021-10-13 00:51:25,648: mt ['Internal server error, please retry your request']
  [####################] 100.0% (1202/estimated 1202 msgs)
bucket: bucket-00-2, msgs transferred...
       :                total |       last |    per sec
 byte  :           2060126397 | 2060126397 | 31824927.9
2021-10-13 00:52:40,540: mt ['Internal server error, please retry your request']
  [####################] 100.0% (1808/estimated 1808 msgs))
bucket: bucket-00-3, msgs transferred...
       :                total |       last |    per sec
 byte  :           3038797690 | 3038797690 | 39012470.3
2021-10-13 00:54:08,575: mt ['Internal server error, please retry your request']
  [####################] 100.0% (8169/estimated 8169 msgs)

What can be the consequence of those messages? Is there any way on avoid having them?

Thanks in advance and kind regards.

I have found some errors on the logs, can any of them be the cause of these issues?

/[...]/logs/info.log:[ns_server:error,2021-10-15T00:00:03.109Z,ns_1@couchbase-001.domain:service_status_keeper_worker<0.432.0>:rest_utils:get_json:62]Request to (indexer) getIndexStatus failed: {error,timeout}
/[...]/logs/indexer.log:2021-10-15T01:32:57.074+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 9.9.9.18:9100.  Error = read tcp 9.9.9.17:54602->9.9.9.18:9100: use of closed network connection. Kill Pipe.
/[...]/logs/indexer.log:2021-10-15T02:13:01.122+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 9.9.9.17:35670.  Error = EOF. Kill Pipe.
/[...]/logs/info.log:[ns_server:error,2021-10-15T02:13:42.971Z,ns_1@couchbase-001.domain:service_status_keeper-index<0.433.0>:service_status_keeper:handle_cast:119]Service service_index returned incorrect status

These error messages repeat throughout the logs. Hostnames and IP addresses were changed to protect the environment.

Hi @dopessoa,

That error message indicates that a REST request dispatched by cbbackup received a 500 status code. To properly debug this issue we’ll need to see the logs.

Please could you provide a log collection (collected via the WebUI under Logs → Collect Information). Along with this, please could you re-run cbbackup whilst supplying -vvv (this will enable verbose debug logging) and provide that information as well.

Please feel free to use log-redaction (provided in the WebUI).

Thanks,
James

@dopessoa

The ClusterManager calls getIndexStatus REST API on every Index node every 5 seconds. The fact that this is timing out might indicate either some kind of network problem or a problem with an Index node becoming overloaded or otherwise unresponsive.

Unfortunately the “Kill Pipe” messages are generally not very diagnostic, as there are many places in the code that log these after a normal, expected closure of a connection, i.e. cases where the only mechanism for the thread on one end of the pipe to realize the task is finished is when the thread on the other side closes it and then the thread on the first end gets and logs these errors when trying to read from the pipe again.

@dopessoa Note that “network problem” can include a firewall blocking a port that Couchbase Server is trying to use. These don’t always return errors indicating that the access was blocked – the connection attempts may just time out.

Thank you for your valuable inputs @Kevin.Cherkauer and @jamesl33!

I will follow up with the suggestions on verbose execution of cbbackup and I will check the if any of Coucbase ports are not being blocked.

If both actions don’t fix the situation or give a clear idea of what is the source of the issue I will upload the logs.

Regards,
Douglas