XDCR fails to resume on one node


#1

I stopped XDCR for few minutes. After I resumed it, one node out of 4 (called cb3.company) stopped transmitting data do my backup cluster (company_bkp). Backup cluster is just one machine.
I have 4.1.0-5005 Community Edition on both ends.

There are following errors in goxdcr.log, every time I do the stop/start procedure:
GenericSupervisor 2017-08-11T14:51:03.679+02:00 [ERROR] Received error report : map[dcp_06189a4a72b9f0553d754f56502b5e0a/default/company_bkp_cb3.company:11210_0:Dcp is stuck for dcp nozzle dcp_06189a4a72b9f0553d754f56502b5e0a/default
/company_bkp_cb3.company:11210_0]
ReplicationManager 2017-08-11T14:51:03.679+02:00 [INFO] Supervisor PipelineSupervisor_06189a4a72b9f0553d754f56502b5e0a/default/company_bkp of type *supervisor.GenericSupervisor reported errors map[dcp_06189a4a72b9f0553d754f56502b5e0a/
default/company_bkp_cb3.company:11210_0:Dcp is stuck for dcp nozzle dcp_06189a4a72b9f0553d754f56502b5e0a/default/company_bkp_cb3.company:11210_0]
StatisticsManager 2017-08-11T14:51:04.328+02:00 [INFO] Pipeline is no longer running, exit.
GenericSupervisor 2017-08-11T14:51:04.328+02:00 [INFO] Executing 0x6aa0f0 timed out
GenericSupervisor 2017-08-11T14:51:04.329+02:00 [INFO] ****************************
GenericSupervisor 2017-08-11T14:51:04.329+02:00 [INFO] Received timeout error when checking pipeline health. topic=06189a4a72b9f0553d754f56502b5e0a/default/company_bkp
StatisticsManager 2017-08-11T14:51:04.329+02:00 [INFO] expvar=Stats for pipeline 06189a4a72b9f0553d754f56502b5e0a/default/company_bkp-159877804 {“Errors”: “[]”, “Overview”: {"": 0, “bandwidth_usage”: 0, “changes_left”: 104522, “data_r
eplicated”: 0, “dcp_datach_length”: 0, “dcp_dispatch_time”: 0, “deletion_docs_written”: 0, “deletion_failed_cr_source”: 0, “deletion_filtered”: 0, “deletion_received_from_dcp”: 0, “docs_checked”: 1358367503, “docs_failed_cr_source”: 0,
“docs_filtered”: 0, “docs_opt_repd”: 0, “docs_processed”: 1358367503, “docs_received_from_dcp”: 0, “docs_rep_queue”: 0, “docs_written”: 0, “expiry_docs_written”: 0, “expiry_failed_cr_source”: 0, “expiry_filtered”: 0, “expiry_received_
from_dcp”: 0, “num_checkpoints”: 0, “num_failedckpts”: 0, “rate_doc_checks”: 0, “rate_doc_opt_repd”: 0, “rate_received_from_dcp”: 0, “rate_replicated”: 0, “resp_wait_time”: 0, “set_docs_written”: 0, “set_failed_cr_source”: 0, “set_filt
ered”: 0, “set_received_from_dcp”: 0, “size_rep_queue”: 0, “time_committing”: 0, “wtavg_docs_latency”: 0, “wtavg_meta_latency”: 0}, “Progress”: “The runtime context is started”, “Status”: “Pending”}

GenericSupervisor 2017-08-11T14:51:13.328+02:00 [INFO] About to call function
GenericSupervisor 2017-08-11T14:51:13.329+02:00 [INFO] Pipeline is no longer running, exit.
GenericSupervisor 2017-08-11T14:51:13.329+02:00 [INFO] Finish executing 0x6aa0f0
CheckpointManager 2017-08-11T14:51:30.248+02:00 [INFO] Pipeline is no longer running, exit.
CheckpointManager 2017-08-11T14:51:30.248+02:00 [INFO] Exits massCheckVBOpaquesJob routine.