Xcdr replication causing goxcdr to restart


#1

Hey guys,

Over the past week we have been experiencing issues where our xcdr replication to elastic search has become significantly slow and has caused the goxcdr to restart. Sometimes the memory of the 3 couchbase node increases quite signficantly and has caused one node to crash because it has ran out of memory. Here is the logs related to this

Service 'goxdcr' exited with status 1. Restarting. Messages: CapiNozzle 2016-06-28T12:59:02.907Z [INFO] capi_3b057ef39c773498bc80e763d779824c/HeadOfficeTest/production_pa_rev1641_172.31.10.25:9091_0 send batch count=150 for vb 123 CapiNozzle 2016-06-28T12:59:02.976Z [INFO] capi_3b057ef39c773498bc80e763d779824c/HeadOfficeTest/production_pa_rev1641_172.31.10.25:9091_1 send batch count=98 for vb 395 CapiNozzle 2016-06-28T12:59:02.999Z [INFO] capi_3b057ef39c773498bc80e763d779824c/HeadOfficeTest/production_pa_rev1641_172.31.10.25:9091_0 send batch count=223 for vb 13 CapiNozzle 2016-06-28T12:59:03.040Z [INFO] capi_3b057ef39c773498bc80e763d779824c/HeadOfficeTest/production_pa_rev1641_172.31.10.25:9091_1 send batch count=136 for vb 583 [goport] 2016/06/28 12:59:06 /opt/couchbase/bin/goxdcr terminated: signal: killed

my settings for the xcdr are as follows:
http://imgur.com/ftmGFZD

We also notice that on the xdcr it was encountering additional errors such as:
CapiNozzle 2016-06-28T01:38:52.043Z [ERROR] Received error when writing boby part. err=write tcp 172.31.9.185:41914->172.31.11.37:9091: i/o timeout CapiNozzle 2016-06-28T01:38:52.043Z [ERROR] capi_3b057ef39c773498bc80e763d779824c/HeadOfficeTest/production_pa_rev1641_172.31.11.37:9091_33 batchUpdateDocs for vb 713 failed with err write tcp 172.31.9.185:41914->172.31.11.37:9091: i/o timeout. CapiNozzle 2016-06-28T01:38:52.044Z [ERROR] capi_3b057ef39c773498bc80e763d779824c/HeadOfficeTest/production_pa_rev1641_172.31.11.37:9091_33 error updating docs on target. err=batch update docs failed for vb 713 after 3 retries CapiNozzle 2016-06-28T01:38:52.044Z [ERROR] capi_3b057ef39c773498bc80e763d779824c/HeadOfficeTest/production_pa_rev1641_172.31.11.37:9091_33 raise error condition batch update docs failed for vb 713 after 3 retries

I tried looking up to see what this means but I cannot seem to find any answers related to this problem. I was wondering if anyone has any insight into what may be the issue here?
Thanks,
Andrew


#2

Hi Andrew, what versions of Couchbase Server, Elasticsearch, and Elasticsearch Transport Plugin are you using? Best, -Will


#3

Hey @WillGardella, my Couchbase is 4.5, elasticsearch 1.7.5 and my transport is 2.1.1
Regards Andrew