Where would an error with a sync gateway resync operation be logged?


#1

I am trying to run a resync on my production server but it keeps failing after about an hour by just sending back an empty response to the curl POST _resync request I made. I am using Couchbase Server Enterprise 3.0.2, sync gateway enterprise 1.0.3. I have run resyncs successfully before, and I was able to run the resync on our development database (which is sync gateway 1.1.0) successfully with the same sync gateway config document (only difference is server name in databases config section). The sync-gateway log does not show any error message, but I originally only had HTTP and Access log flags enabled so that the log file would not get too large to open with a text editor. Which flags should I enable to be able to see an error message, or would it log an error with a resync somewhere else? Sync gateway itself runs fine throughout the entire process and is still running normally after the failure.

I am re-running it now and I can see that it is not using an excessive amount of CPU or RAM while running the resync, but that memory usage is steadily increasing. It started at about 14% of RAM and now it is up to 27% of RAM after running for about 15 minutes. The database being resynced has about 2 million documents in it.

Thanks,
Alex


#2

Update: At about 30 minutes in sync gateway was using 57.3% of RAM, then it crashed the VM and my node went down. Couchbase console reported that the node was using 97% of RAM before it lost connection with the node.


#3

@alexegli

CRUD and CRUD+ log channels are used during a _resync for some code paths, also check that you are not explicitly setting the log level > 1 via the ADMIN REST API /_logging call as that will mask general logging during _resync


#4

@alexegli

This looks like a potential bug, can you create a new ticket on the Sync Gateway github repo.

Andy


#5

This is for sync gateway v1.0.3 though, and I did not have this issue on my server with a similar amount of data running sync gateway v1.1.0. It ran for 6 hours before finishing, though when it did finish it didn’t output the usual {“changes”:127211} response. But it didn’t seem to be leaking memory so I’m going to upgrade our production to sync gateway v1.1.0 and see if that fixes it. If it works on sync gateway v1.1.0 should I still open a bug about v1.0.3?


#6

@alexegli

If SG 1.1.0 works in your environment then there is no need to open a ticket.

Andy


#7

SG 1.1.0 seems to be working for us now. It says it has about 1.1 million documents to change channel access for, and so far it’s been running for 3 hours with steady memory and CPU usage. It’s not even halfway done but so far it’s ok. So I think the memory leak was probably just in sync gateway 1.0.3.