sync_gateway continuous change feed stops?

user344 · March 23, 2016, 5:47pm

I am running a test with 400 clients pushing&pulling from sync_gateway. Test involves lots of conflicts generated in the database.

At some point, changes feed just stops delivering any changes. Reconnecting clients at this jammed state does not help. SG actually replies with error right away - so does not accept connection. Restarting sync_gateway fixes the problem - all the 400 clients start getting changes again and pull works fine and the whole test finishes correctly.

Is this test maybe already getting to the limits of a single sg-instance? Or could there be some other obvious reason why this is not a feasible test? (at least I have read about some bug where SG is slow on many conflicts?)

adamf · March 23, 2016, 6:03pm

What version of Sync Gateway (and Couchbase Server) are you running?

user344 · March 23, 2016, 6:56pm

Couchbase Sync Gateway/1.2.0(83;41aa099)
Server: Version: 4.0.0-4051 Community Edition (build-4051)

I am running these on Windows 10 64bit.

user344 · March 23, 2016, 7:01pm

I am wondering could there be a case where client leaves some handles open to the SG and eventually jams everything?

adamf · March 23, 2016, 7:09pm

Can you file an issue in the Sync Gateway github, and include steps to reproduce and your Sync Gateway log output?

user344 · March 24, 2016, 6:12am

It takes some time to prepare the reproducable test.

When the system hits jammed state, here is what _changes content look like:

{“results”:[
{“seq”:“6338::75057”,“id”:“0093b13b-5061-4d9b-850f-628145b2cab2”,“changes”:[{“rev”:“2-7a77564d576161756d746c5a446a63306650525667557a4b3430733d”}]}
,{“seq”:“6338::76364”,“id”:“01143174-7b6a-4dbb-9e26-402534bc9c5c”,“changes”:[{“rev”:“2-7a776c4f745261454e337268496d3141444553786e5571564c6a673d”}]}
,{“seq”:“6338::76668”,“id”:“02fa3703-24be-45d8-8361-12e6160ec6da”,“changes”:[{“rev”:“2-7a7a4f396c5862333271506353496f5951635642424c37716a73773d”}]}
,{“seq”:“6338::76962”,“id”:“033e5a84-f192-45bd-8337-1c12dd4959f1”,“changes”:[{“rev”:“2-7a576c524a6d444e31662b30634f56676d69544d6966486d4f37593d”}]}
,{“seq”:“6338::77257”,“id”:“039f9c0d-3dbd-4e5c-91ca-0d0c557cf299”,“changes”:[{“rev”:“2-7a4731785876656c635652444b386a483451766f747958694664593d”}]}

…

,{“seq”:“6338::164211”,“id”:“fe939a86-48cd-4044-b4a0-a63497560fd6”,“changes”:[{“rev”:“2-7a7764386349554b646d3978645857784b755052555632413353733d”}]}
,{“seq”:“6338::164212”,“id”:“fea71a62-6358-43d5-80a5-d8f4dcef1586”,“changes”:[{“rev”:“2-7a744d39793938314f5a784f6e743653463656786c526169316f453d”}]}
],
“last_seq”:“6338::164212”}

So every seq is in this “::” format. And the last checkpoint in clients is 2564::164212, so this is what is sent as “since”-parameter.

user344 · March 25, 2016, 4:53pm

We adjusted the clients a bit so that they will communicate to server less frequently and at the moment I cannot reproduce this anymore. There is a change that this was also a client side lock up. So I will mark this solved and come back with an issue report if I still meet the same.