[2.0/2.1] Can't stop Replication

.net

#1

Hi,

CBL 2.1 and 2.0.0
SGW 2.1
CBS (CE) 5.1.1 build 5723

update: After even 2 minutes the replicator still does not stop

I can’t seem to stop replication anymore. I made a simple solution to rule out the logic of my app. Regardless of continuous or non-continuous, push pull or push or pull, it does not seem to stop… .

Did something change since the last release? Or someone with the same experience?

thanks

2.0.0 seemed to work fine before SGW 2.1.

      Replicator rep = new Replicator(_replicatorConfig);
        ReplicatorActivityLevel current = ReplicatorActivityLevel.Offline;

        rep.Start();
        Stopwatch watch = new Stopwatch();
        watch.Start();
        
        var token = rep.AddChangeListener((sender, args) =>
        {
            current = args.Status.Activity;
            Android.Util.Log.Debug("[Mobile]", $"{args.Status.Activity.ToString()}, {args.Status.Progress.Completed}/{args.Status.Progress.Total}");
        });
            rep.Stop();
        while (current != ReplicatorActivityLevel.Stopped)
        {
            Android.Util.Log.Debug("[Mobile]", $"Replicator not stopped yet after {watch.Elapsed.Minutes}:{watch.Elapsed.Seconds}");
            
            Task.Delay(500).Wait();
        }
        Android.Util.Log.Debug("[Mobile]", $"Replicator has Stopped");
        rep.Config.Database.Delete();

#2

Can you post SG logs, too? What does the change listener show?


#3

CBL logs (with Sync logging on) will be more useful than SG logs.


#4

Ok,

Seems like CBL recieves the stop command but ignores it in busy state. Sync Gateway does not receive the stop command until I stop running the solution (@ 25 seconds)

CBL output ([Mobile] and [Replication] are my tags).

09-18 06:59:01.780 D/[Replication](31132): RestartCommand: will try to restart item replicator
09-18 06:59:01.782 D/Mono    (31132): DllImport searching in: 'LiteCore' ('libLiteCore.so').
09-18 06:59:01.782 D/Mono    (31132): Searching for 'c4repl_stop'.
09-18 06:59:01.782 D/Mono    (31132): Probing 'c4repl_stop'.
09-18 06:59:01.783 D/Mono    (31132): Found as 'c4repl_stop'.
09-18 06:59:01.784 D/[Mobile](31132): Replicator not stopped yet after
Thread started:  #24
09-18 06:59:01.804 I/CouchbaseLite(31132): [25] 2018-9-18 06:59:01.803+00:00 Sync 2018-09-18T06:59:01.8034300+00:00 {Repl#2} Told to stop!
09-18 06:59:01.805 V/CouchbaseLite(31132): [24] 2018-9-18 06:59:01.805+00:00 Sync 2018-09-18T06:59:01.8047480+00:00 (WebSocketWrapper) [24] WebSocket sent HTTP request...
09-18 06:59:01.806 I/CouchbaseLite(31132): [25] 2018-9-18 06:59:01.806+00:00 Sync 2018-09-18T06:59:01.8058770+00:00 {Repl#2} activityLevel=busy: connectionState=3
09-18 06:59:01.807 I/CouchbaseLite(31132): [25] 2018-9-18 06:59:01.807+00:00 Sync 2018-09-18T06:59:01.8069390+00:00 {Repl#2} now busy
The thread 0x15 has exited with code 0 (0x0).
The thread 0x16 has exited with code 0 (0x0).
Thread started:  #25
09-18 06:59:01.829 I/CouchbaseLite(31132): [25] 2018-9-18 06:59:01.829+00:00 Sync 2018-09-18T06:59:01.8290240+00:00 (Replicator) [25] Replicator[<> ws://xx.xx.xx.xx:4984/item] is Busy, progress 0/0
Thread finished:  #23
09-18 06:59:01.891 I/CouchbaseLite(31132): [7] 2018-9-18 06:59:01.891+00:00 Sync 2018-09-18T06:59:01.8912660+00:00 (HTTPLogic) [Thread Pool Worker (7)] Auth challenge received for xxx
Thread started:  #26
The thread 0x17 has exited with code 0 (0x0).
09-18 06:59:02.303 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:02.804 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:03.305 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:03.806 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:04.306 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:04.434 W/CouchbaseLite(31132): [27] 2018-9-18 06:59:04.431+00:00 Sync 2018-09-18T06:59:04.4312960+00:00 (WebSocketWrapper) [27] Timed out waiting for _writeQueue to finish, forcing Dispose...
Thread finished:  #26
Thread started:  #27
09-18 06:59:04.486 I/CouchbaseLite(31132): [28] 2018-9-18 06:59:04.485+00:00 Sync 2018-09-18T06:59:04.4849440+00:00 (WebSocketWrapper) [28] WebSocket connecting to xx.xx.xx.xx:4984
Thread finished:  #27
Thread started:  #28
09-18 06:59:04.522 V/CouchbaseLite(31132): [29] 2018-9-18 06:59:04.522+00:00 Sync 2018-09-18T06:59:04.5219570+00:00 (WebSocketWrapper) [29] WebSocket sent HTTP request...
Thread finished:  #28
Thread started:  #29
09-18 06:59:04.562 I/CouchbaseLite(31132): [13] 2018-9-18 06:59:04.561+00:00 Sync 2018-09-18T06:59:04.5615020+00:00 (WebSocketWrapper) [Thread Pool Worker (13)] WebSocket CONNECTED!
The thread 0x1a has exited with code 0 (0x0).
The thread 0x1b has exited with code 0 (0x0).
The thread 0x1c has exited with code 0 (0x0).
09-18 06:59:04.575 D/Mono    (31132): DllImport searching in: 'LiteCore' ('libLiteCore.so').
09-18 06:59:04.576 D/Mono    (31132): Searching for 'c4socket_gotHTTPResponse'.
09-18 06:59:04.576 D/Mono    (31132): Probing 'c4socket_gotHTTPResponse'.
09-18 06:59:04.576 D/Mono    (31132): Found as 'c4socket_gotHTTPResponse'.
Thread started:  #30
09-18 06:59:04.592 D/Mono    (31132): DllImport searching in: 'LiteCore' ('libLiteCore.so').
09-18 06:59:04.592 D/Mono    (31132): Searching for 'FLSliceResult_Free'.
09-18 06:59:04.592 D/Mono    (31132): Probing 'FLSliceResult_Free'.
09-18 06:59:04.592 D/Mono    (31132): Found as 'FLSliceResult_Free'.
09-18 06:59:04.593 I/CouchbaseLite(31132): [31] 2018-9-18 06:59:04.593+00:00 Sync 2018-09-18T06:59:04.5925900+00:00 {Repl#2} activityLevel=busy: connectionState=3
09-18 06:59:04.596 D/Mono    (31132): DllImport searching in: 'LiteCore' ('libLiteCore.so').
09-18 06:59:04.596 D/Mono    (31132): Searching for 'c4socket_opened'.
09-18 06:59:04.596 D/Mono    (31132): Probing 'c4socket_opened'.
09-18 06:59:04.596 D/Mono    (31132): Found as 'c4socket_opened'.
Thread started:  #31
Thread finished:  #29
09-18 06:59:04.621 I/CouchbaseLite(31132): [32] 2018-9-18 06:59:04.621+00:00 Sync 2018-09-18T06:59:04.6212080+00:00 {Repl#2} Connected!
09-18 06:59:04.622 I/CouchbaseLite(31132): [32] 2018-9-18 06:59:04.622+00:00 Sync 2018-09-18T06:59:04.6221590+00:00 {Repl#2} activityLevel=busy: connectionState=3
09-18 06:59:04.808 D/[Mobile](31132): Replicator not stopped yet after
The thread 0x1d has exited with code 0 (0x0).
09-18 06:59:05.309 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:05.810 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:06.310 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:06.811 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:07.312 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:07.813 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:08.314 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:08.815 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:09.315 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:09.816 D/[Mobile](31132): Replicator not stopped yet after
Thread finished: <Thread Pool> #8
The thread 0x8 has exited with code 0 (0x0).
09-18 06:59:10.317 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:10.818 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:11.319 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:11.820 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:12.321 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:12.822 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:13.323 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:13.824 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:14.326 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:14.827 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:15.328 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:15.829 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:16.330 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:16.831 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:17.332 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:17.834 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:18.335 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:18.836 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:19.336 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:19.837 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:20.338 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:20.839 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:21.340 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:21.841 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:22.343 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:22.844 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:23.345 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:23.846 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:24.347 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:24.847 D/[Mobile](31132): Replicator not stopped yet after
09-18 06:59:25.348 D/[Mobile](31132): Replicator not stopped yet after

SGW

2018-09-18T08:59:00.908+02:00 [INF] HTTP:  #5583: GET /item/ (as xxx)
2018-09-18T08:59:00.909+02:00 [INF] HTTP+: #5583:     --> 200   (1.0 ms)
2018-09-18T08:59:02.025+02:00 [INF] HTTP:  #5584: GET /item/_blipsync (as GUEST)
2018-09-18T08:59:02.025+02:00 [INF] HTTP: #5584:     --> 401 Login required  (0.0 ms)
2018-09-18T08:59:04.734+02:00 [INF] HTTP:  #5585: GET /item/_blipsync (as xxx)
2018-09-18T08:59:04.734+02:00 [INF] HTTP+: #5585:     --> 101 [2e03f14d] Upgraded to     BLIP+WebSocket protocol. User:xxx.  (0.0 ms)
  2018-09-18T08:59:04.735+02:00 [INF] WS: [2e03f14d] Start BLIP/Websocket handler
2018-09-18T08:59:04.735+02:00 [DBG] WS+: [2e03f14d] Sender starting
2018-09-18T08:59:25.763+02:00 [DBG] WS+: [2e03f14d] receiveLoop stopped
2018-09-18T08:59:25.763+02:00 [DBG] WS+: [2e03f14d] parseLoop stopped
2018-09-18T08:59:25.763+02:00 [DBG] WS+: [2e03f14d] Sender stopped
2018-09-18T08:59:25.764+02:00 [DBG] HTTP+: [2e03f14d] #5585:    --> BLIP+WebSocket connection closed

Is there a mock mechanism to throw an error or stall the replication for testing purposes. I don’t want this to happen in production and want to test it thoroughly


#5

You’ve hit the nail on the head here. This is something that is difficult to do as it requires both knowledge of the protocol and a proxy to interfere. This is something I’ve been also wanting to create but it will require a lot of hours to do. I wrote a C API for decoding messages, which I want to expand to encoding later, but it is only half done in its current form.

It looks like you are stopping the replicator almost immediately after starting it. Is that correct?


#6

Yes, just for test purposes. I just work with a bool for test purposes. If it is true, it will try a restart. So no delay or exception from the replication because I can’t mock it :slight_smile:

I just want a ‘what-if’ before it goes into production. We sometimes notice that the replication hangs (in debug and release mode). An app restart solves it. But that’s no solution when it goes to production. So I have built I kind of restart mechanism. If it does not stop after 20 seconds I dispose and reinitialize the replicator. After 3 times I give a notice to the user. Works for now… .


#7

Right, the “hanging replicator” is probably the most common class of issue that we work against. Any reproduction cases will be worth their weight in gold (not literally…sorry!). I solved quite a few for 2.1 but I imagine there are still more edge cases out there. Let me know if there is any specific information you find out.