Automated testing (NUnit), Live Query, Replication and blocking on the Main Thread

paulr · January 4, 2022, 4:37am

I have created a large number of automated tests for my app. There’s one set of tests that I can’t seem to get going right, and I’m looking for some advice… Am I being too ambitious here?

The underlying requirement of all the tests is to have multiple endpoints creating changes that are delivered to other endpoints and to confirm that all Endpoints maintain consistency.

In the tests, I create several EndPoint objects, each containing a Database object (with its own unique db file), a Live Query, and a replication, all created in its own thread. Each endpoint then starts making and saving change documents, which get synced. Each endpoint is also receiving the change documents from the other EndPoints, which are then integrated into their data. The test is to make sure everything stays sane.

I have a variety of ways to create change sets from the EndPoints, each of which represents different tests. Some examples:

slowly create changes from each EndPoint in turn
rapidly create changes from all EndPoints in parallel,
create conflicting changes,
one or more endpoints go offline for a while, then come back and catch up.

… and so on. Many many such tests.

Mostly things work.

The weird thing is that at the end of some of the tests, I need to wait for syncing to settle before testing that all the endpoints are in a sane state. So in my UnitTest code - I have something like this at the end:

    Thread.Sleep(20000);    /// wait for some time to ensure the final sync has started
    while (the state of any replicator is busy)
        Thread.Sleep(500);
    Assert.That( all endpoints have sane state )

Yeah… its ugly. But performance is not important in this test, accuracy is. I need to wait for all replication to have finished before checking for a sane state.

The problem is that once I enter this wait, no further syncing will happen. No matter how long I wait at the “wait for some time…” line, I dont receive the final change events.

It seems to me that the operation of the Replication or LiveQuery is, in some way, blocking on the main thread, even though the Endpoints are all created in separate threads. Looking in the forums, there has been an interesting conversations about the LiveQueries, Main Threads and Java (here). In that conversation, borrrden made the comment:

I work in the .NET version and we don’t have any concept of a main thread in a cross platform sense so by default it uses a thread pool background thread for the same operation here

so I wouldn’t expect LiveQueries to suffer from these problems. However I am running in NUnit, so perhaps something different is going on.

What am I missing? Is it too ambitious to run multiple Databases, Live Queries and Replicators in the same Unit Test app?

Many thanks for any advice you can provide.
Paul.

blake.meike · January 4, 2022, 11:22pm

Well, depending on how your code is structured, that isn’t just ugly, I might well not work. I’m not clear what you mean when you say that “a Database object […], a Live Query, and a replication, all created in its own thread”. The thread on which one of these objects is created doesn’t really have a lot to do with which threads are used to run its methods. It is entirely possible that the thread you are sleeping is one of the ones necessary for syncing!

I suggest that, instead of waiting for a while and crossing your fingers, you try adding listeners to your replicators. The listeners will report that the replicator as idle, when it is, at least temporarily, done working…

I’m curious about your criteria for “sanity”. If you have lots of interdependent replicators, your tests are not going to be particularly deterministic. It will take some doing, even with listeners, to be certain of whether a particular replicator is temporarily idle, because it hasn’t seen the some update, yet, or whether it is done, up to date, and quiescent.

paulr · January 5, 2022, 10:11pm

Hi Blake,

Thanks for your thoughts.

The reason I create those objects in a background thread is a comment I found from jens here:

Couchbase Lite 2’s query listener implementation runs the queries on the database’s default thread. If that’s the UI thread, it can degrade responsiveness. Workaround is to use a db instance that’s associated with a background thread.

I do have listeners on syncing, and I keep a state variable showing the current state of replication (in progress, idle, not started, offline, auth error etc) .

The reason I need to wait is that replication at endpoint A might complete, and then some time later (usually short) replication at endpoint’s B and C will start up and begin receiving A’s changes. Those changes might cause some “correction” changes to be propagated back. So I need to wait a period of time to make sure everything has caught up - even when replications say they are idle - there can be something that still needs to happen. Note that this is only for automated. testing - in the real world with apps on different devices, this is not a problem.

As to the question about sanity and determinism - that’s a really interesting topic (to me anyway). I have multiple endpoints sharing small “change” objects that represent a set of changes made to a large object held at the endpoints. This gives me efficiency (don’t swap the large object) and history. The challenge is sharing and merging those changes asyncronously at the remote ends, especially when the changes are conflicting. This has been a really interesting part of my research and is the reason I am creating this weird set of test cases. The bottom line is that I need to show that a set of changes created at remote endpoints and shared (synced) with each other, possibly out of order (or later if one endpoint is offline) has to result in the large object looking the same at all endpoints.

I still think that, given Jens comment above, there is something I dont understand going on, because live query or replication stops if the main thread is in a Thread.Sleep (or Task.Delay - I have tried making the test cases async, but that doesn’t help).

Cheers!
Paul

blake.meike · January 7, 2022, 6:31pm

Yeah… Jens is, absolutely, correct. You should not be creating Databases or Replicators on the UI thread. I believe the same goes for LiveQuerys, but I’m not positive.

Creating the object on a thread, however, does NOT “associate” it with that thread. You will need to be sure that any processing that you with the object also takes place off the UI thread.

… and, just a though, here. I totally appreciate your curiosity. I certainly share your interest in determinism… as does the entire Couchbase Lite Team! ;-p It is our bread and butter. Not only do we test and verify our product in exactly this scenario. So do our customers who have been using our products for many years.

One of the reasons you might pay for CBL is because we do this. Your tests will probably get you a better feel for the system and how it works. As tests, though, the best you are gonna do is to test the things you can think of…

As a way of managing determinism, you might try running one-shot replications. If you do that, you will have complete control over when the replications run. If you keep running them until they don’t do anything, when they are run, then the system is in a stable (final) state.

Really hope that helps

paulr · January 9, 2022, 2:39am

I am definitely ensuring that all activity against the replicator and db are performed on background threads. All of my changes are of the form:

    var tasks = new[] {
        Task.Run(() => endPoint1.makeChanges()),
        Task.Run(() => endPoint2.makeChanges()),
        ...
    };
    var ret = Task.WaitAll(tasks, Threading.Timeout.Infinite);

I can see that the LiveQuery change event handlers are also being called on background threads.

Here’s the thing that confuses me - I block the main thread at the end of the test (via Thread.Sleep or Task.Delay) to wait for all change events to complete, and that causes change events to stop firing. In theory that shouldn’t happen and I dont know why.

Anyway - thanks for your comments. I’m going to look at splitting these endpoints into separate subprocesses.

BTW - the determinism I am testing here is my ability to merge in a stream of (possibly incompatible) changes at different endpoints in a deterministic way. I am not testing CBL syncing - I am assuming that works perfectly!

Cheers.
Paul

blake.meike · January 10, 2022, 6:25pm

Best of luck! Feel free to ping us if you encounter other problems.

There are places in the code that depend on the main thread (for Android, the UI thread, for Java, the application main thread) to, for instance, move a task from a native thread (spawned by LiteCore) to the Executor on which it will be handled. If you block the main thread there are some places that the task transfer may not take place.