Some documents were not synced at initial sync. After trying 10x it worked without making any changes

Topic includes CBL (Android), SG and Couchbase Server

Android version is 6.0.1
SG version is 1.2
Couchbase Server version is 4.0.0 CE

Cluster setup: 4 nodes; each node has 8GB RAM, 2 cores; RAM is at each node at about 60%; CPU is at each core at about 20%, often lower; maybe 100 ops/second, 90% gets, often lower

App is live for 3 months and this is the first time I have encountered the error detailed below. There were no recent code changes.

Device 1 signs in with account, documents are created and synced. Device 2 signs in with same account, documents are pulled down. Then an error occured in my app. It turned out that a handful of documents were not pulled down and saved locally on Device 2. There were about 20 documents to sync, possibly less than 1MB in total. I also tried to find the missing documents in the Couchbase web interface. I found at least one, but not the others. After debugging for hours and trying the init sync on Device 2 over and over again after a clean install (Android), it suddenly worked. All documents were pulled. Then I made changes to the documents on Device 2 and they were synced to Device 1 and vice versa. Still I could not find some documents in the Couchbase web interface.

I cannot reproduce this error in any way as it seemed very inconsistent.

I assure you that I very carefully tested and double checked document IDs. This observation is very worrisome to me. It is very hard to trust Couchbase. Please let me know if there are known issues which could have resulted in this error.

Both devices continously push and pull changes.

1 Like

A few questions to clarify the exact scenario:

  1. When you say you can’t find documents in the Couchbase web interface (I assume this is the Couchbase Server Admin Console- these are documents that have been successfully replicated from Device 2 to Device 1? Are you able to retrieve these documents via the Sync Gateway Admin REST API (doing a simple GET on the doc id)?
  2. For your cluster setup - what’s the breakdown of Sync Gateway and Couchbase Server within the 4 nodes?
  3. It would be helpful to know whether the documents are visible when you issue a _changes request as the user against the Sync Gateway REST API - that will narrow down whether you’re seeing a Sync Gateway or CBL issue.
  4. Can you clarify the version of CBL you’re using (there isn’t a 6.0.1 of CBL Android).
  1. Yes, that’s the Couchbase Server Admin Console and documents have been synced on both devices. At least 1 document is not found in the Couchbase Server Admin Console and this document is also not found doing a GET request. This request was executed on 1 node of the couchbase cluster
    curl -X GET http://localhost:4985/<db_name>/<docId>
    Answer is {"error":"not_found","reason":"missing"}
    Document replication is set to 1. Still I assume that I need to execute the command only on 1 server not on at least 3 (out of 4) nodes to find the document. Other documents can be found with this command. Making changes to the missing document on Device 1 will be synced to Device 2 and vice versa.

  2. All nodes have the same setup. SG and Couchbase Server is installed on all nodes.

  3. I’d need some guidance here please. I was able to execute this command on a node
    curl -X GET http://localhost:4985/<db_name>/_changes?limit=2 and received 2 changes. I don’t know how to add this code to my app so that I’m signed in as a user. Would I need to use a LiveQuery. Could you point me to some documentation please?
    I also tried to these 2 commands on 1 node:
    curl -X GET http://localhost:4985/<db_name>/_changes?doc_ids={<docId}
    curl -X GET http://localhost:4985/<db_name>/_changes?doc_ids={<'docId'}
    following this guide: http://developer.couchbase.com/documentation/mobile/current/develop/references/couchbase-lite/rest-api/database/get-changes/index.html The guide is for CBL. It seemed to return all changes ever. So I’m doing something wrong here.

  4. CBL version is 1.2.1; Android version is 6.0.1

  1. I was able to get the changes feed by user session id. The document id of the missing document does not appear in the changes feed. Many others appear, I did not check everything. Param doc_ids did not seem to work.

This is how I received the changes feed:

  1. Download Postman for Chrome Browser
  2. Download Postman Interceptor for Chrome Browser
  3. Turn Postman Interceptor on, open Postman, click Postman Intercepter icon at the top
  4. GET request http://<couchbase_cluster_ip>/<db_name>/_changes
    Headers: key=cookie; value=SyncGatewaySession=<session_id>

Thanks for the followup investigation.

Couchbase LIte uses the _changes API to replicate changes. So if changes to the document made on Device 1 are getting synchronized to Device 2 (and you’re not doing peer-to-peer replication), then there’s something different between the _changes requests being issued by Couchbase Lite, and the one you’re making via CURL.

It’s possible that CBL is hitting a different SG node than the one your testing. As you say, it shouldn’t matter which SG node you hit - they should all return the same results. To help narrow down the problem, you could try the same curl request against each of the SG nodes.

For a filtered changes request by doc id, you should be able to use:
curl -X GET 'http://localhost:4985/<db_name>/_changes?filter=_doc_ids&doc_ids=\["<doc_id>"\]'

There is no peer-to-peer replication. Everything is handled by SG.

I get this result
{"results":[ ], "last_seq":0}
after executing your command on all 4 nodes. Params were replaced by db name and doc Id. I get a valid result for another doc Id. The “good” doc Id works also in the Couchbase web interface/console.

After making changes on Device 1 (or 2) to the doc which cannot be found and has no changes, the changes get synced to Device 2 (or 1). Executing the above command on all 4 nodes afterwards yields no result again. And finally I checked that the doc Id of the missing doc is correct. It is - there is no copy-paste error.

To further diagnose what’s going on, I think you’d need to review:

  1. The client logs from Device 1 and Device 2
  2. The Sync Gateway logs from the node where Device 1 is pushing the change
  3. The Sync Gateway logs from the node where Device 2 is pulling the change

Assuming you’ve got a load balancer and the SG node isn’t deterministic for #2 and #3, you can look for a _bulk_docs request being made by your user (for #2), and a _changes request being made by your user (for #3)

If any of these logs can be sanitized and shared, I’d be happy to take a look.

Correct!

I added this line to the Android app: Manager.enableLogging("Sync", Log.VERBOSE); There are sync related logs for both devices. I can also add more logs following this guide. One device pushes 1 change and the other device pulls the change. Also interesting to note is that 1 device prints 1000 log lines, the other one only 1/4 of it. Would these logs be helpful? Could I send them to you via private message?

Those logs would definitely be helpful, but also important are the corresponding Sync Gateway logs. You can send the logs for all four SGs if you can’t work out which ones are being targeted. Sending via PM is great - thanks.

@adamf

A user of my app is experiencing the same problem I had two weeks ago. The problem is in short that one document is not syncing at the initial sync. I was able to execute two commands on my server to see if there is anything wrong:

curl -X GET 'http://localhost:4985/<db-name>/_changes?filter=_doc_ids&doc_ids=\["8dd910e3-1b96-45dc-aad5-a5d32691b2c6"\]'
returns
{"results":[ {"seq":3740573,"id":"8dd910e3-1b96-45dc-aad5-a5d32691b2c6","changes":[{"rev":"34-5508fb12c018fb72ceec298fc68891df"}]} ], "last_seq":3740573}

curl -X GET 'http://localhost:4985/<db-name>/8dd910e3-1b96-45dc-aad5-a5d32691b2c6'
returns the document.

I can also view the document in the Couchbase Console in a browser. The document is of the same type of the documents which were not synced with me two weeks ago. The user has other documents of the same type which get pulled down at first sync. There is only about 20 documents and in total they weigh less than 2MB. SG and Couchbase cluster is working fine and not overloaded.

It would be great if all documents were synced at the initial sync and I’m willing to give you all information so that issue can be solved. But at the moment I’m more interested in a workaround so that my users are happy.

I have two ideas how to solve this issue in my app. The conditions are that I know the document ID.

  1. Create a document with the ID of the missing document. I hope that afterwards the contents and revisions of the missing document are written to the manually created document.

  2. Make a REST request to get the missing document and create it manually.

What is the better workaround? Would I introduce new issues with any of the workarounds?

In this scenario, you’re certain that the user has security access to the document in question? (i.e. the document is in a channel that the user has been granted access to). That’s the most common cause of documents not being replicated (when the document actually exists in the DB).

I don’t know of any scenarios where a document doesn’t get replicated to a user that should get it - I would definitely like to get more specifics on how to reproduce the issue, if you’ve got them.

Updating the document (to generate a new revision) would force that document to get replicated to users. This should work even without changes made to the body of the document.

The document which was not synced has the same channel as a document of the same type which was synced. I’ll send you the gist with both documents via PM. If there is more information I can gather please let me know.

I’ll try this workaround

No, I wouldn’t recommend creating another document - you’re just going to end up with two conflicting documents at that point. Bumping the revision on the Sync Gateway side is the best attempt at a workaround. Will follow up on the PM to discuss how to repro.

I created a new document and that didn’t fix the issue. The document did not sync and remained empty. I sent another PM. I’m confident that this issue is solvable. Thank you for your continued support!

@adamf Same problem, but made changes to app so that I can provide logs quicker. I made a private gist with nearly all sync gateway logs (1 out of 4 is missing, sorry), log of Android device with Couchbase logs set to verbose, and also the single missing document which can be viewed in the Couchbase web console. If I can improve the logs in anyway please let me know. I’m more than happy to do so.

@adamf did you have any chance to look into the logs. The scenario I described in this thread just happened to my own account again. Thus this issue is no edge case at all but rather happens on a regular basis. My workaround is to offer the user to delete the missing lists so that they at least can keep using the app. This is not acceptable. When the init sync starts I expect all documents to be pushed down

##Found the issue
CBL 1.2.1 works, CBL 1.3.0 doesn’t. I build a simple Android app which only starts continuous push and pull replication. I use my new cluster which runs CE 4.1 and SG 1.3. Data was imported from old cluster. I’m the only user of the cluster. I authorize myself in the app to only receive data of my account

Logs sent via DM

#####Details

  • Manager.enableLogging(“CBLite”, Log.VERBOSE);

  • Logcat is set to verbose

  • there are 8 files in the gist:

  • first test worked. CBL 1.2.1 and SG 1.3 were running

  • second test failed. A lot of documents are missing! CBL 1.3 and SG 1.3 were running

  • the lines which start with “Sync progress” are logged using this code:

    private synchronized void updateSyncProgress(int completedCount, int totalCount,
    Replication.ReplicationStatus status) {
    Log.d(“Sync progress”, “” + completedCount + " | " + totalCount + " | " + status);
    }

  • documents are logged using the all-docs query and I clicked a button in the app to do so. I did this after it seemed that there are no more documents to be synced

  • the replication is continuous and is never stopped in the app in any way!

I strongly believe that there is an issue with CBL 1.3.0. If this is not your expertise could you forward it to a team member please. Can I provide other logs?

###More findings to last post

Following command
curl -X GET http://localhost:4985/<db-name>/_user/<user-id>
results in this JSON:

{ "name": "<user-id>", "admin_channels": [ "<user-channel-1>, "<user-channel-2>" ], "all_channels": [ "!", "<user-channel-1>", "<user-channel-2>" ] }

If I add the channels in the Android app code using CBLite 1.3

List<String> channels = new ArrayList<>();
channels.add("<user-channel-1>");
channels.add("<user-channel-2>");
pullReplication.setChannels(channels);
pullReplication.setContinuous(true);`

there is no difference compared to adding no channels. See last post for details. If I only add the first channel I’ll get more documents but there are still some missing. I don’t get any documents of the second channel. If I only add the second channel it looks like I get all documents of the second channel, but none of the first.

Using 1.2.1 I get all documents of all channels without adding channels to the replication

Sorry for the 3rd reply in a row.

The following information go back to the original post. I was about to publish an update with CBL 1.2.1 in the hopes that this fixes the problem. But I received many user emails and was able to verify that the problem persists with CBL 1.2.1. The problem is that not all documents are synced down ever. Again, I have a new cluster with an imported backup and where I’m the sole user. Hence I can collect clean logs, so please reach out, so that I can provide these logs and get the issue fixed!

@benjamin_glatzeder Great - can you file a new Sync Gateway issue with details on how to reproduce here:

I expect it will be easier to track the issue there.