Some documents stopped syncing after 2/3 Couchbase Server crashed

2 out of 3 Couchbase Server servers crashed nearly at the same time around a month ago. Since then some documents stopped syncing. Most have an empty body and were tombstoned. And they look like this:

{
  "meta": {
    "id": "PSKh7fwzmdhzy4ROVJIMqClGMJN2::0kz4d65oyb37::shoppingListItem::rn4gntrl4ukv::d7779720a2cf",
    "rev": "50779-15bba49a94ba00000000000000000000",
    "expiration": 0,
    "flags": 0,
    "type": "json"
  },
  "xattrs": {
    "_sync": {
      "rev": "3-bcff03dc7def8cc702a0f23f3e67d305",
      "flags": 1,
      "sequence": 129308174,
      "recent_sequences": [
        129308174
      ],
      "history": {
        "revs": [
          "2-b9bb0e2018ee8ab3291ace94a74f9042",
          "3-bcff03dc7def8cc702a0f23f3e67d305",
          "1-683e3e5b1f4e33cf7172e9e5960c2a65"
        ],
        "parents": [
          2,
          0,
          -1
        ],
        "deleted": [
          1
        ],
        "channels": [
          null,
          null,
          null
        ]
      },
      "cas": "0x0000ba949aa4bb15",
      "value_crc32c": "0x297bd0aa",
      "tombstoned_at": 1566026279,
      "time_saved": "2019-08-17T09:17:59.267566545+02:00"
    }
  }
}

As this is used in a shopping list app users might delete items and add them in the future again. A document name is derived by the shopping list item name and thus deleted documents may be created again.

The other findings is that documents cannot to be found on the server but are on the user’s device. They sent custom logs. The custom logs show that the document has been updated locally. So the documents has a higher revision number, and some fields, etc.

I’m also logging everything I could have thought of in the SG function but the documents don’t appear there. So I don’t think they get rejected by the SG function.

For some users it helped when I ran this command:

curl -X POST "http://localhost:4985/my_db/_compact" -H  "accept: application/json"

Tombstone Purge Interval is set to 30 days. The document above was deleted less than 24 hours ago. So a solution may be to set the interval very low and run the _compact command frequently via a cron tab. But this doesn’t sound right to me.

What are other options to explore? Stop service, make a backup, flush bucket, import backup? Might this help?

Thanks!

SG version 2.1.2
CB Server 6.0
CBL Android 1.4.4

Can you clarify the question - Are you looking to get rid of the tombstoned documents that were created or are you wondering why they were tombstoned in the first place?

Hi @priya.rajagopal,

apologies for the unclear description. I understand why and when documents are tombstoned.

About getting rid off tombstoned documents:

  • a user helpled me narrow down the issue
  • any updates via CBL on a specific tombstoned document were not synced. As mentioned a document name is derived by their shopping list item name (e.g. bread, milk, etc.). So the user deleted one shopping list item and thus the document, too. After they ran out of the item they added the item again and the document should be updated, but wasn’t. This has worked for the past half year without any issues. It all started when 2/3 Couchbase Server servers crashed.
  • I manually deleted the tombstoned document in the Couchbase Server dashboard and then the document was synced again. So I figured if I delete all tombstones older than 30 days by running the _compact command the issue will go away. The crash happened at the end of June. Unfortunately running the command has solved the problem only for some users.

Thanks!

I dug a little further. When I run the following N1QL query

SELECT meta().id, * 
FROM my_bucket 
WHERE meta().xattrs._sync.tombstoned_at > 0;

it returns a very long list with results like so:

[
  {
    "id": "76zVg3N7yzdQmSjhuC4Sm7ZwLY02::shoppingListItem::dx3ew74t15f3::16114a0b3232",
    "my_bucket": null
  },
  {
    "id": "qJUma95wzWWXWEq3d6FpkQACqT73::shoppingListItem::dta7fvdyf8q0::866c3a17a4b4",
    "my_bucket": {}
  }
]

Expected is "my_bucket": null. I’m unable to open the tombstoned documents in the Couchbase Server dashboard as expected.

Tombstoned documents with "my_bucket": {} can be opened in Couchbase Server dashboard. I can see all the meta data and the body is {}. These documents do not sync after they were added again by the user as I observed. Again when they ran out of this particular item, e.g. Milk.

Also to clarify my most important question is: How can I fix this?
And less important why might this have happened? How can I prevent this in the future?

Thanks!

I’m using CBL 1.4.x for iOS and I’ve found it handles deleted documents quite differently from what I would expect. For example a deleted document’s currentRev is null and its properties are also null even though I’ve set it the save all properties on deletion. I’ve considered rather than deleting a document simply adding a property such as ‘hidden’ and reserving delete for cases where I never expect to use the document again.

@hyling: Your comment is off-topic for this thread; please start a new thread. Thanks.

@jens I’m sorry if it isn’t clear but my comment is in response to benjamin_glatzeder’s question. It sounds this is happening in a released app and he’s more interested in a quick fix rather than the root cause at this time. Maybe I’m reading into it a bit but it sounds like he’s probably fielding a lot of support issues because nobody likes running out of milk. :smile: So I was proposing a quick fix. My understanding is he’s re-using documents, delete then un-delete and that’s aggravating the problem. I’m proposing he hides them rather than delete them if he plans on reusing the documents. This probably won’t stop the problem completely but it might buy him some time to look for the root cause. If you don’t think this is a good option I can delete the comment.

@jens Normally I don’t respond unless I have a good option but he’s using CBL 1.4.4 which is officially unsupported. At this point is it correct to assume it won’t be fixed if it’s a bug? In that case I thought any option is better than no option. I’m interested in hearing your thoughts on this though.

@hyling Sorry! I didn’t see the connection to the thread. Carry on :slight_smile:

Thank you @hyling for the possible work-around. You are right the following is the issue:

My understanding is he’s re-using documents, delete then un-delete and that’s aggravating the problem

There might be a few things I’d need to change in the logic and this might work. But this won’t work for the documents which currently do not update anymore. If it’s all the documents which are tombstoned, have an empty body and are still accessible via Couchbase Server dashboard then it’s > 1.2M documents. Documents which are inaccessible via the dashboard are ~70k. If it’s true that all documents with an empty body do not sync anymore then I’d need to figure out all users’s emails, send them an email and explain what they could do. It’s a huge task even when automated. Also I won’t know if the issue will appear in two weeks again.

I also found out that there are documents with an empty body which were tombstoned before the other documents and vice versa. So my hunch was that first the documents - when deleted - are tombstoned and meta data is kept. After some time (Tombstone Purge Interval?) the meta data is purged on the server automatically. But I think I’m wrong. And deleting/purging all tombstoned documents with an empty body on the server is not a solution.
I’m wondering if deleted documents via CBL can either have an empty body or be inaccessible via the dashboard. Or does one “type” of deleted document look suspicious?

but he’s using CBL 1.4.4 which is officially unsupported. At this point is it correct to assume it won’t be fixed if it’s a bug?

I don’t think it’s a bug in CBL 1.4.4 as it has been working fine for many months. The issues only appeared after 2/3 CB Server nodes crashed at nearly the same time. That’s when I should’ve stopped the service, flushed the bucket, import the latest backup…

I started development for CBL 2.x and can report back soon. Unfortunately so far I was unable to reproduce the issue on my devices so I won’t know if the CBL update will fix it. And I was unable to find any log from the SG fn. I do have logs for any document that is going to get rejected. So far there’s no log which shows the described issue. So this might in turn be a CBL issue as an update of one of the affected documents is never replicated(?) to the SG. And with CBL 2.x all issues will be gone?! Let’s see!

Finally I was hoping for a magic bullet, e.g. running the SG _compact task, re-importing a fresh backup, or run SG re-sync, or similar.

I found the cause in the sg_info.log:

2019-08-25T21:01:11.319+02:00 [INF] Sync fn rejected: new=map[priceInHundredths:0 _id:unpRV3yhobVQL1mUBVSQ5xYuQu63::shoppingListItem::tj8mpspvmpun::d2b395dd7933 type:shoppingListItem _revisions:map[start:12 ids:[4ae923180d02ed8ed7a1e7e96543b3ad c845fa34c27492a2bffc022df41e2c90]] qtyInThousandths:0 note: _rev:12-4ae923180d02ed8ed7a1e7e96543b3ad unit:map[id:GENERAL name:] sortOrder:157 name:Gerolsteiner Medium id:d2b395dd7933 category:map[id:GENERAL name: sortOrder:0] isCheckedOff:0 channels:unpRV3yhobVQL1mUBVSQ5xYuQu63 shoppingListId:tj8mpspvmpun isDeal:0 isImportant:0 hasPicture:0]  old={} --> 403 missing channel access
2019-08-25T21:01:11.319+02:00 [INF] 	BulkDocs: Doc "unpRV3yhobVQL1mUBVSQ5xYuQu63::shoppingListItem::tj8mpspvmpun::d2b395dd7933" --> 403 missing channel access (403 missing channel access)

Right at the end of the first line it shows that the old doc has {} as body. Hence the channel is missing and the document gets rejected.

The part of the SG fn is:

else if (doc.type === 'shoppingListItem') {
        if (oldDoc && !oldDoc._deleted) {
            if (isPropertyMissing(doc)) {
                throw ({
                    forbidden: doc
                });
            }
            if (hasWrongValue(doc)) {
                throw ({
                    forbidden: doc
                });
            }
            requireAccess(oldDoc.channels); // <-- Change this?
            channel(doc.channels);
        } else if (oldDoc && oldDoc._deleted) {
            if (isPropertyMissing(doc)) {
                throw ({
                    forbidden: doc
                });
            }
            if (hasWrongValue(doc)) {
                throw ({
                    forbidden: doc
                });
            }
            channel(doc.channels);
        } else if (!oldDoc) {
            if (isPropertyMissing(doc)) {
                throw ({
                    forbidden: doc
                });
            }
            if (hasWrongValue(doc)) {
                throw ({
                    forbidden: doc
                });
            }
            channel(doc.channels);
        }
    }

I think in this line requireAccess(oldDoc.channels); // <-- Change this? the document is rejected as the oldDoc is empty and hence doesn’t have a channel. Would the below code work and is it recommended to do this?

function isEmpty(obj) {
    for(var key in obj) {
        if(obj.hasOwnProperty(key))
            return false;
    }
    return true;
}

// ...
if (!isEmpty(oldDoc)) {
    requireAccess(oldDoc.channels); // <-- Change this?
}
channel(doc.channels);
// ...

Thanks!

One possible reason why the document is empty: It looks like your document has a “channels” property is that correct? In CBL did you set your propertiesToSaveForDeletion to persist the “channels” property after deletion? What I found on iOS is by default (if propertiesToSaveForDeletion is not set) when a doc is deleted the only properties that are send to the sync gateway are these : _deleted, _id, _rev, _revisions.

Sync gateway docs on handling deletions:

Validation checks often need to treat deletions specially, because a deletion is just a revision with a "_deleted": true property and usually nothing else. Many types of validations won’t work on a deletion because of the missing properties — for example, a check for a required property, or a check that a property value doesn’t change. You’ll need to skip such checks if doc._deleted is true.
Sync Function API Reference | Couchbase Docs

requireAccess(oldDoc.channels); // ← Change this?

I think this would work but what I mentioned above might be the cause. I don’t know if removing this would be a security issue?

Thank you for following up and for the documentation link! I made some progress, made similar edits as you suggested and the Sync fn rejected doc logs became less frequent. It looks like I ran into the same issue I had a long time ago: SG rejects newly created document with document ID of deleted document: Channel access missing

I checked - on CBL 1.4 - the document and database classes. They don’t offer such a method. But interesting nonetheless.

Hopefully I can follow up tomorrow and share my solution!

And yes, all documents have a channels property.

I made some progress, made similar edits as you suggested and the Sync fn rejected doc logs became less frequent.

I’m glad that’s helping!

I checked - on CBL 1.4 - the document and database classes. They don’t offer such a method. But interesting nonetheless.

Looks like you’re right the propertiesToSaveForDeletion is a method of the Model classes which is only available on iOS. They removed it in 2.x as well. It’s a shame since they’re super convenient. https://docs.couchbase.com/couchbase-lite/1.4/objc.html#model

I wonder if deleting a document on Android saves document properties by default unlike iOS. Would you add the following to your sync function, create and delete a document of the same type as the ones you having an issue with, then share the console output?

console.log("doc:" + JSON.stringify(doc))
if(oldDoc) {
    console.log("old doc:" + JSON.stringify(doc))
}

Summing up a bit. The Sync fn rejected: old={} --> 403 missing channel access most likely was thrown here

if (doc.type === 'typeA') {
    //...
} else if (doc.type === 'typeB') {
  // ...
} else {
if (oldDoc) {
    requireAccess(oldDoc.channels); // <-- here
}
channel(doc.channels)
}

And the solution was to change the code like so:

if (oldDoc && oldDoc.channels) {
    requireAccess(oldDoc.channels);
}
channel(doc.channels)

I can adapt the code like so to log the documents which have an oldDoc but no oldDoc.channels

if (oldDoc && oldDoc.channels) {
    requireAccess(oldDoc.channels);
}
if (oldDoc && !oldDoc.channels) {
    console.error(doc._id);
    console.error("doc:" + JSON.stringify(doc))        
}
channel(doc.channels)

I will need to run this in production as I was never able to reproduce the issue on my end. I might do this over the weekend. By the way would you know if the oldDocis sent from the client (Android device) or is it fetched from Couchbase Server?

Thanks!

oldDoc is the previous revision of the document on Couchbase Server. It will be null for new document. doc is the revision being saved

@hyling

Here’s the bit from the SG fn:

if (oldDoc && oldDoc.channels) {
    requireAccess(oldDoc.channels);
} else if (oldDoc && !oldDoc.channels) {
    console.error(new Date() + ": in else if (oldDoc && !oldDoc.channels) ... doc._id = " + doc._id);
    console.error(new Date() + ": doc = " + JSON.stringify(doc));
    console.error(new Date() + ": oldDoc = " + JSON.stringify(oldDoc));
}
channel(doc.channels)

and a log

2019-09-01T23:25:33.102+02:00 [ERR] Javascript: Sync Sun, 01 Sep 2019 23:25:33 CEST: in else if (oldDoc && !oldDoc.channels) ... doc._id = ACACjP1JedbFzjCQjUJ6VekUUFr1::shoppingListItem::efvxtfyho6pt::362d4f9b2877 -- channels.NewSyncRunner.func1() at sync_runner.go:122
2019-09-01T23:25:33.103+02:00 [ERR] Javascript: Sync Sun, 01 Sep 2019 23:25:33 CEST: doc = {"_deleted":true,"_id":"ACACjP1JedbFzjCQjUJ6VekUUFr1::shoppingListItem::efvxtfyho6pt::362d4f9b2877","_rev":"4-0bd23128ec3181b0d7c916beda9c783c","_revisions":{"ids":["0bd23128ec3181b0d7c916beda9c783c","e0366226b0c3e48500bb6ce575730ccf","451482d507ececa25adeb0ef1932f04e"],"start":4}} -- channels.NewSyncRunner.func1() at sync_runner.go:122
2019-09-01T23:25:33.103+02:00 [ERR] Javascript: Sync Sun, 01 Sep 2019 23:25:33 CEST: oldDoc = {"_id":"ACACjP1JedbFzjCQjUJ6VekUUFr1::shoppingListItem::efvxtfyho6pt::362d4f9b2877"} -- channels.NewSyncRunner.func1() at sync_runner.go:122

I interpretate it as the doc was deleted and the oldDoc had only an _id field. Why though I don’t know. In testing I wasn’t able to reproduce it regardless how many documents I deleted and created again. Looking up the doc in the Couchbase Server dashboard throws an “Object Not Found” error for the doc in the logs. I’d expect that.

As I said in one of my previous posts the issue was solved by adding && oldDoc.channels to the if-clause.