Couchbase Lite Delete document exception

Just verified that. I actually got the exactly same revs entry on Node B for document ID (-AoWVxTpIuka3yvx36udp7g) deleted, true. However, the document still exist in the database because I still get it from querying the view no Node B.

BTW, we are only running continues push replication now.

It just got replicated back to Node A. And now we are getting “Previous revision not found” on Node A for that document.

Ok. so when I query the database directly using restful /dbname/id, I got this.

“_id”:"-AoWVxTpIuka3yvx36udp7g","_rev":“2-cfb9c332abbdcd9d9cfd8d302a92a342”}

As you can see, somehow the document thinks it is on revision 2 not 3(delete)?

So in the inconsistent state, you can see a changes feed entry with revision 3, but the document REST endpoint shows revision 2?

      Yep. In the same database,  

/dbname/id
"id":"-AoWVxTpIuka3yvx36udp7g",“rev”:“2-cfb9c332abbdcd9d9cfd8d302a92a342”}

/dbname/_changes
{“seq”:80,“id”:"-AoWVxTpIuka3yvx36udp7g",“changes”:[{“rev”:“3-55fceee41ce7a6baa8069ba5e190c6e1”}],“deleted”:true},

Ding. The forum messed up my formatting… :sweat_smile: I think I am closing in on the real problem.

What is the result of the following two queries?

dbname/id?rev=3-55fceee41ce7a6baa8069ba5e190c6e1&meta=true&revs=true
dbname/id?rev=2-cfb9c332abbdcd9d9cfd8d302a92a342&meta=true&revs=true

Both returned {“status”:404,“error”:“not_found”,“reason”:“missing”}… Do I need to substitute the id for the document ID?

Yes, you need to put it the actual ID of the document.

Node B

for dbname/id?rev=3-55fceee41ce7a6baa8069ba5e190c6e1&meta=true&revs=true

{"_id":"-AoWVxTpIuka3yvx36udp7g","_rev":“3-55fceee41ce7a6baa8069ba5e190c6e1”,"_deleted":true,"_revisions":{“start”:3,“ids”:[“55fceee41ce7a6baa8069ba5e190c6e1”]}}

for dbname/id?rev=2-cfb9c332abbdcd9d9cfd8d302a92a342&meta=true&revs=true
I got

{“Body”:“OurMessageBody”, “_id”:"-AoWVxTpIuka3yvx36udp7g","_rev":“2-cfb9c332abbdcd9d9cfd8d302a92a342”,"_revisions":{“start”:2,“ids”:[“cfb9c332abbdcd9d9cfd8d302a92a342”,“903b99929c1def2ac8937276a3d4c7f7”]}}

For Node A, It only found rev 3. There is no Rev 2 (404).

In that case Node A is causing the problem. That corresponds to what you showed me before (The revision history is being deleted). So what exactly is Node A doing? If it is just receiving a message and deleting it, it would only have 2 revisions. Something in the logic is causing a problem that results in the first two revisions being obliterated from the database.

Would Compact() remove previous revisions?

All the app does is, we receives a message, the app stores it in to CB Lite and mark it as “staged” by setting a property on the document. Then the system process it. If the system processed it successfully, we set the property on the document to be “valid”. Then an external system reads all “valid” messages via calling a end point on our app; read the message and then tell our system to delete it after read.

Also, if the rev associated with Delete was replicated from Node A to Node B, wouldn’t CB lite on Node B update the doc’s revision to be 3-xxxx (delete)?

Compact() only removes previous revisions if the previous revision count exceeds the MaxRevTreeDepth property (default 20). Otherwise it just sets the previous revision bodies to null and deletes unused attachments.

This is what I would expect, but your Node B is storing two different documents with the same ID (which is a big problem and I am trying to weed out the cause). One is with the first two revisions as its history, and one is only with the third (delete). On Node A, it looks like the same situation was happening except that the document with the first two revisions was deleted and only the document with the delete (and no history before that) is left. In correct operation, there should only be 1 document with 3 revisions, not one with 2 and one with 1.

Is this scenario reproducible every time? Or did it only happen with one document? If possible, I’d like a set of steps like “create a document, edit it this way, then delete” so that I can try to see what is going on.

I am not sure I got this part… So you are saying at one stage there were two documents in the docs table had the same document id? Or there were two sets of revision records created for the same document?

I had a look of the DB on Node A with sqlite tools. Node B is not compacted so the revision is huge.

Node A
Docs:
21|-AoWVxTpIuka3yvx36udp7g I am assuming 21 would be the internal Id used on revision table…

Revisions:
22|6|3-603488f7609859b95978b5569dbd4949|21|1|1|{}|1| // not sure if related…
80|21|3-55fceee41ce7a6baa8069ba5e190c6e1||1|1|{}|1|
88|21|1-903b99929c1def2ac8937276a3d4c7f7||0|0||1|
89|21|2-cfb9c332abbdcd9d9cfd8d302a92a342|88|0|0||1|

Interesting how delete actually has a lower sequence number than the two changes… but again, those two changes could have been replicated back from Node B?

Would you like me to run the same check on Node B? I can compact the db and then query it…

This is the scenario I think happened. The SQLite data supports it.

As I noted before

80|21|3-55fceee41ce7a6baa8069ba5e190c6e1||1|1|{}|1|

This is not a valid entry, it is a generation 3 revision with no parent (notice how in 89|21|2-cfb9c332abbdcd9d9cfd8d302a92a342|88|0|0||1| there is an “88” after the rev ID. That means that this revision’s parent is 88. The only revisions allowed to have a NULL parent are generation 1 revisions). If I can figure out how it got there then I can prevent this situation in the future. That’s why I am wondering if this is something you can easily reproduce, or just something that happened once or twice. The latter is far harder to track down the cause of.

It is happening in our application constantly. At least once a day for the last two weeks. I don’t think we are doing anything tricky in our application… I’d say the steps to replicate is: Create one project to use CB lite with push replication. Run two instances of the app and setup replication between them. Do CRUD on node A. That’s it…
I will write another simple app to simulate it and see if I can replicate the issue. If I did, I can send you the app.

Another thing I noticed, I hooked into pushReplication.Changed event and log the status changes. I noticed that, during the day, the push replication stops on both Nodes occasionally… It could be just a furphy… but is it suppose to happen? (The “error” status in the log attached was my app doing it, not a real error. Because I thought the push replication would be idle but never stop)…

{ “time”: “2016-03-03 23:24:43.2386”, “level”: “DEBUG”, “logger”: “MyApp”, “message”: “[Replication] Push status changed to Active” }
{ “time”: “2016-03-03 23:24:44.2559”, “level”: “DEBUG”, “logger”: “MyApp”, “message”: “[Replication] Push status changed to Active” }
{ “time”: “2016-03-03 23:24:44.2559”, “level”: “ERROR”, “logger”: “MyApp”, “message”: “[Replication] Push is Stopped.” }

It’s supposed to, but there is always the possibility of a bug in the logic. I’ve been doing a lot of work on the retry logic recently so I wouldn’t be surprised if you had a commit that is not as good as the current release version. In certain rare cases it will stop (for example, if the URL is invalid or the server returns an error between 400 and 499 for known endpoints, etc).

If you can reproduce it with a simple app, that will be very helpful in diagnosing this. If I can reproduce this I will file a github issue to track it.