Delete operations behave strangely when using XDCR - .Net client


#1

We are seeing our delete operations behave strangely when we are using XDCR. For example:

  1. Document with key “user::1234” added (using StoreMode.Add).
  2. Document with key “user::abcd” referencing “user:1234” added (using StoreMode.Add).
  3. Both documents deleted by their keys.
  4. Document with key “user::1234” added again (using StoreMode.Add).
  5. Document with key “user::abcd” referencing “user:1234” added again (using StoreMode.Add).
  6. Retrieve document with key “user::1234” - fine.
  7. Retrieve document with key “user::abcd” - does not exist.

All operations take place against one data centre only.

From what I can see if we leave about 30 minutes between steps 3 and 4 it works fine. Alternatively when I use StoreMode.Set it also works immediately but this is not really an option for us.

I searched the forums and this issue sounded similar: Re-adding documents fails but there was no real resolution (other than StoreMode.Set).

This issue only seems to occur when we are using XDCR. We have uni-directional replication between two clusters (including the one the operation is run against) and another one way replication (used for reporting).

We are using the .Net client 1.3.7 and Couchbase version 2.5.1 enterprise edition (build-1083).


#2

Do you mean by that if you omit the delete in step 3?

Also, what is the unexpected behavior if you use StoreMode.Add after the delete? Is it that the add fails indicating the item exists?

Delete operations should create new “tombstone” entries which would mean that the XDCR should not recreate the item and thus an Add should be fine, but it may be good to get a clarification on how the replication setup is done between the three clusters you seem to list there in the second to last paragraph.


#3

Hi - thanks for the response.

So what I mean by using StoreMode.Set is that if the operations in step 4 and 5 are persisted using StoreMode.Set instead of StoreMode.Add they both succeed - documents are returned fine in steps 6 and 7. The unexpected behaviour is that, using StoreMode.Add, the overall operations seem to succeed but the second document doesn’t actually exist.

A possible interesting point is that when step 4 and 5 are executed the document which was added in step 1 with the key “user::1234” actually has a slightly different key the second time around (so say “user:5678”) whereas the second document keeps the exact same key “user::abcd”. This second document is the one which is not able to be retrieved.

Regarding the replication:

Three data centres: A1, A2 and Reporting1

A1 and A2 have bi-directional replication with settings:

Version 2
Max Replications p/b : 32
Checkpoint Interval: 1800
Batch Count: 500
Batch Size: 2048
Failure Retry Interval: 30
Optimistic Threshold: 256

A1 and Reporting1 have one way replication going A1 -> Reporting1 with the same settings as above.

We are executing the steps above against A1 in this case.


#4

Hi wanted to confirm there is no other operation going on on the env on other clusters that may be adding and removing the same keys concurrently. is that right?

just in case the delete was having issues, can you also be sure your delete did not fail - I assume you have proper exception handling to catch any exceptions we throw on delete?
-cihan


#5

Hi - currently we only have operations run against the A1 cluster. I’m almost certain nothing is running another conflicting operation against these clusters. They are also in a locked down environment. The delete operation does not fail - it indicates success.