Corrupted blob data after sudden replication disconnect

We store one or more photo image blobs with our Documents in Couchbase Lite .NET with Xamarin Forms on iOS and Android. The blob contains the binary data of the (usually) JPEG. Our users have reported that some of the photos became corrupted, part of the photo data is lost. Unsure exactly what our users did to cause this I tried various things to reproduce the issue. I created 10 Documents with 3 photos each taken with the mobile camera. I started a one-time replication to push these blobs to the server – we do not currently use continuous replication. During the replication I went into the app switcher and swiped to close the mobile app. One of the photos on the server ended up corrupted, just like our users reported.

Obviously the app shouldn’t be closed like this during a replication, or maybe the app crashed during replication for whatever reason, but nonetheless, the blob should not be stored in a corrupted state on the server. It should either be saved completely, or not at all.

Couchbase Server Community Edition 6.0.0 build 1693
Sync Gateway 2.1.2 (2;35fe28e)
Couchbase Lite 2.1.2 and 2.5.
Xamarin Forms 3.x and 4.0.
Problem reproduced on iOS, not tried on Android.

Attached is the logfile from the Gateway when the sudden disconnect happened.
corrupted_blobs_log.txt.zip (2.3 KB)

This may be a CBL replicator bug. I’m guessing what happens is that the replicator is told to stop while it’s halfway through sending a blob, and when it closes the connection it looks like the blob is finished, so SG just saves what it received so far. Instead it needs to indicate that the upload failed, so SG will throw away the received data.

1 Like

I’ve filed [https://github.com/couchbase/couchbase-lite-core/issues/790](an issue on Github). If my theory is correct, this can be fixed pretty easily in Sync Gateway.

For a workaround, on the server side you should check whether a blob’s length property in the document matches the actual length of the blob in bytes. I’m pretty sure that the corruption here is from truncation of the data, so the length will end up less than expected. (To make double sure you can compare the digest property against a SHA-1 digest of the blob, but I don’t think that’s necessary here.)

I’m pretty sure the truncated blobs won’t spread from the server to any other clients, because the CBL replicator does verify the SHA-1 digest before saving. I believe the replicator will just throw away the incoming document if a blob is invalid, so the other clients just won’t see that revision at all.

The corrupted blob does spread to other clients. And if the Document is updated it will sync back to the originating phone as well and overwrite the original undamaged blob with the corrupted version.

The corrupted blobs should be simply truncated from the originals, with no bytes changed. Could you please confirm that?

I’m really surprised that the bad blobs would spread to other clients. I know for sure that Couchbase Lite verifies the SHA-1 digest of a blob before storing it, so if the blob received from the server doesn’t match the digest stored in the document, it will be rejected.

Initially I was sure, then we started to doubt so I deleted my post here while we ran another check. I can confirm that it does spread to other clients as I described. If we update a property on the document with the corrupted photo blob it will become the newest revision and replicate to all other clients and also back to the original photo-corrupter client.

We have multiple clients replicating the same (or subset of the same) dataset via the Gateway. So the original client A pushes a photo blob, and it arrives and is stored on the server corrupted (I understand from my colleague with the SHA-1 of the full blob). This document with corrupted photo blob will sync to the other clients B and C. If this document gets updated in any way on the server or by the other clients B and C, then it will pull back onto the original client A and overwrite the good photo blob with the corrupted photo blob.

We think you expect it to not replicate to other documents while we see it does is because I am setting the blob properties wrong on the documents.

So in other words, now that you have a blob in this state, any client that pulls from this particular Sync Gateway will receive a copy of the bad blob? Is there a URL you have that any of us can run a quick pull from to verify what happens (it only needs to be a DB that contains the single bad document)?

I think what I do wrong is that I set() a property with new Blob() on the Document, but not using setBlob(). So we store blob data, and it “works” for our app, but in a property, and it doesn’t handle it correctly in _attachments. This might be why Jens is confused about the behavior we observe. This was developed from early DB, so somewhere along the line blobs changed or I misunderstood and implemented it wrong.

_attachments is not relevant to you as a user of the library. It is only there to maintain compatibility with the way that 1.x did things. 2.0 made “attachments” (aka blobs) first class citizens inside the data model. If you are using that property name then I encourage you to stop, otherwise I don’t think there is an “incorrect” way to set blobs on a document (but just for sanity sake it would be nice to see how you do it).

When we load our document I move everything over to my own Dictionary based model. I copy all properties (values, dicts, lists) over. In my data model the classes are all designed around the CB originating dictionary. The Getters and Setters access properties inside the dictionary. When I save the document I load/create the MutableDocument, copy over all my values from my dictionary, and write to DB.

When I add a photo I store the metadata (name) in my property structure, and the blob (name, type, data) I add it to an array on my object of “blobs to add”. When the time comes to save the document, and the dictionary values get copied into the MutableDocument, all “blobs to add” go into a files property at the base on my MutableDocument. When later I want to look up my blob I use the blob name from metadata, and look in my files property for the right blob.
This is the code I run to copy the values from my dictionary in memory to the MutableDocument:

public void CopyValues(MutableDocument saveDoc)
{
// now paste the values into the CBL Document, to save to database
foreach (var pair in this)
{
if (pair.Key != “files”)
saveDoc.SetValue(pair.Key, TranslateDictionaryValueToCouchbase(saveDoc, pair.Value));
}
// removing keys that were deleted from the dictionary
foreach (string keyRemoved in _removedKeys)
{
saveDoc.Remove(keyRemoved);
}
if (_newBlobs.Any() || _removedBlobs.Any())
{
var filesDict = saveDoc.GetDictionary(“files”);
if (filesDict == null)
{
saveDoc.SetValue(“files”, new MutableDictionaryObject());
filesDict = saveDoc.GetDictionary(“files”);
}
// add new blobs to the document that are listed for saving
foreach ((string blobId, DataBlob blob) in _newBlobs)
{
filesDict.SetDictionary(blobId, new MutableDictionaryObject(new Dictionary<string, object> {
{ “file”, new Blob(blob.ContentType, blob.Content) }
}));
}
// delete blobs on the document that are listed for deletion
foreach (string blobId in _removedBlobs)
{
filesDict.Remove(blobId);
}
}
}

The main reason I store the blob in a files property at the base of the document instead of where it is actually used in my properties is because, if I remember correctly, at some point during the development this is how it was done, before the switch was made to new Blob() and just adding the blob wherever.

I don’t see any reason why this should cause a problem since at the end of the day, barring the network condition that causes the issue, everything works fine right? There are a lot of internal checks to ensure that the data you pass into the mutable document is valid according to our rules. If it saves correctly locally then you are in luck. The question at hand is “why is Couchbase Lite allowing in a blob that is invalid?”. When replicating the conversation will indicate any blobs that need to be transferred and they will be validated upon receipt on the CBL side (or so we plan. This situation is possibly presenting a situation where we don’t validate correctly). If possible, a sync gateway database that we can pull from that is in the bad state would help us diagnose what is going on. @jens Or do you know of a way to quickly make this happen?

This works fine, yes. It saves correctly locally, it replicates correctly, and works fine on other clients (web app via web services on the Server, mobile clients and desktop with CBL). Only problem is the sudden termination of the connection causing corrupted blobs to be stored.

We are looking at getting you something to pull with a corrupted photo.

Ok I see more clearly what is going on, though others before me are working on the actual root cause. Sync Gateway is actually thinking that the cut off transfer is complete and saving the “corrupt” (i.e. truncated) blob into its blob storage and using the SHA-1 of what it received (calculated) instead of the SHA-1 that was sent (I verified that the SHA of the corrupted photo matches the SHA that is stored in the _attachments property). Then the property on the document itself is just saved as-is. Since the SHA was altered on the server side CBL does not see any problem with this blob and accepts it.

The issue @jens posted above is pretty active now so you can follow along to see our progress.