Document "schema" versioning and sync gateway

Hello,

I have read this article about how to do document versioning with couchbase server. It is a bit old (4 years ago) but still useful. However there was not any reference to SG (maybe because it wasn’t exist at the time) and I’m trying to find a way to change document specifications over time.
I’m actually working in startup projects (with R-DBMS for now) where the schema and access permissions can change two times at month, so document versioning will be absolutely required.
In my current projects (rest api with sql database) doing this types of changes means:

  • change object models
  • update queries to match the new models and access changes
  • create migrations document for altering tables
  • do database migration and deploy code to production

What options are valid with SG?
After reading the article I think that document schema and access permissions need to be handled in different ways.
In the most cases I think the better is having a document with the last supported database schema change (or something similar for every document type).
This document need to be shared for every user, so every client could know if is old and cannot be run until the user update it. However this solves only the client side problem (knowing if the application can run properly), the big problem remain insoluted: how to keep track the document schema change?

Document version property without replication:

If the documents were updated in place the problem is about old clients that cannot read the document properly in case of:

  • Property renaming
  • Property deletion
  • Property requirement change (eg: user reference denormalization, so from string now is a subdocument)

So essentially this means that using this method only the last version of the client can be used, and release of new client versions need to be handled in sync with sync function release, but in the middle the user will have a bad experience.

Document version embedding:

In this case the document contains all versions of that type until the last supported.
However, this means:

  • Document size will grow a lot, based on the size of all document version summed.
  • A changes feed to keep the versions updated is needed, because the old clients will update only their current last version (eg v2), but the new clients will use the new document version (v3).
  • A lot of unnecessary data is synced to clients.

Document version property with replication:

With this case the document is replicated based on his version property (or maybe his type, embedding the document version inside the type property).
This resolve the document size problem, but the other two problems still remains (changes feed application and unnecessary data sync).

Bucket versioning:

With this method every specification change is kept as a creation of a new bucket, which have the new documents specifications.
With this case the bucket will contains a new version of all documents, in its own correct format.
Cons:

  • Document migration is required from the last bucket into the new, this require to migrate sync gateways properties either (like users) so a couchbase server SDK is needed.
  • A new changes feed application is needed for every version, that change the other version of documents in others buckets
  • Sync user credentials could be a problem, because the user data is not referenced in the changes feed and so cannot be tracked to be updated into every bucket.
    • This problem could be emarginated using a custom authentication system that update credentials in all buckets

Pros:

  • The sync function will be specific for every new version, so the older sync functions will be untouched in the release process, and no bugs for mixed rules can occur.
  • Clients will have always to sync their current bucket, so they always have only one version of documents.

In essence: What is the more successfully document versioning method used into production evironments?
I’m trying to find a solution that:

  • minify code written for each specific release
  • minify possibly versioning related bugs
  • permit at least two versions to coexist

I don’t think that old blog post refers to the kind of thing you’re trying to do. It was more about keeping archives of old versions of documents, possibly to be able to roll back changes or as an audit trail.

You’re talking about changing schemas. Having multiple versions of a doc around with different schema versions would be an absolute mess, since different clients might make changes to different versions and the doc wouldn’t be self-consistent anymore. Moreover, every version of a doc that you keep around multiplies the amount of data to be replicated.

For the most part, you should be taking advantage of the flexibility of JSON. You don’t have to rigidly keep a single schema across all documents. If you need to add new properties, just start using them, and make sure your code interprets a missing property as a suitable default value. If you need to change the type of a property, start writing the new type and update your reading code to check the type dynamically and parse each one appropriately.

I’ll try with an example to explain why I think I need a way to version documents, which

Suppose that we have a Todo Application, like the one explained in couchbase mobile documentation.
We setup CS, SG and configured all correctly for a production environment.
We want to close all possible ways to create/edit/delete unauthorized data, so we used the synctos tool to create the sync gateway function, locking up the document properties.
We also have deployed an Android app on Play Store and an iOS app on the App Store (version 1.0.0).

Now we start to have some users, the app work well, but now users want to have multiple administrators for each list of tasks. So we will replace the “owner” property with the “administrators” property, which will be an array of usernames (Please correct me if you think I’m doing it wrong here).

The new schema will be:

List
{
"_id": "user1.list-1",
"type": "list",
"administrators": [ "user1", "user2"],
"name": "Groceries" 
}
List User
{
"_id": "user1.list-1.user3"
"type": "task-list.user"
"username": "user3",
"task-list": {
    "id": "user1.list-1",
    "administrators": ["user1", "user2"]
    }
}
Task
{
"_id": "task-1",
"type": "task"
"task": "Potatoes",
"complete": false,
"task-list": {
    "id": "user1.list-1",
    "administrators": ["user1", "user2"]
    },
"createdAt": "2017-09-05T20:00:00Z"
}

Now we have defined the new document definitions, but we have two migrations to do:

  • Update the sync gateway function and restart SG
  • Update mobile applications to new models definitions and deploy to the stores.

First of all, we need to choose how to deal with the new sync gateway function.
If we migrate directly to this schema, the mobile version 1.0.0 will not work anymore, because owners will not be able to create/update new documents (since they will try to create a List with the owner field, which is not allowed anymore).
So the real new schema will be between the first and the second (if the user is in the administrators array or is the owner can invite other users etc.).

Ok thats cool, now we updated the sync function in production and the version 1.0.0 is working fine.
Now we have finished the apps, we call the new version 1.1.0, but since the iOS version need to be validate and this take some time (days) only Android users will start immediatly to download and use the new version.

Now we are in the middle of the second migration, the problem is when an Android user (with 1.1.0) create a new List, the list will be created with the administrators property, but for an iOS user that list does not have an owner. So iOS users invited cannot see the owner and the application can’t know if the user is the owner (and can invite others) or not.
But the iOS application is well developed and don’t let the user invite other users if the owner property is missing.

Cool, until the Android user tell via whatsapp that the iOS user can now invite his friends because is an administrator. The iOS user see that the owner field is blank and don’t see the button for invite his friends, think that the application have a bug and leave a bad review on the App Store.

This, in my current projects with RDBMS and Rest Api is handled with versioning, so the version 1.0.0 is using an url for create a task and version 1.1.0 is using another, but in the database I made the change to support multiple administrators and everyone is happy (Android user have all list of authorized users, iOS user see that the owner is the other user and think that he cannot do the operation).

Since this is a very simple change, we are editing a feature, I would like to have some recommendation, article or something to handle these requirements change, when they are not so simple (like feature removal, permissions change etc.).

It looks like your post got cut off…?

You can’t just switch over to a new schema. For one thing, you can’t force your mobile users to all upgrade their apps at the same time. You’re working with a distributed system — this isn’t like a web app where the data all lives in one central server that you control. This is more like evolving a file format or a network protocol, where you accept that the old and the new are going to coexist or a while.

If you change the schema, you have to do it with backward compatibility. I honestly don’t have the time to study your use case or suggest things, sorry…

Ok so there is not a best practice to handle definition changes and everyone do that based on the application changes.

Thank you anyway