Couchbase Data Migrations - db-node-migrate

query
n1ql

#1

Hi all,

I just recently published version 0.0.4 of db-node-migrate. This is a module that works with node’s db-migrate package, to let you write and run data migrations.

https://www.npmjs.com/package/db-migrate-couchbase

This module is still a bit early, but we’ve been able to use it to do useful things, and it’s backed by pretty good unit tests against a running Couchbase instance on CircleCI. It has several niceties built in for dealing with models created by Ottoman.


#2

@moxious this looks pretty cool, and I’m very excited that you decided to share this! Are you using Couchbase at Ottoman in production? I would love to hear more about it.

Also, I just wanted to ask for clarification: what this is migrating is not necessarily the documents themselves, but the Ottomon models? Or is it actually migrating documents too?


#3

We are beginning to use this ourselves. Can’t say I’m using it in production yet, but I will be, which is why we needed to write it. Frankly I was a bit surprised I didn’t already find 3 other things that did this, because data migration is not an exotic problem at all. Plenty of others use couchbase + ottoman in production, I’ve been scratching my head wondering what they’ve been doing to address this…

Right now, what’s being migrated is the documents themselves. Recently on ottoman, I’ve put in a number of PRs that attempt to position ottoman to be more flexible in the future. For example, my team has contributed a lot of unit tests, and a plugin architecture (https://github.com/couchbaselabs/node-ottoman/pull/88) for ottoman.

Migrating models themselves is tricky; many frameworks would enable “lazy migration” of data, that is migrating instances when they’re touched instead of all at once. We’d like to get there over time, but at the moment the architecture in ottoman makes this quite difficult. When a document gets loaded from the DB it is immediately validated against the present schema. Ottoman needs quite a bit more machinery before it could do lazy migration. In the end, if you’re dealing with a really huge amount of data, much of it latent, lazy migration is really the way to go, and eager migration (execute a big query, update everything all at once) is just too much to bite off, and may cause unacceptable downtime anyway.

Now on the migration of the models side, another idea we’ve had but have not yet implemented has to do with versioning the models. Perhaps documents should get a tag of which version of what model they correspond to, and perhaps ottoman models themselves (being easily expressible as JSON documents) should be stored in the DB. In most every DB you’d run across, metadata about the data would be in the DB. With ottoman right now, not so much…so for example you cannot distinguish between a field that’s missing because it wasn’t set on the document, and a field that isn’t defined on a model. Saving/versioning ottoman models in the DB would address this.

Baby steps. The plugin system for ottoman was about beginning to make some of those things possible in ottoman. This db-migrate plugin is about giving us some method of evolving documents rather than re-loading all of the data, or hand-writing transform scripts every time, which is just too painful to be worth serious consideration.

The driver itself is broader than just ottoman though; in the sql world a lot of migrations really boil down to vanilla SQL statements executed. So too the same can be true in the couchbase world if you’ve got the ability to use n1ql. So the driver relies on ottoman to define concepts like “table”, but even if you don’t use ottoman at all, db-migrate-couchbase can still be useful for quite a bit.

There’s a lot of stuff that developers using things like MongoDB and Mongoose would take for granted, but that we can’t yet do with Ottoman + Couchbase. We’re taking little bites at adding some of those things. Ultimately we (like many others) had to choose between Couchbase & Mongo. We chose Couchbase in part because of n1ql, but the cost is that in the node ecosystem, there’s still a ways to go before the functionality reaches parity with some of the other document DBs out there. Had we chosen java the picture would be different though.