Managing the Incrementing Key Pattern

Epwna · May 7, 2015, 5:07pm

When using the incrementing key pattern you increment the count every time you add a new document and use count in the key of the document.

What if you remove a user? You can’t decrement the key and if you did you’d have to update all existing keys in the bucket to reflect the new structure.

So in this case would you keep all the old users and determine if they are “offline” or “deleted” when you iterate over them all in this case? And how would then use the count to actually determine how many “active” users you have in your bucket. You’d have to use a view at that point.

I’d love some insight on this subject. I really find the data modeling patterns very interesting.

mnunberg · May 7, 2015, 6:40pm

Similar to auto-generated database IDs, the ID counter is not a good mechanism to reflect how many users are actually in your system. Its main use is to generate an ID for the user while avoiding more expensive mechanisms (such as a long UUID string). This helps keep your database smaller (keeping the key size proportional to how many users rather than a fixed size which would just inflate the key and metadata size). Stringified UUIDs are 32 or 36 bytes (characters) (depending on whether you maintain the dashes), yet even in the case where you have “millions” of users, your ID string only remains 7 bytes long. You could make this even smaller if you used an alternate encoding (rather than base10). For example 100000 in base10 is 7 characters, but the same number in base16 (f4240) is only 5.

Even with “holes” in your counter, you would still reap the benefits of this optimization.

If you wish to maintain a counter of how many active users exist, simply make it a new counter (not related to the ID counter), and increment it whenever a new user is added or decrement it whenever a user is removed. This counter would be for statistics/bookkeeping rather than generating IDs, though.

The key increment pattern is effectively an optimization to keep your data sizes smaller (so Couchbase needs to store less data, and less data needs to be transferred over the network, especially once you get into things like “referencing other users” and so on).

Epwna · May 7, 2015, 8:02pm

That helps a bunch! Thank you very much.

martinesmann · May 7, 2015, 8:26pm

If you like a general introduction to key patterns and data modelling I can recommend this video:

It shows some great benefits when using “predictable” keys and counters combined with multi get. (aprox. 26 min in the video)

Epwna · May 7, 2015, 8:59pm

Yeah that video was where I got the idea from. Is it possible to do atomic operations with the Couchbase Lite API? I’m using the .Net and the Android APIs for my clients.

Scotch · May 11, 2015, 7:19pm

Epwna, you should also consider that an incrementing key has to be stored in a common location and updated with every insert (i.e. a singleton). Back when relational was all the rage, it wasn’t a big deal because databases lived on one machine. Now we’re distributing across clusters of servers so introducing a singleton resource has become somewhat of an anti-pattern. The few bytes you spend on statistically unique keys (GUIDs) over auto-increment is pretty much irrelevant too.

In .NET, I use the following to generate URI compatible compacted GUIDs:

// GUIDs contain a maximum of 22 useful characters when Base64 encoded.
string identifier = Convert.ToBase64String(Guid.NewGuid().ToByteArray());

// Trim off any trailing ‘=’ signs that remain after encoding.
identifier = identifier.Substring(0, 22);

// Replace any URI-incompatible Base64 characters
identifier = identifier.Replace("/", “_”);
identifier = identifier.Replace("+", “-”);

return identifier;

These are pretty tight and don’t use unprintable characters. They do mix upper- and lower-case, so they’re not suited for manual entry or reading over a phone though. In those scenarios, I use a pseudonym that sometimes returns multiple hits or require additional selection criteria, like customer ID.