Use of GUIDs in keys


#1

Hello. In our solution we are using couchbase on production with guid keys. It’s a web application. In our case guids are good because in some situations we need to response id of stored item before the whole item is written into couchbase. So it consumes 1 couchbase request to store an item. The main disadvantage is that those keys became big. And according to couchbase docs all keys stored in memory, so if they will be shorter, they consume less memory. Now I’m developing an app and use sequential keys by using couchbase inc operation to get sequence id and then store item with id like item:. It consumes 2 couchbase requests: 1 for inc and 1 for write. But I didn’t find any difference in performance. Also one of the sequential key’s advantage is that you can use range requests with keys: startkey=[item:1]&endkey=[item:10].


#2

Hi,

My company is using a partner to deliver a couchbase based system we’ve designed - after speaking to a couchbase expert they’ve formed the view that the way we’ve designed it with guids as keys (and those keys being used as look up values within other documents) is not only sub-optimal but downright negative.

The reasoning is connected to cache-size and memory consumption…

Does this ring true as a correct interpretation?

Essentially, is it a good idea to use GUIDs as keys - and if not, how should we go about generating system wide key values uniquely in a multi-cluster (as in XDCR using) system?
Any thoughts appreciated!

Regards,
Duncan


#3

The extra size of documents due to using GUIDs instead of integers could affect how much you can stuff into your bucket.

Say you have a million documents. An integer ID would then be 7 bytes, a GUID 32. That’s a difference of 25 bytes.

If your documents are fairly small, let’s say 50 bytes, and you have a reference to another GUID in the document, then the total size for a document would roughly be

Integers: 50 + 7 + 7 + 150 (Couchbase meta data) = 214 bytes
GUIDs: 50 + 32 + 32 + 150 = 264 bytes

increase = (264-214)/214 = 23%

So you’ll loose some RAM from your bucket due to using GUIDs. If it is significant depends on your application.

And if you have bigger documents that percentage will go down (i.e. less effect).

As for using atomic increase to generate id’s - that might not work in a XDCR situation. You could get two documents created at the same time with the same ID in the different data centers (if you are using bi-directional replication). You could work around that by making the application data center aware, e.g. in one data center you only create even IDs and the other uneven, or something similar.

If you ARE using bi-directional XDCR and you can’t make your applications data center aware you should probably use GUIDs, and make sure you have enough RAM to fit them.
As far as I know there is no performance difference.


#4

I just realixed I might be talking out of my a**.
I have no idea how Couchbase handles atomic increments and XDCR - I haven’t tested.
Perhaps a Couchbase developer can explain that?


#5

When using XDCR, the concept of Conflict Resolution comes into play any time you modify the same key in different clusters. Some details about the conflict resolution process are here:
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-tasks-xdcr-functionality.html

The effect of this is that you cannot safely use operations like atomic increment across multiple clusters.
If your application is cluster-aware, you can partition the key space, and use incremental counters within each datacenter: new-york-1, new-york-2, san-fran-1, san-fran-2, etc. If it the application cannot be made cluster-aware something like GUID is appropriate to ensure that each datacenter is using unique keys.