Document ID Conventions?

Gentlefolk,

As I wrap my head around this world I read a lot. Much of it is from CouchDB focussed resources, such as the Definitive Guide and others.

I’ve been using a colon, ‘:’, as a logical separator in my document ID keys. Much of the CouchDB derived literature uses slashes, ‘/’. I assume they use slashes because of the belief these keys might end up non-URL-encoded in a CouchDB query. That seems unlikely but I’m new here.

Is there a consensus best practice on document ID key conventions?

Andrew

There are no set rules.
But in general

  1. (Max 1000), but under 200 char is good, Random 32 char will make it really unique. I like using sha1(microtime()). http://www.couchbase.com/communities/q-and-a/key-size-limits-couchbasemembase-again

2.No spaces " " in the front or back of a string. I use trim() key data before SET into CB.
3.All lower or all upper case. when you do views you get better results.
4. try not to use backslash \ . This is more reflective of application troubles.
5.Dont auto-increment. I found I over write keys by accident more vs connivence of increment count

I dont know if you can use non-ascii chars like Kanji(Chinese) in your keys.

2 Likes

househippo,

Thank you for your insight.

While I don’t have a length issue, I may have a unicode issue. It looks like I will need to length test my keys in their byte array form to ensure they are not longer than 250 bytes. I doubt Couchbase will have problems with the high bits being set in a UTF-8 string. It appears though that the string length is hard coded into the membased cache. (250 is a curious length. With a length byte or NULL terminator it precludes having a 64 bit pointer and the key in the same 256 byte array. As I said, curious.)

Andrew

Hello,

In addition to the various comments, the documentation mentioned the key limits here:
http://docs.couchbase.com/couchbase-devguide-2.2/#storing-information

About the limit of 250 bytes, I do not know why it is this value, but usually I like to say the following: if you have have keys bigger than 250keys you probably do something wrong with the key. You can always use a hash to make it smaller.

Regards
Tug
@tgrall

What char values can be used in the key? Only printable 7-bit ASCII 33-126? Or can printable extended ASCII 8-bit values such as ü (252) also be used?

The documentation Tug linked does ot mention what character values are allowed. Househippo also says to use all-lowercase or all-uppercase, but I assume this was just a readability suggestion?

I am hoping to text-encode an unsigned 64-bit ID number as part of a key, and since that part will not be human-meaningful except to match, I was wondering if I could use these other values to reduce the length of the keys a bit? Also some of the extended ASCII characters could be useful for making other kinds of short keys. But will it cause any problems?

Answering myself here, having played with it:

You can use ü, and it will work, but “extended ASCII” is a Microsoft thing, so if you are doing (what I’m doing, programming in C++/gcc on Linux with UTF-8 source code files), the only one-byte printing characters are the 7-bit ASCII values. Other characters such as ü will be multi-byte UTF-8 characters which work in keys, but take multiple bytes per character.

Sorry for Reviving this topic I found in a search, but I believe the reason for only using upper case and lower case is normalize your document id’s to be case insensitive. So for example a user uses myUserName, then tries to access their account again with myusername, the document ids won’t match, but if you always convert them to lowercase or uppercase they become case insensitive.

Just my two cents.

Isn’t it a better idea to use shorter keys both for document ids and fields that may be used for creating any views/designdocs? Since keys will be used in indexes, the shorter the better so that less RAM is required to service more number of docs. As per my understanding keys with 20chars length should require 1/10th RAM compared to keys with 200chars (baring any other minor overheads).

1 Like

another interesting read I found in the couchbase forum
Why I Use Double Colons as a Key Pattern Delimiter

any idea why docs say white spaces are not allowed at all?
just tried a key like “my key” with no problems so far, used node sdk bucket.get to retrieve it successfully…any possible downsides?