Bucket Design to Optimize Queries


#1

I was wondering is having multiple document types in a single bucket is still the desired bucket design when using N1QL. It seem that creating a different bucket per docType would be faster that keeping all the documents in one bucket. The documentation states KeySpaces are analogous to buckets.

If I had 10,000 documents which was set up like this:

BUCKET_PETS
5,000 docTypes dogs
2,500 docTypes cats
2,500 docTypes birds
Create an index on docType

select * from BUCKET_PETS where docType = ‘dogs’;

Is there a benefit to separate the different doctype in their own buckets?
So in this example one for dogs,cats, and birds.

Thanks much.
Keith


#2

Hi Keith,

The tradeoff for using multiple buckets depends on a couple of things–

  • cardinality: do some types of pets much more / less frequently than others? For example, if some types of pets had 100,000 documents, and others had 1,000 documents, that difference would be significant when querying only the 1,000 pets.

-access patterns: do all queries cut across various types of pets, or are most queries type-aware? If you use multiple buckets, then queries that cut across types will need to use UNION and UNION ALL. If all your queries are type-aware, you probably want multiple buckets.

Finally, if you keep your documents in one bucket, you can index on type, and use type in your WHERE clause. This will help speed up your queries (see EXPLAIN).

Thanks.


#3

Also, @layerxfounder keep in mind that Couchbase also treats buckets not only as “namespaces”, but rather to allocate resources and authentication. Creating many buckets certainly adds some overhead, so I’d always start out from a single bucket and really only move on to more if it makes sense (both from a resource and application semantic perspective).


#4

Thanks Gerald,

I know there are a lot of other factors with design. I do in fact have an index on docType. Some queries are fast and some not so much. I also understand this is a developer preview and there are optimizations in the works.

Thank you.
Keith


#5

Which queries are slow? Can you use EXPLAIN to make sure they are using an index, and if they are still slow, please post the queries here (or send them to me privately). We certainly don’t want any queries to be slower than they should be, even in this developer preview.

Thanks,
Gerald


#6

It worked thanks!

One more quick question. I dont need the Default bucket. Is it fine to delete that? or it may cause some issue later on.

Sorry - this is on the wrong post.
Admin: pls delete/ignore.