Question about compound keys


#1

I have been read quite a couple of articles and webinars related to Couchbase.
But I am still not so clear about how should I design the relationship between documents…

Say we have User, Product, ProductsViewd (From webinar “How-To NoSQL Couchbase 104: Data Modeling”)
The webinar suggest we should make 3 types of document:

user::1001
{

}

user::1001::productsviewed
{
“productsList”: [ 8, 33, 99, 100 ]
}

product::8
{
id": 1,
“name”: “T-shirt”,

}

By using compounds key, I can predict key for ProductsView of a User.

But here is my questions:

What if the ProductsView document size increasing up to or over the limit of a document 20mbs?

  1. Every load and write operation will be a heavy operation or Couchbase is just fast enough that I don’t have to care about it?

  2. And when by reaching the limit of a document 20mbs, in order to save more data, I will have to create another document for ProductionsView as user::1001:productsviewed::02 to be another list of productviewed?

Also, the webinar suggest about the key should be predictable (the demo code at the end of webinar)
3) But say for example, the post of a user: user:1001:post:1 …user:1001:post:5
What if the post got deleted in the middle?
Then the key is no longer predictable, I will have to end up using a document like ProductViewed to list the posts.
So what is the points to make it predictable? I should only do it if the I know the Post will never got deleted?


#2

Hey @kchan4,

I might be able to answer some of your questions :smile:

(1) The 20MB limit was put into place to keep performance efficient. As long as you’re staying under that limit you shouldn’t have to worry about Couchbase being able to crunch through it.

(2) To keep documents smaller you can split them up in the fashion that you mentioned. You might do something like this:

user::1001::productsviewed::01
user::1001::productsviewed::01::filechunk::01
user::1001::productsviewed::01::filechunk::02

Where productsviewed::01 is the particular product and could contain information about it and filechunk::01/02 is maybe image data to go with that particular product.

(3) I may need a bit more information on this one, but from what I’m interpreting is that you might have the following:

user::1001::post::1
user::1001::post::2
user::1001::post::3
user::1001::post::4
user::1001::post::5

Then you’re asking, what happens when you remove user::1001::post::3 right? In the concept of say a blog, this wouldn’t matter. Chances are you’d be using a View or N1QL query to list all posts in which case this deleted document would just be skipped.

Let me know if something doesn’t make sense or if I can help you further.

Best,


#3

Thank for your reply @nraboy!

  1. I guess I don’t have to worry then!

  2. I don’t have to have duplicate product data, so I prefers to store them as product Id only. Updating a product require to update all the user’s productviewed sounds difficult…

  3. I think we are not going to use N1QL Query as is still in beta, so probably View only.
    Then I will get the posts of a particular user using view like the following?

say user is
user::1001
{
“type”:“user”,
“name”:“Foo”
}

post is
user::1001::post::1
{
“type”:“post”,
“title”:"Foo,
“content”:“Bar”
}

I can create view to give me post only by:
function (doc, meta)
{
if (doc.type == ‘post’)
{
var userid = meta.id.substr(0, meta.id.indexOf(’::post::’));
emit(userid, doc.title);
}
}

Then I can get the list of posts?
But to in order for me to get the detail of the post, instead of using
emit(userid, doc.title);
I should use
emit(userid, meta.id);
So then I will have post id to of the post and be able to do Delete/Update as well.
Is this the correct way to do?

=====
EDIT

Actually for 3, I notice it already does return the ID for post itself, so for value I will should emit the whole doc?
For example, in a forum to display all the post and the content for the post, I will required to have post’s Title, post’s Content, post’s DateTime.
In the webinar Index and View) it said better not emit the whole doc (I don’t really understand why…), then I should just use iterate each post ID and do get document for each post seperately?
I am just thinking if I emit the post’s doc will be easier…


#4

Hey @kchan4,

You might see the following for (3):

Ignore the fact that I linked you to a Node.js solution, as the core point (range queries) works across the other SDKs too.

The best solution for you would be to use a range query to get all documents prefixed with a certain value. You probably shouldn’t emit the whole document because it could become taxing depending on the document size and how many documents you have. Instead you should take the ID from each result of the range query and then grab the document in a separate request.

How does that sound?

Best,


#5

@nraboy If that’s more efficient in performance! I will do it that way then! Thanks!


#6

From what you described I think it would fit you best. Just without the delete logic of course :smile: