Use MultiGet or views (if using natural keys)?


#1

Hi!

I am trying to understand some key design principles for a document db.

  1. Assume i have a natural key for users, so upon registration i create a uid

uid=cb.incr("u::counter)+1

and create the user key:
u::uid

To create a relationship with another user/document, i add a separate document with the key

u1::friend::u2

which has the connected friends in the JSON document

The user json document has a “no_friend” field which i increase for every accepted friend. As such, i already know they keys i need to request for display friends of user X when i get the user document. Thus simply do a:

multiGet (u:1::friend::u:X)

Is this a good,scalable, data model for relationships ?

Is this operation more or less expensive (time wise) than creating a view to emit friends type documents that belongs to u:1 ?

I guess the question is when to apply views if it can be access by keys, never ?

  1. Natural key seems very logical, when should i not use natural keys as in the example above ? I know some are using UID v4 or any other random keys.

Thanks!


#2

Accessing a document by it’s primary key is always going to be faster than fetching it via a mechanism involving secondary indices, so you are right that a multiget will have the better performance in this case. This really comes by the nature of the structure, since in a simplified way a primary key lookup just has to hit one spot in memory to read the data and return it back to you, while a secondary index is by nature at least one level of indirection. This being said looking at your data model, what is the reasoning behind the document linking the friends, it seems to me like you could just store an array of friends in each user:

user:joe = {
name: joe,
birthdate: 1984-07-07,
friends: [
user:carl,
user:homer
]
}

just as an idea.

For your 2. question, views are useful when it is not possible to predict the keys you want to request, for example documents holding logging information you want to access for a time range like all documents for 2014-01-07 to 2014-07-07. In this case the range query on date keys can be very handy. Or you have documents you want to access by secondary properties but building access via referential keys is to much overhead due to infrequent access. The 3. example I can come up with right now is a real time search, something like a music library with search by artist or album, using views it is easy to create searches like “Starts with Ab” or “Starts with Toxic”. As the user adds more letters you can fire of refined searches to the view doing range queries on the key.


#3

Thanks for your detailed answer pfehre, make sense indeed. Yeah i know, i am still debating with myself if i should go with embedded vs separated documents. Life seems easier with embedded, but using embedded seems to be to limit yourself if the writes will be heavy and the twitter effect (thousands of friends) becomes real. It just feels more secure to have linking documents and an aggregated sum in the actual user document.

Thanks!


#4

Just to explain how I would build such a model, maybe you are interested:

I would go for embedding the relationship document, as I see it does not contain much information besides the 2 user ids being linked, and maybe a status and some date fields. The writes should be ok since a user won’t gain thousands of friends in under a second most likely, and if he does I think the little higher response time will be ok. To combat the ever growing document I would consider creating a cutoff like 1000 (or whatever makes sense), and store all following friends in a seperate document so it is kind of paginated, this will also allow you to fetch the friends in a paginated way as well without much additional work. This can be done by creating a reference as the last item in the array for example.

Just thought it might be interesting to you to see my approach, I don’t know your specific situation so take it with a grain of salt.