Couchbase 4.0 Data Modelling

I have an application with entities like User, Message and MessageFeatures. Each User can have many messages and each message has a MessageFeatures entity. Currently the relational model is expressed as:

User{
  UUID id
  String email
  ...
}
Message{
  UUID id,
  UUID userId
  String text
  ....
}
MessageFeatures{
  UUID id
  UUID messageId
  UUID userId
  PrimitiveObject feature1
  ....
  PrimitiveObject featureN
}

The most important queries are:

  • Get all messages for user
  • Get all message features for a user
  • Get message by uuid
  • Get/Update message feature by uuid
  • Get message feature by message uuid

Less important(can be slow) queries are like :

  • Get message features where user_id = someuuid and featureX = value
  • Get all/count user uuids for which featureX = value
  • update message features set featureX = newValue where featureX = oldValue

While evaluating couchbase i am unable to arrive at a proper data model. I do not think putting all messages and message features for a user in a single document is a good idea because the size will keep on increasing and based on current data it will easily be in range of 4-5 MB for 2 year data. Also to maintain consistency i can update only one message feature at a time as atomicity is per document.

If i do not place them in a single document they will be scattered around the cluster and queries like get all messages/messagefeatures of a user will result in scatter and gather.

I have checked out global secondary indexes and N1QL but even if I index user_uuid field of messages it will only help in fetching the message_uuids of that user, loading all the messages will result in scatter and gather…

Is there a way to force that all messages, message features of a user_uuid get mapped to a same physical node without embedding them in the same document something like hashtags in redis.

How about creating different documents for each message. Let message features be a part of the message document. Let the message UUID be the key for the respective message document. Create User UUID as a property in the message document.

Data model
//User document

User::UUID
{
type:“user”,
uuid:"",
email:""

}

// Message document with features as a property for the document
Message::UUID{
type:“message”,
userid: “”,
text:"",
MessageFeatures: {
UUID:"",
messageID:"“
userid:”“
feature1:”",

featureN":""
}

}

  1. CREATE PRIMARY INDEX
  2. CREATE Secondary Index on userid of the message document.

So all your queries as mentioned below will be using indexes
Get all messages for user – Secondary Index
Get all message features for a user – Secondary Index
Get message by uuid – Primary Index
Get/Update message feature by uuid – Primary Index
Get message feature by message uuid – Primary Index

Hope this solves the problem for you.

Thanks @trustanish but Get/Update message feature by uuid will not use primary index as i will have features uuid and not message uuid.

Also will get all messages by user_id be efficient as it will result in a secondary index evaluation and than a multiget across many data nodes which will not hold data in ram.

Also will it be prudent to maintain own index like user::uuid::messageids which is a document holding all message uuids of that user… this will remove the need for secondary indexing and one can directly do a multiget.