DB Design for high volume chat messages


Although I am fairly used to SQL, I am new to Couchbase (and NoSql in general), so please be kind.

I would like to understand how to model storing billions of chat messages originating from a typical IM app in Couchbase. I have modeled it in my SQL database as a table with appropriate columns for MessageId, Sender, Receiver, Message content and a couple of delivery status and acknowledgement fields. The typical use case for me is rapid inserts of new messages, and updates mostly updating the delivery and acknowledgement fields of recent messages as they are delivered to clients. Once a message is delivered to and acknowledged by a client, its rarely (if ever) read back from the database again.

What would be the correct way to model this in Couchbase? Assume 10000 new messages/sec inserts and 40000 updates on these 10000 messages/sec. Assume, one to one chat as the primary use case, although each person would have many buddies - pretty much like Whatsapp

Thanks, appreciate all feedback.


I am new to this but what I would think would work best is perhaps is to have a document for each chat, so that the users that are in the chat can load up that document because they are in the chat only requiring one document for each chat. (You would need to program it so that depending on the user it will pull out the appropriate message when it comes to WhatsApp, e.g. showing your messages on the left)

You load up the document, keep updating it after sending a message and having set it up with a live query to see if the person has send a message.

There probably some flaws in my idea if you see any do point it out :stuck_out_tongue:

Hey, thanks for replying. I understand you are suggesting I keep a separate document representing “conversation” between 2 people - one document that holds all chat messages exchanged between 2 chat buddies.

I have two follow up questions for this approach:

  1. I understand, it is not possible to partially update a document in Couchbase, so I would have to fetch an entire conversation, to update the status of a single field of a single message - seems like an awful waste of resources - Is there a way to simply insert a new message in a document without fetching the whole thing from the database or update a specific field using something analogous to an Update statement in a traditional SQL database?
  2. Also, it seems, Couchbase has a limit on the size of each document = 20M, its reasonable to hit this limit on a per conversation basis, how would I overcome this?

Having a target use case of billions of chats, consider the event sourcing pattern (the link provides a detailed overview of event sourcing). Though a chat system fits very nicely into the event sourcing pattern in general. The basic idea is each document in Couchbase would represent some event in the system. A new conversation started, a message sent, a message modified, a message deleted, a conversation deleted, a person joins a chat etc. The database is an event log that is reconstructed when viewed in the chat UI. periodically the business logic would generate snapshot events of a given segment of the events into a single document. This allows for read/re-assemble optimizations when looking at the entire history of events. Also this overcomes any document size limitations, since each event document would be relatively small. Since the database documents essentially double as an audit log you can perform audit operations on the same dataset, further optimizing and simplifying the system as a whole.