Best approach to building a Chat Application



We are currently using Couchbase as a data store for our chat application. Each user of our system has the ability to start a single conversation with another user or a group conversation with many of them. Currently we’re using this “schema”:

{ "id":1, "type": "SINGLE", "users": [1,2], "last_update": 1123455 }

{ "id":2, "type": "GROUP", "users": [1,2,3], "last_update": 1123455 }

[{"id":1, "type": "SINGLE", "last_update": 1123455}, {"id": 2, "type": "GROUP", "last_update": 1123455}]

[{"id":1, "type": "SINGLE", "last_update": 1123455}, {"id": 2, "type": "GROUP", "last_update": 1123455}]

[{"id":2, "type": "GROUP", "last_update": 1123455}]

[{"page":1, "first_id": 0, "last_id": 2, "last_date": 1123455}]

[{"id": 0, "date": 1234}, {"id": 1, "date": 12346}, {"id": 2, "date": 1123455}]

{"id": 0, "text": "Hello", "sender": 1, "date": 1234}

We need to do all the Indexing by hand because couchbase views are not consistent enough (if a user receives a new message push notification and opens the application, the message has to be there - I believe this is called “Read your own writes”). We need to query messages between given dates, so we need those indexes.
I think is obvious to say that supporting concurrency with this approach is a pain (we need to update a lot of documents every time a message is sent)

My question is: Are we using views wrong or there is a way to use views to get rid of all those by-hand indexes? I know that we can set the staleness of our queries, but this is a HUGE view and the last time I tested it was a performance killer.

In an ideal scenario we would love to have:

{ "id":1, "users": [1,2], "last_update": 1123455 }

{"id": 0, "text": "Hello", "sender": 1, "date": 1234}

and link those two between a view, so we can query all the messages of the conversation 1 between a range of dates.

We’re using Couchbase Server 3.0

One of the use cases of Couchbase is said to be “Chat Messaging” and I know Viber uses it as well, so the fact that we need to do such a complicated thing to solve our problem gets me to think we’re using this thing wrong. Another idea is to use Redis to do the indexing, but using a single datastore solution will be preferred.

We expect to have a lot of Users and conversations/messages (we are one of the most played games on both iOS and Android in the US, South America and Europe).

PS: In this article, Cihan Biyikoglu says:
“Stale=false type queries are super useful. Imagine building a messaging app. If the message you sent does not appear in your “sent messages” folder (which is typically a view query), you may send it again! Or you save a new playlist in your mobile app, you go back to your playlists and if you don’t see that new playlist, you are confused! So stale-false is essential!”

In that case, if we use views, what setup should we use to achieve Full Consistency on an index which has millions of updates a day?

Thanks in advance!


I like to know, as well, the correct use of STATE, it is a pain that the data is not there.


Hi folks, sorry I missed this post earlier.
performance of views depend on a few things; the fanout (#nodes involved in view query processing), disk write performance for the view, #views/design docs, view definitions (map and reduce code complexity) and finally the staleness setting (stale-false vs ok). stale-false starts the query only after indexer has indexed all operations up to the moment of the stale-false query timestamp. this is currently the only way to guarantee RYOW (read your own write). View performance is constantly increasing and we have some enhancements in 4.0 as well. So I’d highly recommend looking at 4.0 or even better 4.1 for performance of views.

With 4.0, we also introduced N1QL and GSI (Global Secondary Indexes). in 4.0 you can use the same staleness settings to get the RYOW guarantee but One important improvement in N1QL and GSI for query latency is we are not required to fan out the query if we have the right index (GSI) for your query. Covering indexes and prepared execution can cut down on the query execution even more.

We are also adding a new option for faster RYOW queries. We are introducing a new setting in between stale-false and stale-ok that can give you a bounded consistency setting. We have not named this new setting but lets call this stale-at-timestamp. the new setting would let you grab the timestamp of your update and pass that to your query, so we don’t have to update the index to the timestamp of the request but only up to the point of the “update” you care about. We will be introducing this option in the upcoming releases for N1QL. If you are not satisfied with the performance of views and stale-false, highly recommend you look at N1QL and explore the stale-at-timestamp setting when that becomes available.

if you are interested learning more about it, feel free to reach out to me at