Need Opinion - Is it good idea to switch from MongoDB for use case of Structured Messenger App?


#1

Hello everyone,

I am quite new over here and gone through almost every white paper and benchmarks available about CouchBase and MongoDB :slight_smile: I need help in terms of opinion to decide a right backend for the platform we are building. I hope this is the right place to ask!

We are basically building structured messaging app. i.e. each message is structured (like poll) along with responses stored for all users in a (big) group. We also need to store status for each message (not sent, broadcasted, delivered, seen). Each document will be small but too many documents for just one message. (e.g. 1 poll with group of 5000 people = 1 (main message) + 5000 (status records) + 5000 (responses)).

Do you think its right use case for CouchBase? I am heavily inspired by Viber case study. As we simply can’t store these many messages in longer run to MongoDB. So looking for fast, modern and compressed database solution.

We are already running prototype on CouchBase but simple query (select *) having ~8000 records takes ~300ms. Is it normal or we misconfigured the production machine? (4 cores 6G single node).


#2

Anyone?

Thanks for your time!

Ashvin


#3

Anyone cares to reply?

Thanks,

Ashvin


#4

Hi, Ashvin,

I’m glad that you are considering Couchbase for your use cases. Yes, Viber is a perfect use case of Couchbase as their backend data platform for their message app. As you may know, they switched from Mongo + Redis to Couchbase.

From the data model perspective, yes, JSON seems to be a good fit for your use cases. You can either use distinct documents for different data types or you can use nested documents.
This is a good resource for data modeling in Couchbase: http://www.slideshare.net/Couchbase/couchbase-103-data-modeling

In terms of performance, I’ll need more information before I can answer your question:
which version of Couchbase are you using?
Are you using view query or N1QL query? N1QL and GSI are new features we introduced in CB 4.0.
What’s the exact query you are trying to run?
Are you seeing any resource constraint when you are running the query (like CPU or RAM max out)?

Thanks,
Qi


#5

Qicb,

Thanks for the reply and adding pulse to the case! Here are sample documents.

Flavor 1 ( 88.0%) , in-common: type = “message_status”

delivered (number), e.g.: 0
messageId (string, indexed), e.g.: 29103c226cbff83b79e374df8ddbce…
pending (number), e.g.: 1455883578840
seen (number), e.g.: 0
sender (string), e.g.: b5565e2350620eba50244d408deaca…
type (string, indexed), e.g.: message_status
userId (string, indexed), e.g.: b5565e2350620eba50244d408deaca…

Flavor 2 ( 12.0%)

created_at (number), e.g.: 1456390779
data (object), child type:
response (number), e.g.: 1
selectedPosition (number), e.g.: 0
value (number), e.g.: 77789
id (string, indexed), e.g.: f16b10ab434c013c7d8fcab6f493a9…
messageId (string, indexed), e.g.: 6e8a15d63fb3a0096c82827069a8f7…
response (number), e.g.: 1
selectedPosition (number), e.g.: 2
selectedPositions (array), e.g.: [0,1,2,3]
time (number), e.g.: 1456390779040
type (string, indexed), e.g.: message_response
updated_at (number), e.g.: 1456390779
userId (string, indexed), e.g.: b9c3da53bb897b8933a2c11208afb4…
value (number), e.g.: 19

Sample Query that takes 1.28s just to reterive 5000 records and took 2.2s for 10,000 records.

select * from connectly where type = ‘message_status’.

We are using N1QL query via Node and PHP drivers. Server is 4.0 CE with 4 cores and 6G ram as standalone node setup.

Indexed queries are also taking more time than same JSON in MongoDB with WiredTiger or RocksDB. And you were right, sometimes (randomly) it even spikes CPU. Also we noticed that such situation queued up all queries.

Any silly mistake we are doing? Do you prefer to use REST query instead of N1QL?

Thanks once again for your time,

Ashvin


#6

Sorry missed the actual JSON if that adds value

{
“myTable”: {
“delivered”: 0,
“messageId”: “3c6bfb79228909bd67de0356fa2c956c”,
“pending”: 1457095728359,
“seen”: 0,
“type”: “message_status”,
“userId”: “b5565e2350620eba50244d408deaca37”
}


#7

Hi, Ashvin,

Thank you for all the detailed information.

Couple of thoughts:

  1. Just to be honest here, N1QL/GSI in 4.0 is in their first version in CB4.0. We are working on lots of improvements in 4.1 and 4.5 in the following areas: Garbage collection, improve query throughput/latency, reduce resource consumption etc.

  2. Go back to your use cases, I have several ideas to offer:

  3. Have you tried prepared statement in N1QL? This is a doc link about prepared statement using REST API:
    http://developer.couchbase.com/documentation/server/4.1/n1ql/n1ql-language-reference/prepare.html
    This is the doc about how to use prepared statement in node:
    http://developer.couchbase.com/documentation/server/4.0/sdks/node-2.0/n1ql-queries.html
    Use prepared statement will speed up your queries.

  4. When you are performing queries, are you expecting to receive lots of mutations in the system as well?
    Because a single node here is responsible for KV/Index/NIQL, if the incoming mutation rate is high, indexing may overload the system and queue up queries, like you mentioned.

  5. If there’s a resource contention on the server side, you may need to increase CPU/Ram on server side. If enough resources are available, you can change the number of cores to indexer based on your needs. You can go to setting pages and adjust number of indexer thread there. But if indexer/query are competing for CPU, system performance will downgrade.

I’m not sure if I have completely answered your questions but I hope it’s helpful.
Because your dataset is small and I think this is a good use case for memory optimized index we introduced in CB4.5.
http://developer.couchbase.com/documentation/server/4.5-dp/in-memory-indexes.html

During internal testing, we observed performance improvement in memory optimized index.

CB 4.5 is DP now and Beta is coming out shortly.

Let me know if you have any more questions.

Thanks,
Qi