Very slow performance on query without index on simple documents

cihangirb · January 7, 2016, 2:57am

Thanks for the comments @alsa,
I had a chance to work in developing both databases in question so happy to answer in details.

I can understand why you are surprised by overhead of a simple query at the through of “1”. However I’d point out a few issues:

Couchbase is a distributed system vs SQL Server is a single node monolithic process. So Comparing a scale up system to a distributed system on one node can be misleading when measuring performance.
Performance measurement under throughput of 1 vs 1000 queries/sec can yield very different performance curves. Under a non-concurrent execution of a single query SQL Server have an advantage. However in reality systems will deal with more than 1 query executing at any 1 time and increasing the throughput, Couchbase Server will gain the advantage. With higher throughput, you will find the throughput ceiling with sql server that you cannot shatter. Couchbase is tuned to scale-out. Almost all production systems run on multi node deployments with Couchbase for availability and scalibility. And under higher throughput and when scaled out, Couchbase will perform better.
Indexes are more important to Couchbase: Couchbase is designed to distribute data elastically as nodes come and go/fail. We do redistribution of data for better availability. Unlike SQL Server, we cannot hard-code a “local access path” to data at all times. So we have more of a dependence on indexing in Couchbase to locate the data. Good news is we have built the system to sustain many more indexes compared to SQL Server for better scalability and performance. and that is my next point…
Indexing has lighter cost on Couchbase as opposed to SQL Server: In relational systems and even in many nosql systems (mongodb for example), index maintenance is done as part of the data mutations (INSERT/UPDATE/DELETE). This means indexing is a big compromise between fast inserts vs fast queries. Couchbase is architected differently and tradeoff between fast insert vs fast query is not there. We have customers running with >100 indexes with sub-millisecond mutations. So you can “over-index” in couchbase.

Aside from perf/scalem, you identified the difference between relational and NoSQL technology well. Flexibility is a big reason to come to native JSON databases. I’ll add a few more reasons to your list;
- Develop with schema agility and without ALTER TABLE/downtime: Let app drive schema and continuously change the app.
- Eliminate ORMs to translate your data from objects to relational and back on each call.
- More importantly, Get a flexible data model. What that means is you can model your data to optimize your access paths with more options in the data model. You can choose to normalize all the way much like what you’d do in relational and use JOINS with Couchbase. OR use the power of JSON to embed natural relationships like orders + order_line_items into a single document or anything in between.

I hope this helps answer your question.

Last a quick tip: this will not improve your 19 min scan to seconds but it will improve the perf of the scan somewhat. Given you don’t concurrently execute many queries, you can use the max_parallelism to utilize the full machine and get your scan to parallelize. We don’t do this auto-magically yet. Unfortunately trying this will require a different tool. cbq tool is not designed for high performance query executions so does not allow this. You need to do this through curl or another REST API capable tool like chrome browser. I like to use a tool like postman. Here is the sample REST API call;

http://10.0.0.45:8093/query/service?statement=select * from default where a2=4703626&max_parallelism=4

attached the screenshot of my execution with max_parallelism on my small Couchbase 4.1 cluster. First execution shows a 5K item scan throughput with max_parallism=1 the next execution with taller 10K/sec read throughput is run with max_parallelism=2.