Speeding up N1ql queries and indexing as it relates to N1ql


#1

Is it possible to create indexes to speed up N1QL queries as it is in SQL databases? Could I create a view or views that would be utlilized by N1QL? Could an N1QL query be inspected by the Couchbase Server and appropriate indexes for it be created/cached by the server?

If no to any of the above, are there plans to add such features in the future?

Also, leaving my suggestions aside, how do I speed up N1QL queries with for instance ORDER BY statements and WHERE statements in a large bucket (10,000+, eventually will be 10s or 100s of millions or more items)? So how to speed up N1ql queries for practical use?


#2

Hi There!, yes yes and yes… However you should keep in mind performance isn’t part of the goals of the preview. Neither is the full set of indexing facilities. They will come with the release next year.

Today, n1ql will take advantage of views you create. With with upcoming release, we will also introduce a new indexing mechanism for secondary indexing that will speed things further for various types of queries beyond views.

Let me know if you are interested in getting a better understanding of the new indexing capabilities. I’ll be happy to walk you through what we are building. I am at cihan@couchbase.com.


#3

Hi neatcode,

In addition to what Cihan said, please try out the CREATE INDEX and EXPLAIN statements.


#4

@cihangirb That’s great! Also thanks for the personal contact info I may take you up on that.
@geraldss Thanks for the tip, I’ll look into those.

Is there a place for pointers on how to create views that will speed up N1QL queries? I like N1QL because it is familiar to me, views not so much… but I get more confused when I need to figure out what views to create to speed up a particular query.

It would be nice to have a documentation page or blog post that lists various possible N1QL queries, and demonstrates the views for speeding them up. It would go in-depth as to how the Couchbase N1QL Engine picks each view it uses in its query, as well. Like I’m wondering how in the hell Couchbase knows a view is appropriate for a given N1QL query, I’m having trouble conceptualizing how that would work. It sounds like A.I. reading the source code for views or something.

For now I could use some help on a specific query. I’m currently working with this query and it takes around 55 seconds (!) to return and growing (bucket size is 25,535 items presently):

var query = N1qlQuery.fromString(
	"SELECT META().id As id, UIDValidity, UID, * FROM messages AS message"
	+ " WHERE message.type = \"message_header\""
	+ " AND message.account = \"neatcode@yahoo.com\"" //@TODO
	+ " ORDER BY UIDValidity DESC, UID DESC LIMIT 1000");

What are some full examples of views that would improve the speed of this query to something reasonable? Also, not only what are some examples of views that this query would utilize, but what is the ideal set of views that would be optimal for this query?


#5

Hi neatcode,

Here are the 2 statements you would issue in the cbq command line shell:

CREATE INDEX msg_acct_idx ON messages(account);

EXPLAIN SELECT META().id … (rest of your query here)

The documentation is here: http://docs.couchbase.com/prebuilt/n1ql/n1ql-dp3/


#6

Yes I had already done this based on your earlier advice. There was no noticeable speed up, although EXPLAIN showed that the index was being used. Presently in my data, every item has the same account so that might be why. I also tried adding indexes for every operand, still no noticeable speed up; the query averages 55 seconds. If I had to guess, I’d say alot of the time is spent sorting. How do I speed that up? Interestingly, the query is about 20-30 seconds faster from cbq than it is from node for some reason.

Anyways, isn’t there a way to use views to speed up the N1QL query, as @cihangirb suggests?


#7

Oh, I see that CREATE INDEX creates views. I guess I’d be curious about how to create my own view manually to speed up this query.

However, perhaps this just isn’t at the stage I need for my project. I need <2 second query times or so. I find translating the SQL I would write on a SQL database into views is difficult and/or impossible (ie compound ORDER BY statements potentially with WHERE clauses that contain wildcards). And views can be stale (although a parameter can be provided to guarantee a view gets run before the query is performed, how long could this take?), whereas an index in MySQL or MSSQL is up-to-date on query. So N1QL seems promising; at this stage I’ll probably have to use something else like Mongo until N1QL is mature.


#8

We are working on a different indexing strategy to complement views. They are secondary indexes that are maintained and built differently. Even if you end up using something else for time pressure, it would be good to review and get your feedback. let me know if you are interested in arranging something like that.

By the way, with the new indexing strategy, we still provide flexibility on the stale-ness of the index. You can get full consistency with your index like relational databases but the downside you will have with always-consistent indexing is that every index you create will slow down the updates to data. just bound by the laws of nature… if you can however get down to read-your-own-write type consistencies you could lower latency for queries and allow much more efficient index maintenance with much less impact on the incoming updates.

thanks
-cihan


#9

Hi neatcode,

N1QL does not use indexes for sorting; it uses indexes for filtering. So if you had a WHERE clause that selected a small subset of your data, you would see significant speedup from using the index.

Sorting should be faster in N1QL DP4 than DP3 (DP4 upcoming). We are happy to discuss your specific data and queries.


#10

@cihangirb If these secondary indexes would possibly improve performance on the SQL query I gave above from 55seconds to <2 seconds I’d be interested in tinkering with them. About stale-ness, I really like that the new indexing strategy provides for always-consistent indexing. I don’t mind too much that it slows down updates and inserts.

Now you said:

if however you can get down to read-your-own-write type consistencies you could lower latencey for queries…

Is read-your-own-write type consistency already available? How does this work/where is the documentation? Or is that something planned? Would it work like you get a token when you do a write to the database, and for any reads that might depend on that write, you send that same token with the read request / query and that ensures I get my own write? (As described here: https://docs.oracle.com/cd/E17076_04/html/gsg_db_rep/C/rywc.html)

@geraldss Shouldn’t indexes be used for sorting? Are there plans to integrate this as a standard feature? I’m not an expert in this area so maybe I’m wrong.

It’s good sorting will be faster in DP4, I look forward to testing it when it comes out.

@cihangirb @geraldss Another concern I have with using Couchbase which is preventing me from settling on it… on my primary development machine which is a Windows 8 box, Couchbase eventually spins up the CPU to around 100% and stays there, and I have to uninstall and reinstall Couchbase to fix it, then rebuild the test database. I have a ticket out on the issue: http://www.couchbase.com/issues/browse/MB-12775 .


#11

@neatcode, we will notify you here when DP4 is available for you to test with your data. To your specific questions – the consistency modes will be provided as simple API settings. And yes, indexes can be used for sorting in some cases. DP4 uses a general-purpose high-speed parallel sort instead.


#12

@neatcode, it is hard to say what the exact performance will be at this point without knowing HW and env specifics. It is also hard because we are not finished full implementation of secondary indexing as of today but I’d expect things to drastically improve from a full scatter gather.

Full consistency is available today with views - you can pass stale=false and wait until all items to get indexed up to the point of the time of query execution. RYOW would relax this and it isn’t available today but we are looking to introduce it with N1ql queries and yes the mechanics you describe are close. Final SDK implementation may be smoother however.

Let me look into the 100% cpu util issue on Windows. I know we do a ton of background data collection and poll nodes and we cloud be better about detecting a quiet system and dialing these down but under high load, you should have no issue getting great throughput from Couchbase. If you are seeing something to the contrary, pls let me know.

thanks
-cihan