Paginating with n1ql

I am writing a DAL (Data Access Layer) which uses CB 4.1.
My goal is to use n1ql for any query operation, and obviously - paginating.

so I’ll list the issues by the story that unfolds them:

first I ran the following query (type field is secondary index):
select id, fname, lname from bucket where type=“user” limit 1;
I got a result, didn’t change anything, ran the query again and again and again and each time got a totally different result.
So I went on to check how Views handle this and it turns out that Views are returning the data sorted by natural order.
If n1ql is indeed using Views behind the scenes, and my made-up index created a view, then I should get the same result each time I ran the query… So question 1: why do I see this behavior?

I went on to try and tell n1ql to sort the data by the id.
I tried running the same query but with an ORDER BY meta(id). So yes, this time I get the same result each time BUT(!!!) the query takes 9 seconds(!!!) to run.
So I thought maybe n1ql doesn’t know how to “order by a meta field”, I’ll just use the id field I manually place in the value of the document so I made an index:

create index doc_id on bucket(id) using GSI;

tested it with a simple select query and it worked (3ms to get a document by an id, nice).

But then I ran my attempt to order only this time ORDER BY id… got the same 9 seconds response time.

Looked in the web, I saw that in fact n1ql is known to have an issue with ORDER BY that should have been fixed by now… so question 2: given all this, how to get a consistent limit+skip behavior from n1ql?

we move on to the most important part. I do not want to use limit+skip for actual pagination. I wish to use the startkey + startkey_docid mechanism that couchbase has in its REST API… so question 3: how do I use that mechanism with n1ql only?

all these 3 question will answer the most important thing: The smartest most efficient way to paginate in N1QL.

Thank you.

Hi @tzali, great questions.

First the solution, and then a little bit on the background. For the next upcoming release (post 4.1), we have fixed the performance of ORDER BY + LIMIT. See http://review.couchbase.org/#/c/57495/

Your best pagination query would be:

SELECT …
FROM …
WHERE order_key >= prev_value /* order_key must be indexed /
ORDER BY order_key
LIMIT page_size /
page_size should be less than 8k or some configurable value */

If you are touching a small number of fields (say 8 or less, e.g. type, id, fname, lname, order_key), you should consider a covering index (a composite index that starts with order_key and contains the other fields).

Next, some background:

N1QL uses a lot of concurrency internally. In the case of your LIMIT 1 query, or any query without an explicit ORDER BY, the concurrent processing means that the results will be returned in a random, unstable order. There is a setting in the N1QL REST API to override this concurrent behavior (max_parallelism) and force the results to appear in index scan order.

There are also some optimizations we can do, e.g. if you have a covering index and a LIMIT 1, we should not parallelize the processing.

But again, the fix for ORDER BY + LIMIT should resolve the issue in your case.

Thank you.

Hi @geraldss,

Thanks for the very quick reply.
Following questions would be, where and when can I get a stable version of the fix?
Is it already at the official downloads for mac os x and linux?

secondly, the query you wrote still requires an ORDER BY each time you call with pagination.
Wouldn’t that be extremely un-efficient for low write buckets?

Thirdly, and just to make sure, the ORDER BY happens after the WHERE in the query process of n1ql, right? Because my WHERE clause will need to be

WHERE order_key >= prev_value AND type=“documentTypeX” AND …

I am afraid that this means I’ll have to do compound indexes on each field I usually query on to have 2 indexes for each such field. One, the index I intended, and two, an index for pagination.

Thanks for your attention

Hi,

I would guess a couple of months to a preview or release with this fix. You can also get a nightly build before that.

The ORDER BY with this fix is quite efficient, because it only sorts the returned page. But if you want to avoid the ORDER BY altogether, use max_parallelism=1 and leave out the ORDER BY. Then you will get results in index order.

Hello Gerald,

Any update on a release date on this fix?
Currently we are avoiding using CouchBase 4 and n1ql in our production since it doesn’t provide proper pagination.

Thanks

@tzali, Developer Preview is planned in a few weeks.

Hi @geraldss,

Thank you for solution. The solution works fine when I don’t specify order of sort i.e ASC or DESC.

But as soon as I mention order as DESC , the query takes relatively time is ‘seconds’ as compared to that without of order, which was taking execution time in millisecond.

Below is query I am trying to execute.

select meta().id from cms where _createdAt>0 and _type=‘Product’ and _active=true order by _createdAt desc limit 100.

document structure as below

“cms”: {
"_active": true,
"_createdAt": 1467703399704,
"_createdBy": “admin”,
"_modifiedAt": 1467703399704,
"_modifiedBy": “admin”,
"_type": “Product”,
“attributes”: {
“143”: “gray”,
“144”: “32 inch”,
“model-no”: “SAMI1Y7Y00”,
“title”: "SAMI1Y7Y00 gray 32 inch 2 years ",
“warranty”: “2 years”
},
“brand”: “a4cda785-d7e1-4388-83b8-389e194e0caa”,
“categories”: [
“28”
],
“status”: “OFFLINE”
},
“id”: “Product::ELE.TV.91271872”

In bucket I have around 1 lakh such document, and I am trying to fetch only 100 doument’s id.

1 lakh is 1 million, right?

We will improve our support for DESC ordering in a future release. For now, you can do the following:

CREATE INDEX idx ON cms( - MILLIS( _createdAt ), _createdAt, _type, _active );

SELECT meta().id
FROM cms
WHERE _createdAt>0 AND _type=‘Product’ AND _active=true
ORDER BY - MILLIS( _createdAt )
LIMIT 100.

@geraldss, 1 lakh is 100,000. :slight_smile:

@keshav_m, I should know that, it’s the same in Swahili :frowning:

Hi @geraldss

I created index in as below.

CREATE INDEX idx ON cms( - MILLIS( _createdAt ), _createdAt, _type, _active );

Then I, executed this query

SELECT meta().id FROM cms WHERE _createdAt>0 AND _type=‘Product’ AND _active=true ORDER BY - MILLIS( _createdAt ) LIMIT 100;

explain of this query is

explain SELECT meta().id FROM cms WHERE _createdAt>0 AND _type=‘Product’ AND _active=true ORDER BY - MILLIS( _createdAt ) LIMIT 100 ;
{
“requestID”: “c9450ce8-d900-4ced-9efc-c5c9bd440ed9”,
“signature”: “json”,
“results”: [
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “PrimaryScan”,
“index”: “#primary”,
“keyspace”: “cms”,
“namespace”: “default”,
“using”: “gsi”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Fetch”,
“keyspace”: “cms”,
“namespace”: “default”
},
{
"#operator": “Filter”,
“condition”: “(((0 \u003c (cms._createdAt)) and ((cms._type) = “Product”)) and ((cms._active) = true))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “(meta(cms).id)”
}
]
}
]
}
}
]
},
{
"#operator": “Order”,
“limit”: “100”,
“sort_terms”: [
{
“expr”: “(-str_to_millis((cms._createdAt)))”
}
]
},
{
"#operator": “Limit”,
“expr”: “100”
},
{
"#operator": “FinalProject”
}
]
},
“text”: “SELECT meta().id FROM cms WHERE _createdAt\u003e0 AND _type=‘Product’ AND _active=true ORDER BY - MILLIS( _createdAt ) LIMIT 100”
}
],
“status”: “success”,
“metrics”: {
“elapsedTime”: “28.268462ms”,
“executionTime”: “28.223098ms”,
“resultCount”: 1,
“resultSize”: 2727
}
}

By seeing explain , I understand the query is not using this above created index.

i have primary index as , CREATE PRIMARY INDEX #primary ON cms

Could you please explain.

My apologies, there was a mistake in my previous post. Here is a fix. Note that 0 is not selective for timestamps. In a real query, you would want a more specific timestamp.

CREATE INDEX idx ON cms( - MILLIS( _createdAt ), _type, _active, _createdAt );

EXPLAIN SELECT meta().id
FROM cms
WHERE -MILLIS( _createdAt ) < 0 AND _type='Product' AND _active=true
ORDER BY - MILLIS( _createdAt )
LIMIT 100.

Hi @geraldss ,

Actually _createdAt > 0 is not our requirement. By seeing your previous comment, I used _createdAt > 0 in my query.
All i want is to fetch first 100 document of _type=product in descending order. Please suggest a way.

Moreover I used after creating the suggested Index idx
SELECT meta().id
FROM cms
WHERE -MILLIS( _createdAt ) > 1343892398000 AND _type=‘Product’ AND _active=true
ORDER BY - MILLIS( _createdAt ) LIMIT 100.

And i get No result. I creted index as
CREATE INDEX prod_createdAt_idx ON cms( - MILLIS( _createdAt ), _type, _active, _createdAt ) where ((_active = true) and (_type = “Product”));

below is explain

explain SELECT meta().id FROM cms WHERE -MILLIS(_createdAt) > 1469799932410 AND _type=‘Product’ AND _active=true ORDER BY - MILLIS( _createdAt ) LIMIT 100 OFFSET 0;
{
“requestID”: “84e7648d-217a-4bb0-9af0-d77d776ead0b”,
“signature”: “json”,
“results”: [
{
“plan”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Sequence”,
“~children”: [
{
#operator”: “IndexScan”,
“covers”: [
“cover ((-str_to_millis((cms._createdAt))))”,
“cover ((cms._type))”,
“cover ((cms._active))”,
“cover ((cms._createdAt))”,
“cover ((meta(cms).id))”
],
“filter_covers”: {
“cover ((cms._active))”: true,
“cover ((cms._type))”: “Product”
},
“index”: “prod_createdAt_idx”,
“index_id”: “21bdccdbfb9ed0db”,
“keyspace”: “cms”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“Inclusion”: 0,
“Low”: [
“1469799932410”,
“"Product"”,
“true”
]
}
}
],
“using”: “gsi”
},
{
#operator”: “Parallel”,
“maxParallelism”: 1,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Filter”,
“condition”: “(((1469799932410 \u003c cover ((-str_to_millis((cms._createdAt))))) and (cover ((cms._type)) = "Product")) and (cover ((cms._active)) = true))”
},
{
#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “cover ((meta(cms).id))”
}
]
}
]
}
}
]
},
{
#operator”: “Offset”,
“expr”: “0”
},
{
#operator”: “Limit”,
“expr”: “100”
},
{
#operator”: “FinalProject”
}
]
},
“text”: “SELECT meta().id FROM cms WHERE -MILLIS(_createdAt) \u003e 1469799932410 AND _type=‘Product’ AND _active=true ORDER BY - MILLIS( _createdAt ) LIMIT 100 OFFSET 0”
}
],
“status”: “success”,
“metrics”: {
“elapsedTime”: “44.812867ms”,
“executionTime”: “44.778825ms”,
“resultCount”: 1,
“resultSize”: 3701
}

If you use -MILLIS, you must negate the comparison. This is a bit complex because you want descending order.

Here is the query.

explain SELECT meta().id FROM cms WHERE -MILLIS(_createdAt) < -1469799932410 AND _type='Product' AND _active=true ORDER BY - MILLIS( _createdAt ) LIMIT 100 OFFSET 0;

Hi @geraldss,

Please share the reference document for index creation in couchbase.

As of now I’m still getting empty result with above query. Below are the 2 documents i am sharing. I need to fetch first 100 document of this type.

“cms”: {
"_active": true,
"_createdAt": 1469799932415,
"_createdBy": “admin”,
"_modifiedAt": 1469799932415,
"_modifiedBy": “admin”,
"_type": “Product”
},
“id”: “Product::08d38780-8b8b-4d89-96fd-5fb53ac980f6”
},
{
“cms”: {
"_active": true,
"_createdAt": 1469799853089,
"_createdBy": “admin”,
"_modifiedAt": 1469799853089,
"_modifiedBy": “admin”,
"_type": “Product”
},
“id”: “Product::25b34b39-83c3-464b-a499-1c2a4bc4d17e”
}

Hi,

Did you try going to google.com and typing “index creation in Couchbase”.

yes. but didn’t find relvent explanation for query with pagination.

I see the problem. _createdAt is already a number, so you don’t need MILLIS. Here is the new index and query.

CREATE INDEX idx ON cms( -_createdAt, _type, _active );

EXPLAIN SELECT meta().id
FROM cms
WHERE -_createdAt < 0 AND _type='Product' AND _active=true
ORDER BY -_createdAt
LIMIT 100;
1 Like

Thank you @geraldss.

The new index and query worked for me. I have total 3 queries on these type of documents.

Q1. SELECT meta().id FROM cms WHERE -_createdAt < 0 AND _type=‘Product’ AND _active=true ORDER BY -_createdAt LIMIT 100 OFFSET 0;

Q2. SELECT meta().id FROM cms WHERE -_createdAt < 0 AND brand=‘1389’ AND _type=‘Product’ AND _active=true ORDER BY -_createdAt LIMIT 100 OFFSET 0;

Q3. SELECT meta().id FROM cms WHERE _type = “Product” AND _active = TRUE AND ANY category IN categories SATISFIES category = “3748” END AND brand = “1389” AND -_createdAt < 0 ORDER BY -_createdAt LIMIT 100 OFFSET 0;

where brand and category is input parameter and it can be anything. Also here categories is of type Array. So I created these 3 indexes.

I1 . CREATE INDEX prod_created_idx ON cms((-_createdAt),_type,_active, meta().id) WHERE ((_type = “Product”) and (_active = true));

I2. CREATE INDEX prod_brand_created_idx ON cms(brand,(-_createdAt),_type,_active) WHERE ((_type = “Product”) and (_active = true));

I3. CREATE INDEX prod_cat_brand_createdidx ON cms((distinct (array category for category in categories end)),categories,brand,(-_createdAt),_type,_active) WHERE ((_type = “Product”) and (_active = true));

I get the expected results in milli seconds which is expected time. I also tried deleting first 2 indexes (
I1 and I2 ) and keeping only 3rd one (I3 ), But It does not worked for first 2 queries. So I created all the 3 indexes.

However I am missing a sort count (only in Q1) int metrics which i was getting in earlier queries.

I also need count of these type of document.

CQ1 SELECT count(meta().id) FROM cms WHERE _type = “Product” AND _active = TRUE;

and to do so I created another index CI1 for count

CI1 CREATE INDEX prod_count_idx ON cms((meta().id)) WHERE ((_type = “Product”) and (_active = true));

My document Structure is as follow

“cms”: {
"_active": true,
"_createdAt": 1469801004787,
"_createdBy": “admin”,
"_modifiedAt": 1469801004787,
"_modifiedBy": “admin”,
"_type": “Product”,
“attributes”: {
“2576”: “6”,
“type-of-product”: “Key Type Drill Chuck”,
“uom”: “piece”
},
“brand”: “948”,
“categories”: [
“3748”
],
},
“id”: “Product::MA.CH.MA.326662”
}

Please help me to understand ,

  1. Why i am missing sort count in 1st query Q1.
  2. Is there better way to query and get the metrics for total count as well.
  3. Is there better way to create index only for count.

Please explain.

Thanks
Amar

1 Like

Nice progress :slight_smile:

Indexes in Couchbase can be deployed on independent nodes (EE). So you should create the indexes you need, rather than try to minimize the number of indexes.

Now, to your specific questions.

Please help me to understand ,

  1. Why i am missing sort count in 1st query Q1.

sortCount is not available if you use index order. If you need sortCount, issue a separate COUNT query.

  1. Is there better way to query and get the metrics for total count as well.

same as above. Your index and query look fine.

  1. Is there better way to create index only for count.

Looks good.