Can't Use Index on Order By and Group By

Hi,
I am using 7.1.3 Community version.

I’ve created an index by using this statement

CREATE INDEX `occurred_at_index` ON `audit`(`occurred_at`)

It’s using my index while using this query

select * from audit where occurred_at = 1234;

Query Explain :

{
“~children”: [
{
“keyspace”: “audit”,
“spans”: [
{
“exact”: true,
“range”: [
{
“inclusion”: 3,
“index_key”: “occurred_at”,
“high”: “1234”,
“low”: “1234”
}
]
}
],
#operator”: “IndexScan3”,
“using”: “gsi”,
“index_projection”: {
“primary_key”: true
},
“namespace”: “default”,
“index”: “occurred_at_index”,
“index_id”: “54564e8817b57318”
},
{
“namespace”: “default”,
“keyspace”: “audit”,
#operator”: “Fetch”
},
{
“~child”: {
“~children”: [
{
#operator”: “Filter”,
“condition”: “((audit.occurred_at) = 1234)”
},
{
#operator”: “InitialProject”,
“result_terms”: [
{
“star”: true,
“expr”: “self”
}
]
}
],
#operator”: “Sequence”
},
#operator”: “Parallel”
}
],
#operator”: “Sequence”
}

It’s good, I have no problem with this.

But on Order By and Group By clause its not using my index.

For Order By:

select * from audit order by occurred_at;

{
“~children”: [
{
“~children”: [
{
“namespace”: “default”,
“keyspace”: “audit”,
“index”: “#primary”,
#operator”: “PrimaryScan3”,
“using”: “gsi”,
“index_projection”: {
“primary_key”: true
}
},
{
“namespace”: “default”,
“keyspace”: “audit”,
#operator”: “Fetch”
},
{
“~child”: {
“~children”: [
{
#operator”: “InitialProject”,
“result_terms”: [
{
“star”: true,
“expr”: “self”
}
]
}
],
#operator”: “Sequence”
},
#operator”: “Parallel”
}
],
#operator”: “Sequence”
},
{
#operator”: “Order”,
“sort_terms”: [
{
“expr”: “(audit.occurred_at)”
}
]
}
],
#operator”: “Sequence”
}

AND for Group By:

select occurred_at from audit group by occurred_at;

{
“~children”: [
{
“namespace”: “default”,
“keyspace”: “audit”,
“index”: “#primary”,
#operator”: “PrimaryScan3”,
“using”: “gsi”,
“index_projection”: {
“primary_key”: true
}
},
{
“namespace”: “default”,
“keyspace”: “audit”,
#operator”: “Fetch”
},
{
“~child”: {
“~children”: [
{
“group_keys”: [
“(audit.occurred_at)”
],
#operator”: “InitialGroup”,
“aggregates”:
}
],
#operator”: “Sequence”
},
#operator”: “Parallel”
},
{
“group_keys”: [
“(audit.occurred_at)”
],
#operator”: “IntermediateGroup”,
“aggregates”:
},
{
“group_keys”: [
“(audit.occurred_at)”
],
#operator”: “FinalGroup”,
“aggregates”:
},
{
“~child”: {
“~children”: [
{
#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “(audit.occurred_at)”
}
]
}
],
#operator”: “Sequence”
},
#operator”: “Parallel”
}
],
#operator”: “Sequence”
}

After creating this →

CREATE INDEX occurred_at ON `audit`(`occurred_at` INCLUDE MISSING)

index my both queries above is working with Order By and Group By clause smoothly.

But when I use Order By clause with DESC it’s not using my index and I need to create another index like this →

CREATE INDEX occurred_atDESC ON `audit`(`occurred_at` DESC INCLUDE MISSING)

With these 2 indexes my queries working well. But creating index consume my resources. So the problem is having 2 indexes for both ASC and DESC because its costs.

In Postgresql we have just one index and this index working well with where, order by or group by.

My questions are;

  1. Is there anyway to having just one index for where, order by, group by and etc. clauses?
  2. Why indexes in Couchbase working like this?

Index selection is performed on filters; simply adding ORDER BY or GROUP BY is insufficient to match the index. If you add a suitable filter, such as field IS VALUED your index should be selected.

e.g.

select * from audit where occurred_at is valued order by occurred_at;

HTH.

(Ref: https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/selectintro.html#index-selection )

1 Like

@odyildiz ,

At present couchbase doesn’t have REVERSE IndexScan (MB-33311). If needs to Use Index Order it need separate index. Once It has ReserveScan based on requirement it can scan order it benifit most.

1 Like

Thanks for your answer @dh . Its worked on group by and order by.

But what should I do for DESC order

select * from audit where occurred_at is valued order by occurred_at DESC;

It uses my index occurred_at_index but working very very slow. It works in the same amount of time as the non-index version

Thanks @vsr1, So if I want to get in DESC order I need to have seperate index like this →

CREATE INDEX `occurred_at_DESC_index` ON `audit`(`occurred_at` DESC)

right?

Then I can use this query →

select * from audit where occurred_at is valued order by occurred_at DESC;

For DESC order right?

It should work (only draw back is you have to maintain two indexes) and index should have chosen automatically.
If not give USE INDEX hint

select occurred_at 
from audit 
group by occurred_at 
order by occurred_at DESC;

GROUP BY can disturb index order unless index aggregation is done.

The following is better as you need unique values only (no grouping and stream results)

SELECT DISTINCT occurred_at 
FROM audit 
ORDER BY occurred_at DESC;
1 Like

I tried what you told and it works great when I dont have any other WHERE clause, thank you @dh @vsr1 . But when I try to add filter to my query something strange is happened.

I have several indexes. These are 2 of them->

CREATE INDEX `occurred_at_DESC_index` ON `audit`(`occurred_at` DESC)
CREATE INDEX `account_id_index` ON `audit`(`account_id`)

When I run my this query it takes ≈20 seconds->

SELECT * 
FROM audit 
WHERE occurred_at is valued AND account_id = 1234
ORDER BY occurred_at DESC
LIMIT 10;

EXPLAIN →

{
“~children”: [
{
“~children”: [
{
“keyspace”: “audit”,
“spans”: [
{
“exact”: true,
“range”: [
{
“inclusion”: 3,
“index_key”: “account_id”,
“high”: “1234”,
“low”: “1234”
},
{
“inclusion”: 0,
“index_key”: “entity_id
},
{
“inclusion”: 0,
“index_key”: “event_reason
},
{
“inclusion”: 0,
“index_key”: “member_id
},
{
“inclusion”: 0,
“index_key”: “detail.listingId
},
{
“inclusion”: 0,
“index_key”: “event_channel
},
{
“inclusion”: 0,
“index_key”: “occurred_at”,
“low”: “null”
}
]
}
],
#operator”: “IndexScan3”,
“using”: “gsi”,
“index_projection”: {
“primary_key”: true,
“entry_keys”: [
6
]
},
“index_keys”: [
“_index_key ((audit.occurred_at))”,
“_index_key ((meta(audit).id))”
],
“namespace”: “default”,
“index”: “all_fields_idx”,
“index_id”: “986f144bf23e6543”
},
{
“flags”: 1,
“limit”: “10”,
#operator”: “Order”,
“sort_terms”: [
{
“expr”: “_index_key ((audit.occurred_at))”,
“desc”: “"desc"”
}
]
},
{
“namespace”: “default”,
“keyspace”: “audit”,
#operator”: “Fetch”
},
{
“~child”: {
“~children”: [
{
#operator”: “Filter”,
“condition”: “(((audit.occurred_at) is valued) and ((audit.account_id) = 1234))”
},
{
#operator”: “InitialProject”,
“result_terms”: [
{
“star”: true,
“expr”: “self”
}
]
}
],
#operator”: “Sequence”
},
#operator”: “Parallel”,
“maxParallelism”: 1
}
],
#operator”: “Sequence”
},
{
#operator”: “Limit”,
“expr”: “10”
}
],
#operator”: “Sequence”
}

But this query just happened in milliseconds →

SELECT * 
FROM audit 
WHERE occurred_at is valued AND account_id = 1234
ORDER BY occurred_at
LIMIT 10;

EXPLAIN →

{
“~children”: [
{
“~children”: [
{
“scans”: [
{
“keyspace”: “audit”,
“spans”: [
{
“exact”: true,
“range”: [
{
“inclusion”: 3,
“index_key”: “account_id”,
“high”: “1234”,
“low”: “1234”
}
]
}
],
#operator”: “IndexScan3”,
“using”: “gsi”,
“index_projection”: {
“primary_key”: true
},
“namespace”: “default”,
“index”: “account_id_index”,
“index_id”: “2844dbaf5b917188”
},
{
“keyspace”: “audit”,
“spans”: [
{
“exact”: true,
“range”: [
{
“inclusion”: 0,
“index_key”: “occurred_at”,
“low”: “null”
}
]
}
],
#operator”: “IndexScan3”,
“using”: “gsi”,
“index_projection”: {
“primary_key”: true
},
“namespace”: “default”,
“index”: “occurred_at_DESC_index”,
“index_id”: “53f5d96b8a0c346e”
}
],
#operator”: “IntersectScan”
},
{
“namespace”: “default”,
“keyspace”: “audit”,
#operator”: “Fetch”
},
{
“~child”: {
“~children”: [
{
#operator”: “Filter”,
“condition”: “(((audit.occurred_at) is valued) and ((audit.account_id) = 1234))”
},
{
#operator”: “InitialProject”,
“result_terms”: [
{
“star”: true,
“expr”: “self”
}
]
}
],
#operator”: “Sequence”
},
#operator”: “Parallel”
}
],
#operator”: “Sequence”
},
{
“limit”: “10”,
#operator”: “Order”,
“sort_terms”: [
{
“expr”: “(audit.occurred_at)”
}
]
},
{
#operator”: “Limit”,
“expr”: “10”
}
],
#operator”: “Sequence”
}

My occurred_at index ordered DESC, but my second query works faster then first. Do you have any idea why it’s happening and how can I fix this?

CREATE INDEX ix1 ON audit(account_id , occurred_at DESC)

I may use till 7 fields in where clause. Do I need to have different index for each combination?

Or is it ok having all fields in one index like →

CREATE INDEX ix1 ON audit(field1, field2, field3, field4, account_id , occurred_at DESC);

Can this index cover all combination if I just query on (field1, field2, field3, field4, account_id , occurred_at) these fields ?

Index maintain keys in b-tree you may not able to use index order for all combinations