I having problems building a very simple query using the USE INDEX hint.
I’ve created this very simple index in order to optimize the queries:
CREATE INDEX multi_type_index on CONTENT(main.sysAttrs.punblicationInstant DESC,main.type ASC)
If I execute this query, the system responds very slowly:
select * from CONTENT USE INDEX (multi_type_index) WHERE main.sysAttrs.publicationInstant < NOW_MILLIS() AND main.type IN ['STORY', 'LIVE', 'INTERVIEW'] ORDER BY main.sysAttrs.publicationInstant DESC limit 10
The database has around 1M of documents and the explain shows that the system is not able to make an "order push down " so this seems the reason why this query is slow.
In Couchbase documentation explains that the structure in GSI index is B+Tree where the order of the elements within the index is based on the order of the declared fields in the index creation. In my case I’m respecting the order.
[(9999, ‘INTERVIEW’),( 9999, ‘STORY’),( 9999, ‘POLL’)…,(9998, ‘INTERVIEW’),(‘LIVE’,9998),(9998, ‘STORY’)…]
Why the system is not able here to do the “order push down”?
How can I build the index or the query in order to improve performance?
If I transform the “IN” operator by “OR” disjunctions the problem is the same.
IN clauses makes multiple range scans even though it uses index order we required to sort this requires produce all the qualified items.
One option: UNION ALL/UNION query for each IN as equality clause with limit AS subquery in parent query do same ORDER and LIMIT.
For me this is not an option.
I’m coding an abstraction layer in JAVA in order to transform from QueryDSL to N1QL and this kind of “tricks” seems a poor workaround.
In my opinion, this problem could be resolved if partial indexes wouldn’t require the same WHERE predicate for the index as for the query (or a subset of predicates). If a user specifies the USE INDEX hint, why the system should check if the query predicate match with the index one? This should be responsible of the programmer because he is using “USE INDEX” hint. If he uses it then he has consciousness about the data allocated in the index. This way could be more flexible and it makes the queries less verbose and more lighter. I want to clarify that this should be the behaviour if and only if he is using “USE INDEX” hint.
In this way, I wouldn’t need to specify the IN clause at query time but at index creation so the index only would have the data in the IN clause and the system doesn’t need to scan multiple ranges and should be able to push down the order, offset and limit processes to the indexer.
Even if USE INDEX is used it will check if the index qualifies the query to make sure it will always gives right results. Index selection result in wrong results query will not use that index.
Index WHERE clause can be super set of query where clause.
IN clause result to multiple range scans, At present the order is not preserved when multiple range scans so sort is required.
In my opinion, the fact that the index qualifies the query should be responsible of the programmer in “USE INDEX” hint because he knows the data in the index. If the result is wrong he has to change the query as he has to change a SQL query if the query doesn’t return the data that he has looking for. The system shouldn’t take care about it if user specifies the index. Could be useful if the user doesn’t specify the index but just in this case.