Couchbase 4.5 Order by diffrence in performnace for ASC and DESC even query covered

Hi @geraldss

Very much interested in resolving issue i tried below option also but no gain in time
ID1
{
groups[“1”,“2”,‘3’]
entity:‘ENT1’,
users:[‘User1’,‘User2’,…‘User1000’]
numbericvalue1:585,
numericvalue2…numericvalue60
}

ID2
{
groups[“1”]]
entity:‘ENT1’,
users:[‘User1’,‘User2’,…‘User1000’]
}

like this

created array index on groups

since both groups and users are array i created only in one (groups)

select META().id from default where any grp in groups satisfies grp=1 End limit 30 (Good reponse)

select META().id from default where any grp in groups satisfies grp=1 End order by {any field} limit 30 (Order by Adds slowness)

select META().id from default where any grp in groups satisfies grp=1 End and ARRAY_CONTAINS(users,‘USER1’) order by {any field} limit 30 (Very very slow)

Hi @krishnasahu2016,

At present if the query has JOIN clause it will not use index order.
create index such that left side of JOIN can push as many constant predicates to indexer and if possible to make it covered.

CREATE INDEX idx_totturnover ON test(groupid, type, totturnover, entity, user );

EXPLAIN SELECT META(a).id
FROM test a JOIN test b ON KEYS (a.entity)
WHERE a.groupid=1
AND a.type='FIRST_FILTER'
AND a.totturnover IS NOT MISSING
AND b.type='SECOND_TYPE'
AND ARRAY_CONTAINS(b.users,a.user)
ORDER BY a.totturnover ;

select META().id from default where any grp in groups satisfies grp=1 End limit 30;

Above query doesn’t use order by so when first 30 documents are satisfied the query stops.

select META().id from default where any grp in groups satisfies grp=1 End order by {any field} limit 30

Above query has order by and limit 30. We need to get all the qualified documents before we decide 30 documents. If the number of qualified documents are more it takes more time.

select META().id from default where any grp in groups satisfies grp=1 End and ARRAY_CONTAINS(users,'USER1') order by {any field} limit 30

In addition to order by and limit you have extra predicate, which almost iterating the array for each qualified documents. It takes more time.

You can try the following.

CREATE INDEX idx100 ON default(DISTINCT ARRAY u FOR u IN users END);
select META().id from default where any u in users satisfies u = "USER1" End and ARRAY_CONTAINS(grp,1) order by {any field} limit 30

Also post details on.
Number of documents in bucket
Number of distinct groups and number of documents in each group
number of distinct users and number of documents each user

1 Like

HI @vsr1

There are 800 to 900 users per doc so creating index on users further slows down search and ordering

Order by currently is very very costly operation even on 100K docs

below is details

Number of documents in bucket :: 200K
Number of distinct groups and number of documents in each group :: 40 odd groups and 100K docs satifying highest match
number of distinct users and number of documents each user :: tot users 1500 , 100K documents per users within 200K

Hi @krishnasahu2016,

   The query predicates qualify 50% of the documents on top of that you are predicates searches in big arrays. 
    
    Try using max_parallelism, increase query service REST api parameters pipeline-batch, pipeline-cap parameters.
1 Like

Hi @krishnasahu2016

There are only so many ways to improve sort performance. Best sort performance is to avoid it totally.

Here is a way to improve your performance.

  1. Use covering index to fetch the documents you need, in the order you need.

  2. You can THEN do the filtering and LIMIT on top of this.

  3. This will avoid the ORDER BY, and FETCH of the whole 200K documents.

    create index idx1 ON default(field1, field2, field3);
    select *
    from (select * from default order by field1) d
    where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
    limit 30;

1 Like

Added where clause for keshav_m post to pick the index
select *
from (select * from default where field1 is not missing order by field1) d
where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
limit 30;

hi @keshav_m @vsr1

Thank you for suggestion

what about order by desc or offset 50000 (Will this slowdown query even i use covered index sorting)

HI @keshav_m @vsr1

below query looks to have bug limit is getting applied on inner select statement rather on external select

select *
from (select * from default where field1 is not missing order by field1) d
where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
limit 30;

Hi @krishnasahu2016,

Will check it. Thanks for the information.

Could you try this.
sselect * from default where field1 is not missing AND “USER1” IN users AND 1 IN grp ORDER BY field1 limit 30;
OR add limit (huge value) to inner query

hi @keshav_m @vsr1

Tried suggestion of adding more limit in inner select
looks another issue or bug

select *
from (select * from default where field1 is not missing order by field1 limit 200000) d
where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
limit 30 offset 50000;

covering index does not return sortCount in n1qlmetrics for pagination
now if i have to use count to get data matching filter then goes back to same point

adding offset adds slowness

one more point all data is in Ram then what is the requirement of disk fetch for any join or sorting and i confirm all my data seating in Ram

Hi @krishnasahu2016,

not giving sortCount in metrics is not a bug or issue. If the top level query doesn’t use order by (If query uses index order the sortCount will be same as limit because we don’t look how many entries qualified) it will not give the sortCount (it is json).

For pagination, If sortCount is missing, if resultCount < limit there are no more records else more records might be there.

limit 30 offset 50000 - query service needs to process 50,030 records even though it gives 30 to client.

Data is in RAM on DataNode. Query Service needs to get the information from Index node, Data node to process the query.

hi @geraldss @keshav_m @vsr1

Strange issue covering index causing data loss

e.g.

created index on grps array index now if i filter by grp 1 then count is 41

query

select 1 from test where any grp in grps where grp=1 and array_contains(users,‘user1’) limit 100 (Count is 41)

now i changed my index

added users along with grps array index as covering index now

select 1 from test where any grp in grps where grp=1 and array_contains(users,‘user1’) limit 100 (Count is 35) records less

is this bug or any extra changes required

Hi @krishnasahu2016,

Provide the exact index definitions and EXPLAIN output of the both queries.
Also could you please check the queries. It is not valid ANY syntax.
select meta().id, test and find out what are those 6 documents missing and see what are the values of grps and users and why it eliminated.

Hi @vsr1

without covering its

create index indx on test(all array grp for grp in grps end) where type=‘test1’

select 1 from test where any grp in grps satisfiles grp=1 end and type=‘test1’ and array_contains(users,‘user1’) limit 100 (return 41 records)

now i have changed index definition to covering index

create index indx on test(all array grp for grp in grps end,grps,users) where type=‘test1’

select 1 from test where any grp in grps satisfiles grp=1 end and type=‘test1’ and array_contains(users,‘user1’) limit 100 (return 35 records)

checked documents except users list nothing diffrence

HI @vsr1

Only diffrence is covering added in explain statement

One more info in index itself number of indexed documents itself is diffrent

e.g.

create index indx on test(all array grp for grp in grps end) where type=‘test1’ tot indexed dcouments 290K

and

create index indx on test(all array grp for grp in grps end,grps,users) where type=‘test1’ 254K documents

clearly shows less data

Note:- if i add any field in array indexing causing data loss

Hi @krishnasahu2016,

Did u try DISTINCT instead of all.
Could you post the missing documents grps,users

tried DISTINCT still data loss

may not be able to post data have restriction for this, but if you try with sample documents in ur enviroment same issue will result sorry not able share my data

Hi @krishnasahu2016,

The sample documents works. No need exact data, you can change it and include for grps,users fields.

{“grps”, missing field}
{“grps”:null,…}
{"grps:[],…}
{“grps”:[search condition included or not or contains null or missing],…}