Help for optimizing


#1

Can you tell me what is the best index for the following query?

SELECT
   distinct order.userId, 
   ARRAY {p.name,p.language} for p in product END
FROM bucket order
JOIN bucket product
ON product.userId = order.userId AND product.type='Product'
WHERE order.type='Order' AND order.clientId=$clientId

#2
CREATE INDEX  ix1 ON bucket( clientId, userId ) WHERE   type='Order';
CREATE INDEX  ix2 ON bucket( userId , name, language) WHERE   type='Product';

SELECT order.userId, 
   ARRAY_AGG( {product.name,product.language})  AS products
FROM bucket AS order
JOIN bucket  AS product
ON product.userId = order.userId AND product.type='Product'
WHERE order.type='Order' AND order.clientId=$clientId
GROUP BY order.userId;

#3

@vsr1 thank you very much
I have 2 question

  1. How can I limit products in your query? I want to get 100 latest products ORDER BY product.createDate DESC and LIMIT 100
  2. Must I change ix2, if I want to ORDER BY product.createDate DESC

#4
CREATE INDEX  ix1 ON bucket( clientId, userId ) WHERE   type='Order';
CREATE INDEX  ix2 ON bucket( userId , name, language, createDate) WHERE   type='Product';

SELECT d.userId , (SELECT RAW p FROM d.products AS p ORDER BY p.createDate DESC LIMIT 100) AS products
FROM (SELECT order.userId, 
                 ARRAY_AGG( {product.name,product.language, product.createDate})  AS products
                 FROM bucket AS order
                 JOIN bucket  AS product
                ON product.userId = order.userId AND product.type='Product'
                 WHERE order.type='Order' AND order.clientId=$clientId
                  GROUP BY order.userId) AS d;

#5

@vsr1 can you tell me you you change ix2 ? and you put createDate at end?
It is just for covering? or it can be used in sorting?


#6

only for covering. you can’t use for sorting. In case of JOIN order by can be exposed on left most bucket only. all others order will be changed and we need explicit sort


#7

What about the following queries?

/*1*/
SELECT a , b FROM bucket WHERE c='C' AND d=$d ORDER BY z DESC


/*2*/
SELECT a , b FROM bucket WHERE c='C' AND d=$d GROUP BY y ORDER BY z DESC

Can you show me

  • Best index just for query 1 (If you use z let me know is it used for sorting or not)
  • Best index just for query 2
  • Best index (if possible) for both

#8
  1. predicates are equality so (c,d,z desc, a,b)
  2. it is group by and you can’t project a, b, z unless all those are aggregates. you can’t use index order .
    (c,d,y)

https://blog.couchbase.com/create-right-index-get-right-performance/
https://blog.couchbase.com/understanding-index-grouping-aggregation-couchbase-n1ql-query/


#9

@vsr1 WHERE c='C' AND d=$d here C is constant and $d is variable , Is it no better using partial index?
(d,z desc, a,b) and WHERE c='C'


#10

Index where clause must have static constants. NO query parameters. If those needs query parameters keep in index keys to perform better.


#11

Can you show me a query that have static constants? what is different of WHERE c='C' and WHERE type='User'


#12

Both are static you can use both. Any thing c = $1 or c=$C or not static constants.

if your queries doesn’t go include different type then type is good candidate in index WHERE clause.
In addition to above c values limited and use all values then you can create that many index and include index WHERE clause. If c values are many number of indexs become too much move to index keys.


#13

I create index and run the query
When order.clientId=$clientId contains some documents (i.e. less than 10K) query runs very fast , But when it includes millions document it take long time, event If I add LIMIT 100

SELECT d.userId , (SELECT RAW p FROM d.products AS p ORDER BY p.createDate DESC LIMIT 100) AS products
FROM (SELECT order.userId, 
                 ARRAY_AGG( {product.name,product.language, product.createDate})  AS products
                 FROM bucket AS order
                 JOIN bucket  AS product
                ON product.userId = order.userId AND product.type='Product'
                 WHERE order.type='Order' AND order.clientId=$clientId
                  GROUP BY order.userId 
                  LIMIT 100) AS d;

As it join and then limits , How can I first limit then join? to get faster result
I want to paginate it , product.userId >lastPageUserId AND order.clientId=$clientId LIMIT $perPage


#14

You have grouping, so u need to produce all results
check this out

https://blog.couchbase.com/offset-keyset-pagination-n1ql-query-couchbase/


#15

@vsr1 I have another very similar query, for push notification , I want to join two document types, User and FcmTokens
Each User can have 1:n FcmTokens, for a broadcast notification, I need to fetch all users with their FcmTokens, I want to fetch tokens as fast as possible, currently it take about 20 minutes for fetching with covered index
Do you know how can I speed up it?(less than a minute)


#16

Post your query and index definitions


#17

It is exactly same as your suggested query ,just renamed

CREATE INDEX  ix1 ON default( clientId, userId ) WHERE type='User';
CREATE INDEX  ix2 ON default( userId , token, language, updateDate) WHERE type='UserFcmToken';

SELECT d.userId , (SELECT RAW t FROM d.tokens AS t ORDER BY t.updateDate DESC LIMIT 100) AS tokens
FROM (SELECT user.userId, 
                 ARRAY_AGG( {token.token,token.language, token.updateDate})  AS tokens
                 FROM default AS user
                 JOIN default  AS token
                ON token.userId = user.userId AND token.type='UserFcmToken'
                 WHERE user.type='User' AND user.clientId=$clientId
                  GROUP BY user.userId) AS d;

I talk about 5 million results, the query take 20 minutes to calculate 5 million results

I want to send notification to all users as fast as possible , fetching tokens from database must not take 20 minutes


#18

GROUP BY, ORDER BY must produce all qualified documents first before finalizing and streaming.

If you need streaming you can do following which is almost streaming

    CREATE INDEX  ix1 ON default( clientId, userId ) WHERE type='User';
    SELECT DISTINCT userId FROM  default WHERE clientId=$clientId AND type='User';

For Each User repeat the following query

  CREATE INDEX  ix2 ON default( userId , updateDate DESC, token, language) WHERE type='UserFcmToken';

  SELECT    token, language, updateDate, userId
  FROM default 
  WHERE type='UserFcmToken' AND userId = $userId 
  ORDER BY updateDate DESC
  LIMIT 100;

CE limits number of query service cores to 4.


#19

CE limits number of query service cores to 4.

Can you refer me to document?
What it means exactly? Per node? Or in whole cluster?


#20

It is Per Query node. In CE you have all services (Data, index, query) on same node. If you have 8 cores
query maximum uses 4 cores (GOMAXPROCS https://golang.org/pkg/runtime/), other services may be using all 8 cores.

https://blog.couchbase.com/couchbase-server-editions-explained-open-source-community-edition-and-enterprise-edition/

High Performance Concurrent Query Execution