Optimizing query performance for IN statements

alperkanat · March 24, 2017, 2:22pm

Hi There,

I’ve a query in which I’m trying to get count of some documents based on some id. Here’s the query:

SELECT xid, COUNT(DISTINCT yid) AS count
FROM fooboo
WHERE _class = "com.foo.bar"
    AND appId = "app1"
    AND MILLIS_TO_STR(MILLIS(createdAt), "2006-01-02") > "2017-02-22"
    AND xid IN ["fba0a67e-aa20-4333-8404-4fb382f20c9c","beae20e8-4aac-4b1e-9520-e15e619c9b97"]
GROUP BY xid

Earlier, the code was calling this query for each individual xid without the AND xid IN ... part. But when the number of xid's grow, the query performance degraded. Therefore I changed the query like above. I also added the following index:

CREATE INDEX `pid_appId_xid_yid_createdAt_idx`
ON `fooboo`(`appId`,`xid`,`yid`,`createdAt`)
WHERE (`_class` = "com.foo.bar")

When the xid array includes 100 elements, the query performs around 2.67 seconds. Is there a way to improve this further?

atom_yang · March 24, 2017, 3:07pm

can you post explain?

alperkanat · March 24, 2017, 3:17pm

gist.github.com

https://gist.github.com/tunix/bbf1ae152e26492c978b5997a08a105d

data.json

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan",
          "covers": [
            "cover ((`fooboo`.`appId`))",
            "cover ((`fooboo`.`xid`))",

This file has been truncated. show original

atom_yang · March 24, 2017, 3:46pm

How about create index by

CREATE INDEX `pid_appId_i_xid_yid_createdAt_idx`ON `default`(`appId`, DISTINCT ARRAY i FOR i IN `xid` END,`yid`,`createdAt`) WHERE (`_class` = "com.foo.bar")

and rewrite N1QL by

SELECT xid, COUNT(DISTINCT yid) AS count
FROM fooboo
WHERE _class = "com.foo.bar"
    AND appId = "app1"
    AND MILLIS_TO_STR(MILLIS(createdAt), "2006-01-02") > "2017-02-22"
    AND ANY i IN xid SATISFIES i IN ["fba0a67e-aa20-4333-8404-4fb382f20c9c","beae20e8-4aac-4b1e-9520-e15e619c9b97"] END
GROUP BY xid

let’s see explain and time.

vsr1 · March 24, 2017, 3:57pm

If createdAt is in ISO-8601 started it is string comparable. And change query comparison accordingly. If not created index on SUBSTR(createdAt,0,10) and use in query.

CREATE INDEX `pid_appId_xid_yid_createdAt_idx`
ON `fooboo`(`appId`,`xid`,`createdAt`,`yid`)
WHERE (`_class` = "com.foo.bar");

SELECT xid, COUNT(DISTINCT yid) AS count
FROM fooboo
WHERE _class = "com.foo.bar"
    AND appId = "app1"
    AND createdAt > "2017-02-22"
    AND xid IN ["fba0a67e-aa20-4333-8404-4fb382f20c9c","beae20e8-4aac-4b1e-9520-e15e619c9b97"]
GROUP BY xid