"OR" 100 times slower than "AND"

debon.julien · December 19, 2016, 7:22pm

Hi,

I’m using Couchbase 4.5.1-2844 Enterprise Edition (build-2844).

I have an index myTestIndex on fields field1 and field2.
CREATE INDEX myTestIndex ON \my-data` (field1, field2)`
When I run a simple query, depending on using AND or OR operator, the plan changes totally, and the performance with it (AND is fast, OR is slow).
~50 000 documents.

AND:
SELECT * from \my-data` WHERE field1=“value1” AND field2="value2"~14ms executionEXPLAIN SELECT * from `my-data` WHERE field1=“value1” AND field2=“value2”`

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan",
          "index": "myTestIndex",
          "index_id": "9bbe2a824e011185",
          "keyspace": "my-data",
          "namespace": "default",
          "spans": [
            {
              "Range": {
                "High": [
                  "\"value1\"",
                  "\"value2\""
                ],
                "Inclusion": 3,
                "Low": [
                  "\"value1\"",
                  "\"value2\""
                ]
              }
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Fetch",
          "keyspace": "my-data",
          "namespace": "default"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Filter",
                "condition": "..."
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "self",
                    "star": true
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    }
  }
]

OR:
SELECT * from \my-data` WHERE field1=“value1” OR field2="value2"~2000ms executionEXPLAIN SELECT * from `my-data` WHERE field1=“value1” OR field2=“value2”`

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "PrimaryScan",
          "index": "#primary",
          "keyspace": "my-data",
          "namespace": "default",
          "using": "gsi"
        },
        {
          "#operator": "Fetch",
          "keyspace": "my-data",
          "namespace": "default"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Filter",
                "condition": "..."
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "self",
                    "star": true
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    }
  }
]

As you can see, with the OR statement, it doesn’t use myTestIndex, and (I guess) the fetch results in massive performance degradation.

geraldss · December 19, 2016, 10:58pm

You need two changes.

(1) Add another index on field2.

(2) Use UNION / UNION ALL instead of OR.

debon.julien · December 19, 2016, 11:17pm

Do you mean, an index on field2 only? Because I already have tried with either an index on both fields, or an index on each field (so, 2 indexes). Every time, performance was poor.
I could never find an example or way of making UNION work with a COUNT.
This works:
SELECT * from \my-data` WHERE field1=“value1” UNION SELECT * from `my-data` WHERE field2=“value2”`

But I would like something like this:
SELECT COUNT(*) FROM (SELECT * from \my-data` WHERE field1=“value1” UNION SELECT * from `my-data` WHERE field2=“value2”)`

Thank you for the help

geraldss · December 20, 2016, 3:44am

Can you apply all my previous suggestions and report the results? Your original question was not about COUNT. We can deal with that after your original question.

debon.julien · December 20, 2016, 6:38am

Thank you @geraldss, with UNION the request is fast again (50-60ms), and the execution plan does use my index.

However how can one use a COUNT with the UNION/INTERSECT syntax please?
And isn’t it weird that the execution plans are so different between OR and AND?

debon.julien · December 20, 2016, 1:05pm

Actually I found a way to make a count on an UNION:

SELECT COUNT(*) FROM (SELECT * from `my-data` data WHERE field1="value1" 
UNION
SELECT * from `my-data` data WHERE field2="value2") unionResult

The key thing is you need to give a name to the outer FROM element (here, unionResult) otherwise N1QL blows without much help.

The root problem remains, though: how come OR gets an execution plan without covering indexes? While AND does.

geraldss · December 20, 2016, 3:33pm

Yes, this is the recommended approach. To use a covering index, N1QL currently requires a single index, not two separate as required for OR.

keshav_m · January 2, 2017, 5:41pm

If you’re simply doing COUNT(*), you can do the COUNT() in each subquery and do a SUM on the outer query. That’ll be significantly faster as well. If you’re using 4.5 or above, you should see IndexCountScan in your explain.

geraldss · January 2, 2017, 5:50pm

Improvement coming in 4.6.1. https://issues.couchbase.com/browse/MB-22143

debon.julien · January 6, 2017, 3:08pm

@keshav_m I don’t think simply adding the 2 counts works.
card(A U B) = card(A) + card(B) - card(A intersect B)
But I guess if I do the 3 counts (since A intersect B is fast) and then do the addition and subtraction, this will work… It just seems like a lot of manual pain when I would expect Couchbase to handle it for me.

@geraldss Thanks, it seems like it will definitely fix my performance issue if I understand it correctly.