"OR" 100 times slower than "AND"

Hi,

I’m using Couchbase 4.5.1-2844 Enterprise Edition (build-2844).

I have an index myTestIndex on fields field1 and field2.
CREATE INDEX myTestIndex ON \my-data` (field1, field2)`
When I run a simple query, depending on using AND or OR operator, the plan changes totally, and the performance with it (AND is fast, OR is slow).
~50 000 documents.

AND:
SELECT * from \my-data` WHERE field1=“value1” AND field2="value2"~14ms executionEXPLAIN SELECT * from `my-data` WHERE field1=“value1” AND field2=“value2”`

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan",
          "index": "myTestIndex",
          "index_id": "9bbe2a824e011185",
          "keyspace": "my-data",
          "namespace": "default",
          "spans": [
            {
              "Range": {
                "High": [
                  "\"value1\"",
                  "\"value2\""
                ],
                "Inclusion": 3,
                "Low": [
                  "\"value1\"",
                  "\"value2\""
                ]
              }
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Fetch",
          "keyspace": "my-data",
          "namespace": "default"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Filter",
                "condition": "..."
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "self",
                    "star": true
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    }
  }
]

OR:
SELECT * from \my-data` WHERE field1=“value1” OR field2="value2"~2000ms executionEXPLAIN SELECT * from `my-data` WHERE field1=“value1” OR field2=“value2”`

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "PrimaryScan",
          "index": "#primary",
          "keyspace": "my-data",
          "namespace": "default",
          "using": "gsi"
        },
        {
          "#operator": "Fetch",
          "keyspace": "my-data",
          "namespace": "default"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Filter",
                "condition": "..."
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "self",
                    "star": true
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    }
  }
]

As you can see, with the OR statement, it doesn’t use myTestIndex, and (I guess) the fetch results in massive performance degradation.

You need two changes.

(1) Add another index on field2.

(2) Use UNION / UNION ALL instead of OR.

  1. Do you mean, an index on field2 only? Because I already have tried with either an index on both fields, or an index on each field (so, 2 indexes). Every time, performance was poor.

  2. I could never find an example or way of making UNION work with a COUNT.
    This works:
    SELECT * from \my-data` WHERE field1=“value1” UNION SELECT * from `my-data` WHERE field2=“value2”`

But I would like something like this:
SELECT COUNT(*) FROM (SELECT * from \my-data` WHERE field1=“value1” UNION SELECT * from `my-data` WHERE field2=“value2”)`

Thank you for the help :blush:

Can you apply all my previous suggestions and report the results? Your original question was not about COUNT. We can deal with that after your original question.

Thank you @geraldss, with UNION the request is fast again (50-60ms), and the execution plan does use my index.

However how can one use a COUNT with the UNION/INTERSECT syntax please?
And isn’t it weird that the execution plans are so different between OR and AND?

Actually I found a way to make a count on an UNION:

SELECT COUNT(*) FROM (SELECT * from `my-data` data WHERE field1="value1" 
UNION
SELECT * from `my-data` data WHERE field2="value2") unionResult

The key thing is you need to give a name to the outer FROM element (here, unionResult) otherwise N1QL blows without much help.

The root problem remains, though: how come OR gets an execution plan without covering indexes? While AND does.

1 Like

Yes, this is the recommended approach. To use a covering index, N1QL currently requires a single index, not two separate as required for OR.

If you’re simply doing COUNT(*), you can do the COUNT() in each subquery and do a SUM on the outer query. That’ll be significantly faster as well. If you’re using 4.5 or above, you should see IndexCountScan in your explain.

1 Like

Improvement coming in 4.6.1. https://issues.couchbase.com/browse/MB-22143

@keshav_m I don’t think simply adding the 2 counts works.
card(A U B) = card(A) + card(B) - card(A intersect B)
But I guess if I do the 3 counts (since A intersect B is fast) and then do the addition and subtraction, this will work… It just seems like a lot of manual pain when I would expect Couchbase to handle it for me.

@geraldss Thanks, it seems like it will definitely fix my performance issue if I understand it correctly.

1 Like