N1QL "IN" query seems to have finite limit

vaiek · March 2, 2016, 4:15pm

I have a query like this, where metadata.another_field is the key for a global secondary index I have defined:

SELECT metadata.some_field FROM MY_BUCKET WHERE metadata.another_field IN  ["KEY1", "KEY2"]

I confirmed from the explain statement that it is using intended index, and the query works fine. When I explain this query I note this pattern:

           "spans": [
              {
                "Range": {
                  "High": [
                    "\"KEY1\""
                  ],
                  "Inclusion": 3,
                  "Low": [
                    "\"KEY1\""
                  ]
                },
                "Seek": null
              },
              {
                "Range": {
                  "High": [
                    "\"KEY2\""
                  ],
                  "Inclusion": 3,
                  "Low": [
                    "\"KEY2\""
                  ]
                },
                "Seek": null
              }
            ],
            "using": "gsi"
          }
        ]

However, if I supply more than about 17 keys, the spans component changes to:

        "spans": [
          {
            "Range": {
              "High": null,
              "Inclusion": 1,
              "Low": [
                "null"
              ]
            },
            "Seek": null
          }
        ],

And then a filter occurs on what appears to be the entire dataset.

As a result, fetching 10 items takes about 30ms, whereas fetching 100 items takes 30seconds (1000 times as long for 10 times the items).

Is there a better way to fetch items by specific keys from a global secondary index? Or is there a reason the query strategy changes (and become slow) if two many items are listed in the “IN” array?

geraldss · March 2, 2016, 10:48pm

Hello,

Yes. This is a heuristic in the planner. The reason is that IN is converted to OR, and OR can be combined with other logic to produce very complex planner combinations, so OR is heuristically bounded.

You can work around this using UNION ALL: put about 10 or 12 keys in each subquery, and then combine the subqueries using UNION ALL.