Indexing a distinct

Hi,

Below is a query that is currently taking 9 seconds to run and I was hoping to get some help in reducing the execution time.

query = 'select distinct type from default'

running an explain shows this:

[
{
 "#operator": "Sequence",
"~children": [
  {
    "#operator": "PrimaryScan",
    "index": "#primary",
    "keyspace": "default",
    "namespace": "default",
    "using": "gsi"
  },
  {
    "#operator": "Parallel",
    "~child": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "Fetch",
          "keyspace": "default",
          "namespace": "default"
        },
        {
          "#operator": "InitialProject",
          "distinct": true,
          "result_terms": [
            {
              "expr": "(`default`.`type`)"
            }
          ]
        },
        {
          "#operator": "Distinct"
        },
        {
          "#operator": "FinalProject"
        }
      ]
    }
  },
  {
    "#operator": "Distinct"
  }
]
  }
]

It does not seem to be using the index on ‘type’.

This my index definition:
CREATE INDEX docType ON default(type) USING GSI

Any suggestions on how I can make the query run much faster?

Thanks!

DISTINCT processing is done by the index. However you do need to specify the following in the where clause as GSI does not index MISSING or NULL values;

select distinct type from default where type is not missing and type is not null;

@cihangirb thanks for the help.

sadly, adding those two where clauses increased the execution time from 9 seconds to 11 seconds :frowning: . would a view work better than a GSI in this case?

i’m still not 100% sure the difference between a view and a GSI - at a high level i would think a GSI is less overhead on the db - is that true?

Hi,

We are adding covering indexes to address this issue. As @cihangirb indicated, here is the correct query, which will automatically benefit when we add covering indexes:

select distinct type from default where type is not null;

A slight tweak to @cihangirb’s note: GSI does not index MISSING, but it does index NULL.

Thanks @geraldss

Do you have any rough ETA for this? I can’t throw this query into production because of the 9-10 second wait time and the load on the db.

Currently,

select distinct type from default where type is not null;

takes 10.4 seconds to return data and this:

select distinct type from default;

takes 9.1 seconds. @cihangirb query is at 11.4.

Anyway around this issue until covering indexes are live?

For production, it will be a couple of months. We will release a DP (developer preview) version before that.

1 Like

If the entire bucket has docs with type and not a lot of documents are missing type, the index adds an additional hop. scanning the bucket seems more efficient.

Hi,

What is the total number of documents being returned with the query. If the number of documents is large then the fetch between query engine and the couchbase server could be the bottleneck.

If that is the case then we would try increasing some of the system parameters to increase the fetch throughput

Cheers,
Manik

@manik,

the result set, currently is only about 20-30 rows. not much at all.

we have renamed nickel to nyquill (nyql) until we get proper indexes from couchbase :slight_smile: - joking aside, n1ql is very useful and we would love to see couchbase push to massively improve performance. we’ll keep using it in the meantime, but with the nyquill name since it does put a few devs to sleep.

we did a test where we indexed every single query we use in the mentioned script - used explain to double check if the indexes were being used, and then guess what - script execution time went from 45secs to 66secs :smile:. we then deleted all the indexes.

cheers

Hi @whollycow07,

Well that nyquill for you is like red bull for us. What is the total number of documents in the bucket ? Even if the final result-set is small, the query could still be fetching a large number of documents.

You also could try using view indexes and see if that improves things. I doubt that it would but it’d be worth a try. To create an index using view you need to do replace USING GSI with USING VIEW.

Cheers,
Manik