Getting distinct on the basis of a field with other fields

Gautam_Singhania · June 18, 2020, 9:25am

I have a document structure which looks something like this:

{
...
     "groupedFieldKey": "groupedFieldVal",
     "otherFieldKey": "otherFieldVal",
     "filterFieldKey": "filterFieldVal"
...
}

I am trying to fetch all documents which are unique with respect to groupedFieldKey. I also want to fetch otherField from ANY of these documents. This otherFieldKey has minor changes from one document to another, but I am comfortable with getting ANY of these values.

SELECT DISTINCT groupedFieldKey, otherField
FROM bucket
WHERE filterFieldKey = "filterFieldVal";

This query fetches all the documents because of the minor variations.

SELECT groupedFieldKey, maxOtherFieldKey 
FROM bucket
WHERE filterFieldKey = "filterFieldVal"
GROUP BY groupFieldKey
LETTING maxOtherFieldKey= MAX(otherFieldKey);

This query works as expected, but is taking a long time due to the GROUP BY step. As this query is used to show products in UI, this is not a desired behaviour. I have tried applying indexes, but it has not given fast results.

Actual details of the records:

Number of records = 100,000
Size per record = Approx 10KB
Time taken to load the first 10 records: 3s

Is there a better way to do this? A way of getting DISTINCT only on particular fields will be good.

vsr1 · June 18, 2020, 1:51pm

GROUP must materialize all the documents. DISTINCT on few fields not possible. You can try covering index

CREATE INDEX ix1 ON bucket(filterFieldKey, groupFieldKey, otherFieldKey);

Gautam_Singhania · June 19, 2020, 5:03am

I have created a covering index which is not helping. The bottleneck is the GROUP stage.
Look at the screenshot of text plan.