Returning distinict document matching the aggregate MAX result with GROUP BY

I have a set of score records, say:

{ 
  "userId":"user_123", 
  "score":123.4 
},
{ 
  "userId":"user_123", 
  "score":234.5 
}

And I am selecting them using:

select distinct userId, meta(mybucket).id, max(score) as maxscore from mybucket 
where score is not null and score is not missing
group by userId
order by score desc

The results returned will return a record with correct max(score) for the distinct userId, but, the meta(mybucket).id and any other fields unique per record will not be from the record with max score - they will be from any record in the set of records for that distinct userId.

How can I return only one record where score matches the max(score) for distinct userId?

Here is a live example using tutorial:

There are two records with lname ‘Jackson’:

select fname, lname, age from tutorial where lname like 'Jackson'

{
  "results": [
    {
      "age": 18,
      "fname": "Fred",
      "lname": "Jackson"
    },
    {
      "age": 20,
      "fname": "Harry",
      "lname": "Jackson"
    }
  ]
}

When I select with grouping by lname and max(age) i get expected result. Except distinct will return any one of the records within the set. So Fred is actually 18, but is returned here by distinict.

select distnict lname, fname, max(age) as maxage from tutorial 
group by lname 
order by age desc

{
  "results": [
    {
      "fname": "Ian",
      "maxage": 56
    },
    {
      "fname": "Dave",
      "maxage": 46
    },
    {
      "fname": "Earl",
      "maxage": 46
    },
    {
      "fname": "Jane",
      "maxage": 40
    },
    {
      "fname": "Fred",
      "maxage": 20
    }
  ]
}

How can I make sure that the returned result is pulling fname from the record with max(age) instead of one of the records from the group by set?

Hi @enko, you can do the following:

SELECT userId, MAX( [ score, META(mybucket).id ] ) AS pair
FROM mybucket
WHERE score IS NOT NULL
GROUP BY userId
ORDER BY pair DESC;
1 Like

Gerald thank you!

Ended up with:

select userId, 
max([score,meta(mybucket).id])[0] as score,
max([score,meta(mybucket).id])[1] as scoreId,
nick
from mybucket
where score is not null and app = 'my_app' 
group by userId
order by score desc
limit 33 offset 67
1 Like

Hi @geraldss,

Given that I need to return multiple rows, what is the best syntax for this?

@nacion.emelson,

Above query returns multiple documents; one per group. Post your question separate thread.

If you need each group multiple documents you must use ARRAY_AGG() and further trim using sub query expression.

SELECT d.userid, 
       (SELECT d1.id, d1.b.* FROM d.docs AS d1
         WHERE ....
         ORDER BY .......
         OFFSET ...
         LIMIT .......) AS docs
FROM (SELECT userId, ARRAY_AGG({"id":META(b).id, b}) AS docs
      FROM mybucket AS b
      WHERE b.score IS NOT NULL
      GROUP BY b.userId) AS d;

@vsr1,

Sorry but I mean, multiple fields. I have something like this:

SELECT MAX(product.dateCreated) date,
       MAX([product.dateCreated, sku.color])[1] color,
       MAX([product.dateCreated, sku.size])[1] size,
       MAX([product.dateCreated, sku.price])[1] price,
       MAX([product.dateCreated, sku.media])[1] media,
       MAX([product.dateCreated, product.category])[1] category,
       MAX([product.dateCreated, product.description])[1] description
FROM  default product
JOIN default sku
ON KEYS product.skuId
WHERE product.docType = "PRODUCT"
AND product.vendorId IN (SELECT RAW vendorId FROM default WHERE docType = "FOLLOWER" AND customerId = "CUSTOMER::01")
GROUP BY product.vendorId 
ORDER BY MAX(product.dateCreated) DESC

I 'm wondering if there’s a better approach with my query…
I need to align the fields that matches the latest dateCreated row (same with OP’s concern)

I think this is directly related to the thread. Advise if I need to post this as new thread.

Thank you!

@nacion.emelson,

WITH vendorIds AS (SELECT RAW vendorId FROM default WHERE docType = "FOLLOWER" AND customerId = "CUSTOMER::01")
SELECT mx.*
FROM  default product
JOIN default sku ON KEYS product.skuId
WHERE product.docType = "PRODUCT"
AND product.vendorId IN vendorIds
GROUP BY product.vendorId
LETTING mx = MAX([product.dateCreated, {sku.color, sku.size, sku.price, sku.media, product.category, product.description, "date":product.dateCreated}])[1]
ORDER BY mx.date DESC

Pre 6.5 In-line the subquery

1 Like