Select non-group by fields using sql query

I am new to couchbase and I have been going through couchbase documents and other online resources for a while but I could’t get my query working. Below is the data structure and my query:

Table1:
{ “jobId” : “101”, “jobName” : “abcd”, “jobGroup” : “groupa”, “created” : " “2018-05-06T19:13:43.318Z”, “region” : “dev” },
{ “jobId” : “102”, “jobName” : “abcd2”, “jobGroup” : “groupa”, “created” : " “2018-05-06T22:13:43.318Z”, “region” : “dev” },
{ “jobId” : “103”, “jobName” : “abcd3”, “jobGroup” : “groupb”, “created” : " “2018-05-05T19:11:43.318Z”, “region” : “test” }

I need to get the jobId which has the latest job information (max on created timestamp) for a given jobGroup and region (group by jobGroup and region).

My sql query doesn’t help me using self-join on jobId.
Query:

/* Idea is to pull out the job which was executed latest for all possible groups and region and print the details of that particular job*/
select * from (select max(DATE_FORMAT_STR(j.created,‘1111-11-11T00:00:00+00:00’)) as latest, j.jobGroup, j.region from table1 j group by jobGroup, region) as viewtable join table t on keys meta(t).id where viewtable.latest in t.created and t.jobGroup = viewtable.jobGroup and viewtable.region = t.region
Error Result: No result displayed
Desired result :
{
“jobId” : “102”,
“jobName”:“abcd2”,
“jobGroup”:“groupa”,
“latest” :“2018-05-06T22:13:43.318Z”,
“region”:“dev”
},
{
“jobId” : “103”,
“jobName” : “abcd3”,
“jobGroup” : “groupb”,
“created” : " “2018-05-05T19:11:43.318Z”,
“region” : “test”
}

SELECT MAX([created,jobId])[1] AS jobId 
FROM default WHERE ..  GROUP BY jobGroup, region;
1 Like

To expand on @vsr1’s answer, here is the reply I posted on StackOverflow:

This can be answered using ‘group by’ and no join. I tried entering your sample data and the following query gives the correct result:

select max([created,d])[1] max_for_group_region 
 from default d
 group by jobGroup, region;

How does it work? It uses ‘group by’ to group documents by jobGroup and region, then creates a two-element array holding, for every document in the group:

  • the ‘created’ timestamp field
  • the document where the timestamp came from

It then applies the max function on the set of 2-element arrays. The max of a set of arrays looks for the maximum value in the first array position, and if there’s a tie look at the second position, and so on. In this case we are getting the two-element array with the max timestamp.

Now we have an array [ timestamp, document ], so we apply [1] to extract just the document.

@vsr1 @eben :
Thank you very much for the query and explanation, it worked :slight_smile:

If you want select whole document Check out JOIN query in above link for better performance avoid unnecessary fetch (i.e make covered query for document that qualified and fetch those documents).