Generate Distinct Count for Each Interval inside an array

Hi,

I’ve the following query:

SELECT RAW MILLIS_TO_STR(range, 'YYYY-MM-DDThh:mm:ss.sTZD')
FROM DATE_RANGE_MILLIS(
    STR_TO_MILLIS("2016-05-16T00:00:00Z"), 
    STR_TO_MILLIS("2016-05-17T00:01:00Z"), 
    'minute', 
    15
) as range;

This basically creates an array of timestamps starting from 00:00:00 until next day’s start. I’m trying to distinct count uid’s for every 15 minutes. Something like this:

[
    {"count": 15, "createdAt": "2017-12-21T00:00:00Z"},
    {"count": 23, "createdAt": "2017-12-21T00:15:00Z"},
    ...
]

I think this is possible using WHERE FOR t IN ARRAY ... but I couldn’t construct the exact query. Any ideas?

SELECT {"count": COUNT(1), "createdAt": MILLIS_TO_STR(range, 'YYYY-MM-DDThh:mm:ss.sTZD') }
FROM DATE_RANGE_MILLIS(
    STR_TO_MILLIS("2016-05-16T00:00:00Z"), 
    STR_TO_MILLIS("2016-05-17T00:01:00Z"), 
    'minute', 
    15
) as range
GROUP BY range;

Sorry, my mistake. This is the actual data:

[
    ...
   {
        "_class": "com.bar.FooBar",
        "createdAt": "2017-12-19T22:40:45.497Z",
        "userId": "abc123"
   },
   {
        "_class": "com.bar.FooBar",
        "createdAt": "2017-12-19T22:42:45.497Z",
        "userId": "abc123"
   },
   {
        "_class": "com.bar.FooBar",
        "createdAt": "2017-12-19T22:43:45.135Z",
        "userId": "bad22"
   },
   {
        "_class": "com.bar.FooBar",
        "createdAt": "2017-12-19T23:01:45.497Z",
        "userId": "asdasda"
   },
    ...
]

I’d like to aggregate these values by grouping every 15 minutes. So I’d like to achieve something like this:

[
   ...
   {
        "createdAt": "2017-12-19T22:45:00.000Z",
        "count": 2
   },
   {
        "createdAt": "2017-12-19T23:00:00.000Z",
        "count": 0
   },
   {
        "createdAt": "2017-12-19T23:15:00.000Z",
        "count": 1
   },
   ...
]

That’s why I was trying to generate this 15 minute intervals using below query:

SELECT RAW MILLIS_TO_STR(range, 'YYYY-MM-DDThh:mm:ss.sTZD')
FROM DATE_RANGE_MILLIS(
    STR_TO_MILLIS("2016-05-16T00:00:00Z"), 
    STR_TO_MILLIS("2016-05-17T00:01:00Z"), 
    'minute', 
    15
) as range;
SELECT time AS createdAt, COUNT(1) AS count FROM default d
LET time = CASE WHEN DATE_PART_STR(d.createdAt,'minute') < 15 THEN SUBSTR("2017-12-19T22:40:45.497Z",0,14)||"00:00Z"
                WHEN DATE_PART_STR(d.createdAt,'minute') < 30 THEN SUBSTR("2017-12-19T22:40:45.497Z",0,14)||"15:00Z"
                WHEN DATE_PART_STR(d.createdAt,'minute') < 45 THEN SUBSTR("2017-12-19T22:40:45.497Z",0,14)||"30:00Z"
                ELSE  SUBSTR(d.createdAt,0,14)||"45:00Z" END
GROUP BY time;

If you need all time slots

SELECT time AS createdAt,
SUM(CASE WHEN DATE_DIFF_MILLIS(MILLIS(d.createdAt),MILLIS(time),'millisecond') BETWEEN 0 AND 899999 THEN 1 ELSE 0 END) AS count
FROM default AS d UNNEST DATE_RANGE_STR("2016-05-16T00:00:00Z", "2016-05-17T00:00:00Z", 'minute', 15) AS time
GROUP BY time;
1 Like

Hi @vsr1,

Thank you for your support! I still need a minor tweak. Is there a way to distinctively count userId’s within this query? Like you said, I need all the time slots.

Hi @vsr1,

I found a very similar question on the forum which you’ve replied. As a result, I came up with the following query which seems to work:

SELECT time AS createdAt,
COUNT(DISTINCT (CASE WHEN DATE_DIFF_MILLIS(MILLIS(d.createdAt), MILLIS(time),'millisecond') BETWEEN 0 AND 899999 THEN a.userId ELSE NULL END)) AS count
FROM default AS d UNNEST DATE_RANGE_STR("2017-12-21T00:00:00Z", "2017-12-22T00:00:01Z", 'minute', 15) AS time
WHERE b.createdAt between "2017-12-21" and "2017-12-22"
GROUP BY time
ORDER BY time;

Thank you very much!

Hi @vsr1,

The final query is like below:

SELECT time AS createdAt,
    COUNT(DISTINCT (CASE WHEN DATE_DIFF_MILLIS(MILLIS(d.createdAt), MILLIS(time), 'millisecond') BETWEEN 0 AND 899999 THEN d.userId ELSE NULL END)) AS uniqueUsers,
    SUM(CASE WHEN DATE_DIFF_MILLIS(MILLIS(d.createdAt), MILLIS(time), 'millisecond') BETWEEN 0 AND 899999 THEN 1 ELSE 0 END) AS sessionCount
FROM default AS d
UNNEST DATE_RANGE_STR("2017-11-10T00:00:00Z", "2017-11-11T00:00:01Z", 'minute', 15) AS time
WHERE d._class = "com.foo.FooBar"
    and d.application = "foo"
    and d.userType = "user"
    and d.createdAt between "2017-11-10" and "2017-11-11"
GROUP BY time
ORDER BY time;

However while sessionCount is correct, activeUsers is always zero. Any ideas?

there is no activeUsers in projection. I don’t see any issue. May be break query and try.
Each group try ARRAY_AGG(…) and see what values going into aggregates and debug.

make sure userId is correct case sensitive

1 Like