Query to fetch items at certain interval between them (if it exist)

Hi couchbasia!

Was wonder about the plausibility of a query type, suppose the I have a (Array) document:
[{ timestamp: ““2020-11-10T18:00:03.437611Z””, values: {…}}, { timestamp: ““2020-11-10T18:02:03.437611Z””, values: {…}}, { timestamp: ““2020-11-10T18:04:04.437611Z””, values: {…}}, { timestamp: ““2020-11-10T18:10:03.437611Z””, values: {…}}…]

I want to query the document and fetch items which have a 5 minutes interval between them, so from the above document i am suppose to be left with only two (the 1st and last entries) after executing the query.
How might you go about it? any help would very appreciated

  • Not sure know if its the correct place to post

PS this is a solution as a view

function (doc, meta) {
var iv = doc.length > 0 ? new Date(doc[0].timestamp) : new Date();

for(var n in doc)
{
var e = doc[n];
var tm = new Date(e.timestamp);

if((iv.getTime() + 5 * 60 * 1000) < tm.getTime())
{
  iv = tm;
  emit(null, e);
}

}
}

thanks!

i did not used the view since i do not know how to pass parameters to it (using .net sdk)

Extract integer minutes from timestamp and use integer division / modulo
WHERE TONUMBER(DATE_PART_STR(timestamp, ‘minute’)) % 5 = 0

ARRAY  (FIRST   u1
        FOR u2 IN arr1 
        WHEN  ABS(MILLIS(u1.timestamp)-MILLIS(u2.timestamp)) > 5 *60*1000  
        END)
FOR u1 IN arr1 
END

It includes all documents that has 5 minute apart even there are multiple in same minute

what if i have a few in that same minute? its exactly what I am hopeful of learning, is there a way i could preserve some state that the where clause would rely on ?

this returns only the two first entries regardless of any other that might exist…
if the initial (Array) document would have been bigger, this query would have reflected it but with two u1\u2 entries with the same first 2 values detected but not incrementally

Try this it will work. There are two loops. Outer loop walks through all elements. If there is match with inner loop it included element and terminate inner loop for that element no need check other elements.

ARRAY  (FIRST   u1
        FOR u2 IN arr1 
        WHEN  ABS(MILLIS(u1.timestamp)-MILLIS(u2.timestamp)) > 5 *60*1000  
        END)
FOR u1 IN arr1 
END

Thank you for the fast reply(!)
was not exactly sure how you meant i should put it, did smth like this:

SELECT RAW x FROM [0] dd
JOIN (
SELECT RAW entry
FROM Data entries UNNEST entries entry
WHERE STR_TO_MILLIS(entry.timestamp) >= STR_TO_MILLIS(“2020-11-10T18:00:03.437611Z”)
AND STR_TO_MILLIS(entry.timestamp) < STR_TO_MILLIS(“2020-11-10T19:17:40.7401607Z”)
) arr1 ON TRUE
LET x = ARRAY (FIRST u1
FOR u2 IN arr1
WHEN ABS(MILLIS(u1.timestamp)-MILLIS(u2.timestamp)) > 5601000
END
) FOR u1 IN arr1 END

got an array of 364 nulls, wondering what am i missing
again, thanks for the help fellas, I appreciate it immensely

Your original post expected is wrong. because 00, 03, 04, 10 minutes. All 4 qualifies because first 3 are 5 min apart from last

Change last one 18:06:03 it will get right results

SELECT ARRAY  (FIRST   u1 FOR u2 IN arr1 WHEN  ABS(MILLIS(u1.timestamp)-MILLIS(u2.timestamp)) > 5 *60*1000  END) FOR u1 IN arr1 END AS res
LET arr1 = [{ "timestamp": "2020-11-10T18:00:03.437611Z"}, { "timestamp": "2020-11-10T18:02:03.437611Z"}, { "timestamp": "2020-11-10T18:04:04.437611Z"}, { "timestamp": "2020-11-10T18:06:03.437611Z"}];

 "results": [
    {
        "res": [
            {
                "timestamp": "2020-11-10T18:00:03.437611Z"
            },
            {
                "timestamp": "2020-11-10T18:06:03.437611Z"
            }
        ]
    }
    ]
SELECT
    ARRAY (FIRST  u1 FOR u2 IN d.entries
           WHEN u2.timestamp >= "2020-11-10T18:00:03.437611Z" AND u2.timestamp < "2020-11-10T19:17:40.7401607Z"
                AND ABS(MILLIS(u1.timestamp)-MILLIS(u2.timestamp)) > 5601000
           END)
    FOR u1 IN d.entries
    WHEN u1.timestamp >= "2020-11-10T18:00:03.437611Z" AND u1.timestamp < "2020-11-10T19:17:40.7401607Z"
    END AS res
FROM Data AS d
WHERE ANY v IN d.entries SATISFIES v.timestamp >= "2020-11-10T18:00:03.437611Z" AND v.timestamp < "2020-11-10T19:17:40.7401607Z" END;

CREATE INDEX ix1 ON Data( DISTINCT e.timestamp FOR e IN entries END);

Right, but I still think it is a typical statistics/analytics problem very close to building a histogram. Split your data into 5 minute bins and pick an item with max timestamp from each bin. It is not clear from your description how to treat gaps longer then 5 minutes though.

Hi @skaryshev,

If you need that you can try this. Grouping 0, 5th, 10th, 15th,… minute and projecting latest timestamp in the group

SELECT MAX([e.timestamp,e])[1].* 
FROM default AS e
WHERE ..............
GROUP BY IDIV(MILLIS(e.timestamp),300000)

Thank you, @vsr1, Your solution is way above my level of expertise in N1QL. I do not need it right now but looking for implementing some standard analytics tasks in Community Edition lacking Analytics module. :slight_smile:

Hi fellas, ended up going another way by mix and matching, thanks for everyone’s time!

FROM DATE_RANGE_STR(“2020-11-10T18:00:03.437611Z”, “2020-11-10T19:17:40.7401607Z”, ‘minute’, 5) time
LET
entries = (FROM TheData ens UNNEST ens e
WHERE META(ens).id LIKE “Entry_92c8546a-7564-45c1-8f60-e7a37987f580_3f30f81d-4b54-4693-bb3e-76969eee1e90_%”
AND MILLIS(e.timestamp) >= MILLIS(“2020-11-10T18:00:03.437611Z”)
AND MILLIS(e.timestamp) < MILLIS(“2020-11-10T19:17:40.7401607Z”)
SELECT RAW e)
, sel_ = FIRST v FOR i:v IN entries WHEN MILLIS(v.timestamp) > MILLIS(time) END
WHERE sel_ IS NOT NULL
SELECT RAW ARRAY_AGG(sel_)
LIMIT 1000

hopes this help somebody

  • i tried to join the subquery (TheData) but to no avail, hopefully this will just keep getting the data from a cache… will follow up
    again thanks for the time!