Filtering out the arrays depending on each other in the same result set

Hi, one of my queries have a result like this. I need to further process the result set by subquerying this result’s query.
Second item in the events list depends on the first events item.
If an item inside second events array does not have a correspondence with same device id and same day in the first events object, we need to filter it out. So the final result set must be sth like the second result set below. How I can do it? Maybe you can advice me to construct a different result set before this operation… My simple goal is to check if a device executed the second step in the same day of the execution of the first step of this device.

[
  {
    "events": [
      {
        "createdAt": "2017-02-08T08:53:19.879Z",
        "deviceId": "00000000-555a-4ce5-ffff-ffffaa4fac17",
        "stepNo": 0
      },
      {
        "createdAt": "2017-02-09T08:09:16.597Z",
        "deviceId": "00000000-555a-4ce5-ffff-ffffaa4fac17",
        "stepNo": 0
      }
    ],
    "stepNo": 0
  },
  {
    "events": [
      {
        "createdAt": "2017-02-08T08:36:49.404Z",
        "deviceId": "00000000-555a-4ce5-ffff-ffffaa4fac17",
        "stepNo": 1
      },
      {
        "createdAt": "2017-03-21T10:48:47.769Z",
        "deviceId": "00000000-555a-4ce5-ffff-ffffaa4fac17",
        "stepNo": 1
      },
      {
        "createdAt": "2017-03-21T12:01:31.638Z",
        "deviceId": "00000000-555a-4ce5-ffff-ffffe6e46228",
        "stepNo": 1
      },
      {
        "createdAt": "2017-03-21T12:34:15.345Z",
        "deviceId": "00000000-555a-4ce5-ffff-ffffe6e46228",
        "stepNo": 1
      }
    ],
    "stepNo": 1
  }
]

Final output:

[
  {
    "events": [
      {
        "createdAt": "2017-02-08T08:53:19.879Z",
        "deviceId": "00000000-555a-4ce5-ffff-ffffaa4fac17",
        "stepNo": 0
      },
      {
        "createdAt": "2017-02-09T08:09:16.597Z",
        "deviceId": "00000000-555a-4ce5-ffff-ffffaa4fac17",
        "stepNo": 0
      }
    ],
    "stepNo": 0
  },
  {
    "events": [
      {
        "createdAt": "2017-02-08T08:36:49.404Z",
        "deviceId": "00000000-555a-4ce5-ffff-ffffaa4fac17",
        "stepNo": 1
      }
    ],
    "stepNo": 1
  }
]

Is this extension of Partition the result set and create multiple arrays

In fact yes. This is the next step of the operation. Basically, I have millions of events and I am trying to filter out the events that is out of order and occurred outside of a time window.

Try this . Even you can add predicate in where clause.
Ex. d.stepNo IS NOT NULL AND d.deviceId = “…” AND d.createdAt BETWEEN …

SELECT d.stepNo, ARRAY_AGG({d.setpNo, d.createdAt, d.deviceId}) AS events FROM  default d
                               WHERE d.stepNo IS NOT NULL
                               GROUP by d.stepNo, d.deviceId, DATE_FORMAT_STR(d.createdAt, '1111-11-11');

I tried this, it did not output what i expected. What OBJECT "StepNo_"||TOSTRING(v[0].stepNo):v FOR v IN exactly does?

And let me further explain the problem: This steps are totally dynamic and we may have 10s of steps each depends on the first step’s execution time (createdAt value) and existence of the previous step. Each step must occur in the same day of the first step and no step can be skipped. If skipped, we need to filter this item out.

Is it possible post the query that generated input because doing in that query might be easier.

Sure, the query that outputs the first result set is this:

select "step" || TOSTRING(x.stepNo) as stepNo, ARRAY_AGG(x) as arr from (
select   b.deviceId, b.eventName,
ARRAY_POSITION(
["X", "Y", "Z"], b.eventName) as stepNo
from appconnect b
where b.appId= "fgb" and b._class = EventEntity" and b.eventName in (["X", "Y", "Z"] ) ) as x 
group by x.stepNo)

From events array you can construct new events array as follows.

   ARRAY v FOR v IN events WHEN events[0].deviceId  == v.deviceId AND
                                        SUBSTR(events[0].createdAt, 0, 10) == SUBSTR(v.createdAt, 0, 10) END

OR try this query 

Updated query from post 10. This may not give desired results

    SELECT "step" || TOSTRING(stepNo) AS stepNo,
       ARRAY v FOR v IN events WHEN events[0].deviceId  == v.deviceId AND
             SUBSTR(events[0].createdAt, 0, 14) == SUBSTR(v.createdAt, 0, 14) END AS myevents
FROM appconnect AS b
LET stepNo = ARRAY_POSITION(["X", "Y", "Z"], b.eventName)
WHERE b.appId= "fgb" AND b._class = "EventEntity" and b.eventName in ["X", "Y", "Z"]
GROUP BY stepNo
LETTING events = ARRAY_AGG({b.deviceId, b.createdAt, stepNo})

What version of Couchbase you are using? Do you want result for given deviceId and given createdAt date or all of them.

For all of the devices. Here is my working query. I could not figure out why the result is not is correct but possibly since I could not order the result set by stepNo. But I could not figure out where to put it without doing a subquery. Doing a subquery just for ordering does not sound ok to me.

SELECT "step" || TOSTRING(stepNo) AS stepNo,
       ARRAY v FOR v IN events WHEN events[0].deviceId  == v.deviceId AND
             SUBSTR(events[0].createdAt, 0, 14) == SUBSTR(v.createdAt, 0, 14) END AS myevents
FROM bucketname AS b
LET stepNo = ARRAY_POSITION(["X", "Y", "Z"], b.eventName)
WHERE b.appId= "fgb" AND b._class = "EventEntity" and b.eventName in ["X", "Y", "Z"]
GROUP BY stepNo 
LETTING events = ARRAY_AGG({b.deviceId, b.createdAt, stepNo})

What is couchbase version you are using?

It is 4.6.2-3905 Enterprise Edition (build-3905)

Is output needs to be sorted with in given date?

No, date order is not required. I suppose we need ordering of stepNo since we assume the items with stepNo 0 lays in 0th index here:
SUBSTR(events[0].createdAt, 0, 14)

Ant the hardest part is, if a device has no item at step 2 in given day, step 3 item of this device should be filtered out.

INSERT INTO default
    VALUES("a001",{ "createdAt": "2017-02-08T08:53:19.879Z", "deviceId": "00000000-555a-4ce5-ffff-ffffaa4fac17", "eventName": "X" }),
    VALUES("a002",{ "createdAt": "2017-02-09T08:09:16.597Z", "deviceId": "00000000-555a-4ce5-ffff-ffffaa4fac17", "eventName": "X" }),
    VALUES("a003",{ "createdAt": "2017-02-08T08:36:49.404Z", "deviceId": "00000000-555a-4ce5-ffff-ffffaa4fac17", "eventName": "Y" }),
    VALUES("a004",{ "createdAt": "2017-03-21T10:48:47.769Z", "deviceId": "00000000-555a-4ce5-ffff-ffffaa4fac17", "eventName": "Y" }),
    VALUES("a005",{ "createdAt": "2017-03-21T12:01:31.638Z", "deviceId": "00000000-555a-4ce5-ffff-ffffe6e46228", "eventName": "Y" }),
    VALUES("a006",{ "createdAt": "2017-03-21T12:34:15.345Z", "deviceId": "00000000-555a-4ce5-ffff-ffffe6e46228", "eventName": "Y" });


SELECT d.deviceId, ARRAY {v[0].stepNo, "events": v} FOR v IN ns WHEN ARRAY_LENGTH(v) > 0 AND (v[0].stepNo == 0 OR  v[0].stepNo == ns[v[0].stepNo-1][0].stepNo +1 ) END AS steps
FROM (
      SELECT d.deviceId, (SELECT RAW step FROM d.steps AS step ORDER BY step.stepNo) AS steps
      FROM (
           SELECT d.deviceId, ARRAY_AGG({d.events, d.stepNo}) AS steps
           FROM (
                 SELECT d.deviceId, d.stepNo, (SELECT RAW event FROM d.events AS event ORDER BY event.createdAt) AS events
                 FROM (
                       SELECT b.deviceId, stepNo, ARRAY_AGG({b.deviceId, b.createdAt, stepNo}) AS events
                       FROM default AS b
                       LET stepNo = ARRAY_POSITION(["X", "Y", "Z"], b.eventName)
                       WHERE stepNo >= 0
                       GROUP BY b.deviceId, stepNo ) AS d
                ) AS d
          GROUP BY d.deviceId) AS d) AS d
LET ns = ARRAY (ARRAY e FOR e IN s.events WHEN e.stepNo == 0  OR SUBSTR(e.createdAt, 0, 10) == SUBSTR(d.steps[0].events[0].createdAt, 0, 10) END) FOR s IN d.steps END;

Above query works 4.6.2 or above because it uses subquery expressions. steps are sorted by stepNo if previous stepNo is not present it removed and events inside step are sorted by createdAt.

Each deviceId it gives separate row. when no qualified steps it gives empty array of that device.

Also check why SUBSTR(createdAt, 0, 14) needed (because it matches date+hour) Is it needs SUBSTR(e.createdAt, 0, 10) for just date

Thank you for your great answer. The only missing part is my first question.
The items in step 1 should match any item but not only the first item in step 0. That means, for below values, a003 should also exist in the query because 2017-04-09T08: exist in step 0.

Can we check all items instead of d.steps[0].events[0].createdAt.

I have changed the query according to my requirements as below.

The only missing part is that: I want to truncate createdAt dates to minutes. But when I use

ARRAY_AGG({b.deviceId, stepNo, DATE_TRUNC_MILLIS(millis(b.createdAt), 'minute')})

instead of
ARRAY_AGG({b.deviceId, stepNo, b.createdAt})

I get an error

Object member missing name or value: date_trunc_millis(str_to_millis((b.createdAt)), "minute") - at }

Can’t we use functions in ARRAY_AGG?

SELECT d.deviceId, ARRAY {v[0].stepNo, "events": v} FOR v IN ns WHEN ARRAY_LENGTH(v) > 0 AND (v[0].stepNo == 0 OR  v[0].stepNo == ns[v[0].stepNo-1][0].stepNo +1 ) END AS steps
    FROM (
          SELECT d.deviceId, (SELECT RAW step FROM d.steps AS step ORDER BY step.stepNo) AS steps
          FROM (
               SELECT d.deviceId, ARRAY_AGG({d.events, d.stepNo}) AS steps
               FROM (
                     SELECT d.deviceId, d.stepNo, (SELECT RAW event FROM d.events AS event) AS events
                     FROM (
                           SELECT b.deviceId, stepNo, ARRAY_AGG({b.deviceId, stepNo, b.createdAt}) AS events
                           FROM appconnect AS b
                           LET stepNo = ARRAY_POSITION(["X", "Y"], b.eventName)
                           WHERE stepNo >= 0 and _class = "EventEntity" and eventName in ["X", "Y"]
                           GROUP BY b.deviceId, stepNo  ) AS d
                    ) AS d
              GROUP BY d.deviceId) AS d) AS d
    LET ns = ARRAY (ARRAY e FOR e IN s.events 
    WHEN e.stepNo == 0
     OR 
     (ANY f IN d.steps[0].events satisfies DATE_DIFF_MILLIS(MILLIS(e.createdAt), MILLIS(f.createdAt), "second") < 500 AND DATE_DIFF_MILLIS(MILLIS(e.createdAt), MILLIS(f.createdAt), "second") > 0 END  ) END) FOR s IN d.steps END;

Please check your data.

INSERT INTO default VALUES("a001",{ "createdAt": "2017-02-08T08:53:19.879Z", "deviceId": "00000000-555a-4ce5-ffff-ffffaa4fac17", "eventName": "X" }), 
VALUES("a002",{ "createdAt": "2017-04-09T08:09:16.597Z", "deviceId": "00000000-555a-4ce5-ffff-ffffaa4fac17", "eventName": "X" }),
 VALUES("a003",{ "createdAt": "2017-04-08T08:36:49.404Z", "deviceId": "00000000-555a-4ce5-ffff-ffffaa4fac17", "eventName": "Y" }), 
VALUES("a004",{ "createdAt": "2017-03-21T10:48:47.769Z", "deviceId": "00000000-555a-4ce5-ffff-ffffaa4fac17", "eventName": "Y" }),
 VALUES("a005",{ "createdAt": "2017-03-21T12:01:31.638Z", "deviceId": "00000000-555a-4ce5-ffff-ffffe6e46228", "eventName": "Y" }), 
VALUES("a006",{ "createdAt": "2017-03-21T12:34:15.345Z", "deviceId": "00000000-555a-4ce5-ffff-ffffe6e46228", "eventName": "Y" });

“a003” as “createdAt”: “2017-04-08T08:36:49.404Z” and not present in step0 step 0 has “2017-02-08”, “2017-04-09”

If need to all dates use this.

SELECT d.deviceId, ARRAY {v[0].stepNo, "events": v} FOR v IN ns WHEN ARRAY_LENGTH(v) > 0 AND (v[0].stepNo == 0 OR  v[0].stepNo == ns[v[0].stepNo-1][0].stepNo +1 ) END AS steps
FROM (
      SELECT d.deviceId, (SELECT RAW step FROM d.steps AS step ORDER BY step.stepNo) AS steps
      FROM (
           SELECT d.deviceId, ARRAY_AGG({d.events, d.stepNo}) AS steps
           FROM (
                 SELECT d.deviceId, d.stepNo, (SELECT RAW event FROM d.events AS event ORDER BY event.createdAt) AS events
                 FROM (
                       SELECT b.deviceId, stepNo, ARRAY_AGG({b.deviceId, b.createdAt, stepNo}) AS events
                       FROM default AS b
                       LET stepNo = ARRAY_POSITION(["X", "Y", "Z"], b.eventName)
                       WHERE stepNo >= 0
                       GROUP BY b.deviceId, stepNo ) AS d
                ) AS d
          GROUP BY d.deviceId) AS d) AS d
LET ns = ARRAY (ARRAY e FOR e IN s.events WHEN e.stepNo == 0  OR SUBSTR(e.createdAt, 0, 10) IN (ARRAY SUBSTR(dt.createdAt, 0, 10) FOR dt IN d.steps[0].events END) END ) FOR s IN d.steps END;

In object construction explicit name required for expressions.

Use ARRAY_AGG({b.deviceId, stepNo, “createdAtMillis”:DATE_TRUNC_MILLIS(millis(b.createdAt), ‘minute’)})

Thanks so much. I really really thank for your great help.