How to merge and aggregate nested arrays?


#1

I have a stream of data that will contain pieces of information updated over time:

{
“session”: “d5f08f4c-37d7-4fba-ab78-045844aa98de”,
“things”: [
{
“thingId”: “1”,
“time”: 1461343192396,
“value”: “something”
},
{
“thingId”: “2”,
“time”: 1461343192396,
“value”: “23”
}
]
},
{
“session”: “d5f08f4c-37d7-4fba-ab78-045844aa98de”,
“things”: [
{
“thingId”: “3”,
“time”: 1461343774121,
“value”: “foo”
},
{
“thingId”: “2”,
“time”: 1461343775672,
“value”: “bar”
}
]
}

I need to produce a single object per session that contains all of the most recent thingIds, something like:

{
“session”: “d5f08f4c-37d7-4fba-ab78-045844aa98de”,
“things”: [
{
“thingId”: “1”,
“time”: 1461343192396,
“value”: “something”
},
{
“thingId”: “2”,
“time”: 1461343192396,
“value”: “bar”
},
{
“thingId”: “3”,
“time”: 1461343774121,
“value”: “foo”
}
]
}

I’ve attempted to do this by unnesting and ordering the things, but I haven’t figured out how to re-nest them into an array with only the most recent of each thingId, and it seems that may not be the best approach.

What would be the best way to accomplish this?


#2

The input data is loaded into bucket b.

select * from b;

[ { "b": { "session": "d5f08f4c-37d7-4fba-ab78-045844aa98de", "things": [ { "thingId": "1", "time": 1461343192396, "value": "something" }, { "thingId": "2", "time": 1461343192396, "value": "23" } ] } }, { "b": { "session": "d5f08f4c-37d7-4fba-ab78-045844aa98de", "things": [ { "thingId": "3", "time": 1461343774121, "value": "foo" }, { "thingId": "2", "time": 1461343775672, "value": "bar" } ] } } ]

You can unnest and then create any kind of array, filter, etc…

select array_agg(mythings) from b unnest things as mythings;

[ { "$1": [ { "thingId": "1", "time": 1461343192396, "value": "something" }, { "thingId": "2", "time": 1461343192396, "value": "23" }, { "thingId": "2", "time": 1461343775672, "value": "bar" }, { "thingId": "3", "time": 1461343774121, "value": "foo" } ] } ]