ORDER BY in ARRAY_AGG()

Hello,
In postgreSQL, I can ORDER BY in ARRAY_AGG() like so:

SELECT r.participants, ARRAY_AGG(distinct ev ORDER BY ev.created_at) as events FROM rooms r use keys '1234' UNNEST r.events ev WHERE ev.type NOT IN ['fourth_event'] GROUP BY r.participants

In N1QL, this results in a syntax error. When I run the following query on my document without ORDER BY, the events come out in a different order than they appear in the document:

SELECT r.participants, ARRAY_AGG(distinct ev) as events FROM rooms r use keys '1234' UNNEST r.events ev WHERE ev.type NOT IN ['fourth_event'] GROUP BY r.participants

Document:

{ "id": "1234", "participants": { "ec95ad83-1ce9-4ed8-b35b-16a08ccd1d5b": { "name": "Test Visitor", "avatar_url": "", "id": "ec95ad83-1ce9-4ed8-b35b-16a08ccd1d5b" } }, "events": [ { "type": "event_one", "member_id": "cef3409b-022f-4522-8a5c-03eb0f159a57", "created_at": "2016-07-05T23:30:32.000Z" }, { "type": "event_two", "member_id": "cef3409b-022f-4522-8a5c-03eb0f159a57", "created_at": "2016-07-05T23:06:41.667Z" }, { "type": "event_three", "member_id": "cef3409b-022f-4522-8a5c-03eb0f159a57", "created_at": "2016-07-05T23:06:41.667Z" }, { "type": "event_four", "member_id": "cef3409b-022f-4522-8a5c-03eb0f159a57", "created_at": "2016-07-05T23:06:43.000Z" }, { "type": "event_five", "member_id": "6bdf7d15-ed3b-4ff7-a12b-c6765ee427bb", "created_at": "2016-07-05T23:06:43.000Z" }, { "type": "event_six", "member_id": "cbf19485-73c7-4e9a-88e5-c4ae31fd94ef", "created_at": "2016-07-05T23:06:43.000Z" } ] }

How can I replicate the ORDER BY behavior in N1QL to get array members to come out in the order they exist in the document when I’m using UNNEST?

In this case, ARRAY_SORT will do the job since you’re sorting by created_date, which is the lowest collocation value in that object.

SELECT
r.participants,
ARRAY_SORT(ARRAY_AGG(distinct ev) )as events
FROM
rooms r
use keys '1234'
UNNEST
r.events ev
WHERE
ev.type NOT IN ['fourth_event']
GROUP BY r.participants

keshav_m,
thanks for the response. Is there some documentation on collocation values?

I tried searching the following with no luck:

collocation value couchbase
n1ql collocation value couchbase
n1ql collocation value
site:couchbase.com collocation value

with no luck.

Thanks again

Hi @dave.koston, what was the result for the query with ARRAY_SORT()?

Hi keshav_m, ARRAY_SORT() works well for me with the sample data above. Thank you.

I’m assuming that’s because created_at is alphabetically the first key in the JSON object. I’m hoping to learn a bit more about collocation values as that sample query is just one of the use cases I have.

Hi @dave.koston, N1QL collation is raw byte collation of UTF8 encoded strings.
String comparison is done using a raw byte collation of UTF8 encoded strings (sometimes referred to as binary, C, or memcmp). This collation is case sensitive. Case insensitive comparisons can be performed using UPPER() or LOWER() functions.

Awesome, thanks for the explanation and your help

Hi! I have the exact same issue as Dave, except I can’t use created date because the user can sort the items. Is there any way to keep the sort order based on the array it’s aggregating? My data:

"cards": [
      "fae067ec-1c77-4d3d-b944-bf1f23d5f196",
      "14e87f77-5a6d-421a-9245-25afd9fa4870",
      "fdf85566-c6a0-47fb-9d10-e7f44d756ab2",
      "27b47d65-4ce4-42d7-a49b-5e9693e682d9",
      "778b8803-5719-43a8-ad29-038e9df32756"
],

And my query response when using ARRAY_AGG after LEFT JOIN-ing the objects:
“cards”: [
{
"_id": “14e87f77-5a6d-421a-9245-25afd9fa4870”,
“title”: “test2”
},
{
"_id": “27b47d65-4ce4-42d7-a49b-5e9693e682d9”,
“title”: “test4”
},
{
"_id": “778b8803-5719-43a8-ad29-038e9df32756”,
“title”: “test5”
},
{
"_id": “fae067ec-1c77-4d3d-b944-bf1f23d5f196”,
“title”: “test1”
},
{
"_id": “fdf85566-c6a0-47fb-9d10-e7f44d756ab2”,
“title”: “test3”
}
]

Any help would be appreciated, been banging my head on this for a while. I can provide the full query too, but it’s pretty similar to Dave’s (I just have a LEFT JOIN on the IDs).
LEFT JOIN default ON KEYS ARRAY “card:” || cardID FOR cardID IN k.cards END

@zeestorm: Yes. Please give your Couchbase version, full query and its explain.

Thanks for the quick response, the details are below:
Also, worst case scenario, I can pass along kcolumns.cards (the original array) and then just have golang order them if it’s too difficult. Thanks again!

CB 4.6.2

Query:

SELECT s._id, s.name, s.share, s.members, s.created_at, s.updated_at, ARRAY_AGG(s.cols) AS columns
FROM (
  SELECT k._id, k.name, k.share, k.members, k.created_at, k.updated_at,
  { kcolumns._id, kcolumns.name, "cards": IFNULL(ARRAY_AGG({c._id, c.title, c.assignees, c.color, "comments": ARRAY_LENGTH(c.comments)}), []) } AS cols
  FROM `timelinr` k USE KEYS "board:13c76911-a346-4e2d-945f-cc58e91f3dae"
  LEFT UNNEST k.columns kcolumns
  LEFT JOIN timelinr c ON KEYS ARRAY "card:" || cardID FOR cardID IN kcolumns.cards END
  GROUP BY k._id, k.name, k.share, k.members, k.created_at, k.updated_at, kcolumns
) AS s
GROUP BY s._id, s.name, s.share, s.members, s.created_at, s.updated_at

Explain (I have no written indexes for this query yet):

[
{
"plan": {
  "#operator": "Sequence",
  "~children": [
    {
      "#operator": "KeyScan",
      "keys": "\"board:13c76911-a346-4e2d-945f-cc58e91f3dae\""
    },
    {
      "#operator": "Fetch",
      "as": "k",
      "keyspace": "timelinr",
      "namespace": "default"
    },
    {
      "#operator": "Parallel",
      "maxParallelism": 1,
      "~child": {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "Unnest",
            "as": "kcolumns",
            "expr": "(`k`.`columns`)",
            "outer": true
          }
        ]
      }
    },
    {
      "#operator": "Join",
      "as": "c",
      "keyspace": "timelinr",
      "namespace": "default",
      "on_keys": "array (\"card:\" || `cardID`) for `cardID` in (`kcolumns`.`cards`) end",
      "outer": true
    },
    {
      "#operator": "Parallel",
      "maxParallelism": 1,
      "~child": {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "InitialGroup",
            "aggregates": [
              "array_agg({\"_id\": (`c`.`_id`), \"assignees\": (`c`.`assignees`), \"color\": (`c`.`color`), \"comments\": array_length((`c`.`comments`)), \"title\": (`c`.`title`)})"
            ],
            "group_keys": [
              "(`k`.`_id`)",
              "(`k`.`name`)",
              "(`k`.`share`)",
              "(`k`.`members`)",
              "(`k`.`created_at`)",
              "(`k`.`updated_at`)",
              "`kcolumns`"
            ]
          }
        ]
      }
    },
    {
      "#operator": "IntermediateGroup",
      "aggregates": [
        "array_agg({\"_id\": (`c`.`_id`), \"assignees\": (`c`.`assignees`), \"color\": (`c`.`color`), \"comments\": array_length((`c`.`comments`)), \"title\": (`c`.`title`)})"
      ],
      "group_keys": [
        "(`k`.`_id`)",
        "(`k`.`name`)",
        "(`k`.`share`)",
        "(`k`.`members`)",
        "(`k`.`created_at`)",
        "(`k`.`updated_at`)",
        "`kcolumns`"
      ]
    },
    {
      "#operator": "FinalGroup",
      "aggregates": [
        "array_agg({\"_id\": (`c`.`_id`), \"assignees\": (`c`.`assignees`), \"color\": (`c`.`color`), \"comments\": array_length((`c`.`comments`)), \"title\": (`c`.`title`)})"
      ],
      "group_keys": [
        "(`k`.`_id`)",
        "(`k`.`name`)",
        "(`k`.`share`)",
        "(`k`.`members`)",
        "(`k`.`created_at`)",
        "(`k`.`updated_at`)",
        "`kcolumns`"
      ]
    },
    {
      "#operator": "Parallel",
      "maxParallelism": 1,
      "~child": {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "InitialProject",
            "result_terms": [
              {
                "expr": "(`k`.`_id`)"
              },
              {
                "expr": "(`k`.`name`)"
              },
              {
                "expr": "(`k`.`share`)"
              },
              {
                "expr": "(`k`.`members`)"
              },
              {
                "expr": "(`k`.`created_at`)"
              },
              {
                "expr": "(`k`.`updated_at`)"
              },
              {
                "as": "cols",
                "expr": "{\"_id\": (`kcolumns`.`_id`), \"cards\": ifnull(array_agg({\"_id\": (`c`.`_id`), \"assignees\": (`c`.`assignees`), \"color\": (`c`.`color`), \"comments\": array_length((`c`.`comments`)), \"title\": (`c`.`title`)}), []), \"name\": (`kcolumns`.`name`)}"
              }
            ]
          },
          {
            "#operator": "FinalProject"
          }
        ]
      }
    }
  ]
},
"text": "SELECT k._id, k.name, k.share, k.members, k.created_at, k.updated_at,\n  { kcolumns._id, kcolumns.name, \"cards\": IFNULL(ARRAY_AGG({c._id, c.title, c.assignees, c.color, \"comments\": ARRAY_LENGTH(c.comments)}), []) } AS cols\n  FROM `timelinr` k USE KEYS \"board:13c76911-a346-4e2d-945f-cc58e91f3dae\"\n  LEFT UNNEST k.columns kcolumns\n  LEFT JOIN timelinr c ON KEYS ARRAY \"card:\" || cardID FOR cardID IN kcolumns.cards END\n  GROUP BY k._id, k.name, k.share, k.members, k.created_at, k.updated_at, kcolumns"
}
]

Also the data:

{
  "_id": "13c76911-a346-4e2d-945f-cc58e91f3dae",
  "columns": [
    {
      "_id": "9180235b-d3c7-4893-b635-f80aa08c5035",
      "cards": [
        "fae067ec-1c77-4d3d-b944-bf1f23d5f196",
        "14e87f77-5a6d-421a-9245-25afd9fa4870",
        "fdf85566-c6a0-47fb-9d10-e7f44d756ab2",
        "27b47d65-4ce4-42d7-a49b-5e9693e682d9",
        "778b8803-5719-43a8-ad29-038e9df32756"
      ],
      "name": "Development"
    }
  ],
  "created_at": "2017-08-12T01:28:03.643Z",
  "created_by": "04c35de6-b2b8-4a75-933a-c36dfcb08b1d",
  "deleted_at": null,
  "members": [],
  "name": "Testing",
  "owner": "team:0cfa5e64-5efb-43ae-a74a-c889717770a7",
  "share": false,
  "type": "board",
  "updated_at": "2017-08-22T19:57:08.632Z",
  "updated_by": "04c35de6-b2b8-4a75-933a-c36dfcb08b1d"
}

And one of the cards looks like:

{
  "_id": "fae067ec-1c77-4d3d-b944-bf1f23d5f196",
  "type": "card",
  "title": "test",
  "color": "#2F3259",
  "description": null,
  "boardID": "13c76911-a346-4e2d-945f-cc58e91f3dae",
  "created_at": "2017-08-22T19:57:08.632Z",
  "created_by": "04c35de6-b2b8-4a75-933a-c36dfcb08b1d",
  "updated_at": "2017-08-22T19:57:08.632Z",
  "updated_by": "04c35de6-b2b8-4a75-933a-c36dfcb08b1d",
}

Only way Query engine guarantee certain order is through ORDER BY because it can do parallel processing.
But we can do your desired order by reconstructing array.

   SELECT s._id, s.name, s.share, s.members, s.created_at, s.updated_at, ARRAY_AGG(s.cols) AS columns
    FROM (
      SELECT k._id, k.name, k.share, k.members, k.created_at, k.updated_at,
      { kcolumns._id, kcolumns.name,
        "cards": IFNULL(ARRAY (FIRST cv FOR cv IN cards WHEN cv._id = v END ) FOR v IN kcolumns.cards END, []) } AS cols
      FROM `timelinr ` k USE KEYS "board:13c76911-a346-4e2d-945f-cc58e91f3dae"
      LEFT UNNEST k.columns kcolumns
      LEFT JOIN `timelinr` c ON KEYS ARRAY "card:" || cardID FOR cardID IN kcolumns.cards END
      GROUP BY k._id, k.name, k.share, k.members, k.created_at, k.updated_at, kcolumns
      LETTING cards = ARRAY_AGG({c._id, c.title, c.assignees, c.color, "comments": ARRAY_LENGTH(c.comments)})
    ) AS s
    GROUP BY s._id, s.name, s.share, s.members, s.created_at, s.updated_at;
1 Like

Amazing! Worked perfectly, thank you so much! The “letting” piece is really cool, did not know you could do that.

You can do LET between FROM and WHERE
LETTING after GROUP BY.
Easy to read and some cases avoid multiple computations.
No chaining (i.e if you defined multiple LET variable can’t use previous LET variable in following LET). Same for LETTING.

1 Like

Hi! I know it’s been several months, and I’ve expanded on this numerous times with great success, however I am having a difficult time with trying to display the data in a different way.
It’s the same data sets, same version (4.5.1)… but my goal is to just create a list of “boards” sorted by the most recent updated card (I don’t need any card details).
I came up with a query that works, but it duplicates all the records (since it’s by card), and using DISTINCT seems to lose the order.

SELECT b._id, b.name FROM timelinr c
JOIN timelinr b ON KEYS "board:" || c.boardID
WHERE c.type = "card"
AND b.owner = "team:f51e4695-479d-4ea6-beab-425ad9484ee9"
AND b.deleted_at IS NULL
AND c.deleted_at IS NULL
ORDER BY c.updated_at DESC

Sample results:

[{
  "_id": "test2",
  "name": "Test2"
},
{
  "_id": "test1",
  "name": "Test1"
},
{
  "_id": "test1",
  "name": "Test1"
},
...]

In this sample data set, a card under Test2 board was updated last, so it shows up first, and then 2 cards under Test1 board was updated. I’m trying to group them together to just show Test2, Test1, (then whatever unique board follows next).
My query above is working, but I have to unique the records after my results are returned. Is there a way to preserve order by grouping/distinct, but without including the “updated_at” date? GROUP BY requires it, since I’m ordering with it.

Any help would be very appreciated. Thank you!

In 4.5.1 try this, with max_parallelism=1

SELECT DISTINCT b._id, b.name FROM timelinr c
JOIN timelinr b ON KEYS “board:” || c.boardID
WHERE c.type = "card"
AND b.owner = "team:f51e4695-479d-4ea6-beab-425ad9484ee9"
AND b.deleted_at IS NULL
AND c.deleted_at IS NULL
ORDER BY c.updated_at DESC

Scouring the golang SDK (gocb) and CB documentation for max_parallelism, this doesn’t appear to be an option that can be set via Go? It looks like other languages (such as Java) have options to do so. Is this missing from the Go implementation, or maybe just the documentation?

Thanks for the help. I used a variation of what you wrote, since I couldn’t figure out how to turn on max_parallelism with the golang library, which has been working since December:

SELECT DISTINCT b._id, b.name FROM default b
LEFT JOIN timelinr c ON KEY "board:" || c.boardID FOR b
WHERE b.type = "board" 
AND b.deleted_at IS NULL 
AND b.owner = "team:f51e4695-479d-4ea6-beab-425ad9484ee9"
AND c.type = "card"
AND c.deleted_at IS NULL
ORDER BY -millis(c.updated_at)

Indexes:

CREATE INDEX `idx_card_board_join_2` ON `default`(`owner`) WHERE ((`type` = "board") and (`deleted_at` is null))
CREATE INDEX `idx_card_board_join_1` ON `default`(("board:" || `boardID`),(-str_to_millis(`updated_at`))) WHERE (((`type` = "card") and (`deleted_at` is null))

Everything was working until we upgraded to 5.0.1 (from 4.5.1). Now this same query with same indexes is causing errors:

{
  "code": 5001,
  "msg": "Panic: runtime error: invalid memory address or nil pointer dereference"
}

I’ve tried changing the index, adding fields, removing fields. I have a < 5% success rate (so there have been times where it actually works). I tried searching the error with little results. Is there something that triggers this error?

Check query.log and there might be stack trace post that.
Also post the EXPLAIN output

Not seeing anything under the Logs section. Looked to see if there was a logs specific to queries, but couldn’t find anything.

Explain:

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan2",
            "index": "idx_card_board_join_2",
            "index_id": "b1e5c2c13e15b6ee",
            "index_projection": {
              "primary_key": true
            },
            "keyspace": "default",
            "namespace": "default",
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "high": "\"team:f51e4695-479d-4ea6-beab-425ad9484ee9\"",
                    "inclusion": 3,
                    "low": "\"team:f51e4695-479d-4ea6-beab-425ad9484ee9\""
                  }
                ]
              }
            ],
            "using": "gsi"
          },
          {
            "#operator": "Fetch",
            "as": "b",
            "keyspace": "default",
            "namespace": "default"
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "IndexJoin",
                  "as": "c",
                  "for": "b",
                  "keyspace": "default",
                  "namespace": "default",
                  "on_key": "cover ((\"board:\" || (`c`.`boardID`)))",
                  "outer": true,
                  "scan": {
                    "covers": [
                      "cover ((\"board:\" || (`c`.`boardID`)))",
                      "cover ((-str_to_millis((`c`.`updated_at`))))",
                      "cover ((meta(`c`).`id`))"
                    ],
                    "filter_covers": {
                      "cover (((`c`.`boardID`) is not missing))": true,
                      "cover (((`c`.`deleted_at`) is null))": true,
                      "cover ((`c`.`deleted_at`))": null,
                      "cover ((`c`.`type`))": "card"
                    },
                    "index": "idx_card_board_join_1",
                    "index_id": "414a1c1f5f8438df",
                    "using": "gsi"
                  }
                },
                {
                  "#operator": "Filter",
                  "condition": "((((((`b`.`type`) = \"board\") and ((`b`.`deleted_at`) is null)) and ((`b`.`owner`) = \"team:f51e4695-479d-4ea6-beab-425ad9484ee9\")) and (cover ((`c`.`type`)) = \"card\")) and cover (((`c`.`deleted_at`) is null)))"
                },
                {
                  "#operator": "InitialProject",
                  "distinct": true,
                  "result_terms": [
                    {
                      "as": "boardID",
                      "expr": "(`b`.`_id`)"
                    },
                    {
                      "expr": "(`b`.`name`)"
                    }
                  ]
                },
                {
                  "#operator": "Distinct"
                }
              ]
            }
          },
          {
            "#operator": "Distinct"
          }
        ]
      },
      {
        "#operator": "Order",
        "sort_terms": [
          {
            "expr": "cover ((-str_to_millis((`c`.`updated_at`))))"
          }
        ]
      },
      {
        "#operator": "FinalProject"
      }
    ]
  },
  "text": "SELECT DISTINCT b._id as boardID, b.name FROM default b\nLEFT JOIN default c ON KEY \"board:\" || c.boardID FOR b\nWHERE b.type = \"board\" \nAND b.deleted_at IS NULL \nAND b.owner = \"team:f51e4695-479d-4ea6-beab-425ad9484ee9\"\nAND c.type = \"card\"\nAND c.deleted_at IS NULL\norder by -millis(c.updated_at)"
}