Query Assistance - Converting Query N1QL to Analytics N1QL

I have the following N1QL query:

SELECT ARRAY_DISTINCT(ARRAY_FLATTEN(agr,3))
FROM (
    SELECT *
    FROM bucketname c
    WHERE c.type = "sometype"
    LIMIT 5 
    ) AS rr
LETTING agr = ARRAY_AGG(ARRAY OBJECT_NAMES(v) FOR v IN rr.c.l1list END);

It takes data that looks like this:

{
   "l1field1":"data1",
   "l1field2":"data2",
   "l1field3":"data3",
   "l1list":[
      {
         "dog":"tree",
         "something":"nothing"
      },
      {
         "cat":"meat",
         "something":"oblivion"
      },
      {
         "bird":"chicken",
         "something":"doom"
      }
   ]
},
{
   "l1field1":"data1",
   "l1field2":"data2",
   "l1field3":"data3",
   "l1list":[
      {
         "want":"wish",
         "something":"nothing"
      },
      {
         "run":"mile",
         "something":"oblivion"
      },
      {
         "jump":"fly",
         "something":"doom"
      }
   ]
}

And outputs this:

["dog", "something", "cat", "bird", "want", "run", "jump"]
  1. How do I convert the N1QL query to the analytics variant? I read N1QL for Analytics vs. N1QL for Query | Couchbase Docs and made many attempts but could not get this work.

  2. (Additional but optional question) How do I something like this efficiently over hundreds of millions of records? Is that even possible?

Try following on both N1QL for Query AND N1QL for Analytics

SELECT DISTINCT RAW u
FROM (SELECT (SELECT DISTINCT RAW lu FROM c.`l1list` AS l
              UNNEST OBJECT_NAMES(l) AS lu
             ) AS a
      FROM bucketname c
      WHERE c.type = "sometype"
      LIMIT 5) AS d
UNNEST d.a AS u;

@vsr1
Trying it out. It’s taking a long time though.

Ran it and let it go for 5 minutes. Cancelled after. No luck on my end.

At least this works on N1QL For Query ( i replaced bucketname with array of 2 documents)

SELECT DISTINCT RAW u
FROM (SELECT (SELECT DISTINCT RAW lu FROM c.`l1list` AS l
              UNNEST OBJECT_NAMES(l) AS lu
             ) AS a
      FROM [{ "type":"sometype", "l1field1":"data1", "l1field2":"data2", "l1field3":"data3", "l1list":[ { "dog":"tree", "something":"nothing" }, { "cat1":"meat", "something1":"oblivion" }, { "bird1":"chicken", "something":"doom" } ] }, { "type":"sometype", "l1field1":"data1", "l1field2":"data2", "l1field3":"data3", "l1list":[ { "want":"wish", "something":"nothing" }, { "run":"mile", "something":"oblivion" }, { "jump":"fly", "something":"doom" } ] } ] AS c
      WHERE c.type = "sometype"
      LIMIT 5) AS d
UNNEST d.a AS u;

@kevinqbui ,

Can you create an index on the type field in Analytics and run the query again?

create index idx1 on dataset_name (`type`: string);

It should be possible to run this query (without a LIMIT clause) on a large number of documents, if you provide enough Analytics nodes. The query will be evaluated in parallel on all Anlytics nodes in the cluster. For better performance, you could create an Analytics dataset that only contains documents of type = “sometype”, then change this query to remove the WHERE clause and read from that dataset instead:

CREATE DATASET ds1 ON bucketname WHERE `type`="sometype";

SELECT DISTINCT RAW u
FROM (SELECT (SELECT DISTINCT RAW lu FROM c.`l1list` AS l
              UNNEST OBJECT_NAMES(l) AS lu
             ) AS a
      FROM ds1 c) AS d
UNNEST d.a AS u;