Insert data in CB bucket Nested format from other bucket using N1QL

Hi ,
I have data in a CB bucket flattened like each records and I want to insert records from this bucket to other new bucket using Nested Subdocs . Can we achieve this in N1QL ?

The Flatten data looks like as below in JSON:

[
  {
    "ASDT": "2020-01-10",
    "LOS": 1,
    "GS": 1,
    "SRC": "Expedia.com",
    "KPROP": "1234",
    "SUB_NM": "ELEGANT HOTEL",
    "LST_SHOP": "2020-01-10 11:52:54",
    "SHOP_STAT": "S",
    "TC_BID": 524218,
    "ARR_DT": "2020-01-11",
    "RTYPE": "DELUXE ROOM , OCEAN VIEW",
    "INC_FLG": 2,
    "QUAL_FLG": 3,
    "REST_FLG": 6,
    "LN_STAT": "S",
    "RATE": 123
  },
  {
    "ASDT": "2020-01-10",
    "LOS": 1,
    "GS": 1,
    "SRC": "Expedia.com",
    "KPROP": "1234",
    "SUB_NM": "ELEGANT HOTEL",
    "LST_SHOP": "2020-01-10 11:52:54",
    "SHOP_STAT": "S",
    "TC_BID": 524218,
    "ARR_DT": "2020-01-11",
    "RTYPE": "DELUXE ROOM , GARDEN VIEW",
    "INC_FLG": 2,
    "QUAL_FLG": 3,
    "REST_FLG": 6,
    "LN_STAT": "S",
    "RATE": 234
  },
  {
    "ASDT": "2020-01-10",
    "LOS": 1,
    "GS": 1,
    "SRC": "Expedia.com",
    "KPROP": "1234",
    "SUB_NM": "ELEGANT HOTEL",
    "LST_SHOP": "2020-01-10 11:52:54",
    "SHOP_STAT": "S",
    "TC_BID": 524218,
    "ARR_DT": "2020-01-11",
    "RTYPE": "LUXURY SUITES",
    "INC_FLG": 2,
    "QUAL_FLG": 3,
    "REST_FLG": 6,
    "LN_STAT": "S",
    "RATE": 345
  },
  {
    "ASDT": "2020-01-10",
    "LOS": 1,
    "GS": 1,
    "SRC": "Expedia.com",
    "KPROP": "1234",
    "SUB_NM": "ELEGANT HOTEL",
    "LST_SHOP": "2020-01-10 11:52:54",
    "SHOP_STAT": "S",
    "TC_BID": 524218,
    "ARR_DT": "2020-01-12",
    "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW",
    "INC_FLG": 2,
    "QUAL_FLG": 3,
    "REST_FLG": 6,
    "LN_STAT": "S",
    "RATE": 333
  },
  {
    "ASDT": "2020-01-10",
    "LOS": 1,
    "GS": 1,
    "SRC": "Expedia.com",
    "KPROP": "1234",
    "SUB_NM": "ELEGANT HOTEL",
    "LST_SHOP": "2020-01-10 11:52:54",
    "SHOP_STAT": "S",
    "TC_BID": 524218,
    "ARR_DT": "2020-01-13",
    "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW",
    "INC_FLG": 2,
    "QUAL_FLG": 3,
    "REST_FLG": 6,
    "LN_STAT": "S",
    "RATE": 222
  },
  {
    "ASDT": "2020-01-10",
    "LOS": 1,
    "GS": 1,
    "SRC": "Expedia.com",
    "KPROP": "1234",
    "SUB_NM": "ELEGANT HOTEL",
    "LST_SHOP": "2020-01-10 11:52:54",
    "SHOP_STAT": "S",
    "TC_BID": 524218,
    "ARR_DT": "2020-01-14",
    "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW",
    "INC_FLG": 2,
    "QUAL_FLG": 3,
    "REST_FLG": 6,
    "LN_STAT": "S",
    "RATE": 111
  }
]

I want to run a query to update target bucket with the JSON like this :

{
  "ASDT": "2020-01-10",
  "LOS": 1,
  "GS": 1,
  "SRC": "Expedia.com",
  "KPROP": "1234",
  "SUB_NM": "ELEGANT HOTEL",
  "LST_SHOP": "2020-01-10 11:52:54",
  "SHOP_STAT": "S",
  "TC_BID": 524218,
  "RATES": [
    {
      "ARR_DT": "2020-01-11",
      "RATEBYTYPE": [
        {
          "RTYPE": "DELUXE ROOM , OCEAN VIEW",
          "INC_FLG": 2,
          "QUAL_FLG": 3,
          "REST_FLG": 6,
          "LN_STAT": "S",
          "RATE": 123
        },
        {
          "RTYPE": "DELUXE ROOM , GARDEN VIEW",
          "INC_FLG": 2,
          "QUAL_FLG": 3,
          "REST_FLG": 6,
          "LN_STAT": "S",
          "RATE": 234
        },
        {
          "RTYPE": "LUXURY SUITES",
          "INC_FLG": 2,
          "QUAL_FLG": 3,
          "REST_FLG": 6,
          "LN_STAT": "S",
          "RATE": 345
        }
      ]
    },
    {
      "ARR_DT": "2020-01-12",
      "RATEBYTYPE": [
        {
          "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW",
          "INC_FLG": 2,
          "QUAL_FLG": 3,
          "REST_FLG": 6,
          "LN_STAT": "S",
          "RATE": 333
        }
      ]
    },
    {
      "ARR_DT": "2020-01-13",
      "RATEBYTYPE": [
        {
          "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW",
          "INC_FLG": 2,
          "QUAL_FLG": 3,
          "REST_FLG": 6,
          "LN_STAT": "S",
          "RATE": 222
        }
      ]
    },
    {
      "ARR_DT": "2020-01-14",
      "RATEBYTYPE": [
        {
          "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW",
          "INC_FLG": 2,
          "QUAL_FLG": 3,
          "REST_FLG": 6,
          "LN_STAT": "S",
          "RATE": 111
        }
      ]
    }
  ]
}

The following query produces required format.

SELECT {ASDT, LOS, GS, SRC, KPROP, SUB_NM, LST_SHOP, SHOP_STAT, TC_BID,
       "RATES":ARRAY_AGG({ARR_DT, RATEBYTYPE})} doc
FROM ( SELECT ASDT, LOS, GS, SRC, KPROP, SUB_NM, LST_SHOP, SHOP_STAT, TC_BID, ARR_DT,
              ARRAY_AGG({RTYPE, INC_FLG, QUAL_FLG, REST_FLG, LN_STAT, RATE}) AS RATEBYTYPE
        FROM default  AS d
        WHERE ..........
        GROUP BY ASDT, LOS, GS, SRC, KPROP, SUB_NM, LST_SHOP, SHOP_STAT, TC_BID, ARR_DT) AS d1
GROUP BY ASDT, LOS, GS, SRC, KPROP, SUB_NM, LST_SHOP, SHOP_STAT, TC_BID;

Use NSERT INTO SELECT to insert into different bucket

1 Like

thanks you @vsr1 . Appreciated it … I didn’t end up getting any errors . So assume its fine for CB 6.0 .
I have attached 2 text file and I am looking ~40% reduction on just nesting for 12 documents (source) to 2 documents(target) (I have renamed some of the keys from original post) Just want to seeking your advice:

  1. if this right way to reduce CB data volume plus the number of documents . I am not sure if I will get consistent results on ~40% reduction so I will try bigger dataset .

  2. query performance wise is that a bigger gain I would anticipate putting this NESTed subdoc ? I will hit at max 9 documents at one point of time and extract the information from subdocs .

  3. Also I am looking to extend the Subdocs from just 6 ARR_DT to 365 days . do you see any future challenge on query performance ?

  4. I am planning to keep _FLG columns of subdoc for the idea to make a join between multiple dimensions . that dimension is going to be < 500 documents . is there any better approach than Join . Since I don’t want to inflate the doc sizes I kept the dimension separated as there will be couple of keys on _FLG buckets .

  5. Also any challenges if I want to update the nested subdocs and reload new data in practical scenario where my top keys are same but subdoc will just changes ? I have heard about locking and update with CAS not sure how practically feasible to update such 365 subdocs for this matching key for 1 documents very fast , reliable way with index update and with index read consistency ?

  6. I want to query for Minimum rates for Arrival Dates and also want to filter arrival_dates … Since I have 2 level of array objects can you help me if there is any document where can I look into :
    I am trying to get below and not returning value what I intent for :

    select RATES[0].ARR_DT , MIN(RATES[0].RATES)
    from TARGET WHERE ASDT=“2020-01-10” AND SRC = “EXP”
    GROUP BY RATES[0].ARR_DT

Any help and direction is greatly appreciated as always. attached file below

source and target.zip (1.1 KB)

If you want use flat documents or ARRAY are depends on various use cases. It has it own advantages and disadvantages.
As I have limited knowledge on Use cases so not able advise anything on that.

You can check out


https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/indexing-arrays.html
https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/arrayfun.html

SELECT  r.ARR_DT , MIN(ARRAY_MIN(r.RATEBYTYPE[*].RATES) ) AS minrate
FROM  TARGET  AS t 
UNNEST t.RATES AS r
WHERE t.ASDT=“2020-01-10” AND t.SRC = “EXP”
GROUP BY r.ARR_DT;

@vsr1 -
Thank you and the docs reference are very good.

the query giving me NULL for Rate but I guess I know why it is doing so :
If you see my sample data on top of which it would run below query its not producing output : may be because of I have string text “Closed” in rate . Not sure how we handle Min on top of it as data will not be always number or always String: if all “Closed” we will return String , if all number we will return MIN number and if all number and string we will return MIN of number and string . Not sure if we can handle this casting .

SELECT  r.ARR_DT , MIN(ARRAY_MIN(r.RATEBYTE[*].RATE) ) AS minrate
FROM  TARGET  AS t 
UNNEST t.RATES AS r
WHERE t.ASDT="2020-01-11" AND t.SRC = "Expedia.com"
GROUP BY r.ARR_DT;

Data below:

[
  {
    "t": {
      "ASDT": "2020-01-11",
      "BID": 524218,
      "GS": 1,
      "LOS": 1,
      "LST_SHOP": "2020-01-11 11:52:54",
      "RATES": [
        {
          "ARR_DT": "2020-01-11",
          "RATEBYTYPE": [
            {
              "INC_FLG": 2,
              "LN_STAT": "S",
              "QUAL_FLG": 3,
              "RATE": 123,
              "REST_FLG": 6,
              "RTYPE": "DELUXE ROOM , OCEAN VIEW"
            },
            {
              "INC_FLG": 2,
              "LN_STAT": "S",
              "QUAL_FLG": 3,
              "RATE": "Closed",
              "REST_FLG": 6,
              "RTYPE": "DELUXE ROOM , GARDEN VIEW"
            }
          ]
        },
        {
          "ARR_DT": "2020-01-13",
          "RATEBYTYPE": [
            {
              "INC_FLG": 2,
              "LN_STAT": "S",
              "QUAL_FLG": 3,
              "RATE": 345,
              "REST_FLG": 6,
              "RTYPE": "LUXURY SUITES"
            }
          ]
        },
        {
          "ARR_DT": "2020-01-14",
          "RATEBYTYPE": [
            {
              "INC_FLG": 2,
              "LN_STAT": "S",
              "QUAL_FLG": 3,
              "RATE": 333,
              "REST_FLG": 6,
              "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW"
            }
          ]
        },
        {
          "ARR_DT": "2020-01-15",
          "RATEBYTYPE": [
            {
              "INC_FLG": 2,
              "LN_STAT": "S",
              "QUAL_FLG": 3,
              "RATE": 222,
              "REST_FLG": 6,
              "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW"
            }
          ]
        },
        {
          "ARR_DT": "2020-01-16",
          "RATEBYTYPE": [
            {
              "INC_FLG": 2,
              "LN_STAT": "S",
              "QUAL_FLG": 3,
              "RATE": 111,
              "REST_FLG": 6,
              "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW"
            }
          ]
        }
      ],
      "SHOP_STAT": "S",
      "SRC": "Expedia.com",
      "SUB_NM": "HILTON GARDEN INN ATLANTA DOWNTOWN"
    }
  }
]

That what exactly it does. There is typo RATEBYTE vs RATEBYTYPE

SELECT  r.ARR_DT , MIN(ARRAY_MIN(r.RATEBYTYPE[*].RATE) ) AS minrate
FROM  TARGET  AS t
UNNEST t.RATES AS r
WHERE t.ASDT="2020-01-11" AND t.SRC = "Expedia.com"
GROUP BY r.ARR_DT;
1 Like

Sorry my Bad . The Words are wrongly chosen by this name … I apologize .
But I see it is excellently showing max ignoring the type string vs number . that is amazing .

In my new inserts I am ended with with below errors:
In the actual dataset for TARGET I want the data to be with one key only which is “1” And if I use ( KEY doc.ASDT , VALUE doc) I ended up with similar CAS error . However I see document populated partially .
not sure if this is the defects that you were mentioning before ?

DML Error, possible causes include CAS mismatch or concurrent modification Failed to perform insert - cause: Duplicate Key 1

INSERT INTO FULL_TGT (KEY "1" , VALUE doc)
SELECT {ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT,
       "RATES":ARRAY_AGG({ARR_DT, RATEBYTYPE})} doc
FROM ( SELECT ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT,
              ARRAY_AGG({RTYPE, INC_FLG, QUAL_FLG, REST_FLG, LN_STAT, RATE}) AS RATEBYTYPE
        FROM FULL_SRC  AS d
        GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT) AS d1
GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT

If Select produces n documents it becomes n inserts. You can’t have duplicate document keys.

If you want do single document.

INSERT INTO FULL_TGT (KEY "1" , VALUE {doc})
SELECT ARRAY_AGG({ ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT,RATES}) AS doc
FROM ( SELECT  ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT,
       ARRAY_AGG({ARR_DT, RATEBYTYPE}) AS RATES
FROM ( SELECT ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT,
              ARRAY_AGG({RTYPE, INC_FLG, QUAL_FLG, REST_FLG, LN_STAT, RATE}) AS RATEBYTYPE
        FROM FULL_SRC  AS d
        GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT) AS d1
GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT) AS d2

NOTE: Too many nested arrays may be difficult to navigate.

In relational world : ASDT is not a Key and it is a same value across all rows and I have cbimport this data here in SRC bucket . So SRC bucket has key as k::1 , k::2 as I did with #MONO_INCR# in cbimport .
so it has different keys as id but not ASDT .
I was expecting to use ASDT as key and thinking this would be merged in TARGET document which gave CAS error so I tried to make my KEY “1” for 1 document .So I am not sure how to define keys here …
Yes I have lot of nested arrays and not too many levels though . You think performance wise its not advisable ?
This goes to my original question on best practice on nesting and unnest . plus how the update can be handled inside nested attributes

INSERT INTO FULL_TGT (KEY doc.ASDT  , VALUE doc)
SELECT {ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT,
       "RATES":ARRAY_AGG({ARR_DT, RATEBYTYPE})} doc
FROM ( SELECT ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT,
              ARRAY_AGG({RTYPE, INC_FLG, QUAL_FLG, REST_FLG, LN_STAT, RATE}) AS RATEBYTYPE
        FROM FULL_SRC  AS d
        GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT) AS d1
GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT;

There might already something with that date, you can delete or change key
If you want overwrite use UPSERT vs INSERT

If you need you can do
KEY “prefix”||doc.ASDT
NOTE: same ASDT and different values of LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT can generate same key can cause issue,
You can use KEY UUID() or come up unique key

I used prefix , no help with below error . Guessing I need to do sequencing (relational world) . How can I achieve this in CB ?

12009DML Error, possible causes include CAS mismatch or concurrent modificationFailed to perform insert - cause: Duplicate Key prefix2020-01-06

However UPSERT works but it doesn’t help as I have 155K source documents which needs to be processed and insert into arrays by dates where I see only 1 document in TARGET with 2220 lines so definitely it is cutting out .

NOTE: same ASDT and different values of LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT can generate same key can cause issue,
You can use KEY UUID() or come up unique key

There is no sequence in the N1QL. You can use UUID( )

INSERT INTO FULL_TGT (KEY dockey  , VALUE doc)
SELECT {ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT,
       "RATES":ARRAY_AGG({ARR_DT, RATEBYTYPE})} doc, MIN(dockey) AS dockey
FROM ( SELECT ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT, MIN(META().id) AS dockey,
              ARRAY_AGG({RTYPE, INC_FLG, QUAL_FLG, REST_FLG, LN_STAT, RATE}) AS RATEBYTYPE
        FROM FULL_SRC  AS d
        GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT) AS d1
GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT;

I got error : syntax error - at ARRAY_AGG

Sorry sorted out missing ,
I tried UUID() and not sure what it is doing though as I see it is creating random ID’s but my entire document is splitted into 28 subparts where some of the documents keeping different ARR_DT but I guess this is not what I want

, is missing.

clear up target bucket those might be from previous once
run select and see each document will insert as separate document

Yeh , fixed … I did flush bucket before each insert … this time also its split to 28 docs … I am not sure why it is doing so … is that because of exceeding 1 MB doc limit .

no it will not split. will not not sure what you are talking unless you post few documents
each group of ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT will have different document
If you are expecting each ASDT you need to do one more GROUP BY ASDT

Sorry … this is getting confusing … Uploaded TARGET and as you see each documents is partially showing the information for ASDT “2020-01-06” . I have k::1003 which is also showing ASDT: “2020-01-06” and other 28 documents …
I am looking for ASDT to be served as KEY so that I will have 1 document . If you see below document as SOURCE it has ASDT : “2020-01-06” and its same across all 155K document . Idea is to have 1 document which will say AS OF this date show me all Arrival Dates (ARR_DT) inside as Subdoc in nested manner … I don’t mind the key to be “1” … And as we proceed to the next day AS OF advanced to “2020-01-07” and I may have similar 150K document to be processed with key called “2”

Source Bucket data:

{
  "ARR_DT": "2020-05-03",
  "ASDT": "2020-01-06",
  "BID": 913775,
  "GS": 2,
  "INC_FLG": 4,
  "LN_STAT": "S",
  "LOS": 1,
  "LST_SHOP": "2020-01-06 9:10:00",
  "QUAL_FLG": 2,
  "RATE": 302.2,
  "REST_FLG": 17,
  "RTYPE": "SUITE HEAR MOBIL ACCESSIBLE ROLL IN SHWR",
  "SHOP_STAT": "S",
  "SRC": "Brand.com",
  "SUB_NM": "INTERCONTINENTAL MINNEAPOLIS AIRPORT"
}

GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT

each produces different document. LST_SHOP is timestamp different hour/minute

1 Like

I hashed out the problem … I was trying to design the document in one way and ingesting the source data in other way but yes solved it .
I will try with UUID() now to see if my 5 key combinations below goes together as a one single key to identify each documents .
May be 28 is what was doing that job with 28 combinations .
I need to validate the data…
Nevertheless it helps. Appreciate it

  • ASDT (string)

  • BID (number)

  • GS (number)

  • LOS (number)

  • LST_SHOP (string)