Insert data in CB bucket Nested format from other bucket using N1QL

eldorado · January 10, 2020, 8:44am

Hi ,
I have data in a CB bucket flattened like each records and I want to insert records from this bucket to other new bucket using Nested Subdocs . Can we achieve this in N1QL ?

The Flatten data looks like as below in JSON:

[
  {
    "ASDT": "2020-01-10",
    "LOS": 1,
    "GS": 1,
    "SRC": "Expedia.com",
    "KPROP": "1234",
    "SUB_NM": "ELEGANT HOTEL",
    "LST_SHOP": "2020-01-10 11:52:54",
    "SHOP_STAT": "S",
    "TC_BID": 524218,
    "ARR_DT": "2020-01-11",
    "RTYPE": "DELUXE ROOM , OCEAN VIEW",
    "INC_FLG": 2,
    "QUAL_FLG": 3,
    "REST_FLG": 6,
    "LN_STAT": "S",
    "RATE": 123
  },
  {
    "ASDT": "2020-01-10",
    "LOS": 1,
    "GS": 1,
    "SRC": "Expedia.com",
    "KPROP": "1234",
    "SUB_NM": "ELEGANT HOTEL",
    "LST_SHOP": "2020-01-10 11:52:54",
    "SHOP_STAT": "S",
    "TC_BID": 524218,
    "ARR_DT": "2020-01-11",
    "RTYPE": "DELUXE ROOM , GARDEN VIEW",
    "INC_FLG": 2,
    "QUAL_FLG": 3,
    "REST_FLG": 6,
    "LN_STAT": "S",
    "RATE": 234
  },
  {
    "ASDT": "2020-01-10",
    "LOS": 1,
    "GS": 1,
    "SRC": "Expedia.com",
    "KPROP": "1234",
    "SUB_NM": "ELEGANT HOTEL",
    "LST_SHOP": "2020-01-10 11:52:54",
    "SHOP_STAT": "S",
    "TC_BID": 524218,
    "ARR_DT": "2020-01-11",
    "RTYPE": "LUXURY SUITES",
    "INC_FLG": 2,
    "QUAL_FLG": 3,
    "REST_FLG": 6,
    "LN_STAT": "S",
    "RATE": 345
  },
  {
    "ASDT": "2020-01-10",
    "LOS": 1,
    "GS": 1,
    "SRC": "Expedia.com",
    "KPROP": "1234",
    "SUB_NM": "ELEGANT HOTEL",
    "LST_SHOP": "2020-01-10 11:52:54",
    "SHOP_STAT": "S",
    "TC_BID": 524218,
    "ARR_DT": "2020-01-12",
    "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW",
    "INC_FLG": 2,
    "QUAL_FLG": 3,
    "REST_FLG": 6,
    "LN_STAT": "S",
    "RATE": 333
  },
  {
    "ASDT": "2020-01-10",
    "LOS": 1,
    "GS": 1,
    "SRC": "Expedia.com",
    "KPROP": "1234",
    "SUB_NM": "ELEGANT HOTEL",
    "LST_SHOP": "2020-01-10 11:52:54",
    "SHOP_STAT": "S",
    "TC_BID": 524218,
    "ARR_DT": "2020-01-13",
    "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW",
    "INC_FLG": 2,
    "QUAL_FLG": 3,
    "REST_FLG": 6,
    "LN_STAT": "S",
    "RATE": 222
  },
  {
    "ASDT": "2020-01-10",
    "LOS": 1,
    "GS": 1,
    "SRC": "Expedia.com",
    "KPROP": "1234",
    "SUB_NM": "ELEGANT HOTEL",
    "LST_SHOP": "2020-01-10 11:52:54",
    "SHOP_STAT": "S",
    "TC_BID": 524218,
    "ARR_DT": "2020-01-14",
    "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW",
    "INC_FLG": 2,
    "QUAL_FLG": 3,
    "REST_FLG": 6,
    "LN_STAT": "S",
    "RATE": 111
  }
]

I want to run a query to update target bucket with the JSON like this :

{
  "ASDT": "2020-01-10",
  "LOS": 1,
  "GS": 1,
  "SRC": "Expedia.com",
  "KPROP": "1234",
  "SUB_NM": "ELEGANT HOTEL",
  "LST_SHOP": "2020-01-10 11:52:54",
  "SHOP_STAT": "S",
  "TC_BID": 524218,
  "RATES": [
    {
      "ARR_DT": "2020-01-11",
      "RATEBYTYPE": [
        {
          "RTYPE": "DELUXE ROOM , OCEAN VIEW",
          "INC_FLG": 2,
          "QUAL_FLG": 3,
          "REST_FLG": 6,
          "LN_STAT": "S",
          "RATE": 123
        },
        {
          "RTYPE": "DELUXE ROOM , GARDEN VIEW",
          "INC_FLG": 2,
          "QUAL_FLG": 3,
          "REST_FLG": 6,
          "LN_STAT": "S",
          "RATE": 234
        },
        {
          "RTYPE": "LUXURY SUITES",
          "INC_FLG": 2,
          "QUAL_FLG": 3,
          "REST_FLG": 6,
          "LN_STAT": "S",
          "RATE": 345
        }
      ]
    },
    {
      "ARR_DT": "2020-01-12",
      "RATEBYTYPE": [
        {
          "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW",
          "INC_FLG": 2,
          "QUAL_FLG": 3,
          "REST_FLG": 6,
          "LN_STAT": "S",
          "RATE": 333
        }
      ]
    },
    {
      "ARR_DT": "2020-01-13",
      "RATEBYTYPE": [
        {
          "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW",
          "INC_FLG": 2,
          "QUAL_FLG": 3,
          "REST_FLG": 6,
          "LN_STAT": "S",
          "RATE": 222
        }
      ]
    },
    {
      "ARR_DT": "2020-01-14",
      "RATEBYTYPE": [
        {
          "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW",
          "INC_FLG": 2,
          "QUAL_FLG": 3,
          "REST_FLG": 6,
          "LN_STAT": "S",
          "RATE": 111
        }
      ]
    }
  ]
}

vsr1 · January 10, 2020, 1:36pm

The following query produces required format.

SELECT {ASDT, LOS, GS, SRC, KPROP, SUB_NM, LST_SHOP, SHOP_STAT, TC_BID,
       "RATES":ARRAY_AGG({ARR_DT, RATEBYTYPE})} doc
FROM ( SELECT ASDT, LOS, GS, SRC, KPROP, SUB_NM, LST_SHOP, SHOP_STAT, TC_BID, ARR_DT,
              ARRAY_AGG({RTYPE, INC_FLG, QUAL_FLG, REST_FLG, LN_STAT, RATE}) AS RATEBYTYPE
        FROM default  AS d
        WHERE ..........
        GROUP BY ASDT, LOS, GS, SRC, KPROP, SUB_NM, LST_SHOP, SHOP_STAT, TC_BID, ARR_DT) AS d1
GROUP BY ASDT, LOS, GS, SRC, KPROP, SUB_NM, LST_SHOP, SHOP_STAT, TC_BID;

Use NSERT INTO SELECT to insert into different bucket

eldorado · January 11, 2020, 12:50am

thanks you @vsr1 . Appreciated it … I didn’t end up getting any errors . So assume its fine for CB 6.0 .
I have attached 2 text file and I am looking ~40% reduction on just nesting for 12 documents (source) to 2 documents(target) (I have renamed some of the keys from original post) Just want to seeking your advice:

if this right way to reduce CB data volume plus the number of documents . I am not sure if I will get consistent results on ~40% reduction so I will try bigger dataset .
query performance wise is that a bigger gain I would anticipate putting this NESTed subdoc ? I will hit at max 9 documents at one point of time and extract the information from subdocs .
Also I am looking to extend the Subdocs from just 6 ARR_DT to 365 days . do you see any future challenge on query performance ?
I am planning to keep _FLG columns of subdoc for the idea to make a join between multiple dimensions . that dimension is going to be < 500 documents . is there any better approach than Join . Since I don’t want to inflate the doc sizes I kept the dimension separated as there will be couple of keys on _FLG buckets .
Also any challenges if I want to update the nested subdocs and reload new data in practical scenario where my top keys are same but subdoc will just changes ? I have heard about locking and update with CAS not sure how practically feasible to update such 365 subdocs for this matching key for 1 documents very fast , reliable way with index update and with index read consistency ?
I want to query for Minimum rates for Arrival Dates and also want to filter arrival_dates … Since I have 2 level of array objects can you help me if there is any document where can I look into :
I am trying to get below and not returning value what I intent for :

select RATES[0].ARR_DT , MIN(RATES[0].RATES)
from TARGET WHERE ASDT=“2020-01-10” AND SRC = “EXP”
GROUP BY RATES[0].ARR_DT

Any help and direction is greatly appreciated as always. attached file below

source and target.zip (1.1 KB)

vsr1 · January 11, 2020, 4:46am

If you want use flat documents or ARRAY are depends on various use cases. It has it own advantages and disadvantages.
As I have limited knowledge on Use cases so not able advise anything on that.

You can check out

SELECT  r.ARR_DT , MIN(ARRAY_MIN(r.RATEBYTYPE[*].RATES) ) AS minrate
FROM  TARGET  AS t 
UNNEST t.RATES AS r
WHERE t.ASDT=“2020-01-10” AND t.SRC = “EXP”
GROUP BY r.ARR_DT;

eldorado · January 11, 2020, 10:19pm

@vsr1 -
Thank you and the docs reference are very good.

the query giving me NULL for Rate but I guess I know why it is doing so :
If you see my sample data on top of which it would run below query its not producing output : may be because of I have string text “Closed” in rate . Not sure how we handle Min on top of it as data will not be always number or always String: if all “Closed” we will return String , if all number we will return MIN number and if all number and string we will return MIN of number and string . Not sure if we can handle this casting .

SELECT  r.ARR_DT , MIN(ARRAY_MIN(r.RATEBYTE[*].RATE) ) AS minrate
FROM  TARGET  AS t 
UNNEST t.RATES AS r
WHERE t.ASDT="2020-01-11" AND t.SRC = "Expedia.com"
GROUP BY r.ARR_DT;

Data below:

[
  {
    "t": {
      "ASDT": "2020-01-11",
      "BID": 524218,
      "GS": 1,
      "LOS": 1,
      "LST_SHOP": "2020-01-11 11:52:54",
      "RATES": [
        {
          "ARR_DT": "2020-01-11",
          "RATEBYTYPE": [
            {
              "INC_FLG": 2,
              "LN_STAT": "S",
              "QUAL_FLG": 3,
              "RATE": 123,
              "REST_FLG": 6,
              "RTYPE": "DELUXE ROOM , OCEAN VIEW"
            },
            {
              "INC_FLG": 2,
              "LN_STAT": "S",
              "QUAL_FLG": 3,
              "RATE": "Closed",
              "REST_FLG": 6,
              "RTYPE": "DELUXE ROOM , GARDEN VIEW"
            }
          ]
        },
        {
          "ARR_DT": "2020-01-13",
          "RATEBYTYPE": [
            {
              "INC_FLG": 2,
              "LN_STAT": "S",
              "QUAL_FLG": 3,
              "RATE": 345,
              "REST_FLG": 6,
              "RTYPE": "LUXURY SUITES"
            }
          ]
        },
        {
          "ARR_DT": "2020-01-14",
          "RATEBYTYPE": [
            {
              "INC_FLG": 2,
              "LN_STAT": "S",
              "QUAL_FLG": 3,
              "RATE": 333,
              "REST_FLG": 6,
              "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW"
            }
          ]
        },
        {
          "ARR_DT": "2020-01-15",
          "RATEBYTYPE": [
            {
              "INC_FLG": 2,
              "LN_STAT": "S",
              "QUAL_FLG": 3,
              "RATE": 222,
              "REST_FLG": 6,
              "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW"
            }
          ]
        },
        {
          "ARR_DT": "2020-01-16",
          "RATEBYTYPE": [
            {
              "INC_FLG": 2,
              "LN_STAT": "S",
              "QUAL_FLG": 3,
              "RATE": 111,
              "REST_FLG": 6,
              "RTYPE": "DELUXE ROOM , BOSPHORUS VIEW"
            }
          ]
        }
      ],
      "SHOP_STAT": "S",
      "SRC": "Expedia.com",
      "SUB_NM": "HILTON GARDEN INN ATLANTA DOWNTOWN"
    }
  }
]

vsr1 · January 11, 2020, 10:38pm

That what exactly it does. There is typo RATEBYTE vs RATEBYTYPE

SELECT  r.ARR_DT , MIN(ARRAY_MIN(r.RATEBYTYPE[*].RATE) ) AS minrate
FROM  TARGET  AS t
UNNEST t.RATES AS r
WHERE t.ASDT="2020-01-11" AND t.SRC = "Expedia.com"
GROUP BY r.ARR_DT;

eldorado · January 11, 2020, 10:49pm

Sorry my Bad . The Words are wrongly chosen by this name … I apologize .
But I see it is excellently showing max ignoring the type string vs number . that is amazing .

In my new inserts I am ended with with below errors:
In the actual dataset for TARGET I want the data to be with one key only which is “1” And if I use ( KEY doc.ASDT , VALUE doc) I ended up with similar CAS error . However I see document populated partially .
not sure if this is the defects that you were mentioning before ?

DML Error, possible causes include CAS mismatch or concurrent modification Failed to perform insert - cause: Duplicate Key 1

INSERT INTO FULL_TGT (KEY "1" , VALUE doc)
SELECT {ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT,
       "RATES":ARRAY_AGG({ARR_DT, RATEBYTYPE})} doc
FROM ( SELECT ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT,
              ARRAY_AGG({RTYPE, INC_FLG, QUAL_FLG, REST_FLG, LN_STAT, RATE}) AS RATEBYTYPE
        FROM FULL_SRC  AS d
        GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT) AS d1
GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT

vsr1 · January 11, 2020, 10:52pm

If Select produces n documents it becomes n inserts. You can’t have duplicate document keys.

If you want do single document.

INSERT INTO FULL_TGT (KEY "1" , VALUE {doc})
SELECT ARRAY_AGG({ ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT,RATES}) AS doc
FROM ( SELECT  ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT,
       ARRAY_AGG({ARR_DT, RATEBYTYPE}) AS RATES
FROM ( SELECT ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT,
              ARRAY_AGG({RTYPE, INC_FLG, QUAL_FLG, REST_FLG, LN_STAT, RATE}) AS RATEBYTYPE
        FROM FULL_SRC  AS d
        GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT) AS d1
GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT) AS d2

NOTE: Too many nested arrays may be difficult to navigate.

eldorado · January 11, 2020, 11:04pm

In relational world : ASDT is not a Key and it is a same value across all rows and I have cbimport this data here in SRC bucket . So SRC bucket has key as k::1 , k::2 as I did with #MONO_INCR# in cbimport .
so it has different keys as id but not ASDT .
I was expecting to use ASDT as key and thinking this would be merged in TARGET document which gave CAS error so I tried to make my KEY “1” for 1 document .So I am not sure how to define keys here …
Yes I have lot of nested arrays and not too many levels though . You think performance wise its not advisable ?
This goes to my original question on best practice on nesting and unnest . plus how the update can be handled inside nested attributes

vsr1 · January 11, 2020, 11:07pm

INSERT INTO FULL_TGT (KEY doc.ASDT  , VALUE doc)
SELECT {ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT,
       "RATES":ARRAY_AGG({ARR_DT, RATEBYTYPE})} doc
FROM ( SELECT ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT,
              ARRAY_AGG({RTYPE, INC_FLG, QUAL_FLG, REST_FLG, LN_STAT, RATE}) AS RATEBYTYPE
        FROM FULL_SRC  AS d
        GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT) AS d1
GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT;

There might already something with that date, you can delete or change key
If you want overwrite use UPSERT vs INSERT

If you need you can do
KEY “prefix”||doc.ASDT
NOTE: same ASDT and different values of LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT can generate same key can cause issue,
You can use KEY UUID() or come up unique key

eldorado · January 11, 2020, 11:17pm

I used prefix , no help with below error . Guessing I need to do sequencing (relational world) . How can I achieve this in CB ?

12009DML Error, possible causes include CAS mismatch or concurrent modificationFailed to perform insert - cause: Duplicate Key prefix2020-01-06

However UPSERT works but it doesn’t help as I have 155K source documents which needs to be processed and insert into arrays by dates where I see only 1 document in TARGET with 2220 lines so definitely it is cutting out .

vsr1 · January 11, 2020, 11:20pm

NOTE: same ASDT and different values of LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT can generate same key can cause issue,
You can use KEY UUID() or come up unique key

There is no sequence in the N1QL. You can use UUID( )

INSERT INTO FULL_TGT (KEY dockey  , VALUE doc)
SELECT {ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT,
       "RATES":ARRAY_AGG({ARR_DT, RATEBYTYPE})} doc, MIN(dockey) AS dockey
FROM ( SELECT ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT, MIN(META().id) AS dockey,
              ARRAY_AGG({RTYPE, INC_FLG, QUAL_FLG, REST_FLG, LN_STAT, RATE}) AS RATEBYTYPE
        FROM FULL_SRC  AS d
        GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT, ARR_DT) AS d1
GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT;

eldorado · January 11, 2020, 11:47pm

I got error : syntax error - at ARRAY_AGG

Sorry sorted out missing ,
I tried UUID() and not sure what it is doing though as I see it is creating random ID’s but my entire document is splitted into 28 subparts where some of the documents keeping different ARR_DT but I guess this is not what I want

vsr1 · January 11, 2020, 11:50pm

, is missing.

clear up target bucket those might be from previous once
run select and see each document will insert as separate document

eldorado · January 11, 2020, 11:53pm

Yeh , fixed … I did flush bucket before each insert … this time also its split to 28 docs … I am not sure why it is doing so … is that because of exceeding 1 MB doc limit .

vsr1 · January 11, 2020, 11:54pm

no it will not split. will not not sure what you are talking unless you post few documents
each group of ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT will have different document
If you are expecting each ASDT you need to do one more GROUP BY ASDT

eldorado · January 12, 2020, 12:11am

Sorry … this is getting confusing … Uploaded TARGET and as you see each documents is partially showing the information for ASDT “2020-01-06” . I have k::1003 which is also showing ASDT: “2020-01-06” and other 28 documents …
I am looking for ASDT to be served as KEY so that I will have 1 document . If you see below document as SOURCE it has ASDT : “2020-01-06” and its same across all 155K document . Idea is to have 1 document which will say AS OF this date show me all Arrival Dates (ARR_DT) inside as Subdoc in nested manner … I don’t mind the key to be “1” … And as we proceed to the next day AS OF advanced to “2020-01-07” and I may have similar 150K document to be processed with key called “2”

Source Bucket data:

{
  "ARR_DT": "2020-05-03",
  "ASDT": "2020-01-06",
  "BID": 913775,
  "GS": 2,
  "INC_FLG": 4,
  "LN_STAT": "S",
  "LOS": 1,
  "LST_SHOP": "2020-01-06 9:10:00",
  "QUAL_FLG": 2,
  "RATE": 302.2,
  "REST_FLG": 17,
  "RTYPE": "SUITE HEAR MOBIL ACCESSIBLE ROLL IN SHWR",
  "SHOP_STAT": "S",
  "SRC": "Brand.com",
  "SUB_NM": "INTERCONTINENTAL MINNEAPOLIS AIRPORT"
}

vsr1 · January 12, 2020, 12:15am

GROUP BY ASDT, LOS, GS, SRC, BID, SUB_NM, LST_SHOP, SHOP_STAT

each produces different document. LST_SHOP is timestamp different hour/minute

eldorado · January 12, 2020, 1:47am

I hashed out the problem … I was trying to design the document in one way and ingesting the source data in other way but yes solved it .
I will try with UUID() now to see if my 5 key combinations below goes together as a one single key to identify each documents .
May be 28 is what was doing that job with 28 combinations .
I need to validate the data…
Nevertheless it helps. Appreciate it

ASDT (string)
BID (number)
GS (number)
LOS (number)
LST_SHOP (string)