How do I create an index to cope with UNNEST'ed queries?

n1ql

#1

I have this query (created with help from people in here!):

SELECT du.modified,du.modifiedby,d.revisioninfo.modifiedcount FROM data AS d UNNEST TO_ARRAY(IFMISSINGORNULL(d.revisioninfo.updates,{'modified':d.revisioninfo.created,'modifiedby':d.revisioninfo.createdby})) AS du WHERE d.type='Catch'
ORDER BY du.modified DESC

It works functionally as expected :+1:

…but it is too slow (1 sec.) which obviously is due to lack of an appropriate index. The data structure looks like this:

{
  "bait": "5",
  "count": 12,
  "date": "2015-08-24T12:00:00+0200",
  "finclipcount": 3,
  "fishingtripkey": "141A1CBE90BD7AFDC1257EAB0048501D",
  "ispublic": false,
  "key": "0D4F548C4DDC5CBDC1257F7000282F6B",
  "length": 43,
  "localssociation": "AF67A22726B129A9C1257E600051BE7C",
  "markednow": false,
  "place": "045B152BDB745864C1257E600051BE56",
  "refno": "1234321",
  "revisioninfo": {
    "created": "2016-03-08T08:18:56+0100",
    "createdby": "John Dalsgaard/BA171123846CEBF1C1257CB2002DA330/Fangst",
    "modifiedcount": 2,
    "updates": [
      {
        "modified": "2016-03-08T08:20:02+0100",
        "modifiedby": "John Dalsgaard/BA171123846CEBF1C1257CB2002DA330/Fangst"
      },
      {
        "modified": "2016-03-08T08:21:08+0100",
        "modifiedby": "John Dalsgaard/BA171123846CEBF1C1257CB2002DA330/Fangst"
      }
    ]
  },
  "sexcountfemale": 7,
  "sexcountmale": 5,
  "species": "25",
  "takenhome": 1,
  "type": "Catch",
  "undersized": 9,
  "userkey": "BA171123846CEBF1C1257CB2002DA330",
  "weight": 9
}

and the updates node may or may not be there…

I have tried to create an index like this:

CREATE INDEX `def_type_modified` ON `data`(`type`,`revisioninfo.modifiedcount`,`revisioninfo.created`,`revisioninfo.createdby`,`revisioninfo.updates.modified`,`revisioninfo.updates.modifiedby`)

But “Explain” indicates that it doesn’t use it…:

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan2",
            "index": "def_type",
            "index_id": "941b5fdba8f0e7f8",
            "index_projection": {
              "primary_key": true
            },
            "keyspace": "data",
            "namespace": "default",
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "high": "\"FishingTrip\"",
                    "inclusion": 3,
                    "low": "\"FishingTrip\""
                  }
                ]
              }
            ],
            "using": "gsi"
          },
          {
            "#operator": "Fetch",
            "as": "d",
            "keyspace": "data",
            "namespace": "default"
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Unnest",
                  "as": "du",
                  "expr": "to_array(ifmissingornull(((`d`.`revisioninfo`).`updates`), {\"modified\": ((`d`.`revisioninfo`).`created`), \"modifiedby\": ((`d`.`revisioninfo`).`createdby`)}))"
                }
              ]
            }
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Filter",
                  "condition": "((`d`.`type`) = \"FishingTrip\")"
                },
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "expr": "(`du`.`modified`)"
                    },
                    {
                      "expr": "(`du`.`modifiedby`)"
                    },
                    {
                      "expr": "((`d`.`revisioninfo`).`modifiedcount`)"
                    }
                  ]
                }
              ]
            }
          }
        ]
      },
      {
        "#operator": "Order",
        "sort_terms": [
          {
            "desc": true,
            "expr": "(`du`.`modified`)"
          }
        ]
      },
      {
        "#operator": "FinalProject"
      }
    ]
  },
  "text": "SELECT du.modified,du.modifiedby,d.revisioninfo.modifiedcount FROM data AS d UNNEST TO_ARRAY(IFMISSINGORNULL(d.revisioninfo.updates,{'modified':d.revisioninfo.created,'modifiedby':d.revisioninfo.createdby})) AS du WHERE d.type='FishingTrip'\nORDER BY du.modified DESC"
}

Any ideas appreciated :wink:


Slow query with unnest - though using index
#2

Try this. The back-ticks are not right place. it considers as fields vs subdocument.

CREATE INDEX `def_type_modified` ON 
`data`(`type`,`revisioninfo`.`modifiedcount`,`revisioninfo`.`created`,
`revisioninfo`.`createdby`,`revisioninfo`.`updates`);

#3

Thanks for the suggestion. Will try that out. Is it not necessary to include the revisioninfo.updates.modified and revisioninfo.updates.modifiedby? Can I just include the node revisioninfo.updates?

… or was that just to show me the principle - which is also fine :wink:


#4

Ok, so I tried to create the index - and I do need to include the revisioninfo.updates node. However, the result lacks some information. Therefore, I tried with this index:

CREATE INDEX `def_type_modified` ON `data`(`type`,`revisioninfo`.`modifiedcount`,`revisioninfo`.`created`,`revisioninfo`.`createdby`,`revisioninfo`.`updates`,`revisioninfo`.`updates`.`modified`,`revisioninfo`.`updates`.`modifiedby`)

But the result still only shows the modifiedcount:

[
  {
    "modifiedcount": 0
  },
  {
    "modifiedcount": 0
  },
  {
    "modifiedcount": 0
  },
...

So it seems not to be able to find the calculated modified and modifedby fields. If I drop the index then I get the expected output:

[
  {
    "modified": "2018-08-03T12:41:57+0200",
    "modifiedby": "John Dalsgaard/BA171123846CEBF1C1257CB2002DA330/Fangst",
    "modifiedcount": 0
  },
  {
    "modified": "2018-04-25T08:44:38+0200",
    "modifiedby": "John Dalsgaard/BA171123846CEBF1C1257CB2002DA330/Fangst",
    "modifiedcount": 0
  },
  {
    "modified": "2018-04-24T15:03:29+0200",
    "modifiedby": "John Dalsgaard/BA171123846CEBF1C1257CB2002DA330/Fangst",
    "modifiedcount": 0
  },
....

#5

revisioninfo.updates Include all sub fields. If it is array rrevisioninfo.updates.modified causes MISSING.
Remove last two fields in the index.

CREATE INDEX `def_type_modified` ON 
`data`(`type`,`revisioninfo`.`modifiedcount`,`revisioninfo`.`created`,
`revisioninfo`.`createdby`,`revisioninfo`.`updates`);

Also try this

SELECT
       IFMISSINGORNULL(du.modified, d.revisioninfo.created) AS modified,
       IFMISSINGORNULL(du.modifiedby, d.revisioninfo.createdby) AS modified,
       d.revisioninfo.modifiedcount
FROM data AS d
LEFT UNNEST TO_ARRAY(d.revisioninfo.updates) AS du
WHERE d.type='FishingTrip'
ORDER BY modified DESC;

#6

Thanks for your suggestions. Reverting to the index as you suggest (which was the first one I tried) I still get the wrong result only with the modifiedcount field as I did before.

However, trying your query returns the expected result :wink:

The response times are around 800ms without the index - and around 150ms with the index. So a lot better - though still not impressive response time :wink:

… and as a side effect I understand the query you suggest better than what I had before :+1:

Thanks for your help!


#7

I thought I now had it running but then I investigated a little more and found that it is not quite there yet.

If a record does not have any revisioninfo.updates then it returns incorrect data. This query:

SELECT 
	IFMISSINGORNULL(du.modified, d.revisioninfo.created) AS modified,
	IFMISSINGORNULL(du.modifiedby, d.revisioninfo.createdby) AS modifiedby,
	d.revisioninfo.modifiedcount
FROM data as d
LEFT UNNEST TO_ARRAY(d.revisioninfo.updates) AS du
WHERE d.type ="FishingTrip"
LIMIT 2

Returns this data:

[
  {
    "modified": null,
    "modifiedby": null,
    "modifiedcount": 0
  },
...

If I include a field that is not in the index (e.g. date):

SELECT 
	IFMISSINGORNULL(du.modified, d.revisioninfo.created) AS modified,
	IFMISSINGORNULL(du.modifiedby, d.revisioninfo.createdby) AS modifiedby,
	d.revisioninfo.modifiedcount
	,d.date
FROM data as d
LEFT UNNEST TO_ARRAY(d.revisioninfo.updates) AS du
WHERE d.type ="FishingTrip"
LIMIT 2

… then it looks Ok:

[
  {
    "date": "2008-09-06T00:00:00+0200",
    "modified": "2018-02-27T23:18:27+0100",
    "modifiedby": "Import/Fangst",
    "modifiedcount": 0
  },
  {
    "date": "2014-07-29T00:00:00+0200",
    "modified": "2018-02-27T23:39:15+0100",
    "modifiedby": "Import/Fangst",
    "modifiedcount": 0
  }
]

If I specify one of the fields that are in the index.

SELECT 
	IFMISSINGORNULL(du.modified, d.revisioninfo.created) AS modified,
	IFMISSINGORNULL(du.modifiedby, d.revisioninfo.createdby) AS modifiedby,
	d.revisioninfo.modifiedcount
	,d.type
FROM data as d
LEFT UNNEST TO_ARRAY(d.revisioninfo.updates) AS du
WHERE d.type ="FishingTrip"
LIMIT 2

then it is wrong again, e.g. type:

[
  {
    "modified": null,
    "modifiedby": null,
    "modifiedcount": 0,
    "type": "FishingTrip"
  },
  {
    "modified": null,
    "modifiedby": null,
    "modifiedcount": 0,
    "type": "FishingTrip"
  }
]

I guess something is “wrong” with the index definition or…? How can I obtain the right (or expected :slight_smile: ) result?


#8

Change it to TO_ARRAY(IFMISSINGORNULL(d.revisioninfo.updates,[]))

If does not work , may be your original query is right. Post the EXPLAIN ? What is version of CB


#9

I am using: Community Edition 5.0.1 build 5003

Here is the “Explain” of the first query from my previous post:

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan2",
            "covers": [
              "cover ((`d`.`type`))",
              "cover (((`d`.`revisioninfo`).`modifiedcount`))",
              "cover (((`d`.`revisioninfo`).`created`))",
              "cover (((`d`.`revisioninfo`).`createdby`))",
              "cover (((`d`.`revisioninfo`).`updates`))",
              "cover ((meta(`d`).`id`))"
            ],
            "index": "def_type_modified",
            "index_id": "f2465877553a71be",
            "index_projection": {
              "entry_keys": [
                0,
                1,
                4
              ]
            },
            "keyspace": "data",
            "limit": "2",
            "namespace": "default",
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "high": "\"FishingTrip\"",
                    "inclusion": 3,
                    "low": "\"FishingTrip\""
                  }
                ]
              }
            ],
            "using": "gsi"
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Unnest",
                  "as": "du",
                  "expr": "to_array(cover (((`d`.`revisioninfo`).`updates`)))",
                  "outer": true
                }
              ]
            }
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Filter",
                  "condition": "(cover ((`d`.`type`)) = \"FishingTrip\")"
                },
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "as": "modified",
                      "expr": "ifmissingornull((`du`.`modified`), cover (((`d`.`revisioninfo`).`created`)))"
                    },
                    {
                      "as": "modifiedby",
                      "expr": "ifmissingornull((`du`.`modifiedby`), cover (((`d`.`revisioninfo`).`createdby`)))"
                    },
                    {
                      "expr": "cover (((`d`.`revisioninfo`).`modifiedcount`))"
                    }
                  ]
                },
                {
                  "#operator": "FinalProject"
                }
              ]
            }
          }
        ]
      },
      {
        "#operator": "Limit",
        "expr": "2"
      }
    ]
  },
  "text": "SELECT \n\tIFMISSINGORNULL(du.modified, d.revisioninfo.created) AS modified,\n\tIFMISSINGORNULL(du.modifiedby, d.revisioninfo.createdby) AS modifiedby,\n\td.revisioninfo.modifiedcount\nFROM data as d\nLEFT UNNEST TO_ARRAY(d.revisioninfo.updates) AS du\nWHERE d.type =\"FishingTrip\"\nLIMIT 2"
}

#10

If you look entry_keys, 2,3 are not projected and query uses that for covered case.

You can try 5.1.1 or try following by defining let

SELECT
       IFMISSINGORNULL(du.modified, d.revisioninfo.created) AS modified,
       IFMISSINGORNULL(du.modifiedby, d.revisioninfo.createdby) AS modified,
       d.revisioninfo.modifiedcount
FROM data AS d
LEFT UNNEST TO_ARRAY(d.revisioninfo.updates) AS du
LET xxx= [d.revisioninfo.created, d.revisioninfo.createdby]
WHERE d.type='FishingTrip'
ORDER BY modified DESC;

#11

I updated to 5.1.1 and that seems to do the trick :+1: (should have upgraded any way!)

The Explain now looks like this:

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan2",
            "covers": [
              "cover ((`d`.`type`))",
              "cover (((`d`.`revisioninfo`).`modifiedcount`))",
              "cover (((`d`.`revisioninfo`).`created`))",
              "cover (((`d`.`revisioninfo`).`createdby`))",
              "cover (((`d`.`revisioninfo`).`updates`))",
              "cover ((meta(`d`).`id`))"
            ],
            "index": "def_type_modified",
            "index_id": "f2465877553a71be",
            "index_projection": {
              "entry_keys": [
                0,
                1,
                2,
                3,
                4
              ]
            },
            "keyspace": "data",
            "limit": "2",
            "namespace": "default",
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "high": "\"FishingTrip\"",
                    "inclusion": 3,
                    "low": "\"FishingTrip\""
                  }
                ]
              }
            ],
            "using": "gsi"
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Unnest",
                  "as": "du",
                  "expr": "to_array(ifmissingornull(cover (((`d`.`revisioninfo`).`updates`)), []))",
                  "outer": true
                }
              ]
            }
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Filter",
                  "condition": "(cover ((`d`.`type`)) = \"FishingTrip\")"
                },
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "as": "modified",
                      "expr": "ifmissingornull((`du`.`modified`), cover (((`d`.`revisioninfo`).`created`)))"
                    },
                    {
                      "as": "modifiedby",
                      "expr": "ifmissingornull((`du`.`modifiedby`), cover (((`d`.`revisioninfo`).`createdby`)))"
                    },
                    {
                      "expr": "cover (((`d`.`revisioninfo`).`modifiedcount`))"
                    }
                  ]
                },
                {
                  "#operator": "FinalProject"
                }
              ]
            }
          }
        ]
      },
      {
        "#operator": "Limit",
        "expr": "2"
      }
    ]
  },
  "text": "SELECT \n\tIFMISSINGORNULL(du.modified, d.revisioninfo.created) AS modified,\n\tIFMISSINGORNULL(du.modifiedby, d.revisioninfo.createdby) AS modifiedby,\n\td.revisioninfo.modifiedcount\nFROM data as d\nLEFT UNNEST TO_ARRAY(IFMISSINGORNULL(d.revisioninfo.updates,[])) AS du\nWHERE d.type =\"FishingTrip\"\nLIMIT 2"
}

Thanks!

PS: It just took me a little searching to find out that you have to use rpm -U coucbase..... when doing an upgrade (instead of the rpm -i .... command specified in the documentation…)