Querying and Indexing an array

monti.mauro · February 6, 2017, 7:13pm

Hi guys,

I am having a hard time creating an index for the following query.

SELECT META(`attributes`).id AS _ID, META(`attributes`).cas AS _CAS, `attributes`.*
FROM `attributes` 
WHERE 
ANY item IN `attributes`.`paths` SATISFIES item = "/path/to/a/file" END 
AND `_class` = "com.model.Entity"

My document looks like:

{
  "schema": "1234567890@schema",
  "creator": "system",
  "resource": "http://resources/resource1",
  "paths": [
    "/path/to/a/file", "/another/path
  ],
  "lastModification": 1414137431000,
  "attributes": {
    "attr1": "value1",
    "attr2": "value2"
  },
  "_class": "com.model.Entity",
  "classification": "com.entities.classification1",
  "creationDate": 1486154007968,
  "inherits": [
    "com.model.Type1",
    "com.model.Type2"
  ]
}

I tried creating the following index:

CREATE INDEX `i_attribute_paths`
ON `attributes`(ARRAY t FOR t IN paths END)
WHERE _class = 'com.model.Entity' USING GSI;

The explain for this query:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "PrimaryScan",
          "index": "#primary",
          "keyspace": "attributes",
          "namespace": "default",
          "using": "gsi"
        },
        {
          "#operator": "Fetch",
          "keyspace": "attributes",
          "namespace": "default"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Filter",
                "condition": "(any `item` in (`attributes`.`paths`) satisfies (`item` = \"/path/to/a/file\") end and ((`attributes`.`_class`) = \"com.model.Entity\"))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "as": "_ID",
                    "expr": "(meta(`attributes`).`id`)"
                  },
                  {
                    "as": "_CAS",
                    "expr": "(meta(`attributes`).`cas`)"
                  },
                  {
                    "expr": "`attributes`",
                    "star": true
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT \nMETA(`attributes`).id AS _ID, \nMETA(`attributes`).cas AS _CAS, \n`attributes`.* \nFROM `attributes` \nWHERE ANY item IN `attributes`.`paths` SATISFIES item = \"/path/to/a/file\" END AND `_class` = \"com.model.Entity\""
  }
]

Is there any other way to re-write the query or the index to avoid a PrimaryScan?
Any suggestions are greatly appreciated!

Thanks a lot!.
M.-

Prerna.Manaktala · February 6, 2017, 7:30pm

Hi Monti

Great work on building the right array index for your query.
The explain of your query was still using primary scan and not the index you built because you used different variables in your query and index.
You used item in your query:
ANY item IN attributes.paths SATISFIES item
while you use the character t in your array index:
CREATE INDEX i_attribute_paths
ON attributes(ARRAY t FOR t IN paths END)
WHERE _class = ‘com.model.Entity’ USING GSI;

The right array index for your query will be:
CREATE INDEX i_attribute_paths
ON attributes(DISTINCT ARRAY item FOR item IN paths END)
WHERE _class = ‘com.model.Entity’ USING GSI;

Here is the explain of your query with above index:

[
{
“plan”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “IntersectScan”,
“scans”: [
{
“#operator”: “DistinctScan”,
“scan”: {
“#operator”: “IndexScan”,
“index”: “i_attribute_paths”,
“index_id”: “31b3b1e6d677c151”,
“keyspace”: “attributes”,
“namespace”: “default”,
“spans”: [
{
“Exact”: true,
“Range”: {
“High”: [
“"/path/to/a/file"”
],
“Inclusion”: 3,
“Low”: [
“"/path/to/a/file"”
]
}
}
],
“using”: “gsi”
}
},
{
“#operator”: “DistinctScan”,
“scan”: {
“#operator”: “IndexScan”,
“index”: “i_attribute_paths”,
“index_id”: “fd6827d21af7255c”,
“keyspace”: “attributes”,
“namespace”: “default”,
“spans”: [
{
“Exact”: true,
“Range”: {
“High”: [
“"/path/to/a/file"”
],
“Inclusion”: 3,
“Low”: [
“"/path/to/a/file"”
]
}
}
],
“using”: “gsi”
}
}
]
},
{
“#operator”: “Fetch”,
“keyspace”: “attributes”,
“namespace”: “default”
},
{
“#operator”: “Parallel”,
“~child”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Filter”,
“condition”: “(any item in (attributes.paths) satisfies (item = "/path/to/a/file") end and ((attributes._class) = "com.model.Entity"))”
},
{
“#operator”: “InitialProject”,
“result_terms”: [
{
“as”: “_ID”,
“expr”: “(meta(attributes).id)”
},
{
“as”: “_CAS”,
“expr”: “(meta(attributes).cas)”
},
{
“expr”: “attributes”,
“star”: true
}
]
},
{
“#operator”: “FinalProject”
}
]
}
}
]
},
“text”: “SELECT META(attributes).id AS _ID, META(attributes).cas AS _CAS, attributes.*\nFROM attributes \nWHERE \nANY item IN attributes.paths SATISFIES item = "/path/to/a/file" END \nAND _class = "com.model.Entity";”
}
]

I hope it helps.
Also you can read more about n1ql array indexing here:

Thanks
Prerna

monti.mauro · February 6, 2017, 7:42pm

@Prerna.Manaktala!.. Thanks a lot!..
I didn’t know that the ‘vars’ used in the index had to match what i have defined in the query.
Is always good to learn something new!

M.-