N1QL vs MapReduce Views with Sync-Gateway - Performance issues

ok. It it best to get all the IDs then do a bulk_fetch?

Should not be necessary, but we have seen that sometimes. Can you try the covering index?

Updating the Query to Select Id From…

This it the correct way to create a covering index?
Definition: CREATE INDEX over5_2 ON test_sw1(id) WHERE ((not ((meta().id) like “_sync:%”)) and ((_deleted = false) or (_deleted is missing)))

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "PrimaryScan",
          "index": "#primary",
          "keyspace": "test__sw1",
          "namespace": "default",
          "using": "gsi"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Fetch",
                "keyspace": "test__sw1",
                "namespace": "default"
              },
              {
                "#operator": "Filter",
                "condition": "(((not ((meta(`test__sw1`).`id`) like \"_sync:%\")) and ((`test__sw1`.`type`) = \"user\")) and (((`test__sw1`.`_deleted`) = false) or ((`test__sw1`.`_deleted`) is missing)))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "(`test__sw1`.`id`)"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT id FROM `test__sw1` USE INDEX(over5_2) where  meta().id not like '_sync:%' and type = 'user' and (_deleted = false or _deleted IS MISSING)"
  }
]

It is SELECT META().id, not SELECT id.

A covering index means the index includes all the data in the query. So you can select other fields, if you add them to end of the index.

Can you post the EXPLAIN output and execution time for this query.

EXPLAIN SELECT META().id
FROM test_sw1 USE INDEX (over5)
WHERE meta().id NOT LIKE 'sync:%' AND type = 'user' AND (_deleted = FALSE OR _deleted IS MISSING)
;
[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "PrimaryScan",
          "index": "#primary",
          "keyspace": "test_sw1",
          "namespace": "default",
          "using": "gsi"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Fetch",
                "keyspace": "test_sw1",
                "namespace": "default"
              },
              {
                "#operator": "Filter",
                "condition": "(((not ((meta(`test_sw1`).`id`) like \"sync:%\")) and ((`test_sw1`.`type`) = \"user\")) and (((`test_sw1`.`_deleted`) = false) or ((`test_sw1`.`_deleted`) is missing)))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "(meta(`test_sw1`).`id`)"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT META().id\nFROM test_sw1 USE INDEX (over5)\nWHERE meta().id NOT LIKE 'sync:%' AND type = 'user' AND (_deleted = FALSE OR _deleted IS MISSING)\n;"
  }
]

You changed

NOT LIKE ‘_sync’

to

NOT LIKE ‘sync’.

You need to be consistent between the index and the query. You need to see IndexScan and cover() in the EXPLAIN output. And then you need to post the execution time from actually running the query…

That was copied from the prev Post

Here is the output and I see the cover now

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan",
          "covers": [
            "cover ((`test_sw1`.`type`))",
            "cover ((meta(`test_sw1`).`id`))",
            "cover ((`test_sw1`.`_deleted`))",
            "cover ((meta(`test_sw1`).`id`))"
          ],
          "filter_covers": {
            "cover ((not ((meta(`test_sw1`).`id`) like \"_sync:%\")))": true
          },
          "index": "over5",
          "index_id": "62110fa20a745e1d",
          "keyspace": "test_sw1",
          "namespace": "default",
          "spans": [
            {
              "Range": {
                "High": [
                  "successor(\"user\")"
                ],
                "Inclusion": 1,
                "Low": [
                  "\"user\""
                ]
              }
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Filter",
                "condition": "((cover ((not ((meta(`test_sw1`).`id`) like \"_sync:%\"))) and (cover ((`test_sw1`.`type`)) = \"user\")) and ((cover ((`test_sw1`.`_deleted`)) = false) or (cover ((`test_sw1`.`_deleted`)) is missing)))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "cover ((meta(`test_sw1`).`id`))"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT META().id\nFROM test_sw1 USE INDEX (over5)\nWHERE meta().id NOT LIKE '_sync:%' AND type = 'user' AND (_deleted = FALSE OR _deleted IS MISSING)\n;"
  }
]

Ok. Now you need to remove the word “EXPLAIN” and run the query, and then post the metrics here. Please avoid copy-paste errors and use that latest query.

Thanks.

View - 1.91s
N1QL - 1.04s

I will be testing more complex views and N1QL.

1 Like

Another question on Joins on how to better improve this query? What kind of Indexing should be needed?

Some notes:

  1. usr.locations is a hash and we need to see if the key exists, this can be different
  2. pa.residentId == usr.id
  3. I get the same results as my view.

Query
EXPLAIN Select usr.id as userId, pa.id as paId
FROM sync_gateway_sw1 pa INNER JOIN sync_gateway_sw1 usr ON KEYS pa.residentId
WHERE meta().id NOT LIKE ‘_sync:%’ AND (meta()._deleted = FALSE OR meta()._deleted IS MISSING) AND usr.type = ‘user’ AND
usr.resident = true
and usr.locations.a3a2921e-96d9-45c9-aeb2-cc83410ad9c7 IS NOT NULL
and pa.type = ‘place-association’ and pa.locationId = ‘a3a2921e-96d9-45c9-aeb2-cc83410ad9c7’

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan",
          "index": "type_ix",
          "index_id": "71e5ae4863f63694",
          "keyspace": "sync_gateway_sw1",
          "namespace": "default",
          "spans": [
            {
              "Range": {
                "High": [
                  "\"place-association\""
                ],
                "Inclusion": 3,
                "Low": [
                  "\"place-association\""
                ]
              }
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Fetch",
                "as": "pa",
                "keyspace": "sync_gateway_sw1",
                "namespace": "default"
              },
              {
                "#operator": "Join",
                "as": "usr",
                "keyspace": "sync_gateway_sw1",
                "namespace": "default",
                "on_keys": "(`pa`.`residentId`)"
              },
              {
                "#operator": "Filter",
                "condition": "(((((((not ((meta().`id`) like \"_sync:%\")) and (((meta().`_deleted`) = false) or ((meta().`_deleted`) is missing))) and ((`usr`.`type`) = \"user\")) and ((`usr`.`resident`) = true)) and (((`usr`.`locations`).`a3a2921e-96d9-45c9-aeb2-cc83410ad9c7`) is not null)) and ((`pa`.`type`) = \"place-association\")) and ((`pa`.`locationId`) = \"a3a2921e-96d9-45c9-aeb2-cc83410ad9c7\"))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "as": "userId",
                    "expr": "(`usr`.`id`)"
                  },
                  {
                    "as": "paId",
                    "expr": "(`pa`.`id`)"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "Select usr.id as userId, pa.id as paId\nFROM sync_gateway_sw1 pa INNER JOIN sync_gateway_sw1 usr ON KEYS pa.residentId\nWHERE meta().id NOT LIKE '_sync:%' AND (meta()._deleted = FALSE OR meta()._deleted IS MISSING)  AND usr.type = 'user' AND\nusr.resident = true\nand usr.locations.`a3a2921e-96d9-45c9-aeb2-cc83410ad9c7` IS NOT NULL\nand pa.type = 'place-association' and pa.locationId = 'a3a2921e-96d9-45c9-aeb2-cc83410ad9c7'"
  }
]

I would suggest two changes:

(1) Create an index on pa ( locationId, type, id, residentId ). You should see cover() in your EXPLAIN.

(2) Change all occurrences of META() to META(pa).

I am having trouble creating the index for pa

I have tried

CREATE INDEX pa2_idx ON pa:sync_gateway_sw1(locations, type, id, residentId)
[
  {
    "code": 12003,
    "msg": "Keyspace not found keyspace sync_gateway_sw1 - cause: No bucket named sync_gateway_sw1",
    "query_from_user": "CREATE INDEX pa2_idx ON pa:sync_gateway_sw1(locations, type, id, residentId)"
  }
]

AND

CREATE INDEX pa2_idx ON sync_gateway_sw1:pa(locations, type, id, residentId)

Hi,

Your index needs to match your query WHERE clause. Can you go back to the index and query that were working as covering indexes, and then make the changes incrementally, checking after each change?

It is a programming exercise, so you have to be precise.

Trying to improve the performance of this query

Select id
FROM sync_gateway_sw1 use index(resident8_idx)
WHERE meta().id NOT LIKE '_sync:%' 
AND (_deleted = FALSE OR _deleted IS MISSING)  
AND type = 'user' and resident=true 
AND ANY l IN OBJECT_NAMES(locations) SATISFIES l = '25486355-dfdd-49df-888a-abe123791bd2'  END
limit 100

Using Index

create index resident8_idx ON sync_gateway_sw1(type,meta().id, resident, DISTINCT ARRAY i FOR i IN OBJECT_NAMES(locations)  END)
WHERE meta().id NOT LIKE '_sync:%' AND (_deleted = FALSE OR _deleted IS MISSING) and resident=true
AND type = 'user'

EXPLAIN

[
{
“plan”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Sequence”,
“~children”: [
{
#operator”: “DistinctScan”,
“scan”: {
#operator”: “IndexScan”,
“index”: “resident8_idx”,
“index_id”: “d614cbbce45b4ed0”,
“keyspace”: “sync_gateway_sw1”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“successor("user")”
],
“Inclusion”: 1,
“Low”: [
“"user"”
]
}
}
],
“using”: “gsi”
}
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Fetch”,
“keyspace”: “sync_gateway_sw1”,
“namespace”: “default”
},
{
#operator”: “Filter”,
“condition”: “(((((not ((meta(sync_gateway_sw1).id) like "_sync:%")) and (((sync_gateway_sw1._deleted) = false) or ((sync_gateway_sw1._deleted) is missing))) and ((sync_gateway_sw1.type) = "user")) and ((sync_gateway_sw1.resident) = true)) and any l in object_names((sync_gateway_sw1.locations)) satisfies (l = "25486355-dfdd-49df-888a-abe123791bd2") end)”
},
{
#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “(sync_gateway_sw1.id)”
}
]
},
{
#operator”: “FinalProject”
}
]
}
}
]
},
{
#operator”: “Limit”,
“expr”: “100”
}
]
},
“text”: “Select id\nFROM sync_gateway_sw1 use index(resident8_idx)\nWHERE meta().id NOT LIKE ‘_sync:%’ \nAND (_deleted = FALSE OR _deleted IS MISSING) \nAND type = ‘user’ and resident=true \nAND ANY l IN OBJECT_NAMES(locations) SATISFIES l = ‘25486355-dfdd-49df-888a-abe123791bd2’ END\nlimit 100”
}
]

The locations is a hash and need to see specific key exists.
It takes 1.29s with 55 results

If I remove the following from the query

AND ANY l IN OBJECT_NAMES(locations) SATISFIES l = ‘25486355-dfdd-49df-888a-abe123791bd2’ END
It takes 83 ms with 109 results (because of no filter)

I do improve this?

Yes, you can improve the performance significantly. Keep the ANY predicate in the query. Look up array indexing in N1QL. There are documents and examples you can find. Play around with array indexing, and when you have it working, you can post here and we will make any additional suggestions.

When creating the index I already used “DISTINCT ARRAY i FOR i IN OBJECT_NAMES(locations)”

Your index uses variable i, and your query uses variable l. You need to use the same variable in both.

Changing I to i change the execution time to 1.3s

Testing more, even with covered indexes (See explain) N1QL is slower. I have tested with other views and n1ql and produced similar results. Any idea on how to improve it? I even tried to return just the meta().id in the N1QL and

“view_duration = 0.332000941 rows=221”
“n1ql_duration = 0.691879602 rows=221”

Current View

function (doc, meta) {
    if (meta.id && (meta.id.indexOf('_sync') === -1) && !doc._deleted) {
        if (doc.type === 'change-log-entry') {
            if( doc.changedAt && doc.changeType === 'task-create' && doc.changeInfo && doc.changeInfo.details) {
                var normalizeDate = new Date(doc.changedAt);
                var isEmergency = !!doc.changeInfo.details.isEmergency;
                emit([doc.location, normalizeDate.toISOString()],  {
                    'taskId': doc.primaryTarget,
                    'isEmergency':isEmergency});
            }
        }
    }
}

N1QL

Index

CREATE INDEX `change_log_task_create_v1` ON `sync_gateway_sw1`(`type`,(meta().`id`),`_deleted`,softDelete, `location`,`changeType`,`changedAt`, changeInfo.details.isEmergency)
WHERE ((((((not ((meta().`id`) like "_sync:%")) and ((`_deleted` = false) or (`_deleted` is missing))) AND (softDelete is MISSING or softDelete = false) and (`type` = "change-log-entry")) and
(`changeType` = "task-create")) and (`location` is not null)) and (`changedAt` is not null))
and changeInfo is NOT NULL and changeInfo.details is NOT NULL

Query

select meta().id,location, changeInfo.details.isEmergency
FROM sync_gateway_sw1 use index(change_log_task_create_v1)
WHERE meta().id NOT LIKE ‘_sync:%’
AND (_deleted = FALSE OR _deleted IS MISSING)
AND (softDelete is MISSING or softDelete = false)
AND type = ‘change-log-entry’ and changeType = ‘task-create’
and location is not null and changedAt is not null
and location = ‘79497d97-3eb6-400c-8ed0-d48d28d327cb’
and changedAt between ‘2015-08-22’ and ‘2016-08-22’
and changeInfo is NOT NULL and changeInfo.details is NOT NULL

EXPLAIN

[
{
“plan”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “IndexScan”,
“covers”: [
“cover ((sync_gateway_sw1.type))”,
“cover ((meta(sync_gateway_sw1).id))”,
“cover ((sync_gateway_sw1._deleted))”,
“cover ((sync_gateway_sw1.softDelete))”,
“cover ((sync_gateway_sw1.location))”,
“cover ((sync_gateway_sw1.changeType))”,
“cover ((sync_gateway_sw1.changedAt))”,
“cover ((((sync_gateway_sw1.changeInfo).details).isEmergency))”,
“cover ((meta(sync_gateway_sw1).id))”
],
“filter_covers”: {
“cover ((((sync_gateway_sw1.changeInfo).details) is not null))”: true,
“cover (((sync_gateway_sw1.changeInfo) is not null))”: true,
“cover (((sync_gateway_sw1.changedAt) is not null))”: true,
“cover (((sync_gateway_sw1.location) is not null))”: true,
“cover ((sync_gateway_sw1.changeType))”: “task-create”,
“cover ((sync_gateway_sw1.type))”: “change-log-entry”,
“cover ((not ((meta(sync_gateway_sw1).id) like "_sync:%")))”: true
},
“index”: “change_log_task_create_v1”,
“index_id”: “1d6cfba2866ddb20”,
“keyspace”: “sync_gateway_sw1”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“successor("change-log-entry")”
],
“Inclusion”: 1,
“Low”: [
“"change-log-entry"”
]
}
}
],
“using”: “gsi”
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Filter”,
“condition”: “((((((((((cover ((not ((meta(sync_gateway_sw1).id) like "_sync:%"))) and ((cover ((sync_gateway_sw1._deleted)) = false) or (cover ((sync_gateway_sw1._deleted)) is missing))) and ((cover ((sync_gateway_sw1.softDelete)) is missing) or (cover ((sync_gateway_sw1.softDelete)) = false))) and (cover ((sync_gateway_sw1.type)) = "change-log-entry")) and (cover ((sync_gateway_sw1.changeType)) = "task-create")) and cover (((sync_gateway_sw1.location) is not null))) and cover (((sync_gateway_sw1.changedAt) is not null))) and (cover ((sync_gateway_sw1.location)) = "79497d97-3eb6-400c-8ed0-d48d28d327cb")) and (cover ((sync_gateway_sw1.changedAt)) between "2015-08-22" and "2016-08-22")) and cover (((sync_gateway_sw1.changeInfo) is not null))) and cover ((((sync_gateway_sw1.changeInfo).details) is not null)))”
},
{
#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “cover ((meta(sync_gateway_sw1).id))”
},
{
“expr”: “cover ((sync_gateway_sw1.location))”
},
{
“expr”: “cover ((((sync_gateway_sw1.changeInfo).details).isEmergency))”
}
]
},
{
#operator”: “FinalProject”
}
]
}
}
]
},
“text”: “select meta().id,location, changeInfo.details.isEmergency\nFROM sync_gateway_sw1 use index(change_log_task_create_v1)\nWHERE meta().id NOT LIKE ‘_sync:%’ \nAND (_deleted = FALSE OR _deleted IS MISSING) \nAND (softDelete is MISSING or softDelete = false) \nAND type = ‘change-log-entry’ and changeType = ‘task-create’\nand location is not null and changedAt is not null\nand location = ‘79497d97-3eb6-400c-8ed0-d48d28d327cb’\nand changedAt between ‘2015-08-22’ and ‘2016-08-22’\nand changeInfo is NOT NULL and changeInfo.details is NOT NULL”
}
]

METRICS:

“view_duration = 0.332000941 rows=221”
“n1ql_duration = 0.691879602 rows=221”

Remove type, meta().id, _deleted, softDelete, etc from the index KEY definition…
you already have in the WHERE clause of create index.

You need to to include things like _chagedat, location, changeInfo.details.isEmergency in the index key.
Right now, the index scan isn’t very selective.