No Index available (for nested object value update)

dramamine · September 11, 2019, 1:14pm

Hi guys,
I’m trying to solve this myself but figured I should ask here in case someone knows what I’m doing wrong quicker than I do.

I have a document that looks like this:

{
  "id": "guid",
  "attributes": {
    "startPath": "string",
    "endPath": "string",
    "startChar": "0",
    "endChar": "0"
  },
  "clientModified": "string",
  "serverModified": "string",
}

I need to update these documents’ nested object “attributes”, and change the names inside the dictionary. Instead of startPath and endPath, we want startSpan and endSpan. My updatequery looks like this:

UPDATE MyBucket
SET attributes.startSpan = attributes.startPath, attributes.endSpan = attributes.endPath
UNSET attributes.startPath,attributes.endPath
;

I cannot execute this without creating an appropriate index, and I don’t want to create a PRIMARY INDEX because I think those are the heaviest indexes and it would take a long time (we have hundreds of millions of documents).
Creating a PRIMARY INDEX on MyBucket lets the query work, but I want to create as minimal of an index as possible. I am not sure what I need inside the index.

I have been trying different indexes on those values like
CREATE INDEXidx_fix_spans1ONMyBucket(attributes.startPath, attributes.endPath) OR
CREATE INDEXidx_fix_spans2ONMyBucket(attributes) OR
CREATE INDEXidx_fix_spans3ONMyBucket(attributes, attributes.startPath, attributes.endPath) OR
CREATE INDEXidx_fix_spans4ONMyBucket(attributes, attributes.startPath, attributes.endPath, attributes.startSpan, attributes.endSpan)

So far, no luck. Anyone have any insight into what my index should look like?
FYI: Our intention is to use this index for this update query, then drop it after the update runs.

EDIT: Does updating a field in a document mean couchbase is re-saving the whole doc, and thus couchbase absolutely needs a PRIMARY INDEX for this functionality?

vsr1 · September 11, 2019, 4:34pm

CREATE INDEX  ix1 ON MyBucket(attributes.startPath);
UPDATE MyBucket
SET attributes.startSpan = attributes.startPath, attributes.endSpan = attributes.endPath
UNSET attributes.startPath,attributes.endPath
WHERE attributes.startPath IS NOT MISSING AND attributes.endPath IS NOT MISSING;

This even protects, rerun the same query will not mutate documents already updated again and override value.

N1QL UPDATE Fetches whole document and re-write whole document, It is important to add predicate avoid unnecessary mutations. As mutations are expensive and takes cluster resources.

UPDATE default d SET d.address.zipcode = 91234 WHERE ......
===>  UPDATE default d SET d.address.zipcode = 91234 WHERE ...... AND  d.address.zipcode != 91234;

dramamine · September 11, 2019, 5:04pm

Thanks very much for your reply vsr1!

I have some hesitation about running this update on production, since we are dealing with very many documents (hundreds of millions).
Is it possible that this update could time out when executed? If so, would we just keep executing again?

vsr1 · September 11, 2019, 5:09pm

There are couple of options to avoid timeout

Add LIMIT 10000 and rerun query in loop. May have to request_plus
Use Keyset pagination technique with document key https://blog.couchbase.com/offset-keyset-pagination-n1ql-query-couchbase/

CREATE INDEX ix1 ON MyBucket(attributes.startPath, attributes.endPath, META().id);

UPDATE MyBucket
SET attributes.startSpan = attributes.startPath, attributes.endSpan = attributes.endPath
UNSET attributes.startPath,attributes.endPath
WHERE attributes.startPath IS NOT MISSING AND attributes.endPath IS NOT MISSING AND META().id > $id
LIMIT 10000
RETURNING META().id;

RUN update in the loop start with id = “” and next iteration use last id returned by UPDATE. As this UPDATE you should try on testdata before doing on production.