Optimize UPDATE ... UNSET N1QL

maroon5 · May 21, 2020, 6:14pm

Hello,

I’m trying to optimize N similar queries that are updating > 100 million records (field rename). Currently these queries are very slow, as it’s required to process all records for each of the update query.

Is there a way to rename a set of fields based on different conditions in one scan?

The queries are looking similar to the following examples:

UPDATE `bucket` SET specialization.Doctor = specialization.DoctorName 
UNSET specialization.DoctorName 
WHERE (_type="SpecTypeA" OR _type="SpecTypeB") 
AND ISARRAY(specialization.DoctorName) == TRUE

UPDATE `bucket` SET specialization.Nurse = specialization.NurseName 
UNSET specialization.NurseName 
WHERE (_type="SpecTypeA" OR _type="SpecTypeB") 
AND ISARRAY(specialization.NurseName) == TRUE

....

UPDATE `bucket` SET specialization.Patient = specialization.PatientName 
UNSET specialization.PatientName 
WHERE (_type="SpecTypeA" OR _type="SpecTypeB") 
AND ISARRAY(specialization.PatientName) == TRUE

vsr1 · May 21, 2020, 8:47pm

Make sure index WHERE exactly present in the query (due to complexity)

CREATE INDEX ix1 ON default(META().id) WHERE _type IN ["SpecTypeA", "SpecTypeB"] AND
        TRUE IN [ISARRAY(specialization.DoctorName), ISARRAY(specialization.NurseName), ISARRAY(specialization.PatientName)];

UPDATE `default` AS d
SET d.specialization = OBJECT_CONCAT( (OBJECT v.name:v.val
                                        FOR v IN OBJECT_PAIRS(d.specialization)
                                        WHEN NOT (ISARRAY(v.val) AND v.name IN ["DoctorName", "NurseName", "PatientName"])
                                        END),
                                        (OBJECT SUBSTR(v.name, 0, POS(v.name, "Name")):v.val
                                        FOR v IN OBJECT_PAIRS(d.specialization)
                                        WHEN ISARRAY(v.val) AND v.name IN ["DoctorName", "NurseName", "PatientName"]
                                        END) )
WHERE d._type IN ["SpecTypeA", "SpecTypeB"]
      AND TRUE IN [ISARRAY(d.specialization.DoctorName), ISARRAY(d.specialization.NurseName), ISARRAY(d.specialization.PatientName)]
      AND META(d).id LIKE "a%" ;

OR

CREATE INDEX ix2 ON default(META().id) WHERE _type IN ["SpecTypeA", "SpecTypeB"] AND
        TRUE IN ARRAY ISARRAY(specialization.[v]) FOR v IN ["DoctorName", "NurseName", "PatientName"] END;

UPDATE `default` AS d
SET d.specialization = (OBJECT (CASE WHEN ISARRAY(v.val) AND v.name IN ["DoctorName", "NurseName", "PatientName"]
                                    THEN SUBSTR(v.name, 0, POS(v.name, "Name"))
                                    ELSE v.name END):v.val
                        FOR v IN OBJECT_PAIRS(d.specialization)
                        END)
WHERE d._type IN ["SpecTypeA", "SpecTypeB"]
      AND TRUE IN ARRAY ISARRAY(d.specialization.[v]) FOR v IN ["DoctorName", "NurseName", "PatientName"] END
      AND META(d).id LIKE "a%" ;

You can run multiple updates different ranges of META(d).id LIKE “a%” . Expand the query for other names.
Even you can use LIMIT on update. Even on error repeat same UPDATE because once index caught up it will not have successful renamed document s it will not mutate again (If needed use scan_consistency)
You can also use technique below too (you can get last mutated document key by doing RETURNING META().id on UPDATE).

As you do many documents
Another alternative is with above index do following covered query and get document keys and use SDK fetch documents from directly from KV and mutate asynchronously

SELECT RAW META(d).id 
FROM default AS d
WHERE d._type IN ["SpecTypeA", "SpecTypeB"]
          AND TRUE IN [ISARRAY(d.specialization.DoctorName), ISARRAY(d.specialization.NurseName), ISARRAY(d.specialization.PatientName)]
          AND META(d).id > "" ;