N1QL CBQ Update million records is not working

As @vsr1 indicated Eventing can provide a high performance solution:

I addition I would encourage you to look at Function: fixEmailDomains | Couchbase Docs it should be close to the exact Eventing code that you are looking for.

Hi,
As we are using Couchbase server 5.1, I dont think fixEmailDomains is available.

Right you are Eventing was introduced in version 5.5.

In the 6.5 release, the handler code can directly mutate (or write back) to the source bucket, prior to 6.5 you needed to use an intermediate bucket (or write the KEYs to an intermediate bucket and let eventing use that as it source to update the true target).

I dont see Eventing Service with Couchbase server 5.1

You can also update as chunks. and repeat with request_plus

    UPDATE `user-bucket` SET emailId = REPLACE(emailId, "xxmail.com", "yymail.com")
    WHERE REVERSE(emailId) LIKE  RVERSE("%xxmail.com") LIMIT 100000;

Hi,
I tried as you said, and the update time is huge. To update 10k records, its taking more than 2minutes. Below indexes are created in the bucket before executing update script:

CREATE INDEX USERID-IDX ON user-bucket(emailId) WHERE REVERSE(emailId) LIKE REVERSE(“%xxmail.com”);
CREATE INDEX USERIDIQ-IDX ON user-bucket(emailId) WHERE REVERSE(emailId) LIKE REVERSE(“%yymail.com”);

post the explain plan.

> {
>   "plan": {
>     "#operator": "Sequence",
>     "~children": [
>       {
>         "#operator": "IndexScan2",
>         "index": "USERID-IDX",
>         "index_id": "fa3a305f87cade40",
>         "index_projection": {
>           "primary_key": true
>         },
>         "keyspace": "user-bucket",
>         "namespace": "default",
>         "spans": [
>           {
>             "range": [
>               {
>                 "inclusion": 0,
>                 "low": "null"
>               }
>             ]
>           }
>         ],
>         "using": "gsi"
>       },
>       {
>         "#operator": "Sequence",
>         "~children": [
>           {
>             "#operator": "Parallel",
>             "~child": {
>               "#operator": "Sequence",
>               "~children": [
>                 {
>                   "#operator": "Fetch",
>                   "keyspace": "user-bucket",
>                   "namespace": "default"
>                 },
>                 {
>                   "#operator": "Filter",
>                   "condition": "(reverse((`user-bucket`.`emailId`)) like reverse(\"%xxmail.com\"))"
>                 }
>               ]
>             }
>           },
>           {
>             "#operator": "Limit",
>             "expr": "10000"
>           },
>           {
>             "#operator": "Parallel",
>             "~child": {
>               "#operator": "Sequence",
>               "~children": [
>                 {
>                   "#operator": "Clone"
>                 },
>                 {
>                   "#operator": "Set",
>                   "set_terms": [
>                     {
>                       "path": "(`user-bucket`.`emailId`)",
>                       "value": "replace((`user-bucket`.`emailId`), \"xxmail.com\", \"yymail.com\")"
>                     }
>                   ]
>                 },
>                 {
>                   "#operator": "SendUpdate",
>                   "alias": "user-bucket",
>                   "keyspace": "user-bucket",
>                   "limit": "10000",
>                   "namespace": "default"
>                 }
>               ]
>             }
>           }
>         ]
>       },
>       {
>         "#operator": "Discard"
>       }
>     ]
>   },
>   "text": "UPDATE `user-bucket` SET emailId = REPLACE(emailId, \"xxmail.com\", \"yymail.com\") WHERE REVERSE(emailId) LIKE  REVERSE(\"%xxmail.com\") LIMIT 10000"
> }

Couchbase team,
Any findings on the explain plan?

Plan looks correct and it using right index.

Then why its taking more than 2minutes for 10k records? Any fixes please.

N1QL update is full document update.
As REQUEST_PLUS. It needs wait Index catch upto latest mutation.
Then get a keys. get document and update one by one. N1QL update is full document update.

Can you please advise me on the performance approach to this?

You can try this.

Get the document keys using following query.
Use KV reactive/async calls using subdoc get emailid and mutate. You can do that in parallel.

SELECT RAW META(d).id
FROM `user-bucket` AS d
WHERE REVERSE(d.emailId) LIKE  REVERSE("%xxmail.com") ;

subdoc is helping me in completing my task .But I am getting DocumentTooDeepException for multiple documents.
com.couchbase.client.java.error.subdoc.DocumentTooDeepException: JSON is too deep in document D1D0AA2C42628D474D8EDF18012E5D1E

How do i handle such case? Do I need to retry the operation again?

@ismail.iqbal.ap,

SDKs are out of my expertise.

I assume this is one time job, so skip those document and continue. At the end when index caught up you run the same query. You will have few skipped documents retry again or handle that manually or outside.

cc @graham.pople, @david.nault

Here is my piece of code:

private static void createOrUpdateObjects(List docIDs, AsyncBucket asyncBucket) {

  Observable
          .from(docIDs)
          .flatMap(docID -> {
          	DocumentFragment<Lookup> userResult = asyncBucket.lookupIn(docID)
                      .get("emailId")
                      .execute()
                      .retryWhen(RetryBuilder.anyOf(DocumentTooDeepException.class, BackpressureException.class)
	                            .delay(Delay.exponential(TimeUnit.SECONDS, 3))
	                            .max(RETRY_COUNT)
	                            .build())
                      .doOnError(e -> System.out.println(docID+": "+e.getMessage()))
                      .onErrorResumeNext(Observable.empty())
                      .toBlocking().singleOrDefault(null);

              String userId = userResult.content(0).toString();
              userId = userId.replaceAll("@xxmail.com", "@yymail.com");
              	
                return asyncBucket.mutateIn(docID)
                        .replace("emailId", userId)
                        .execute()
                        .retryWhen(RetryBuilder.anyOf(DocumentTooDeepException.class, BackpressureException.class)
	                            .delay(Delay.exponential(TimeUnit.SECONDS, 3))
	                            .max(RETRY_COUNT)
	                            .build())
                        .doOnError(e -> System.out.println(docID+": "+e.getMessage()))
                        .onErrorResumeNext(Observable.empty());
          })
          .toList()
          .toBlocking()
          .single();

}

Hi @ismail.iqbal.ap

DocumentTooDeepException is a server error indicating that either your Sub-Document path is > 32 elements long “A.B.C.D.E…”, or that your document itself has 32 levels of JSON objects “{“l1”:{“l2”:{“l3”:{…”. Going from your code, it must be the latter. Retrying the error shouldn’t have any effect here, you will need to modify your document structure accordingly.

Yes. I can understand that the json is kind of big in my case. But the value(emailId) am trying to update in the outer level . Why its trying to traverse all levels?

image

Hi @ismail.iqbal.ap

I know you are using an older version of Couchbase, 5.1, but perhaps it is time to consider an upgrade. As the Eventing Service can truly solve your issue at hand. Below I make a test document 40 levels deep (via perl script) and give a simple Eventing function to update an email address at level 40 (a deep nesting).

Running my Eventing function (with 12 workers) on a single 12 phys core box 2,4GHz max boost (non MDS all services are on this box) I process

I get 29,985 items per second where I change all 1M items in a my test set I just created. Obviously a real cluster should perform better. Please compare this to your prior non-Eventing 10K tests in Java (above) which crawl along at under 100 items per second.

You can run this Eventing function stand alone in version 6.5+ -or- if you upgrade you could do what @graham.pople suggests then just use Eventing as a “point tool” to do the final cleanup to avoid any manual work.

** Perl to make a single test document**

#!/usr/bin/perl
# make a deep 40 level document and put in one EMAIL pattern and one other item.
my $max=40;
sub indent {
    $cnt = $_[0];
    for (my $i=1; $i<=$cnt; $i++) {
        printf STDOUT " ";
    }
}
print STDOUT "{\n";
for ($i=1; $i<=$max; $i++) {
    &indent($i);
    printf STDOUT "\"a" . $i . "\": {\n";
    if ($i == $max) {
        &indent($i+1);
        printf STDOUT "\"emailId\": \"ismail.iqbal.ap\@xxmail.com\",\n";
        &indent($i+1);
        printf STDOUT "\"locale\": \"en_US\"\n";
    }
}
for ($i=$max; $i>=1; $i--) {
    &indent($i);
    printf STDOUT "}\n";
}
printf STDOUT "}\n";

Test Document

{
 "a1": {
  "a2": {
   "a3": {
    "a4": {
     "a5": {
      "a6": {
       "a7": {
        "a8": {
         "a9": {
          "a10": {
           "a11": {
            "a12": {
             "a13": {
              "a14": {
               "a15": {
                "a16": {
                 "a17": {
                  "a18": {
                   "a19": {
                    "a20": {
                     "a21": {
                      "a22": {
                       "a23": {
                        "a24": {
                         "a25": {
                          "a26": {
                           "a27": {
                            "a28": {
                             "a29": {
                              "a30": {
                               "a31": {
                                "a32": {
                                 "a33": {
                                  "a34": {
                                   "a35": {
                                    "a36": {
                                     "a37": {
                                      "a38": {
                                       "a39": {
                                        "a40": {
                                         "emailId": "ismail.iqbal.ap@xxmail.com",
                                         "locale": "en_US"
                                        }
                                       }
                                      }
                                     }
                                    }
                                   }
                                  }
                                 }
                                }
                               }
                              }
                             }
                            }
                           }
                          }
                         }
                        }
                       }
                      }
                     }
                    }
                   }
                  }
                 }
                }
               }
              }
             }
            }
           }
          }
         }
        }
       }
      }
     }
    }
   }
  }
 }
}

Eventing Code

function OnUpdate(doc, meta) {
    var debug = true;
    if (debug) log('1 level 39',doc
        .a1.a2.a3.a4.a5.a6.a7.a8.a9.a10
        .a11.a12.a13.a14.a15.a16.a17.a18.a19.a20
        .a21.a22.a23.a24.a25.a26.a27.a28.a29.a30
        .a31.a32.a33.a34.a35.a36.a37.a38.a39);  
    if (debug) log('2 ALTER emailId');
    // derefrence to make easier to work with
    var lvl40_hdl = doc
        .a1.a2.a3.a4.a5.a6.a7.a8.a9.a10
        .a11.a12.a13.a14.a15.a16.a17.a18.a19.a20
        .a21.a22.a23.a24.a25.a26.a27.a28.a29.a30
        .a31.a32.a33.a34.a35.a36.a37.a38.a39.a40;
    // read the property     
    var oldVal = lvl40_hdl.emailId;
    // change @xxx.com to @yyy.com
    var newVal = oldVal.replace("@xxmail.com", "@yymail.com");
    if (oldVal === newVal) {
        // nothing to doc
        if (debug) log("meta.id",meta.id,"emailId unchanged",oldVal);
        return;
    }
    // update the local doc
    lvl40_hdl.emailId = newVal;
    if (debug) log('3 level 39',doc
        .a1.a2.a3.a4.a5.a6.a7.a8.a9.a10
        .a11.a12.a13.a14.a15.a16.a17.a18.a19.a20
        .a21.a22.a23.a24.a25.a26.a27.a28.a29.a30
        .a31.a32.a33.a34.a35.a36.a37.a38.a39);
    // write back to the source bucket 6.5+ via a binging to 
    // the alias src_bkt this will update the document.
    src_bkt[meta.id] = doc;
}