How would I batch-Upsert a bunch of data from .net client?

40-beta

#1

Suppose I have a data conversion process that is reading some SQL database, and pumping documents into Couchbase 4.0 beta cluster, so the ping time is a bit steep (28 msec versus say <1 msec if I was local).

Throughput is great when I have my data generation system local to the Couchbase cluster. When I’m pumping data from one network (say my office network), to another (say an azure store), I’m thinking I should probably just pump the data to a local cluster node, then export the data, and batch import it into the Azure store.

But I’m wondering if there is something better than either of these two ideas I haven’t considered. The flaw in my backup/restore idea is that I don’t think I can use it incrementally. The flaw in my big-for-loop-of-bucket.Upsert() calls, is that it’s ping-time limited (right now 28 msec http turnaround time limits me to 40 upserts a second).

I guess I could create a pool of workers, and maybe get 4 or 8 times more upserts per second even with 28 msec ping time, but maybe there’s ANOTHER alternative I haven’t looked at?


#2

@wpostma -

Have you tried increasing the max pool size and using the Task based upsert method? Something like this:

var items = new List<string>();
              for (int i = 0; i < 1000000; i++)
              {
                  items.Add("key" + i);
              }
              var tasks = new List<Task<IOperationResult<string>>>();
              items.ForEach(x => tasks.Add(bucket.UpsertAsync(x, x)));
              var results = await Task.WhenAll(tasks);

              foreach (var result in results)
              {
                  if (result.Success)
                  {
                      //process
                  }
              }

You could even partition the items into smaller lists and hand off to a pool of workers using single clients instances.

-Jeff


#3

That sounds much better, thanks