Query views with fetchObjects parameter performance issue while indexing


#1

Hey,

We started working with views and json docs few months ago with couchbase server 2.2 and ,Net sdk 1.3.6.
Before that we only worked with binary data and without views.

As part of it, we created a benchmarks that checks the performance while getting data from views because we plan to work with them a lot in our applications.

The steps we did for the benchmarks:

  • We created on our dev environment new bucket with 2.5GB memory quota and 3 working thread without replica.
  • We created for each doc type a design document (because beside an “All” view, we have more views for each doc)
  • Created a process that do the next steps:
    1.Opened 200 threads to store values to the bucket which should be indexed on 5 different views (each thread do one store for each doc type)
  1. In parallel to the store above we ran 500 threads that each thread try to get all data from the “All” view we created for.

The results of those tests were bad.
When running it, the views query took 3-5 times more than query not in parallel to the store operations ( not while the indexer of the views is running).

An example to an “All” view:

function(doc, meta) { 
	 if (doc.type == "LevelExperience") { 
		 emit(null, null); 
	 } 
 }

An example to the views query:

var view = _client.GetView<T>(designDocName, viewName, fetchObjects);
                    return view.ToList();

#2

A couple of questions:

  1. Was the view published and what query parameters were you using? Especially, were you using stale=false?
  2. How many client instances did you have and did you experiment with different numbers here?

We know there are some things that are not very efficient about connection management in the .NET 1.x client. The 2.x client has has significant changes that get us to a much better place and a platform we can improve on. @jmorris may be able to add more here.

It could be another issue too, but I might try experimenting with the number of client instances first.


#3

@Chenos -

If your comparing the performance between views and binary operations (Set, Get, etc), views never be faster. Simply put binary operations are a much simpler and efficient way of storing and retrieving data than views; views offer more flexibility in terms of secondary indexes, Map Reduce and the ability to return a smaller “sub document” as a row from a larger document, thus reducing the amount of bytes sent over the network.

That being said, there are other factors which may contribute to lowering view performance as Matt mentions in above and client tuning via ServicePointManager which may help.


#4
  1. Yes. The benchmarks were activated on Production views. I used also Stale=Ok and even than the query time of the values took time.
  2. I had one client instance that opened threads with Parallel.For.

We are working with 1.3.6 because we prefer to use the releases versions that already on air some time.
Is 2 version release and on air enough time to be sure it do not have major bugs?
What are the changes that made 2.X sdk more efficient than 1.X (related to views) …can you list some?


#5

This is the first time we are working with json docs and views in couchbase.
We chose it because:

  1. You can actually see the doc values in the console and edit them.
  2. With views we can do the smaller “Sub document” queries as you said and it is much more efficient than implementing it with binary docs.

So, since we chose it, we prefer to work only with json docs and avoid mixing it with binary docs as much as we can.
I read in the couchbase documentation about views and saw some ways:

  • Increase MaxParallelIndexers
  • Change updateMinChanges and updateInterval of views indexes
  • Separate disks of data and views indexes
  • Use SSD for views

My major question is how to get to the proper values for the parameters above that will work best for me.
Is there a formula you can give or other suggestions?

Thanks,

Chen.


#6

@jmorris
Another thing.
We use ModelViewsFramework in our project but the DesignDocAttribute contradict the idea of having small amount of design docs. When using it, the design doc name inside the attribute have to be different for each class (i tried use common name in separate classes and it did not work).

Do you know another framework that automatically create views scripts or it is something every company have to develop?


#7

@Chenos -

So, since we chose it, we prefer to work only with json docs and avoid mixing it with binary docs as much as we can.

To clarify, you can use the binary protocol (Key/Value operations) with JSON; the data itself doesn’t need to be stored in binary. There are series of extensions methods to help you do this, look at Couchbase.Extensions.CouchbaseClientExtensions.cs.

We use ModelViewsFramework in our project but the DesignDocAttribute
contradict the idea of having small amount of design docs.

Yes, the ModelViewsFramework does indeed contradict the idea of having a small amount of design docs. It offers ease of use, but at the cost of potentially creating too many design docs and impacting indexing performance. I don’t know of another framework that exists that gives you exactly what you want with the exception of N1QL (which allows you to do ad-hoc queries without design docs or views), which is quite a ways off from being production ready.

Note that version Couchbase 3.0.0 and better offer much improved indexing and view performance. Since you are using Couchbase 2.2, the .NET 2.0 client is not compatible ATM; it expects CB 2.5 or better due to changes in cluster map configuration.

I had one client instance that opened threads with Parallel.For.

Your likely going to have to do some tuning here, first of all you are going to want to adjust MaxDegreeOfParallelism and use a partitioner. Additionally, the client uses the .NET WebClient internally which can be tuned by changing the ServicePointManager’s properties. Your going to want to increase the DefaultConnectionLimit, you can read more about that here.

-Jeff


#8

Thanks Jeff.

And …What about my question on the maxParallelIndexers and updateInterval tuning?

Is there a way to find the proper values for them?


#9

@Chenos maybe @tgreenstein can lend a hand here?