Views + Tests?

piotrb · January 3, 2013, 12:10am

Hey Guys,

We’re trying out CouchBase 2.0 and we’re trying to write a few tests against views, it seems that its very difficult to keep the views consistant between runs, views end up either not having the data thats being tested or still having data from the previous test in them.

Is there a best practice on how to test against CouchBase views in such a way that you can predictably depend on given output?

Thanks,
Piotr

tgrall · January 4, 2013, 12:00am

Hello Piotr

Could you tell us how you call the view? (sample code of your view and client)

Views/Index are based on the data stored on the disk. So you need to check that:

your items are stored on the disk if could be done form the client library? (which SDK/Language are you using?)
use the Stale.FALSE to be sure that the index is used to update the index before responding to the client,

Regards
Tug

piotrb · January 4, 2013, 12:16am

Using the Ruby sdk, along with the couchbase-model gem.

Its a few diffrent issues, so there is no specific code to show here.

We do call stale false where possible. We also use flushing of the whole bucket where possible.

Is there a way to iterate over all the keys of a bucket without using a view? (that would be one way to ensure that we can get a clean slate), we currently use a view for that and that could be unreliable.

Is there a way to wait until all pending writes have completed, so we can do a stale.false and know its going to find all the records in question?

Thanks

piotrb · January 7, 2013, 12:01am

Is it safe to assume that data get persisted to disk in the order it was written?

Ie, could I simply do one more insert into the db on the db side, observe that, once its on disk (the assumption here is that everything else should on disk too), re-run the index, and then delete all the keys?
(this is specifically on our test environment, where there would be a couchbase server running on each test host, so there is no issue with hosts being async from each other).

Same idea could be applied to the db “warm up”, insert data, add an extra control value to the db, observe that last insert, once its on disk, the indexes should be consistant with stale: false

scalabl3 · January 7, 2013, 12:02am

Good question, and I am not sure of the answer as to whether the disk write queue is chronological. It has many optimizations, including reduction of multiple changes into a single write. For instance, let’s say I change document x 1000 times, once it gets to the top of the disk write queue, only the latest change at the time it’s at the top gets written, not all 1000. Of course depending on disk i/o speed, it could reach the top of the queue multiple times in that 1000 operations.

The way Indexing works: first you create/change a document -> RAM Cache -> Disk Write Queue (and goes to Replicas) -> Persisted to Disk -> Indexers (Design Documents) which update Indexes (Views)
When you read the key (get operations) it comes from RAM Cache unless there is a Cache miss (meaning it’s not in RAM because RAM was exhausted and it was evicted).

Since you are primarily testing, you could do an Observe on every key, then execute a stale=false. That ensures every data item has made it to disk before calling the Indexer. Of course, that’s not as common of a requirement, and using Observe on every key can increase latency in high throughput applications as disk writes will always be behind RAM speed.

scalabl3 · January 7, 2013, 12:03am

I am assuming that you are either deleting or adding data between test runs, which is why you are checking for consistency. It’s a good idea to immediately query the view after loading/changing data with either stale=update_after or stale=false. In that case you know that the index has been updated. You can create a callback after the view query to run the tests. I am not sure what the tests are testing, so I am guessing here.

Flushing between tests is one good way to be sure old data/indexed data is gone.

Stale=false is a good way to trigger the indexers to update, but used for every query isn’t the best practice as it constantly triggers the indexers. It also increases latency for returned results as it has to update the index first.

You can use stale=update_after to the same effect by querying the view immediately after loading data, that triggers the indexers to update. Then query the view after that, however there will be a time gap between querying the view the first time and the second time for the index to be updated. How long that gap is depends on many factors including CPU, how horizontal the cluster is scaled compared to data size (i.e. number of nodes), network and disk speed. Essentially that is the same time gap experienced with stale=false, except it’s more asynchronous.

The other could be by using Observe on the last keys being set/added/replaced, observe has a callback to let you know it has been persisted to disk and/or replicas. Then after observe comes back querying a view with stale=false will trigger the indexers.

Let me add, you were asking if there was a way to iterate through keys, it depends on the keys, if they have any random elements (ex: GUID/UUID), then you need to either use a View or Elastic Search.

The default map function of:

function (doc, meta) {
emit(meta.id, null)
}

is a primary index on keys. That’s what you should use for listing all keys. If you literally just want ALL the keys and don’t need to do any querying whatsoever, no range of keys, etc. then, since the key is always associated with the row itself as well, you can do:

function (doc, meta) {
emit(null, null)
}

and each row still has the key associated with it, but it literally stores no data except a reference to the key.

scalabl3 · January 7, 2013, 12:04am

Yes, as long as you aren’t modifying the same document you just created, which will change it’s order in the queue, the observe should be an indicator that everything has gone to disk…

piotrb · January 7, 2013, 12:09am

Yes, but it sounds like you’re still saying that the last operation will basically run last, to attaching an observe to it would make sense if you’re trying to wait for the server have drained its queue fully right?

scalabl3 · January 7, 2013, 12:11am

So I just double checked, and if you are only creating documents and not replacing/modifying them, it’s likely the they will be written in order as well, however, if you are creating and modifying the same documents repeatedly, it will change the order of the queue, which is what my understanding was.