Memory allocation for Views double the view content?!

00christian00 · April 21, 2015, 8:28am

I was looking at the couchnode source code on GIT to understand how views are handled and I noticed that the view is actually all streamed to the client as JSON string and then parsed, so if I do some query on a view without specifying limits I need to have enough free RAM for at least 2 times the whole view key + value?

So the best way to limit memory allocation is split large queries to smaller batch of datas.
Let’s say for example I need to query 10.000.000 of docs, to be sure I won’t run out of ram I must query them 10.000 at the time for example?

martinesmann · April 21, 2015, 1:07pm

Hi 00christian00
If you query the entire view and want that available on the client, yes then you would need RAM to fit the materialised items in memory, with that said, the content is streamed to the client and therefore not send as one big json chunk but in smaller pieces.

And yes you are correct, spitting up queries is the best way, you can read this blog post for more details on how to get the best performance:
http://blog.couchbase.com/pagination-couchbase

Thanks
Martin

mnunberg · April 21, 2015, 10:17pm

This would also depend on the client and the specific options available. Some clients (Python and Ruby do this; node.js may do this in the future, and I believe Java also already does this) provide an “iterator” interface to allow you to incrementally process data as it’s received off the network.

00christian00 · April 21, 2015, 11:25pm

Thank you both guys.
I think the node js can also do it already but there is no way to attach to the event that is fired yet. The source have this for ViewQueryResponse.prototype._bindReq :

if (events.EventEmitter.listenerCount(self, 'row')) {
    var resultMeta = null;
    var p = new JsonParser();
    p.onValue = function (value) {
      if (this.stack.length === 0) {
        resultMeta = {
          total_rows: value.total_rows
        };
        self.emit('rows', value.rows, resultMeta);
      } else if (this.stack.length === 2 && this.stack[1].key === 'rows') {
        self.emit('row', value);
      }
    };
    resp.on('data', function (data) {
      p.write(data);

But the viewqueryresponse _bindReq is called on the query method and there is no way to attach to the event “row” before that function call.

mnunberg · April 22, 2015, 1:21am

couchnode currently does not use a streaming parser. It will use the built in streaming parser in libcouchbase for the next release.

See https://github.com/brett19/couchnode/commit/f912ea88b4872cfba1f7ccd5a1b121b1fcf835c3

00christian00 · April 22, 2015, 6:53am

Looks good!
Had a look at the code, it seem that it still require all the docs to be in memory.
Can you add a check to see if there is any method subscribed to the “rows” event and if not do not save the docs in the array?

mnunberg · April 22, 2015, 3:07pm

@brett19 – is it available in this new version?

brett19 · April 22, 2015, 3:24pm

Hey Guys,

The old version of the Node.js SDK (pre-2.0.7) was actually capable of streaming the data rather than cacheing the whole result set first. This was achieved by not providing a callback to the .query method, and instead subscribing to the ‘row’ event that the returned EventEmitter provides (the SDK did not cache all rows unless someone actually was subscribing to an event that needed that).

Note however that as of 2.0.8 released today, we use libcouchbase’s row streaming, but there is a bug that causes it to cache all of the rows unlike the logic from 2.0.7. I should be able to fix this for 2.0.9 which will be released on the first Tuesday of next month. Would you care to open a JSCBC for this?

Cheers, Brett

00christian00 · April 23, 2015, 8:15am

Thanks Brett,
I did it:
https://issues.couchbase.com/browse/JSCBC-228