Storing time-series data


#1

We want to store time-series data in couchbase. At present we are using RRD for same.

I understand that by using dateToArray and group_level we can generate graphs for day, week, month etc.

This would be good if we have single item for e.g. network bandwidth but what if we have multiple items that will be added to system. Will we have to create separate views for each item e.g. cpu, disk, processes etc?

I am trying to think of a schema or view for this scenario, can someone give some pointers?


#2

Yes, for each metric you will need to create separate views unless they grouping them is meaningful.


#3

documents:

{“event”: “cpu”, value: 1, date: …}
{“event”: “memory”, value: 2, date: …}

Map:

function (doc, meta) {
emit(dateToArray(doc.date), [doc.event, doc.value]);
}

Reduce:

function (key, values, rereduce) {
var result = {};
if (!rereduce) {
values.forEach(function(arr) {
var event = arr[0], value = arr[1];
if (!result.hasOwnProperty(event)) {
result[event] = {count: 1, avg: arr[1]};
} else {
var count = result[event].count + 1,
avg = result[event].avg;

    avg += (value - avg) / count;

    result[event].count = count;
    result[event].avg = avg;
  }
});

} else {
values.forEach(function(res) {
for (var event in res) {
if (!result.hasOwnProperty(event)) {
result[event] = {count: res.count, avg: res.avg};
} else {
var count = result[event].count + res.count;
var avg = result[event].avg * result[event].count/count + res.avg * res.count/count;
result[event].count = count;
result[event].avg = avg;
}
}
});
}
return result;
}

Output:

{“rows”:[{
“key”: [2012, 03, 08],
“value”: {“cpu”: %avg_cpu_for_2012-03-08%, “memory”: %avg_memory_for_2012-03-08%}
}]}

Code might be wrong, but it shows the idea how to make it.

And by controlling group level, you can set resolution for average function.


#4

We have had some bad issues when doing to much work in the reduce like described here, and reach some memory limit that killed the map/reduce.

From http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-views-writing-reduce.html#couchbase-views-writing-reduce-custom ->

“the size limit on the composite structure within the reduce() function is 64KB.”

General advice from manual :

“The reduce() function is designed to reduce and summarize the data emitted during the map() phase of the process. It should only be used to summarize the data, and not to transform the output information or concatenate the information into a single structure.”


By the way, there is some great solutions available for storing timeseries

graphite is one of them and could be considered : http://graphite.wikidot.com/faq#toc0

It stores data, but allo has a full set of APIs to render the data (png, json, etc …)

Xavier