Making a map view distinct


#1

Hi,

I’ve actually asked the same question on google groups (https://groups.google.com/forum/?hl=en-GB#!topic/couchbase/Q_iVbL6eqig), but I’ll ask it here as well.

I have this problem with a view and/or rereduce. Its hard to explain what it need to do, but let my try:
I have to historical data, that back in time have some attributes that I need to summarise and aggregate to end up with a nice looking graph as an end result.

So my documents look something like this:

{
“id”: “20111003-140324-053-VODK0200000760010001.xml”,
“assetId”: “1234”,
“date”: 20121003140324052,
“url”: “https://blah.dk/sfsdf”,
“platform”: “TEST”,
“lastModified”: “2011-11-12T08:55:23.181Z”,
“licenseStart”: “2011-11-15T00:00:00.000+02:00”,
“licenseEnd”: “2013-09-14T22:58:58.000+02:00”,
“availability”: “public”
}

So what I need in the end is a graph (in graphite) that shows me which of these assets (documents) that in a given time have been “public” available within the license dates and for each “platform".
I need to summarise this for e.g for each month or day (prob. too much). My data goes back 2-4 years and I have about a million documents in the bucket.

For this I have done a view function (with help fro, Tug), that can emit the platform and dates in an array.

function (doc, meta) {
if (meta.type == “json”) {
if (doc.platform && doc.availability ===‘public’){
var startDate = new Date(doc.licenseStart);
var endDate = new Date(doc.licenseEnd);

          for (var d = startDate; d <= endDate; d.setDate(d.getDate() + 1)) {
            var dateAsArray = dateToArray(d); 
            dateAsArray.unshift(doc.platform);
	emit( dateAsArray  );    
          }	
          
          
    }

}
}

This is all great and good I will give me somewhat what I need, but I actually end up with duplicates in my map and thus the wrong end result.
This is because I am storing all the old documents in the different versions back in history.

This means that a document with id “20111003-140324-053-VODK0200000760010001.xml”, (with e.g. assetId: 1234) can be in couchbase several times in different states with different keys of course (but it will have the same assetId). This means that duplicates are present. Is there a way to make the view “distinct” over assetId? So that each row with the same assetId only counts once?

I can’t really see how. So Ive written a really dumb rereduce function:

function(key, values, rereduce) {
var count = 0;
var uniqueList = [];
var unique = true;

if (!rereduce){
return values.length;
}

for (var i=0; i < values.length ; i++) {
for (var j=0; j < uniqueList.length; j++) {
if (values[i] === uniqueList[j]) {
unique = false;
break;
}
}

if(unique) {
	uniqueList.push(values[i]);
	count++;
}

}
return count;
}

Its really ugly looking code and NOT optimal, and I am not even sure that it will work. Does anybody have an idea how to make my map unique when it comes to “assetId”???
Somewhat like DISTINCT in the SQL world. :slight_smile:

Any ideas?

/Steffen


#2

You reduce function is one approach.