Couchbase for aggregating Telecom CDRs


#1

Hello

I am new to NoSQL and checking if it is good to move to it from mysql.

Mostly horizontal growing and replication across different servers/sites is what i liked as it is a problem in our current solution , but still got some worries about queries .

My company work in telecom and on daily bases we got around 6 million CDR Call Detail Record (document) , after saving these record to database we need to extract reports by aggregating the data .

Each record got different properties for example : DateTime,Customer,Vendor,Duration,Country,Operator,Type,…

What is the best way to get this aggregated data (Sum of duration),(Count of CDRs),(Count Per Type) with different criteria combination .

For example how can i create a view that will get me the result of Customer A and Vendor B at the same time .

Is this where n1ql come , if yes should we do warehousing as we do in MySQL as the table will have more than 100 million record ?

Is their any pre aggregation as i read in MangoDB

Best Regards
Shahbour


#2

Hi @shahbour, great to hear that you are evaluating different solutions.

Horizontal scalability with replication and performance is definitely the sweet spot of Couchbase Server, so you’ve started your investigations in the right place.

If you have a write rate of 6 million documents per day, this (let’s assume its more or less evenly distributed) is around 70 inserts per second, so a very very moderate rate. Couchbase Server can handle writes in the hundreds of thousands of ops/s. For example check this recent blog post out if you are interested: http://blog.couchbase.com/pure-storage-and-couchbase.

So my assumption is that you will be more concerned about the number of documents and querying capabilities than the insert rate (?). Note that while dependent on the actual document size and your scaling requirements, 100m documents doesn’t sound like a lot to me.

The first thing you need to make sure is that you store the data as JSON if you want to query it (this holds true for both Views and the upcoming N1QL). If you just want to use Couchbase Server as the firehose and do the analysis somewhere else, you can do that with any type of data (even some arbitrary binary data).

Views are great for aggregations like you describe, for counting and sums we even have pre-defined reduce functions to make your life easier. What you need to do is write those views upfront, then they are stored and every document inserted is processed. This is done in a distributed fashion, so adding more nodes also provides linear scalability.

To give you actual queries, we need some sample data so we can write a simple view with a use case.
What “pre aggregation” in MongoDB are you referring to?

Maybe for you it would be a great opportunity to get in touch with our sales and presales folks, they have resources to help you with your use case and give you the best experience to go from an idea/PoC to a finished application that you’re happy with :slight_smile: We already have customers in the same area, so this is definitely something we can help you with (http://www.couchbase.com/contact).


#3

Hello

Sorry for late reply but i wanted to provide some data and now i have them .

Below is a sample of my data

{
   "disconnectDate": "01/03/2015",
   "recordSequenceNumber": "1986102064",
   "egressSignalingType": "00",
   "egressIPCircuitEndPoints": "46.17.72.40:12440/0.0.0.0:0",
   "routeSelectedEgressTrunkGroupName": "TG_XXXX_UK_IP",
   "routeSelected": "LONXXX01:TG_XXXX_UK_IP",
   "callingNumber": "4164824896",
   "vendorId": "199",
   "billingNumber": "6473084200",
   "dialedNumberNOA": "02",
   "routeSelectedEgressGateway": "LONXXX01",
   "psxIndex": "1",
   "psxProcessingTime": "06",
   "calledNumber": "6473084200",
   "terminatedWithScript": "0",
   "egressCodecType": "",
   "callServiceDuration": 0,
   "ingressSignalingType": "012",
   "callDisconnectReasonTXEgress": "0",
   "ingressPSTNCircuitEndPoints": "",
   "vendorName": "XXXXX",
   "callDisconnectReason": "041",
   "customerName": "EAD",
   "disconnectInitiator": "02",
   "serviceProvider": "WHOLESALE",
   "scriptName": "TANDEM",
   "selectedRouteType": "7",
   "callDisconnectReasonTXIngress": "0503",
   "startDate": "01/03/2015",
   "egressRemoteSignalingIPAddr": "XXXXXXX",
   "chargeFlag": "0",
   "routeSelectedEgressData": "LONGSX01:TG_XXXX_UK_IP",
   "routeIndexUsed": "01",
   "ingressCodecType": "",
   "dialedNumber": "6473084200",
   "egressLocalSignalingIPAddr": "XXXXXXX",
   "ingressTrunkGroupName": "TG_XXX_UK_IP",
   "callDisconnectLocation": "9",
   "overloadStatus": "0",
   "customerId": "279",
   "startTime": "00:14:19.8",
   "gsxCallID": "0x7E0D77E9",
   "calledPartyNOA": "03",
   "egressProtocolVariantSpecData": "SIP,2114811881_130391831@XXXXXX,<sip:4164824896@46.17.72.7:5060>;tag=gK0d4d737a,<sip:6473084200@27.111.15.22:5060>,0,,,,sip:6473084200@27.111.15.22:5060,,,,sip:4164824896@XXXXXXX5060,,,,,,503,,0,0,,0,0,,,,,,,,1,0,0,0,,,,",
   "callingPartyNOA": "03",
   "egressTrunkGroupName": "TG_XXXX_UK_IP",
   "ingressIPCircuitEndPoints": "XXXXXXXX:22324/91.220.75.175:28686",
   "gatewayName": "LONGSX01",
   "timeElapsedRXAlert": "00",
   "recordType": "ATTEMPT",
   "disconnectTime": "00:14:20.2",
   "routeAttemptNumber": "1",
   "ingressRemoteSignalingIPAddr": "XXXXXXXX",
   "ingressLocalSignalingIPAddr": "XXXXXXXX",
   "timeElapsedDiscRXCompofCall": "30",
   "callSetupDelay": "22,327,3,352",
   "incomingCallingNumberNOA": "02",
   "callingName": "",
   "routeLabel": "RL_WS_64",
   "egressPSTNCircuitEndPoints": "",
   "timeElapsedSetupMsgRXLastCallRteAtt": "20",
   "incomingCallingNumber": "4164824896",
   "ingressProtocolVariantSpecData": "SIP,154200731-1-376759418@91.220.75.174,<sip:4164824896@XXXXXXXX>;tag=sansay336316389rdb3714,<sip:6473084200@XXXXXX>;tag=gK0dcd7228,0,,,,sip:6473084200@46.17.72.7:5060,4164824896@91.220.75.175,,,sip:4164824896@XXXXXXXX:5060,,,,,,503,,0,0,,0,0,,,,,,,,1,0,0,0,,,,",
   "timeElapsedRXPSXRsp": "10"
}

i created a view like this

Map

function (doc, meta) {
if(doc.recordType == “ATTEMPT” || doc.recordType == ‘STOP’) {
emit(doc.startDate + ‘-’ + doc.customerId + ‘-’ + doc.vendorId, [doc.recordType, doc.callServiceDuration]);
}
}

reduce

function(key, values, rereduce) {
var result = {attempts: 0, connected: 0, duration: 0};
for(i=0; i < values.length; i++) {
result.attempts = result.attempts + 1
if(values[i][0] == ‘STOP’) {
result.connected == result.connected + 1;
result.duration = result.duration + values[i][1];
}
}
return(result);
}

The output of the view reduced is some thing similar to the below

{“rows”:[
{“key”:“01/02/2015-12-3”,“value”:{“attempts”:1,“connected”:0,“duration”:1004590}},
{“key”:“01/02/2015-128-4”,“value”:{“attempts”:1,“connected”:1,“duration”:2261840}},
{“key”:“01/02/2015-13-2”,“value”:{“attempts”:3,“connected”:2,“duration”:3975750}},
{“key”:“01/02/2015-147-230”,“value”:{“attempts”:5,“connected”:0,“duration”:7071290}},
{“key”:“01/02/2015-15-50”,“value”:{“attempts”:40,“connected”:0,“duration”:40725270}},
{“key”:“01/02/2015-150-1”,“value”:{“attempts”:11,“connected”:1,“duration”:17223900}},
{“key”:“01/02/2015-156-12”,“value”:{“attempts”:4,“connected”:1,“duration”:5573920}},
{“key”:“01/02/2015-164-13”,“value”:{“attempts”:1,“connected”:0,“duration”:642840}},
{“key”:“01/02/2015-169-1”,“value”:{“attempts”:3,“connected”:0,“duration”:6826640}},
{“key”:“01/02/2015-186-4”,“value”:{“attempts”:1,“connected”:0,“duration”:2051030}}
]
}

This is very good for now , almost every thing i do when importing - warehousing using relational database .

Still querying the view , for example i want to get sum of all (attempts,connected,duration) for customerid 150 and day 01/02/2015 -> the key will be 01/02/2015-150 or sum of data for vendorid 50 .

For the second query i can’t use grouping level as in my key the customer is before vendorid and i need all customer .

I simplified the key here to 3 fields (date,customer,vendor) while in reality it is 7 fields .

Is N1QL useful here and does it use views indexing or it has its own index ?

BR
Shahbour


#4

N1QL would definitely be useful here and currently the developer previews use map reduced based views.