Question : Cached count


#1

I am creating a tenant based service , each tenant has a dashboard that show some stats like user count
How can I have a cached count of users per tenant? I don’t want couch each time

What is the best way? I have 100s thousands tenants and 100s millions user


Views and count distinct
#2

How about each tenant store user count as atomic counter. https://docs.couchbase.com/server/4.0/developer-guide/counters.html


#3

@vsr1 , Is there easier way?
How can I ensure consistency? my counter must be eventually consist , What must I do if count of users and atomic counter get out of sync , for instance I add user but failed to increment counter


#4

The javascript views would work well for this.

Views retain inner data structures in the index on the counted items that qualify within the tree. These are built asynchronously. So, when you query the aggregation, you’ll get a cached copy of the count from the most recent update.


#5

@ingenthr Can you show me an example?


#6

@ingenthr reply please


#7

@ingenthr I apologize for mention, But I need help , help please


#8

Sorry for the delay, just a quick note to say I saw this and will get you an example shortly.


#9

Given a set of documents which represent users where the key has the userid and tenant ID embedded like this…

key: u0:t0

{
  "tenant": 0,
  "user": 0,
  "name": "Groucho"
}

And then given a View with this index code that matches the keys to a pattern, grabbing the IDs out and emitting an entry for each tenant ID into the results…

function (doc, meta) {
  var re = new RegExp('t(.):u(.)');
  matched = re.exec(meta.id);
  
  if (matched.length >0) {
    emit("tenant" + matched[1], null); // the tenant matched
  }
}

… and given a reduce using the built in _count. A query against this view would give you the count of users in tenant0 and tenant1. The query string using my browser against a bucket test (but I’d use an SDK in the app, making sure I specify to “group”) is http://localhost:8092/test/_design/dev_agg/_view/usercount?limit=6&stale=false&connection_timeout=60000&inclusive_end=true&skip=0&full_set=true&group=true:

{"rows":[
{"key":"tenant0","value":2},
{"key":"tenant1","value":2}
]
}

My bucket had four users across two tenants and I get a count of two from each.

I intentionally made this one quite compact. It depends on being able to determine a user from the key, but you could also just match on a field of the document if you didn’t want to use the key pattern. Just change the logic in the function to emit whatever tenant keys and values make sense in your resulting dataset. This should perform quite well as it will only re-perform the count aggregation upon request and if you specify a range to the view query, it’ll only recalculate the count for that tenant.


#10

As I understand , the reduce function executed each time I run query? Right?
If the reduce function runs each time I have 2 issue

  1. How can I get count for tenant0 I dont want calculate for all tenants,How can I filter?
  2. Assume a tenant has millions user , if reducer need to runs each time and it must count millions documents , I think it is not optimized

#11

No. The views engine will execute it only when data is changed and will store summaries on the interior of the index, so the cost is minimal.

You can specify ranges when querying the view. That’ll constrain it to the count of the range of interest, even if that’s just one value. See the docs on view querying.

The reduce will only be run for a subset containing changed documents.

I might recommend running a quick benchmark to prove it to yourself. Add a million docs. Count the amount of time it takes to run the first view query. Then count subsequent requests. Then change a subset of docs randomly… maybe 5%… then run a view query.

I expect you’ll see initial high cost, then subsequent low cost as the aggregation is summarized in the index for a subset of the data. As long as it isn’t changing quickly, even a stale=false request shouldn’t be too expensive.


#12

@ingenthr how can I group count by day?

{
  "tenant": 0,
  "user": 0,
  "name": "Groucho",
  "registerDate": 1552112926
}

I want to get count per day per tenant


#13

Help needed!!!..