Session storage with events (aka timers, aka destructors)

paf · February 11, 2016, 12:13pm

Colleagues,
we’re using Couchbase in live production for about 5 years now.
We have different clusters with different tasks.
One of those is session storage.
We store about 10M of sessions, and update them at 1K requests/second/node.

One of problems we’re facing is that we need to do some actions if session is abandoned.
Like destructor.
Generally speaking, to do some thing with a session, when we feel it’s time to do that.

Our current approach is to

accumulate all actions we plan to do at a certain one second in future and stuff IDs of those in one APPEND operation to a special $BUCKET.TIMEOUT, where number of second (since EPOCH) works as a key.
scan timeline, one second at a time, and process a block of 1000 events from each second at a time. GETL, extract 1000 events, CAS whatever events left, repeat. This works from several client nodes.

There are problems with this approach, will spare everyone from detailing.

Upon 2 weeks of investigations into internals, we feel that the following approach may be convenient.
I don’t feel this is ready for jira issue yet – wanted to discuss it first.
I welcome everybody’s comments. Especially those of dear Creators, but not limited to!

Add SDK/memcached protocol a new command
TIMER key time
It will add that information to metadata.
Once a certain time, a special pager will visit all entries (like exp_pager_stime infrastructure), and…
generate an event
$bucket $key $time happened
and put that event to… some external message queue broker.
IP of that broker to be configurable via web.

Client application could subscribe to those events.

MINUS: these timers would not tick very precise, only not oftener than pager will reach an item
MINUS: integration with external piece of software is required, and different people might prefer different protocols
PLUS: we continue to have one storage=Couchbase “all in one place, data and scheduled events inside” and do not have to invent a supporting timer-infrastructure.

paf · February 11, 2016, 12:21pm

Our investigation shows that something can be done with Views infrastructure.
Timer points could be stored inside JSON document,
and a View function could look at those and insert those points into view’s index.

A client can then query a view
?startkey=last_time_client_processed_someething&endkey=now

This approach is feasible, but it costs.
Our investigations show that for our use-case, tested with realistic data, and on life-like hardware,
it’ll cost us about 1.77 increase in CPU usage (not to count increased storage requirements).

We’re thinking about that, but it seems to be more costly than suggested in topic.

paf · February 11, 2016, 12:22pm

Same can be done with new GSI indexes.
Same idea.
But CPU cost with those, our tests show, is even worse: about 3x increase in CPU usage.

paf · February 11, 2016, 12:25pm

We’re also considering
TAP streams and
DCP streams

To accumulate copy of knowledge of those session events, and process that knowledge separately,
while still considering Couchbase as as “master storage”.

But that also costs: this time network bandwidth Because it’ll involve 100% of bytes to be transferred, while only a fraction of those really needed.

This hints at GSI approach with its projector+indexer, but that looks not to be ready for our needs.

perry · February 12, 2016, 5:41pm

Hey PAF, although I’m not PM, I would tend to shy away from your first suggestion because of the challenges with making a feature generic enough for the masses while still functional for your needs…without getting into over-customization for very specific cases.

As you pointed out, other methods would incur some extra costs but in reality, any feature or additional capability is going to come with a similar tradeoff. It just depends on where that cost is being incurred, either time/space/CPU/manpower/etc.

Today, using a view or GSI for this seems like the most natural way and gives you a nice balance between simplicity and flexibility. At 1k/sec/node you’re probably okay for either method, GSI will give tend to give you a faster query response whereas views will tend to give you faster updating (though either can be scaled to meet the other…again, depends on resources).

I might also suggest that you look at our Kafka plugin which lets you stream the incoming results out of Couchbase and do “whatever” processing you want on them. It will avoid you having to poke around with TAP/DCP yourself (which I would strongly discourage against). If there are certain features or improvements that we can build into that connector/plugin to make it more generally useful, that will be very well received. Your point about the network bandwidth is already known and being looked at. I can’t promise anything in the near-term, but it’s something we’re looking at improving.

Hope that helps…

Perry

paf · February 15, 2016, 9:47am

Hi, Perry.
I knew this would be first natural reaction, that’s exactly I why I put my figures up front:

views - CPU hungry (1.77x)
GSI – CPU super-hungry (3x in our life-like tests).
We simply can not afford such expenditure
We can not spend those resources for events.
Those are just extra $$ our Customer is not willing to pay = we’d have [to continue] to wiggle and squirm to make timed events continue to work.

My point is that events are natural thing for a session-storage.
And that Couchbase is very convenient and reliable storage, as we know from 5 year experience.
And I feel it’s high time it would gain timer support, which God knows is easy to add and it’ll not eat resources:

storing timed event inside existing meta-data infrastructure;
triggering events by paging infrastructure (similar to existing expiration pager, which does not eat CPU at all in our cases (keyword: exp_pager_stime));
will not spend much manpower/CPU.
Nowhere near “similar” as you put it.
Also network consumption will be small, only IDs…

Maybe there are other options on reporting events.
Suggested above is just one option.

My current feeling is that CPU consumption is tossed to wolves (=features)

current implementation of GSI, though beautiful architecturally, proved to be much slower than views. Not to speak of other minuses for us (like https://issues.couchbase.com/browse/MB-18016);
same sad story with dcp_proxy approach, when about 50% resources were plainly next-to-wasted (see https://issues.couchbase.com/browse/MB-17896).

Maybe our PMs need to refresh memory of the maxim that “conservation of resources is also a feature”.

yours,
PAF

venkat · June 22, 2018, 9:03am

Hi PAF,

We are launching Eventing Service in Couchbase 5.5 release. The Beta build is already available on the website.

There is an example(Example#2) which shows how to manage expiries.

Documentation : https://developer.couchbase.com/documentation/server/5.5/eventing/eventing-overview.html

Would highly recommend if you could take it for a spin and let us know of your thoughts/feedback.

venkat · August 9, 2018, 7:20am

Eventing Service is now GA with Couchbase Server 5.5.

Overview and Demo Talk : https://www.youtube.com/watch?v=SXa6PJEuaHY
Products Page : https://www.couchbase.com/products/eventing
Blog: https://blog.couchbase.com/eventing/
Documentation : https://developer.couchbase.com/documentation/server/5.5/eventing/eventing-overview.html

paf · August 28, 2018, 9:54am

For Google sake: original APPEND approach has small problem (easy to patch):
http://jira.teligent.ru/browse/CB-51

venkat · August 28, 2018, 10:30am

We also introduced Timers in 6.0 Beta
https://blog.couchbase.com/timers-couchbase-functions/

paf · August 28, 2018, 11:56am

Which was noted right away and very appreciated.
We’re considering to switch from our home-grown patch to use those.