Guarantee of execution of eventing functions


#1

Hi!

On a theoretical level, so without assuming outages of any kind or broken network connections, are eventing functions (e.g. onUpdate) guaranteed 100% to run (get triggered at least once) for each document update? Or is there a maximum “load” that could limit their execution?

Thank you,
Alberto


#2

We have atleast-once semantics in place. Under rollback conditions, there will be duplicate events in some scenarios.

Also, do note that not all updates to a document are propagated onto the DCP - there is deduplication of updates that causes coalescing of such updates - i.e, if many updates happen to a document in . a short time window, only few of them will appear in the DCP.


#3

Yes, to expand what @venkat said, there is an “at least once” semantic in Eventing. The data service itself has a behaviour where if there are multiple updates to the same document, it is coalesced (de-duplicated) into single update giving the newest value. So that behaviour means that if a document is updated multiple times rapidly, the event handler will fire only on the latest value.

The reason why rapid mutations to the same document, with prior mutations overwriting the content of the older ones are deduplicated is for performance reasons. If your use case requires every mutation to be seen, regardless of how short it lives (for example: auditing changes), please let us know.

We can consider that as a possible enhancement to explore further.


#4

Thank you both for the explanation :slight_smile:

I do not have yet a scenario where I need to track every possible mutation, but I can see how it could be a required feature for some business logic.

The question then would be: what is the maximum delay that a function will experience before being triggered? For how long are the mutations of a document “accumulated”?


#5

It depends on size of queue of events for a Function. If deployed Function is slow in terms of execution time and events generated in source bucket are more than no. of events executed per second, then backlog of events that are yet to be processed will grow.

In Admin UI, a stat counter is present which captures size of events backlog which are waiting to get fired. As eventing provides at-least once model, events will eventually get fired from the backlog queue.

De-duplication is an artifact manifested by data service. Eventing and other services like index, FTS, views, XDCR etc are consuming events from data service. So all services notice this behavior from data service. De-duplication doesn’t affect other components as much as Eventing, because for the purpose of indexing/replication - it’s ok to index latest copy of data, without knowing previous revisions against that document.

Regarding how mutations are accumulated - it depends on how long data service keeps data snapshot open. Don’t want to bog you down with details of DCP. But if you’re interested, you could have a look Couchbase Connect session on DCP

We would be interested to know more about the use case you have, for which you’re exploring Eventing.


#6

Thanks for the extensive explanation!

Still one more question then:

then backlog of events that are yet to be processed will grow.

I wonder, does this backlog reside in the eventing service node? Does it have a maximum length, or dropping of old events?

We would be interested to know more about the use case you have, for which you’re exploring Eventing.

I’m still inspecting the capabilities of the eventing service to figure out which parts of business logic would be worth moving to it. The feature I’m more interested in is definitely the availability of CURL triggers, especially the ability of binding documents expire/deletion to a notification sent to the back end.

I’ll happily report back once I manage to set up some business logic and test it!

Thanks a lot!
Alberto


#7

Eventing does keep a queue of events that it should process next. Queue has two types of caps:
(a) Based on count of events - Default value 100K per worker
(b) Based on memory used by the queue - Default value 1GB per worker

If either of these cap is reached, Eventing would wait for enqueued events to get processed first before enqueuing next batch of events to the queue. As and when events get processed, via an ack mechanism - Eventing framework would update checkpoints to avoid reprocessing of same event.

Shared the limits above. Eventing doesn’t drop events and tries to provide at least once scheme for event execution. In case of Eventing node failure, new node would pick the processing from last checkpoints.