Possible bug: eventing function stuck on undeploying


#1

Hi, while playing with eventing functions, I managed to get a function in a state where it is marked as undeployed and paused (green tick), but it’s impossible to delete it (gray button) and deploy it (CB says the function is being undeployed).

I’m using CB 6.0.0-beta on Docker (Windows host).

Once I restarted CB, the function got marked as paused (orange tick) and could be deleted (only apparently, on page reload the function appears again).

I also previously managed to cause some weird desynchronization between the eventing service and the kvmeta one… can I send you my logs somewhere?

What I see in eventing.log is:

2018-09-02T07:56:13.037+00:00 [Error] util::ReadAppContent App: db_eventing_integration_test_testFn unmarshal failed for checksum
2018-09-02T07:56:13.037+00:00 [Error] Producer::metakvAppCallback [db_eventing_integration_test_testFn:0] Failed to lookup path: /eventing/apps/ from metakv, err: unexpected end of JSON input

#2

Can you let us know how many nodes are being used? and which services are being configured in each of the nodes?


#3

I’m testing on local machine, with CB running in Docker. So, single node, data/index/query/eventing services all together. The test is basic in any case, so no indexes are being generated and only the eventing service is tested.

After some times of executing the test, I noticed this issue started happening when:

  1. The function is bootstrapping.
  2. I POST multiple times the setting to undeploy the function.

#4

Hi,

Does any cbcollect attached to Eventing timer function takes long time to get triggered capture this issue? If not, would request share a cbcollect from the setup when you get chance.

Based on that, I could provide some suggestions.

Thanks.


#5

So, when I managed to see this issue I manually saved the logs folder. I’m not able to reproduce at the moment the issue, but I attached the folder content at the time.

logs_stuckonundeployed_mask.zip (732.6 KB)

If I’ll manage to reproduce the issue, I’ll also post the properly collected logs.


#6

Thanks, will review the logs and share update here.


#7

Oh, I managed to reproduce it!

Steps:

  • Create a function
  • Set setting to deploy/undeploy repetedly (e.g. 100 times, i % 2 == 0 ? deploy : undeploy)

Outcome:

  • Function is shown in CB UI as undeployed, paused.
  • Function cannot be deleted in UI, and REST API returns: ERR_APP_NOT_UNDEPLOYED.
  • Errors in eventing.log.

Logs: https://s3.amazonaws.com/cb-customers/Alberto+Marchetti/collectinfo-2018-09-04T182436-ns_1%40127.0.0.1.zip


#8

That seems unrealistic to do in real world i.e. you might not want to do deploy and undeploy every few seconds. Deploy and undeploy operations have cost associated to them.

Some of the overheads during deploy:

  • Get state of vbucket distribution across data nodes
  • Open on change streams from relevant data nodes for vbucket that they are hosting
  • Plan generated on eventing nodes to do even workload distribution across available nodes
  • Depending on the state of source bucket and feed_boundary for the eventing function - cost of streaming some/all items from disk on data nodes would be expensive.

During undeploy:

  • Again change streams are opened up from data nodes for metadata bucket to clear up all system related metadata.

That said, if you feel that is actual use-case necessitating frequent deploy & undeploy operations - please feel free to let us know. We could accordingly prioritize the request.

Thanks,
Abhishek


#9

I guess this is a really extreme brute force misusage of Couchbase that has no representation in the real world. This issue got caused mostly by testing (by not having an endpoint that told me when the function was still deploying), so I thing that with a new build this problem will be solved :slight_smile:


#10

Yes, with new builds we’ve exposed /api/v1/status that would summarize the state of all functions within the cluster.

Thanks.