AWS Lambda Node Timeout Issue

Hi

I have a simple AWS Lambda script running that uses the Node SDK (3.0) to do an insert.

Everything works fine for about 2 hours, then I get the following unhandled promise rejection error:

Error: LCB_ERR_TIMEOUT (201): A request cannot be completed until the user-defined timeout fired

Re-deploying the lambda function solves the problem for another 2 hours, then the same problem with the same error re-occurs.

In the lambda function, I’m creating references to my cluster, bucket and collection at the top of the function, outside of the handler function - this is recommended as far as I can see.

I can’t see anyway to ‘release’ the connection once the function concludes.

I’m not really sure what’s going on here…

Can anyone help please?

Just to say - since this first post yesterday, I’ve gone through a lot of stuff, none of it has made any difference.

The Lambda function, once started, stays ‘alive’ between invocations - all global variables are maintained, including the couchbase DB cluster connection, and the resulting bucket and collection references.

Each subsequent invocation uses the existing connection, without having to re-open a new connection.

I’m setting context.callbackWaitsForEmptyEventLoop = false which helps the invocation exit properly.

The error 201 I’m getting from couchbase seems to imply that it’s waiting for something to be concluded, but there are no user defined timeouts that I’m aware of?

Some further details.

I’m deploying the code to the lambda function, and it runs perfectly for about 2 hours, roughly 82 invocations based around 1 container.

Then Lambda dumps the container (perfectly normal) and a new container is created, obviously based around the same nodejs code. This time however, right from the first invocation, I’m getting the LCB_ERR_TIMEOUT error.

I’m wracking my brains, but can’t see why this would happen.

Is couchbase running out of connections? Is the old connection somehow being held open with no timeout? Can I just ‘hack’ this and open and close the connection with every invocation? I’m using SDK3.0 and there doesn’t seem to be any command for closing a cluster connection?

It might be AWS related. They introduced some new networking for Lambda functions

There’s new idle connection timeout introduced of 350 seconds (~6 minutes). So if the connection was to be idle for 350 seconds or more, lambda would purge the connection. So any attempt to use that connection on an invocation after 350 seconds or so can trigger connection resets. By default Couchbase SDK heartbeat keep alive is set at 2 hours. You might try reducing TcpKeepAliveTime to a more reasonable timeframe, such as 30 seconds.

That’s interesting, thank you.

How do I change this setting? Is it a server setting or something done when the cluster connection is established? I’m using the node sdk v3.0

I couldn’t find TcpKeepAlive for node.js. It’s definitely there for Java and .Net SDKs client settings. In node.js 2.6 sdk you could specify some timeout settings for operations.

https://docs.couchbase.com/nodejs-sdk/2.6/client-settings.html.

The 201 error is coming from the libcouchbase C library, so enabling logging for the app might be the best route to troubleshoot the issue.

https://docs.couchbase.com/nodejs-sdk/2.6/handling-error-sample-code.html#monitoring

Thanks - it would be really helpful to understand fully what this error means.

What is being referred to by “the user-defined timeout” ?

When I do a bucket.insert, I’m putting a “timeout:10000” option into the request? Is this the ‘user defined timeout’?

@mat01, Thanks for making this post, I was having this issue intermittently for a long time, but your post led me in the right direction. The source of the error (Although really poorly documented and written) is covered here: https://docs.couchbase.com/nodejs-sdk/2.6/client-settings.html (I am also using 3.0) but its the same.

" If lcb_get_bootstrap_status is returning with LCB_ETIMEDOUT and you are running on a slow network, modifying this setting may increase the chances of success."

I appended my connection string
couchbase://test.com/bucketName?config_total_timeout=25 and was able to fix it. Looks like its couchbase client failing to get connection config from server in time. You may have a connectivity issue reaching your couchbase server

1 Like

Hey Jeff, thank you so much for this.

Really helpful.