Random Idle Timeouts SDK Connection Keep-Alive

Hello,

We have an application hosted in Azure, and it seems there is a conflict between how the SDK handles keeping connections open, and Azure’s firewall.

As I understand it (or at least what I see), the SDK keeps connections open for as long as an apache process is kept alive. The problem is if the process is kept alive for longer than 4 minutes, the connection is black-holed by Azure, and any subsequent request is blocked for a certain period of time.

Im experiencing this now and it is becoming problematic as you can imagine. When the timeout occurs, I get these typical errors:

[Sat Aug 05 01:43:00.843975 2017] [php7:notice] [pid 3848] [client 127.0.0.1:45630] [cb,EROR] (cccp L:162 I:1) <NOHOST:NOPORT> Could not get configuration: LCB_ETIMEDOUT (0x17)
[Sat Aug 05 01:43:01.342937 2017] [php7:notice] [pid 3848] [client 127.0.0.1:45630] [cb,WARN] (server L:484 I:1) <my.server:11210> (SRV=0x55db520a7bd0,IX=0) Failing command (pkt=0x55db520a8970, opaque=110, opcode=0xb5) with error LCB_ETIMEDOUT (0x17)
[Sat Aug 05 01:43:01.343023 2017] [php7:notice] [pid 3848] [client 127.0.0.1:45630] [cb,EROR] (server L:559 I:1) <my.server:11210> (SRV=0x55db520a7bd0,IX=0) Server timed out. Some commands have failed
[Sat Aug 05 01:43:03.843580 2017] [php7:notice] [pid 3848] [client 127.0.0.1:45630] [cb,WARN] (server L:484 I:1) <my.server:11210> (SRV=0x55db520a7bd0,IX=0) Failing command (pkt=0x55db520a8970, opaque=111, opcode=0x5) with error LCB_ETIMEDOUT (0x17)
[Sat Aug 05 01:43:03.843637 2017] [php7:notice] [pid 3848] [client 127.0.0.1:45630] [cb,EROR] (server L:559 I:1) <my.server:11210> (SRV=0x55db520a7bd0,IX=0) Server timed out. Some commands have failed
[Sat Aug 05 01:43:06.569803 2017] [php7:notice] [pid 3848] [client 127.0.0.1:45630] [cb,WARN] (server L:484 I:1) <my.server:11210> (SRV=0x55db520a7bd0,IX=0) Failing command (pkt=0x55db520a8970, opaque=112, opcode=0x5) with error LCB_ETIMEDOUT (0x17)
[Sat Aug 05 01:43:06.569859 2017] [php7:notice] [pid 3848] [client 127.0.0.1:45630] [cb,EROR] (server L:559 I:1) <my.server:11210> (SRV=0x55db520a7bd0,IX=0) Server timed out. Some commands have failed
[Sat Aug 05 01:43:09.311300 2017] [php7:notice] [pid 3848] [client 127.0.0.1:45630] [cb,WARN] (server L:484 I:1) <my.server:11210> (SRV=0x55db520a7bd0,IX=0) Failing command (pkt=0x55db520a8970, opaque=113, opcode=0x5) with error LCB_ETIMEDOUT (0x17)
[Sat Aug 05 01:43:09.311358 2017] [php7:notice] [pid 3848] [client 127.0.0.1:45630] [cb,EROR] (server L:559 I:1) <my.server:11210> (SRV=0x55db520a7bd0,IX=0) Server timed out. Some commands have failed

I even got some weirder ones:

[Sat Aug 05 01:58:08.517953 2017] [php7:notice] [pid 3848] [client 127.0.0.1:47842] [cb,WARN] (lcbio L:103 I:1) OS errno 110 (Connection timed out) does not have a direct client error code equivalent. Using NETWORK_ERROR
[Sat Aug 05 01:58:08.518034 2017] [php7:notice] [pid 3848] [client 127.0.0.1:47842] [cb,WARN] (server L:719 I:1) <my.server:11210> (SRV=0x55db520a7bd0,IX=0) Got socket error LCB_NETWORK_ERROR (0x10)
[Sat Aug 05 01:59:23.520392 2017] [php7:notice] [pid 3848] [client 127.0.0.1:47842] [cb,WARN] (pcbc/n1ql L:51) Failed to decode N1QL row as JSON: json_last_error=4. I=0x55db520777b0
[Sat Aug 05 01:59:23.520440 2017] [php7:notice] [pid 3848] [client 127.0.0.1:47842] [cb,EROR] (pcbc/n1ql L:88) Failed to perform N1QL query. 0: . I=0x55db520777b0

Apart from telling Azure to keep connections alive for 30 minutes (not ideal, and the max amount of time Azure allows), what are my other options? Is any part of the SDK configurable?

Hello,

I am bumping this thread again, because this seems like a critical issue when an application sits outside of the couchbase server/network.

I am providing more information here. I now doubt it is the fault of Azure, alone. These timeouts only seem to happen when doing key/value operations.

[Mon Sep 25 14:25:25.240623 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (pcbc/cluster L:47) Initialize Cluster. C=0x7f3582e6a480 connstr="couchbase://SERVER ONE,SERVER TWO,SERVER THREE?detailed_errcodes=1&operation_timeout=10&http_poolsize=0&config_cache=/tmp/cb_config_cache_companydata"
[Mon Sep 25 14:25:25.241100 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (pcbc/pool L:159) Rewrite connection string from "couchbase://SERVER ONE,SERVER TWO,SERVER THREE?detailed_errcodes=1&operation_timeout=10&http_poolsize=0&config_cache=/tmp/cb_config_cache_companydata" to "couchbase://SERVER ONE,SERVER TWO,SERVER THREE/companydata?detailed_errcodes=1&operation_timeout=10&http_poolsize=0&config_cache=/tmp/cb_config_cache_companydata"
[Mon Sep 25 14:25:25.241258 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (pcbc/pool L:334) cachehit: type=0, connstr=couchbase://SERVER ONE,SERVER TWO,SERVER THREE/companydata?detailed_errcodes=1&operation_timeout=10&http_poolsize=0&config_cache=/tmp/cb_config_cache_companydata, bucketname=companydata, auth_hash=a4d95b77f7a3044f13446d1127a88956, lcb=0x561e1629b4e0, refs=1. I=0x561e1629b4e0
[Mon Sep 25 14:25:25.244564 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,TRAC] (pcbc/bucket L:300) N1QL: {"statement":"SELECT id, entity_id FROM companydata WHERE project_id = 1 AND type = 8 AND id >= 100000  AND processed = false ORDER BY entity_id asc, id asc LIMIT 5000","scan_consistency":"request_plus"}. I=0x561e1629b4e0
[Mon Sep 25 14:25:25.244855 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,TRAC] (http-io L:228 I:2) <SERVER THREE:8093> POST  http://SERVER THREE:8093/query/service. Body=227 bytes
[Mon Sep 25 14:25:25.244975 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (lcbio_mgr L:413 I:2) <SERVER THREE:8093> (HE=0x561e1631f9f0) Creating new connection because none are available in the pool
[Mon Sep 25 14:25:25.245552 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,TRAC] (lcbio_mgr L:328 I:2) <SERVER THREE:8093> (HE=0x561e1631f9f0) New pool entry: I=0x561e162aa600
[Mon Sep 25 14:25:25.245581 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,INFO] (connection L:465 I:2) <SERVER THREE:8093> (SOCK=0x561e162c5830) Starting. Timeout=75000000us
[Mon Sep 25 14:25:25.308194 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (connection L:230 I:2) <SERVER THREE:8093> (SOCK=0x561e162c5830) Created new socket with FD=23
[Mon Sep 25 14:25:25.308283 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,TRAC] (connection L:332 I:2) <SERVER THREE:8093> (SOCK=0x561e162c5830) Scheduling I/O watcher for asynchronous connection completion.
[Mon Sep 25 14:25:25.308306 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,EROR] (htconfig L:322 I:2) <SERVER TWO:8091> HTTP Provider timed out waiting for I/O
[Mon Sep 25 14:25:25.308312 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (ioctx L:151 I:2) <SERVER TWO:8091> (CTX=0x561e162c5b30,bc_http) Destroying context. Pending Writes=0, Entered=false, Socket Refcount=1
[Mon Sep 25 14:25:25.308364 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,INFO] (connection L:465 I:2) <SERVER THREE:8091> (SOCK=0x561e16304030) Starting. Timeout=2000000us
[Mon Sep 25 14:25:25.316701 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (connection L:230 I:2) <SERVER THREE:8091> (SOCK=0x561e16304030) Created new socket with FD=22
[Mon Sep 25 14:25:25.316798 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,TRAC] (connection L:332 I:2) <SERVER THREE:8091> (SOCK=0x561e16304030) Scheduling I/O watcher for asynchronous connection completion.
[Mon Sep 25 14:25:25.316869 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,TRAC] (server L:576 I:2) <SERVER TWO:11210> (SRV=0x561e162d45a0,IX=1) Scheduling next timeout for 10000 ms. This is not an error
[Mon Sep 25 14:25:25.316887 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,TRAC] (server L:576 I:2) <SERVER ONE:11210> (SRV=0x561e1631f780,IX=0) Scheduling next timeout for 10000 ms. This is not an error
[Mon Sep 25 14:25:25.316892 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,TRAC] (server L:576 I:2) <SERVER THREE:11210> (SRV=0x561e162ca4f0,IX=2) Scheduling next timeout for 10000 ms. This is not an error
[Mon Sep 25 14:25:25.348825 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,INFO] (connection L:139 I:2) <SERVER THREE:8093> (SOCK=0x561e162c5830) Connected established
[Mon Sep 25 14:25:25.348859 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (connection L:98 I:2) <SERVER THREE:8093> (SOCK=0x561e162c5830) Successfully set TCP_NODELAY
[Mon Sep 25 14:25:25.348866 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (connection L:98 I:2) <SERVER THREE:8093> (SOCK=0x561e162c5830) Successfully set TCP_KEEPALIVE
[Mon Sep 25 14:25:25.348871 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (lcbio_mgr L:287 I:2) <SERVER THREE:8093> (HE=0x561e1631f9f0) Received result for I=0x561e162aa600,C=(nil); E=0x0
[Mon Sep 25 14:25:25.348876 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (lcbio_mgr L:247 I:2) <SERVER THREE:8093> (HE=0x561e1631f9f0) Assigning R=0x561e162c8fd0 SOCKET=0x561e162c5830
[Mon Sep 25 14:25:25.348883 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (ioctx L:101 I:2) <SERVER THREE:8093> (CTX=0x561e162c9060,unknown) Pairing with SOCK=0x561e162c5830
[Mon Sep 25 14:25:25.366606 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,INFO] (connection L:139 I:2) <SERVER THREE:8091> (SOCK=0x561e16304030) Connected established
[Mon Sep 25 14:25:25.366637 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (connection L:98 I:2) <SERVER THREE:8091> (SOCK=0x561e16304030) Successfully set TCP_NODELAY
[Mon Sep 25 14:25:25.366644 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (connection L:98 I:2) <SERVER THREE:8091> (SOCK=0x561e16304030) Successfully set TCP_KEEPALIVE
[Mon Sep 25 14:25:25.366649 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (htconfig L:298 I:2) Successfuly connected to REST API SERVER THREE:8091
[Mon Sep 25 14:25:25.366662 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (ioctx L:101 I:2) <SERVER THREE:8091> (CTX=0x561e162c8330,unknown) Pairing with SOCK=0x561e16304030
[Mon Sep 25 14:25:25.431296 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,TRAC] (htconfig L:209 I:2) <SERVER THREE:8091> Received 200 bytes on HTTP stream
[Mon Sep 25 14:25:25.433089 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,TRAC] (htconfig L:209 I:2) <SERVER THREE:8091> Received 1448 bytes on HTTP stream
[Mon Sep 25 14:25:25.433390 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,TRAC] (htconfig L:209 I:2) <SERVER THREE:8091> Received 1448 bytes on HTTP stream
[Mon Sep 25 14:25:25.433893 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,TRAC] (htconfig L:209 I:2) <SERVER THREE:8091> Received 1448 bytes on HTTP stream
[Mon Sep 25 14:25:25.434241 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,TRAC] (htconfig L:209 I:2) <SERVER THREE:8091> Received 1448 bytes on HTTP stream
[Mon Sep 25 14:25:25.434555 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,TRAC] (htconfig L:209 I:2) <SERVER THREE:8091> Received 1448 bytes on HTTP stream
[Mon Sep 25 14:25:25.435807 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,TRAC] (htconfig L:209 I:2) <SERVER THREE:8091> Received 2919 bytes on HTTP stream
[Mon Sep 25 14:25:25.436393 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (htconfig L:224 I:2) <SERVER THREE:8091> Generation 5 -> 6
[Mon Sep 25 14:25:25.436438 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (confmon L:150 I:2) Not applying configuration received via HTTP. No changes detected. A.rev=4946, B.rev=4948
[Mon Sep 25 14:25:25.495191 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (ioctx L:151 I:2) <SERVER THREE:8093> (CTX=0x561e162c9060,mgmt/capi) Destroying context. Pending Writes=0, Entered=true, Socket Refcount=1
[Mon Sep 25 14:25:25.566012 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,DEBG] (pcbc/cluster L:47) Initialize Cluster. C=0x7f3582e73380 connstr="couchbase://SERVER ONE,SERVER TWO,SERVER THREE?detailed_errcodes=1&operation_timeout=10&http_poolsize=0&config_cache=/tmp/cb_config_cache_companydata"
[Mon Sep 25 14:25:25.566072 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,DEBG] (pcbc/pool L:159) Rewrite connection string from "couchbase://SERVER ONE,SERVER TWO,SERVER THREE?detailed_errcodes=1&operation_timeout=10&http_poolsize=0&config_cache=/tmp/cb_config_cache_companydata" to "couchbase://SERVER ONE,SERVER TWO,SERVER THREE/companydata?detailed_errcodes=1&operation_timeout=10&http_poolsize=0&config_cache=/tmp/cb_config_cache_companydata"
[Mon Sep 25 14:25:25.566093 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,DEBG] (pcbc/pool L:334) cachehit: type=0, connstr=couchbase://SERVER ONE,SERVER TWO,SERVER THREE/companydata?detailed_errcodes=1&operation_timeout=10&http_poolsize=0&config_cache=/tmp/cb_config_cache_companydata, bucketname=companydata, auth_hash=a4d95b77f7a3044f13446d1127a88956, lcb=0x561e162dd820, refs=1. I=0x561e162dd820
[Mon Sep 25 14:25:25.566132 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,TRAC] (server L:576 I:3) <SERVER ONE:11210> (SRV=0x561e162bb1f0,IX=0) Scheduling next timeout for 10000 ms. This is not an error
[Mon Sep 25 14:25:25.566150 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,TRAC] (server L:576 I:3) <SERVER THREE:11210> (SRV=0x561e162b74a0,IX=2) Scheduling next timeout for 10000 ms. This is not an error
[Mon Sep 25 14:25:25.566164 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,TRAC] (server L:576 I:3) <SERVER TWO:11210> (SRV=0x561e162da050,IX=1) Scheduling next timeout for 9999 ms. This is not an error


[Mon Sep 25 14:25:35.575834 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,TRAC] (server L:576 I:3) <SERVER TWO:11210> (SRV=0x561e162da050,IX=1) Scheduling next timeout for 10000 ms. This is not an error
[Mon Sep 25 14:25:35.575874 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,TRAC] (server L:576 I:3) <SERVER ONE:11210> (SRV=0x561e162bb1f0,IX=0) Scheduling next timeout for 10000 ms. This is not an error
[Mon Sep 25 14:25:35.575880 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,TRAC] (server L:576 I:3) <SERVER THREE:11210> (SRV=0x561e162b74a0,IX=2) Scheduling next timeout for 48 ms. This is not an error
[Mon Sep 25 14:25:35.624403 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,WARN] (server L:497 I:3) <SERVER THREE:11210> (SRV=0x561e162b74a0,IX=2) Failing command (pkt=0x561e162aa550, opaque=29, opcode=0x0) with error LCB_ETIMEDOUT (0x17)
[Mon Sep 25 14:25:35.624444 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,TRAC] (confmon L:272 I:3) Refreshing current cluster map
[Mon Sep 25 14:25:35.624450 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,EROR] (server L:572 I:3) <SERVER THREE:11210> (SRV=0x561e162b74a0,IX=2) Server timed out. Some commands have failed
[Mon Sep 25 14:25:35.624454 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,TRAC] (server L:576 I:3) <SERVER THREE:11210> (SRV=0x561e162b74a0,IX=2) Scheduling next timeout for 10000 ms. This is not an error
[Mon Sep 25 14:25:35.624643 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,DEBG] (confmon L:150 I:3) Not applying configuration received via CCCP. No changes detected. A.rev=4946, B.rev=4948
[Mon Sep 25 14:25:35.624667 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,DEBG] (confmon L:150 I:3) Not applying configuration received via HTTP. No changes detected. A.rev=4946, B.rev=4948
[Mon Sep 25 14:25:35.624672 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,TRAC] (confmon L:259 I:3) Attempting to retrieve cluster map via FILE
[Mon Sep 25 14:25:35.624800 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,WARN] (bc_file L:85 I:3) (cache=/tmp/cb_config_cache_companydata) Modification time too old
[Mon Sep 25 14:25:35.624830 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,INFO] (confmon L:182 I:3) Provider 'FILE' failed
[Mon Sep 25 14:25:35.624837 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,DEBG] (confmon L:223 I:3) Will try next provider in 100000us
[Mon Sep 25 14:25:35.656919 2017] [php7:notice] [pid 6786] [client 127.0.0.1:38204] [cb,DEBG] (pcbc/pool L:211) cachedel: type=0, connstr=couchbase://SERVER ONE,SERVER TWO,SERVER THREE/companydata?detailed_errcodes=1&operation_timeout=10&http_poolsize=0&config_cache=/tmp/cb_config_cache_companydata, bucketname=companydata, auth_hash=a4d95b77f7a3044f13446d1127a88956, lcb=0x561e162dd820, refs=0. I=0x561e162dd820
[Mon Sep 25 14:25:35.657739 2017] [php7:notice] [pid 6939] [client 127.0.0.1:38196] [cb,DEBG] (pcbc/pool L:211) cachedel: type=0, connstr=couchbase://SERVER ONE,SERVER TWO,SERVER THREE/companydata?detailed_errcodes=1&operation_timeout=10&http_poolsize=0&config_cache=/tmp/cb_config_cache_companydata, bucketname=companydata, auth_hash=a4d95b77f7a3044f13446d1127a88956, lcb=0x561e1629b4e0, refs=0. I=0x561e1629b4e0