Regular timeouts on nodejs sdk


#1

Hi,
we have an issue where our NodeJS clients are receiving Server timeouts every half a minute. Requests are running well, but then fail for a few seconds.

Our setup:
Server side:
4.0.0-4051 Community Edition (build-4051)
Debian Wheezy
no replicas
XDCR, one server per location

Client side:
Node v4.4.1
couchnode: 2.14.

I did run the application with LCB_LOGLEVEL=5 and the initial output looks like this:

0ms [I0] {21516} [INFO] (instance - L:385) Version=2.5.6, Changeset=fa6ce043d715a3b0fd25712673c4763f782b08ee
0ms [I0] {21516} [INFO] (instance - L:386) Effective connection string: couchbase://10.240.0.9/default. Bucket=default
0ms [I0] {21516} [DEBUG] (instance - L:65) Adding host 10.240.0.9:8091 to initial HTTP bootstrap list
0ms [I0] {21516} [DEBUG] (instance - L:65) Adding host 10.240.0.9:11210 to initial CCCP bootstrap list
2ms [I0] {21516} [DEBUG] (confmon - L:89) Preparing providers (this may be called multiple times)
2ms [I0] {21516} [DEBUG] (confmon - L:99) Provider FILE is DISABLED
2ms [I0] {21516} [DEBUG] (confmon - L:96) Provider CCCP is ENABLED
2ms [I0] {21516} [DEBUG] (confmon - L:96) Provider HTTP is ENABLED
2ms [I0] {21516} [DEBUG] (confmon - L:99) Provider MCRAW is DISABLED
2ms [I0] {21516} [TRACE] (confmon - L:292) Start refresh requested
3ms [I0] {21516} [TRACE] (confmon - L:271) Current provider is CCCP
3ms [I0] {21516} [INFO] (cccp - L:118) Requesting connection to node 10.240.0.9:11210 for CCCP configuration
3ms [I0] {21516} [DEBUG] (lcbio_mgr - L:416) <10.240.0.9:11210> (HE=0x35cac70) Creating new connection because none are available in the pool
3ms [I0] {21516} [DEBUG] (lcbio_mgr - L:321) <10.240.0.9:11210> (HE=0x35cac70) Starting connection on I=0x356f1b0
3ms [I0] {21516} [INFO] (connection - L:450) <10.240.0.9:11210> (SOCK=0x356f330) Starting. Timeout=2000000us
3ms [I0] {21516} [TRACE] (connection - L:344) <10.240.0.9:11210> (SOCK=0x356f330) Received completion handler. Status=0. errno=0
3ms [I0] {21516} [INFO] (connection - L:116) <10.240.0.9:11210> (SOCK=0x356f330) Connected
3ms [I0] {21516} [WARN] (lcbio - L:103) FIXME: Unknown iops/os error code 95. Using NETWORK_ERROR
3ms [I0] {21516} [INFO] (lcbio - L:112) Translating errno=95, lcb=0x10 to NETWORK_ERROR
3ms [I0] {21516} [INFO] (connection - L:121) <10.240.0.9:11210> (SOCK=0x356f330) Couldn’t set TCP_NODELAY
3ms [I0] {21516} [DEBUG] (lcbio_mgr - L:271) <10.240.0.9:11210> (HE=0x35cac70) Received result for I=0x356f1b0,C=0x356f330; E=0x0
3ms [I0] {21516} [DEBUG] (lcbio_mgr - L:223) <10.240.0.9:11210> (HE=0x35cac70) Assigning R=0x34b5ed0 SOCKET=0x356f330
3ms [I0] {21516} [DEBUG] (ioctx - L:101) <10.240.0.9:11210> (CTX=0x35cb8c0,unknown) Pairing with SOCK=0x356f330
4ms [I0] {21516} [DEBUG] (negotiation - L:355) <10.240.0.9:11210> (SASLREQ=0x35630f0) Found feature 0x3 (TCP NODELAY)
4ms [I0] {21516} [DEBUG] (ioctx - L:151) <10.240.0.9:11210> (CTX=0x35cb8c0,sasl) Destroying. PND=0,ENT=1,SORC=1
4ms [I0] {21516} [DEBUG] (ioctx - L:101) <10.240.0.9:11210> (CTX=0x35cbae0,unknown) Pairing with SOCK=0x356f330
4ms [I0] {21516} [DEBUG] (ioctx - L:151) <10.240.0.9:11210> (CTX=0x35cbae0,bc_cccp) Destroying. PND=0,ENT=1,SORC=1
4ms [I0] {21516} [INFO] (lcbio_mgr - L:491) <10.240.0.9:11210> (HE=0x35cac70) Placing socket back into the pool. I=0x356f1b0,C=0x356f330
4ms [I0] {21516} [INFO] (confmon - L:174) Setting new configuration. Received via CCCP
4ms [I0] {21516} [DEBUG] (bootstrap - L:55) Instance configured!
4ms [I0] {21516} [DEBUG] (confmon - L:89) Preparing providers (this may be called multiple times)
4ms [I0] {21516} [DEBUG] (confmon - L:99) Provider FILE is DISABLED
4ms [I0] {21516} [DEBUG] (confmon - L:96) Provider CCCP is ENABLED
4ms [I0] {21516} [DEBUG] (confmon - L:99) Provider HTTP is DISABLED
4ms [I0] {21516} [DEBUG] (confmon - L:99) Provider MCRAW is DISABLED
10004ms [I0] {21516} [DEBUG] (lcbio_mgr - L:464) <10.240.0.9:11210> (HE=0x35cac70) Idle connection expired
ready
10527ms [I0] {21516} [DEBUG] (lcbio_mgr - L:416) <10.240.0.9:11210> (HE=0x35cac70) Creating new connection because none are available in the pool
10527ms [I0] {21516} [DEBUG] (lcbio_mgr - L:321) <10.240.0.9:11210> (HE=0x35cac70) Starting connection on I=0x356f010
10527ms [I0] {21516} [INFO] (connection - L:450) <10.240.0.9:11210> (SOCK=0x356f070) Starting. Timeout=2500000us
10528ms [I0] {21516} [TRACE] (connection - L:344) <10.240.0.9:11210> (SOCK=0x356f070) Received completion handler. Status=0. errno=0
10528ms [I0] {21516} [INFO] (connection - L:116) <10.240.0.9:11210> (SOCK=0x356f070) Connected
10528ms [I0] {21516} [WARN] (lcbio - L:103) FIXME: Unknown iops/os error code 95. Using NETWORK_ERROR
10528ms [I0] {21516} [INFO] (lcbio - L:112) Translating errno=95, lcb=0x10 to NETWORK_ERROR
10528ms [I0] {21516} [INFO] (connection - L:121) <10.240.0.9:11210> (SOCK=0x356f070) Couldn’t set TCP_NODELAY
10528ms [I0] {21516} [DEBUG] (lcbio_mgr - L:271) <10.240.0.9:11210> (HE=0x35cac70) Received result for I=0x356f010,C=0x356f070; E=0x0
10528ms [I0] {21516} [DEBUG] (lcbio_mgr - L:223) <10.240.0.9:11210> (HE=0x35cac70) Assigning R=0x36021a0 SOCKET=0x356f070
10528ms [I0] {21516} [TRACE] (server - L:522) <10.240.0.9:11210> (SRV=0x35cb800) Session not yet negotiated. Negotiating
10528ms [I0] {21516} [DEBUG] (ioctx - L:101) <10.240.0.9:11210> (CTX=0x351cb80,unknown) Pairing with SOCK=0x356f070
10529ms [I0] {21516} [DEBUG] (negotiation - L:355) <10.240.0.9:11210> (SASLREQ=0x356f290) Found feature 0x3 (TCP NODELAY)
10529ms [I0] {21516} [DEBUG] (ioctx - L:151) <10.240.0.9:11210> (CTX=0x351cb80,sasl) Destroying. PND=0,ENT=1,SORC=1
10529ms [I0] {21516} [DEBUG] (ioctx - L:101) <10.240.0.9:11210> (CTX=0x3578880,unknown) Pairing with SOCK=0x356f070
10529ms [I0] {21516} [DEBUG] (server - L:543) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Setting initial timeout=2498ms

Afterwards the application is running fine until these message pop up and requests timeout:

246856ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x3890920, opaque=42726, opcode=0x0) with error 0x17
246857ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x3890970, opaque=42727, opcode=0x0) with error 0x17
246857ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x38909c0, opaque=42728, opcode=0x0) with error 0x17
246857ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x3890a10, opaque=42729, opcode=0x0) with error 0x17
246857ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x3890a60, opaque=42730, opcode=0x0) with error 0x17
246857ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x3890ab0, opaque=42731, opcode=0x0) with error 0x17
246857ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x3890b00, opaque=42732, opcode=0x0) with error 0x17
246857ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x3890b50, opaque=42733, opcode=0x0) with error 0x17
246857ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x3890ba0, opaque=42734, opcode=0x0) with error 0x17
246857ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x3890bf0, opaque=42735, opcode=0x0) with error 0x17
246857ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x3890c40, opaque=42736, opcode=0x0) with error 0x17
246857ms [I0] {21516} [TRACE] (confmon - L:292) Start refresh requested
246857ms [I0] {21516} [ERROR] (server - L:463) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Server timed out. Some commands have failed
246857ms [I0] {21516} [DEBUG] (server - L:467) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Scheduling next timeout for 711 ms
246857ms [I0] {21516} [INFO] (confmon - L:166) Not applying configuration received via CCCP. No changes detected. A.rev=33, B.rev=33
246857ms [I0] {21516} [TRACE] (confmon - L:271) Current provider is CCCP
246857ms [I0] {21516} [INFO] (cccp - L:110) Re-Issuing CCCP Command on server struct 0x35cb800 (10.240.0.9:11210)
246858ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=42726)
246858ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=42727)
246858ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=42728)
246858ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=42729)
246858ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=42730)
246858ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=42731)
246858ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=42732)
246858ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x1, SEQ=42733)
246858ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=42734)
246858ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=42735)
246858ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=42736)
247393ms [I0] {21516} [INFO] (confmon - L:166) Not applying configuration received via CCCP. No changes detected. A.rev=33, B.rev=33

or

375263ms [I0] {21516} [DEBUG] (server - L:467) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Scheduling next timeout for 2500 ms
378064ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x3ef7790, opaque=80545, opcode=0x0) with error 0x17
378064ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x3ef77e0, opaque=80546, opcode=0x0) with error 0x17
378065ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x3ef7830, opaque=80547, opcode=0x0) with error 0x17
378065ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x3ef7880, opaque=80548, opcode=0x0) with error 0x17
378065ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x3ef78d0, opaque=80549, opcode=0x0) with error 0x17
378065ms [I0] {21516} [TRACE] (confmon - L:292) Start refresh requested
378065ms [I0] {21516} [ERROR] (server - L:463) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Server timed out. Some commands have failed
378065ms [I0] {21516} [DEBUG] (server - L:467) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Scheduling next timeout for 4 ms
378066ms [I0] {21516} [INFO] (confmon - L:166) Not applying configuration received via CCCP. No changes detected. A.rev=33, B.rev=33
378066ms [I0] {21516} [TRACE] (confmon - L:271) Current provider is CCCP
378066ms [I0] {21516} [INFO] (cccp - L:110) Re-Issuing CCCP Command on server struct 0x35cb800 (10.240.0.9:11210)
378067ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=80545)
378067ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=80546)
378067ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=80547)
378067ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=80548)
378067ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=80549)
378069ms [I0] {21516} [INFO] (confmon - L:166) Not applying configuration received via CCCP. No changes detected. A.rev=33, B.rev=33

or

519179ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x409eb40, opaque=114050, opcode=0x0) with error 0x17
519179ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x409eb90, opaque=114051, opcode=0x0) with error 0x17
519179ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x409ebe0, opaque=114052, opcode=0x0) with error 0x17
519179ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x409ec30, opaque=114053, opcode=0x0) with error 0x17
519179ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x409ec80, opaque=114054, opcode=0x0) with error 0x17
519179ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x409ecd0, opaque=114055, opcode=0x0) with error 0x17
519179ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x409ed20, opaque=114056, opcode=0x0) with error 0x17
519179ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x409ed70, opaque=114057, opcode=0x0) with error 0x17
519179ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x409edc0, opaque=114058, opcode=0x0) with error 0x17
519179ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x409ee10, opaque=114059, opcode=0x0) with error 0x17
519179ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x409ee60, opaque=114060, opcode=0x0) with error 0x17
519180ms [I0] {21516} [WARN] (server - L:380) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Failing command (pkt=0x409eeb0, opaque=114061, opcode=0x0) with error 0x17
519180ms [I0] {21516} [TRACE] (confmon - L:292) Start refresh requested
519180ms [I0] {21516} [ERROR] (server - L:463) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Server timed out. Some commands have failed
519180ms [I0] {21516} [DEBUG] (server - L:467) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Scheduling next timeout for 2500 ms
519180ms [I0] {21516} [INFO] (confmon - L:166) Not applying configuration received via CCCP. No changes detected. A.rev=33, B.rev=33
519180ms [I0] {21516} [TRACE] (confmon - L:271) Current provider is CCCP
519180ms [I0] {21516} [INFO] (cccp - L:110) Re-Issuing CCCP Command on server struct 0x35cb800 (10.240.0.9:11210)
519180ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=114050)
519180ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=114051)
519180ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=114052)
519180ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=114053)
521314ms [I0] {21516} [ERROR] (cccp - L:133) NOHOST:NOPORT Got I/O Error=0x17
521314ms [I0] {21516} [INFO] (confmon - L:202) Provider ‘CCCP’ failed
521314ms [I0] {21516} [TRACE] (confmon - L:226) Maximum provider reached. Resetting index
521315ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=114054)
521315ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=114055)
521315ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=114056)
521315ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=114057)
521315ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=114058)
521315ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=114059)
521315ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=114060)
521315ms [I0] {21516} [WARN] (server - L:240) <10.240.0.9:11210> (SRV=0x35cb800,IX=0) Found stale packet (OP=0x0, RC=0x0, SEQ=114061)
521316ms [I0] {21516} [INFO] (confmon - L:166) Not applying configuration received via CCCP. No changes detected. A.rev=33, B.rev=33

Hope you could help me on this. Did a lot of research but could not find any solution.

Thanks,
Jens


#2

Looking at the logs, it doesn’t seem the packets are genuinely timing out. It appears that the responses for the commands are received immediately after the timeout. My guess would be that the event loop is being suspended by blocking operations (outside couchnode) for a given amount of time, which would explain the proximity of the timeout and the actual responses.