Sometimes getting "Couchbase::Error::Network (failed to execute HTTP request, Network failure ..."

#1

I am running my app on the following environment.

  • Couchbase server => Community v.2.2
  • Ruby SDK => v.1.3

Sometimes I am getting the following error when I am fetching the data from couchbase.

Couchbase::Error::Network (failed to execute HTTP request, Network failure (key=" ... %error=0x10))

This sometimes happens, it is not always happening.
Most of times, the same action works fine.

What could be the issue in such situation? Maybe, a compatibility issue?

#2

Do you have some script I can run and reproduce the issue after some time?

#3

Hmm… It is quite a difficult to write a reproducing sample script for me. :frowning:

I have tested the same code on Server 2.5 Enterprise and Server 3.0 Community, and those two versions do not spit the network error.
It seems like the error comes from the libcouchbase client, but still does not know what is the exact problem.

#4

Without knowing what kind of operation is being performed it is difficult to have an idea of what might be wrong. I suggest you run with logging (Set LCB_LOGLEVEL=5 in your environment), and with more detailed error codes (set LCB_OPTIONS=detailed_errcodes=1) in your environment as well.

Then, run your script with those variables.

#5

After applying the LCB options, I got the following error messages…

12:09:17 web.1          | 2057555ms [I0] {43} [INFO] (lcbio_mgr - L:391) <10.10.36.126:8092> (HE=0x8811380) Found ready connection in pool. Reusing socket and not creating new connection
12:09:17 web.1          | 2057555ms [I0] {43} [DEBUG] (lcbio_mgr - L:448) <10.10.36.127:8092> (HE=0x849bcb0) Idle connection expired
12:09:17 web.1          | 2057555ms [I0] {43} [DEBUG] (lcbio_mgr - L:448) <10.10.36.125:8092> (HE=0x860b4a0) Idle connection expired
12:09:17 web.1          | 2057555ms [I0] {43} [DEBUG] (lcbio_mgr - L:223) <10.10.36.126:8092> (HE=0x8811380) Assigning R=0xc97c400 SOCKET=0xb8b6300
12:09:17 web.1          | 2057556ms [I0] {43} [DEBUG] (ioctx - L:101) <10.10.36.126:8092> (CTX=0xc7c07a0,unknown) Pairing with SOCK=0xb8b6300
12:09:17 web.1          | 2057556ms [I0] {43} [ERROR] (http-io - L:181) <10.10.36.126:8092>Got error while performing I/O on HTTP stream. Err=0x2d
12:09:17 web.1          | 2057556ms [I0] {43} [TRACE] (confmon - L:284) Start refresh requested
12:09:17 web.1          | 2057557ms [I0] {43} [DEBUG] (ioctx - L:151) <10.10.36.126:8092> (CTX=0xc7c07a0,mgmt/capi) Destroying. PND=0,ENT=0,SORC=1
12:09:17 web.1          | Completed 500 Internal Server Error in 429ms
12:09:17 web.1          |
12:09:17 web.1          | Couchbase::Error::Libcouchbase (failed to execute HTTP request, The remote host closed the connection (key="...............error=0x2d)):
12:09:17 web.1          |   app/controllers/userhabit_console/head_app_views_controller.rb:369:in `to_a'
12:09:17 web.1          |   app/controllers/userhabit_console/head_app_views_controller.rb:369:in `index'

12:08:05 web.1          | 1985631ms [I0] {88} [DEBUG] (lcbio_mgr - L:400) <10.10.36.126:8092> (HE=0x82efed0) Creating new connection because none are available in the pool
12:08:05 web.1          | 1985631ms [I0] {88} [DEBUG] (lcbio_mgr - L:318) <10.10.36.126:8092> (HE=0x82efed0) Starting connection on I=0x9ff7dd0
12:08:05 web.1          | 1985632ms [I0] {88} [INFO] (connection - L:440) <10.10.36.126:8092> (SOCK=0x8000010) Starting. Timeout=75000000us
12:08:05 web.1          | 1985632ms [I0] {88} [TRACE] (connection - L:258) <10.10.36.126:8092> (SOCK=0x8000010) Got event handler for new connection
12:08:05 web.1          | 1985633ms [I0] {88} [TRACE] (connection - L:305) <10.10.36.126:8092> (SOCK=0x8000010) Scheduling asynchronous watch for socket.
12:08:05 web.1          | 1985633ms [I0] {88} [ERROR] (htconfig - L:375) <10.10.36.127:8091> HTTP Provider timed out waiting for I/O
#6

interesting what kind of load are you using?

#7

I’m sorry but what do you mean by “what kind of load”?

#8

From the log, I see that it seems like View query fails with connection closed error. Maybe you are iterating through the large view results and it takes some time to process first results so that when it comes to read next chunk view connection has been closed or something like that.

Do you have a sample script which can reproduce the issue?

#9

Actually, I am iterating through very large view results (sometimes, result row count > 100000).
And I am querying for large view results 2~3 times sequentially.

P.S. I don’t have any sample script which can reproduce the issue.
I am running a web application, so the couchbase querying lines are inside a complicated code block.
So it is a bit difficult to write a sample script for me now.

But I do something like…

hash1 = CouchbaseModel.view_1(filter).to_a
hash1.each do |row|
  key = row.key
  val = row.value
  id = key[key.size - 1]
  @return_hash_1[id] = @return_hash_1[id] ? @return_hash_1[id] + val : val
end

hash2 = CouchbaseModel.view_2(filter).to_a
hash2.each do |row|
  key = row.key
  val = row.value
  id = key[key.size - 1]
  @return_hash_2[id] = @return_hash_2[id] ? @return_hash_2[id] + val : val
end

hash3 = CouchbaseModel.view_3(filter).to_a
hash3.each do |row|
  key = row.key
  val = row.value
  id = key[key.size - 1]
  @return_hash_3[id] = @return_hash_3[id] ? @return_hash_3[id] + val : val
end
#10

You can try disabling view connection pooling via http_poolsize=0 in the same LCB_OPTIONS environment variable (separate the options with a & character).

This may be a case of the server closing an idle connection.

@avsej - Do you have a NULL body for the request structure? I recall seeing something similar a few weeks back. This is fixed in libcouchbase 2.4.3, but would be broken in older versions. See http://review.couchbase.org/42170

#11

After setting http_poolsize=0, the problem has gone.
What exactly does http_poolsize=0 do? I couldn’t find any manual for this option.

P.S. My libcouchbase is 2.4.3


The below problem solved after I reinstalled couchbase gem.

Plus, do you know what this error means?
I was trying to set up the same environment variables on my staging server, and it spit the error.
This happens with http_poolsize=0. (This worked fine on the production server.)

Couchbase::Error::BadEnvironment: failed to create libcouchbase instance, The value for an environment variable recognized by libcouchbase was specified in an incorrect format. Check your environment for entries starting with 'LCB_' or 'LIBCOUCHBASE_' (error=0x24)
/var/lib/gems/2.1.0/gems/couchbase-1.3.10/lib/couchbase.rb:63:in `initialize'
/var/lib/gems/2.1.0/gems/couchbase-1.3.10/lib/couchbase.rb:63:in `new'
/var/lib/gems/2.1.0/gems/couchbase-1.3.10/lib/couchbase.rb:63:in `connect'
/var/lib/gems/2.1.0/gems/couchbase-1.3.10/lib/couchbase.rb:138:in `bucket'
#12

It disables connection pooling. http_poolsize is the same as LCB_CNTL_HTTP_POOLSIZE (which is documented http://docs.couchbase.com/sdk-api/couchbase-c-client-2.4.3/group___l_c_b___c_n_t_l.html#ga23142cc1e082926148dcca86b841d3c2)

The http_poolsize string option is undocumented however; but is the only way to set this variable without writing some C code. Unless ruby exposes lcb_cntl directly.

Of course, another thing that warrants investigation is; the pool should be working fine. How long are you waiiting in between view queries? It might be possible that an idle connectin would time out

#13

Time between view queries?

As you can see the above sample script, I iterate through the query result where the result sometimes contains more than 50000 rows. Of course, this takes less than 1min (60000ms).

If I disable the pool size, doesn’t this mean view querying is slower than that with pool size enabled?

#14

See https://www.couchbase.com/issues/browse/CCBC-546 and the related patch set for more information

1 Like
#15

Hi,
I have the same problem but I don’t know how to and where should I set ‘http_poolsize = 0’? Can you help me on this?
@mnunberg @Dynamicscope

#16

It depends on the Operating System that you are running.
If linux,

root$ LCB_OPTIONS http_poolsize=0

and then, you can check with…

root$ set | grep LCB_OPTIONS

Cheers,

#17

I’m on Ubuntu 14.04, but It won’t work

root@server:~# LCB_OPTIONS http_poolsize=0
LCB_OPTIONS: command not found

#19

@Dynamicscope Should I add LCB_OPTIONS command to path? or run this command from specific path?

#20

@avsej @mnunberg
Finally, upgraded to libcouchbase 2.4.6 and removed http_poolsize option.
It seems like it is working fine when first connected to the server.
But after a few minutes when the first connection seems to be timed out.

The client often (4 out of 5 request button clicks) get the following errors, again.

[I0] {99} [INFO] (lcbio_mgr - L:407) <10.10.36.124:8092> (HE=0x7feed50) Found ready connection in pool. Reusing socket and not creating new connection
[I0] {99} [DEBUG] (lcbio_mgr - L:223) <10.10.36.124:8092> (HE=0x7feed50) Assigning R=0x8e18b60 SOCKET=0x5da6f50
[I0] {99} [DEBUG] (ioctx - L:101) <10.10.36.124:8092> (CTX=0x8e1be60,unknown) Pairing with SOCK=0x5da6f50
[I0] {99} [ERROR] (http-io - L:207) <10.10.36.124:8092>Got error while performing I/O on HTTP stream. Err=0x2d
[I0] {99} [TRACE] (confmon - L:292) Start refresh requested
[I0] {99} [DEBUG] (ioctx - L:151) <10.10.36.124:8092> (CTX=0x8e1be60,mgmt/capi) Destroying. PND=0,ENT=0,SORC=1

Completed 500 Internal Server Error in 58ms

Couchbase::Error::Libcouchbase (failed to execute HTTP request, The remote host closed the connection (key="_design/screen_analysis_screen_info/_view/object_tap_count_hash?connection_timeout=75000&wrapper_class=ScreenAnalysisScreenInfo&group=true&start_key=%5B432%2C%22%24PORTRAIT%24com.uplus.onphone.page.hdtv_full_player%22%2C%221440x2392%22%2C2010%2C1%2C1%5D&end_key=%5B432%2C%22%24PORTRAIT%24com.uperror=0x2d)):

It is still getting the dead connection from the pool.

FYI, I am using…
libcouchbase (2.4.6)
gem couchbase (1.3.11)
gem couchbase-model (0.5.4)

#21

Unfortunately there’s nothing changed in 2.4.6 to help this issue. I believe it might be something related to the server end closing the connection at random points in time. The dead connection detection may still help – but there’s a chance that the server ends up closing the connection only after data had already been sent.

@avsej – does the Ruby client cancel the request in any case? Even though we will only place a request back into the pool if the parser state indicates the response has been received in full, and that we certainly do check the socket has been closed before returning it back from the pool.

Dead connection detection depends on a more recent version of Ruby (I believe 1.3.11?)

In addition, a packet trace indicating why the server has closed the connection would be helpful. I’ll see if I can get some more info on that.