Couldn't bootstrap form cluster


#1

I connect the default bucket, I sure the connect way is true,but it error. The error tip is "Client-Side timeout exceeded for operation.Inspect network conditions or increase the timeout). And I connect other bucket will work! It is too strange.

create_options.version = 3;
create_options.v.v3.connstr = “couchbase://10.80.16.191/default”;

Any help will be appreciated!


#2

v3 API is technically a preview. Does it work fine with v2?


#4

v2 has the same problems. I create 200 threads and connect in each thread,does this make this problem?


#5

@ingenthr the v3 isn’t part of “api3”, simply version 3 of the connection structure. “API3” mainly refers to the API to be stabilized in a future version 3 of the library :smile:

@lcb_create 200 threads may end up loading and slowing down the network quite a bit. You can adjust the operation timeout (see http://docs.couchbase.com/sdk-api/couchbase-c-client-2.4.4/group__lcb-cntl-settings.html#gac834a2c0c1e34d9b9547fd1dbc5d81bf), and also perhaps enable logging (search for LCB_LOGLEVEL in the same link) to see what exactly is timing out.


#6

Additionally, since you have asked quite a few questions here, it might be helpful to post larger examples of your code, so that we can give better advice on things :smile:


#8

HAHA, the reason I didn’t post my code is that I have no confidence for my code. :smile:

I’m a entry-level programmer in china which is called code-farmer.


#9
static void* client_operator(void* threadId){
        /* connect couchbase */
        lcb_error_t err;
        lcb_t instance;
        struct lcb_create_st create_options;
	lcb_store_cmd_t scmd;
        const lcb_store_cmd_t *scmdlist[1];
        lcb_get_cmd_t gcmd;
        const lcb_get_cmd_t *gcmdlist[1];
        create_options.version = 3;
        create_options.v.v3.connstr = "couchbase://10.80.16.191/default";
        err = lcb_create(&instance, &create_options);
        if (err != LCB_SUCCESS) {
                die(NULL, "Couldn't create couchbase handle", err);
        }
        err = lcb_connect(instance);
        if (err != LCB_SUCCESS) {
                die(instance, "Couldn't schedule connection", err);
        }
        lcb_wait(instance);
		
	/* adjust operation timeout */
        lcb_U32 timeout = 3500000;
        lcb_cntl(instance, LCB_CNTL_SET, LCB_CNTL_OP_TIMEOUT, &timeout);
        lcb_cntl(instance, LCB_CNTL_GET, LCB_CNTL_OP_TIMEOUT, &timeout);

        err = lcb_get_bootstrap_status(instance);
        if (err != LCB_SUCCESS) {
                die(instance, "Couldn't bootstrap from cluster", err);
                return NULL;
        }
        /* Assign the handlers to be called for the operation types */
        lcb_set_get_callback(instance, get_callback);
   	
	char p[KEY_STR_NUM];
        for(int i = 0; i < KEY_STR_NUM; i++){
                p[i]  = 'a' + rand()%26;
        }

	gcmd.v.v0.key = p;
        gcmd.v.v0.nkey = strlen((const char*)gcmd.v.v0.key);
        gcmdlist[0] = &gcmd;
        err = lcb_get(instance, NULL, 1, gcmdlist);
        if (err != LCB_SUCCESS) {
                die(instance, "Couldn't schedule retrieval operation", err);
        }
        /* Likewise, the get_callback is invoked from here */
        fprintf(stderr, "Will wait to retrieve item..\n");
        lcb_wait(instance);

        /* Now that we're all done, close down the connection handle */
        lcb_destroy(instance);
}

#10

There are 500 threads as I post,

There are still many error tip after I adjust operation timeout: Couldn’t bootstrap from cluster. Received code 0x17 (Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout)

And there is another error put by linux:[warn]
epoll_create: Too many open files
[err] evsig_init: socketpair: Too many open files

Does not c sdk close file stream?

= =


#11

The library does a fine job at closing open file descriptors, and thank you for posting your code, or a reproducible example thereof.

On many systems, the default per-process FD limit will be something around 1024. Use ulimit -n in your shell to verify:

mnunberg@csure:~$ ulimit -n
1024

You are creating 500 threads - assuming you have 4 nodes per cluster, this means 2048 sockets, plus any kind of additional socket descriptors that your application may have.

To be fair, creating 500 (or even 50) threads in a C application to perform network I/O is not a good idea and the problems you are seeing right now are only the mere beginning of other issues you will eventually face in your application (unless of course you have something like 128 cores! - and even then you still have file descriptor limits to worry about).

If you really do wish to proceed with this design choice, you will need to:

  1. Increase the default timeouts in the library. There is LCB_CNTL_OP_TIMEOUT, but there is also LCB_CNTL_CONFIGURATION_TIMEOUT, and LCB_CNTL_CONFIG_NODE_TIMEOUT. If you’re using views there’s LCB_CNTL_VIEWS_TIMEOUT, and finally for durability requirements, LCB_CNTL_DURABILITY_TIMEOUT. See the full listing of settings here: http://docs.couchbase.com/sdk-api/couchbase-c-client-2.4.4/group__lcb-cntl-settings.html
  2. Increase your OS level file descriptor limit. Depending on your platform this may or may not work: ulimit -n unlimited && ./your_cb_exe

#13

I really appreciate for your patient help, I have solved my problem,decrease threads to 100. :smile: