Couchbase protocol slower than http

Hi,

I’m using the node sdk. Opening a couchbase cluster using the call:

new couchbase.Cluster('couchbase://localhost');

takes much longer than with:

new couchbase.Cluster('http://localhost:8091');

What is the practical difference between the two protocols and why is one taking so much longer than the other? With the second method, everything still seems to work as normal and I can read/write docs, make view queries etc. Is there a good reason why I should be using the former method instead?

Thanks,
Giles

I don’t think that’s expected. The reality with the second one is that we’ll try more methods (there’s a long story as to why), so I’d expect the second one to take longer.

The first method is our primary recommendation as it basically explicitly recommends a scheme that we have the flexibility to change. In the majority of cases, it should be faster and more reliable since it’s using our binary wire protocol to one or more nodes. It also allows us to instruct the client with certain common behaviors.

What would be great is if you can turn up the log level and just do a simple op with both, outputting timings. My guess is it could have something to do with DNS resolution (as we do a DNS SRV request in the former case), but that’d be a defect if so.

@brett19: any other thoughts?

Hi @ingenthr,

Thanks for the reply. I will try turning up the log level today and providing some debug with timings.

When you say “turn up the log level” do you mean as per the instructions under “Changing log levels” on this page? I assume the “cluster” component is most relevant? Looking in the static_config file, loglevel_cluster is already set to debug (the default) so I think it’s already at the correct level.

Assuming the log level is at the correct level, which is the most relevant log file? I have looked at all the logs /opt/couchbase/var/lib/couchbase/logs when running my test but none of them seem to be particularly relevant to this. In fact, a grep for “couchbase:” is empty.

I’ve written a simple test app that just opens the couchbase cluster with one of the two protocols, connects to a bucket, runs a simple ViewQuery (limited to one result, rather than 0, to prove it’s actually connected) and closes the bucket. I have used console.time to time how long it takes between “new couchbase.Cluster” and “bucket.on(‘connect’)”. I can attach the test app if required. The results are as follows:

$ node cbTest.js http
Connecting to couchbase with url http://localhost:8091
cbtest: 2.991ms
Bucket open
Query returned 1 IDs

$ node cbTest.js
Connecting to couchbase with url couchbase://localhost
cbtest: 5841.208ms
Bucket open
Query returned 1 IDs

As you can see, using http it takes around 3ms, whereas using couchbase: takes around 6 seconds. Quite a difference!
This test is repeatable and the results are always similar.

If you can let me know which log file to check I can upload it.

Thanks,
Giles

p.s. As an aside, I have run this test code on three other machines, one Macbook Pro, and two Linux servers, both nodes hosted by Linode. None of these servers shows the same huge difference (I agree with the initial assessment that there’s something strange with the DNS lookup on the first machine, which is also a Linux server) but on all three machines, connecting via http is quicker than connecting via couchbase.

Macbook Pro: http 5.493ms, couchbase 24.734ms
Linux server 1: http: 6.466ms, couchbase 10.857ms
Linux server 2: http: 7.065ms, couchbase 13.638ms

The difference in timing with these is negligible and I’m not going to lose sleep over them, but I wanted to post them as it goes against @ingenthr’s initial assessment that the couchbase method should be quicker in the majority of cases.

Just tried to upload the test app but I’m not able to upload js files. Therefore it’s just cut-n-pasted below…

var couchbase = require('couchbase');
var ViewQuery = couchbase.ViewQuery;


const settings = {
  DATABASE: "xxx",
  PASSWORD: "xxx",
  DDOC: "xxx",           
  VIEW: "xxx",       
};


// Main processing.
(function() {
  var admincb;
  var bucket;
  var url;

  if (process.argv.length > 2 && process.argv[2] === "http") {
    url = 'http://localhost:8091';
  } else {
    url = 'couchbase://localhost';
  }  
  console.log("Connecting to couchbase with url", url);
  console.time("cbtest"); 
  admincb = new couchbase.Cluster(url);
  bucket = admincb.openBucket(settings.DATABASE, settings.PASSWORD);

  bucket.on('error', function(error) {
    console.log("err opening bucket: ", error);
  });
  bucket.on('connect', function() {
    console.timeEnd("cbtest");
    console.log("Bucket open");
    var query = ViewQuery.from(settings.DDOC, settings.VIEW).limit(1);

    bucket.query(query, function(err, rows) {
      console.log("Query returned %d IDs", rows.length);
      bucket.disconnect();
      if (err) {
        console.log("Query returned error", err);
      }
    });
  });
})();

Hi @giles,

Since it’s the SDK side that’s exhibiting the unexpected behavior, I was actually suggesting the log level on the node.js SDK as covered in the Collecting Information and Logging section of the developer docs for node. Most of that will be libcouchbase logging to stderr.

What you should see is the specific steps during bootstrap.

Thanks for digging further. I don’t think this should hurt anything, but I hate to waste a few ms of your day! :slight_smile:

Hi @ingenthr,

I’m not too worried about a few ms of my day - life’s short but not that short! However, almost 6000ms every time I open a bucket is more than a few…

Thanks for the link. I have rerun my test with LCB_LOGLEVEL=5. The output is as follows:

Test with couchbase://localhost

0ms [I0] {25536} [INFO] (instance - L:351) Version=2.6.3, Changeset=822615ccbeecb9107bc4c02616b41aed6d9938c0
0ms [I0] {25536} [INFO] (instance - L:352) Effective connection string: couchbase://localhost/yopublic?client_string=couchnode%2F2.2.4. Bucket=yopublic
0ms [I0] {25536} [DEBUG] (instance - L:223) Applying initial cntl client_string=couchnode/2.2.4
0ms [I0] {25536} [DEBUG] (instance - L:66) Adding host localhost:8091 to initial HTTP bootstrap list
0ms [I0] {25536} [DEBUG] (instance - L:66) Adding host localhost:11210 to initial CCCP bootstrap list
5838ms [I0] {25536} [DEBUG] (confmon - L:89) Preparing providers (this may be called multiple times)
5838ms [I0] {25536} [DEBUG] (confmon - L:99) Provider FILE is DISABLED
5838ms [I0] {25536} [DEBUG] (confmon - L:96) Provider CCCP is ENABLED
5838ms [I0] {25536} [DEBUG] (confmon - L:96) Provider HTTP is ENABLED
5838ms [I0] {25536} [DEBUG] (confmon - L:99) Provider MCRAW is DISABLED
5838ms [I0] {25536} [TRACE] (confmon - L:292) Start refresh requested
5839ms [I0] {25536} [TRACE] (confmon - L:271) Current provider is CCCP
5839ms [I0] {25536} [INFO] (cccp - L:118) Requesting connection to node localhost:11210 for CCCP configuration

[...continues...]

Test with http://localhost:8091

0ms [I0] {25545} [INFO] (instance - L:351) Version=2.6.3, Changeset=822615ccbeecb9107bc4c02616b41aed6d9938c0
0ms [I0] {25545} [INFO] (instance - L:352) Effective connection string: http://localhost:8091/yopublic?client_string=couchnode%2F2.2.4. Bucket=yopublic
0ms [I0] {25545} [DEBUG] (instance - L:223) Applying initial cntl client_string=couchnode/2.2.4
0ms [I0] {25545} [DEBUG] (instance - L:66) Adding host localhost:8091 to initial HTTP bootstrap list
0ms [I0] {25545} [DEBUG] (instance - L:66) Adding host localhost:11210 to initial CCCP bootstrap list
0ms [I0] {25545} [DEBUG] (confmon - L:89) Preparing providers (this may be called multiple times)
0ms [I0] {25545} [DEBUG] (confmon - L:99) Provider FILE is DISABLED
0ms [I0] {25545} [DEBUG] (confmon - L:96) Provider CCCP is ENABLED
0ms [I0] {25545} [DEBUG] (confmon - L:96) Provider HTTP is ENABLED
0ms [I0] {25545} [DEBUG] (confmon - L:99) Provider MCRAW is DISABLED
0ms [I0] {25545} [TRACE] (confmon - L:292) Start refresh requested
0ms [I0] {25545} [TRACE] (confmon - L:271) Current provider is CCCP
0ms [I0] {25545} [INFO] (cccp - L:118) Requesting connection to node localhost:11210 for CCCP configuration

[... continues...]

I don’t know if you can glean any clues from that. Both outputs look near identical to me, other than the inexplicable delay of 5838ms.

Any ideas?

Thanks,
Giles

Indeed, that’s statistically significant, if you invoke the script a lot. :slight_smile:

Unfortunately, I don’t spot anything looking at that. @mnunberg: can you advise? How would we find this? Should we ltrace to figure out if it’s DNS related?

@giles: if you want to try that, add to your invocation ltrace -c -f -l libresolv in front of the command. That’ll count the number and time of calls into libresolv, which does DNS lookups for the process. If there’s a difference between the two, we’ll have found the culprit.

You could try our new experimental project that @brett19 has been working on to make it easier to diagnose environmental problems. Its working name at the moment is SDK doctor. Maybe try running it there to see if it identifies anything?
You’ll find pre-built binaries on the release page.

Oh, one other thing… this looks like a relatively old lcb and that makes me think it’s an old couchnode. Maybe a more recent one won’t exhibit the same problem?

And finally, thanks for iterating with us on this!

Thanks Matt.

I suspect it is DNS related. I’m not totally sure how the DNS is set up on that machine but it takes a long time to ssh and scp to it as well, which I’m sure isn’t coincidence.

I’ll try the ltrace test later and also try out SDK doctor when I’ve got a bit of time. I’ll also look at updating couchnode. “npm list couchbase” says that I’m using couchbase@2.2.4.

Thanks,
Giles

btw, couchbase://localhost and http://localhost:8091 are equivalent. So I’m rather surprised there’s any kind of difference between them.

@brett19 - does node (in that version) preparse the connection string?

Getting the same behaviour on Node v6, windows 8, couchbase 4.5.1-2845 Community Edition

new couchbase.Cluster(‘couchbase://127.0.0.1’) takes 4 seconds
new couchbase.Cluster(‘http://127.0.0.1’) takes 2 ms

interestingly however

new couchbase.Cluster(‘couchbase://127.0.0.1:8091’) only takes a couple ms

Seems like including port is a work around

Hello,

Just thought I would share that I have tried this test again on CB CE 5.0.1 and still see the same situation. Opening a connection on “couchbase://localhost” takes longer than “http://localhost:8091”. I know that theoretically these are the same, but there must be some difference in implementation.

However, as @jarred.filmer found, using “couchbase://…:8091” takes the same time as http. Therefore I will use this version.

Interesting, @avsej, any thoughts on what’s happening here?

I don’t observe this behaviour, and from code perspective, I don’t see anything that could slow things down.

if it is possible, could @giles provide output of the strace -Ttt and ltrace -Ttt of the script running? That should show more context on what is going on.

Hi @avsej,

I just want to mention that I no longer see the huge (~6000ms) opening times when using the couchbase protocol. I’m not sure why or what’s changed. However, that protocol takes definitely takes longer (around 30ms) compared to the http method (around 4ms). This is repeatable.

I have uploaded a zip here containing the following files:
strace_cb.txt
strace_http.txt
strace_cb8091.txt
ltrace_cb.txt
ltrace_http.txt (fails)
ltrace_cb8091.txt (fails)

The *_cb files open the cluster with “couchbase://localhost”. The *_http files open the cluster with http://localhost:8091. The *_cb8091 files open the cluster with “couchbase://localhost:8091”

The two failure (ltrace with localhost:8091) gave a timeout error when opening the cluster:

err opening bucket: { CouchbaseError: Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout
message: ‘Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout’,
code: 23 }

This was repeatable with those protocols. It didn’t happen when running ltrace with “couchbase://localhost”. I’m not sure how to increase the timeout when opening a bucket.

I hope you can glean some useful information from this. Good luck!

Thanks for the details @giles! That’ll surely help in understanding where the time is going. I’ve opened CCBC-896 to track this.