Why different environment can show different efficiency?


#1

Dear all,
Hi, I have a test is to test the synchronous operation throughput of single thread . When install the couchbase-server in two environment, different throughput after running synchronous single-threaded Java code. A 20 k/ops throughput, another 8 k/ops. I don’t know what reason be cause. Uncertainty is caused by the CPU frequency?The code to do this is shown below:
Can achieve 20k/ops environment:RedHat 6.4(64 bit), 4 CPU 2.5Ghz Processor with 4GB RAM
Can achieve 8k/ops environment:RedHat 6.4(64 bit), 4 CPU 1.8Ghz Processor with 4GB RAM

List hosts = Arrays.asList(
new URI(“http://localhost:8091/pools”)
);
// Name of the Bucket to connect to
String bucket = “default”;
// Password of the bucket (empty) string if none
String password = “”;
// Connect to the Cluster
CouchbaseClient client = new CouchbaseClient(hosts, bucket, password);
while(true){
String value = (String)client.get(“a0”);
if(value == null){
System.out.println(“error”);
}
}


#2

Hi,

what you are testing here is effectively network latency. Both client and server (at those specs by far) are bored if you just loop on a single thread and wait for sync results. Your limiting factor here is definitely the network or something else in between that can hold up packets going back and forth.

I understand you want to test this, but I assume that this does not reflect your actual production load. You definitely want to test this multi-threaded to get better batching effects on the IO side and also get the system a little bit closer to its limits.


#3

Thank you for the information. I know that this test is not too conforms to the characteristics of the product. But I have a demand, it is necessary to test this. Because I run on the server side of JAVA code, so should be able to rule out the reason of the network. But I can’t find other reasons. Whether there will be other factors or configuration will affect the result.


#4

Well, let’s do the math quickly. If you can do 8k ops/s over one thread, then each request back to back takes 1/8000 => 0,000125s, so 125µs.

Depending on which operation system, network stack, application server,… you use (especially if virtualized) that might be pretty good. What is different between those machines? Some OS still send their requests through the regular TCP/IP stack, some short-circuit.

For example when I run a ping to localhost on mac os x I still have

64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.046 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.078 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.080 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.086 ms
64 bytes from 127.0.0.1: icmp_seq=4 ttl=64 time=0.075 ms
64 bytes from 127.0.0.1: icmp_seq=5 ttl=64 time=0.069 ms
64 bytes from 127.0.0.1: icmp_seq=6 ttl=64 time=0.055 ms

which you need to add even on localhost

You can identify the parts that are different between each servers and then rule them out, as well as doing more app-level profiling with tools like VisualVM or YourKit.


#5

when I run a ping to localhost on two environment:

Can achieve 20k/ops environment:
64 bytes from localhost (127.0.0.1): icmp_seq=58 ttl=64 time=0.019 ms
64 bytes from localhost (127.0.0.1): icmp_seq=59 ttl=64 time=0.031 ms
64 bytes from localhost (127.0.0.1): icmp_seq=60 ttl=64 time=0.031 ms
64 bytes from localhost (127.0.0.1): icmp_seq=61 ttl=64 time=0.024 ms
64 bytes from localhost (127.0.0.1): icmp_seq=62 ttl=64 time=0.014 ms
64 bytes from localhost (127.0.0.1): icmp_seq=63 ttl=64 time=0.027 ms
64 bytes from localhost (127.0.0.1): icmp_seq=64 ttl=64 time=0.040 ms
64 bytes from localhost (127.0.0.1): icmp_seq=65 ttl=64 time=0.030 ms
64 bytes from localhost (127.0.0.1): icmp_seq=66 ttl=64 time=0.021 ms
64 bytes from localhost (127.0.0.1): icmp_seq=67 ttl=64 time=0.017 ms
64 bytes from localhost (127.0.0.1): icmp_seq=68 ttl=64 time=0.022 ms
— localhost ping statistics —
68 packets transmitted, 68 received, 0% packet loss, time 67851ms
rtt min/avg/max/mdev = 0.013/0.022/0.040/0.008 ms

Can achieve 8k/ops environment:
64 bytes from localhost (127.0.0.1): icmp_seq=14 ttl=64 time=0.036 ms
64 bytes from localhost (127.0.0.1): icmp_seq=15 ttl=64 time=0.037 ms
64 bytes from localhost (127.0.0.1): icmp_seq=16 ttl=64 time=0.036 ms
64 bytes from localhost (127.0.0.1): icmp_seq=17 ttl=64 time=0.033 ms
64 bytes from localhost (127.0.0.1): icmp_seq=18 ttl=64 time=0.042 ms
64 bytes from localhost (127.0.0.1): icmp_seq=19 ttl=64 time=0.037 ms
64 bytes from localhost (127.0.0.1): icmp_seq=20 ttl=64 time=0.035 ms
64 bytes from localhost (127.0.0.1): icmp_seq=21 ttl=64 time=0.042 ms
64 bytes from localhost (127.0.0.1): icmp_seq=22 ttl=64 time=0.039 ms
64 bytes from localhost (127.0.0.1): icmp_seq=23 ttl=64 time=0.042 ms
^C
— localhost ping statistics —
23 packets transmitted, 23 received, 0% packet loss, time 22724ms
rtt min/avg/max/mdev = 0.030/0.036/0.054/0.005 ms


The average time almost two environment:
2W/OPS:avg 0.022ms
8K/OPS:avg 0.036ms

So,I think this is not the reason.


#6

Alright, the next step would be to do application level profiling and see what’s different inside your JVMs. Use YourKit or VisualVM for that