Setting priority for push and pull

I have tested different sync_gateway configurations and found out that in our use case the absolute winning configuration is to use separate SG for pull and push. And configure them so that push sg process has high priority and pull low.

We are seeing 10x performance improvements using this setup, because now push-events will occur at higher priority than pull and they stop causing more “entropy” to the whole system later on when every client is just pulling. I am using about 500 clients to test this.

I am wondering if there is already some configuration option in SG that could be used to adjust push & pull priority according to the load application use case is causing?

One other interesting observation in this test is that sync_gateway takes about 80% of CPU all the time and couchbase server is doing very little work (like 7% cpu)! This is also interesting to me, that why is it actually that sync_gatweway is such a bottleneck?

Hi there. This is a really interesting post! Am I right in thinking you have 2 separate sync-gateway processes? Are they on 2 different machines? And for replication each couchbase lite client connects to both sync-gateways, one for outbound changes, and one for inbound changes?

Yes, exactly that way. Even having two separate SG:s for pull and push on same machine helps a lot.

This is how I start them on Windows:
start /high sync_gateway.exe sync_gateway_push.config
start /low sync_gateway.exe sync_gateway_pull.config

And in configs make sure the admin and rest api ports do not collide.

Have you found any problems having multiple sync-gateways connecting to the same bucket?

None so far at least. And I think because of “couchdb-style replication”, there even shouldn’t be any problems. It should allow arbitrary topologies.

( I hope I have understood this correctly :slight_smile: My understanding is that this is how you scale sync_gateway horizontally - just adding more instances to same bucket? )

What kind of throughput are you testing with (i.e. what’s the read/write load for each of your 500 clients?)

It’s true that push and pull functionality are basically distinct code paths within Sync Gateway, and when running as a single process they are going to be competing for CPU/memory. Based on that, splitting the load to two different machines is going to show improvement. However, I’m surprised that you’re seeing a significant benefit when running multiple sync_gateway instances on a single machine. If you can share some more specifics, I’d like to dig into this a bit. Can you share:

  1. The machine specifications
  2. What version of Sync Gateway you’re running
  3. The read/write load (writes/second, reads/second)
  4. What metric you’re using for the “10x performance improvement”

Thanks!

  1. Clients: osx, Server side: Single windows 10 machine, with Intel i7 4770k, 16 Gb ram, SSD disk
  2. 1.2.0
  3. (have to check this later)
  4. I measure the time it takes to run the whole test and finally wait until all 500 DB:s are replicated to equal state

The test generates 1 doc for each 500 dbs, then replicates all to every node and then every node makes a change. So, there will be ton of conflicts and lots of traffic to replication.

My current understanding is that the the setting of process priority is the key here, why 2 SG-setup is so much faster. Every push that reaches server, causes 499 more pulls. So if all the pushes are done first as highest priority, it will make the whole process way faster… And main cause for this post from me was that I started considering if this would be even generally a usefull setup? Would it be so that in many use cases a single push causes the whole distributed system to ripple?

That helps clarify the scenario - a few more followup questions, if you don’t mind. I’m trying to sort out if there’s something else going on in the sequence of the updates that’s causing the difference in performance, aside from just the Sync Gateway split.

  1. When you say the pushes are ‘done first as highest priority’ - are you actually giving the push replication any priority from the client side? Or do you just mean that there’s a SG node dedicated solely to push replications?
  2. Are all your clients making the same change to the docs they pull (so that duplicates get ignored), or is the expected result 500 docs with ~498 conflicts per doc?
  1. Only dedicated SG node.
  2. All make a random change. The test document is just {“testValue”: 1} and every node increment this value with random amount.

Did some new tests and these are the results:

Single SG, 50 DBs, 50 docs each
Test done in 260.387 seconds
Double SG, 50 DBs, 50 docs each
Test done in 192.618 seconds

Single SG, 300 DBs, 1 docs each
Test done in 2033.9 seconds
Double SG, 300 DBs, 1 docs each
Test done in 591.634 seconds

So the 300x1 test has about 350% improvement. But the effect seems exponential based on the client amount, in the 50x50 test the improvement was only 30%.

Did you have any difficulty starting 2 sync gateway processes? I’m running on ubuntu and don’t seem to be able to start 2.