Sync gateway hangs after 5 hours of usage

Hi everyone!

I’m using Couchbase sync gateway to receive data from 75 pouchdb clients. After that I process the data and send the processed data to mobile devices.

Every minute the client send an new document to the sync gateway. The sync gateway receives the data, processes alll the received data once every 5 mins and sends the new/updated docs to the mobile devices. So there are about 4500 docs added every hour. The total size of the database is rapidly expanding, so that’s why old docs get removed to reduce database size every 6 hours.

Currently I have allot of issues regarding the sync gateway being non responsive after several hours of usage. There’s clearly no weird stuff going on in the logs as far as I can see. I’m using NodeJS to create new docs based on the minute data I receive.

When an mobile device updates, it only has to see minute doc data from the last 24 hours.

There are currently 11 production views (~3 per design).

After several hours of usage the sync_gateway just becomes non-responsive. How can I investigate on what the cause of the problem is. Logs don’t say much, except the error i’m throwing regarding deletion of docs. And where to look at?

I’m using Couchbase 4.1.1 and Sync gateway 1.3.1

{
   "interface":"0.0.0.0:4984",
   "adminInterface":"0.0.0.0:4985",
   "log":
      [
       
      ]
   ,
   "logFilePath":"/home/sync_gateway/sg_error.log",
   "pretty":true,
   "maxCouchbaseConnections":10,
   "databases":{  
      "mobile":{  
         "server":"http://localhost:8091",
         "bucket":"smart",
         "username":"username",
         "password":"password",
         "users":{  
            "GUEST":{  
               "disabled":false,
               "admin_channels":[  
                  "*"
               ]
            }
         },
         "sync":`
   function (doc, oldDoc) {
    if (doc._deleted) {
        if(doc.type === 'minute' || doc.type === 'smart'){

             channel("*");
        }else {
        requireRole("admin");
        throw({forbidden : "CANNOT DELETE DOC"});
        return;
      }
    }
    if (doc.type) {

        if (doc.type === 'minute') {

            if (doc.timestamp > (Date.now() / 1000) - 86400) {
                channel("channel-" + doc.uuid);
            } else
                channel("*");
        } else if (doc.type === 'hour') {

            if (doc.timestamp > (Date.now() / 1000) - 86400) {
                channel("channel-" + doc.uuid);
            }else
                channel("*");


        } else if (doc.type === 'day' ||
                   doc.type === 'Ebalance') {

            channel("channel-" + doc.uuid);

        } else if (doc.type === 'Etransfer') {

            channel("channel-" + doc.sender);
            channel("channel-" + doc.receiver);

        } else if (doc.type === 'Erequest') {

            channel("Erequest-" + doc.uuid);

        } else if (doc.type === 'wijkminute'
            || doc.type === 'wijkday') {
            if (doc.timestamp > (Date.now() / 1000) - 86400)
                channel("wijk");
            else
                channel("*");

        } else if (doc.type === 'forecast') {
            channel('forecast');
        } else  if (doc.type === 'ads'){
            if(!doc.trash)
              channel('ads');
        }
        else  {
            channel("*");
        }
    }
};

            `,
         "shadow":{  
            "server":"http://localhost:8091",
            "bucket":"app",
            "username":"username",
            "password":"password"
         }
      },
      "smart":{  
         "server":"http://localhost:8091",
         "bucket":"smart",
         "username":"username",
         "password":"password",
         "maxCouchbaseOverflow":64,

         "users":{  
            "GUEST":{  
               "disabled":false,
               "admin_channels":["*"]
            }
         },
         "sync":`
              function(doc, oldDoc) {

                if (oldDoc && doc._deleted) {
                    requireRole("admin");
                    throw({forbidden : "cannot delete"});
                    return;
                }
                channel(doc.channels);
            }
            `,
         "shadow":{  
            "server":"http://localhost:8091",
            "bucket":"app",
            "username":"app",
            "password":"password"
         }
         
        
      }
   }
   ,"CORS": {
    "Origin": ["http://localhost:8100"],
    "LoginOrigin": ["http://localhost:8100"],
    "Headers": ["Content-Type", "Accept", "Authorization", "Origin", "Referer","X-CSRF-Token"],
    "MaxAge": 17280000
   }
}

Hi,

As a first guess I think you may be running your SG instances out of memory. If they start using virtual memory (swapping) they will slow way down.

From your description, it sounds like you may be updating docs, which leads to doc revisions. Can you check to see how big your docs are getting? Especially if you have documents with conflicts that remain unresolved, they may be getting quite large.

You also say you have 11 production views. The views may be taking a large amount of space depending how how they’re built and what’s included.

Your log configuration that you show is turning off logging. You might want to turn logging on. You can see the possible parameters here.

Hi hod.greeley,

Thanks for your response. I’ve removed a few views and i’ve mentioned that the sync gateway freezes when couchbase starts indexing. When indexing is finished, sync gateway goes further with syncing. When I look at the sync gateway log, I see this:

    19:47:34.794808 2016-12-06T19:47:34.794+01:00 WARNING: Skipped Sequence 6141363 didn't show up in MaxChannelLogMissingWaitTime, and isn't available from the * channel view.  If it's a valid sequence, it won't be replicated until Sync Gateway is restarted. -- db.(*changeCache).CleanSkippedSequenceQueue.func1() at change_cache.go:220
19:47:34.801888 2016-12-06T19:47:34.801+01:00 WARNING: Skipped Sequence 6141365 didn't show up in MaxChannelLogMissingWaitTime, and isn't available from the * channel view.  If it's a valid sequence, it won't be replicated until Sync Gateway is restarted. -- db.(*changeCache).CleanSkippedSequenceQueue.func1() at change_cache.go:220
19:47:34.808905 2016-12-06T19:47:34.808+01:00 WARNING: Skipped Sequence 6141368 didn't show up in MaxChannelLogMissingWaitTime, and isn't available from the * channel view.  If it's a valid sequence, it won't be replicated until Sync Gateway is restarted. -- db.(*changeCache).CleanSkippedSequenceQueue.func1() at change_cache.go:220
19:47:34.817015 2016-12-06T19:47:34.817+01:00 WARNING: Skipped Sequence 6141370 didn't show up in MaxChannelLogMissingWaitTime, and isn't available from the * channel view.  If it's a valid sequence, it won't be replicated until Sync Gateway is restarted. -- db.(*changeCache).CleanSkippedSequenceQueue.func1() at change_cache.go:220
19:47:34.823984 2016-12-06T19:47:34.823+01:00 WARNING: Skipped Sequence 6141373 didn't show up in MaxChannelLogMissingWaitTime, and isn't available from the * channel view.  If it's a valid sequence, it won't be replicated until Sync Gateway is restarted. -- db.(*changeCache).CleanSkippedSequenceQueue.func1() at change_cache.go:220
19:47:34.831350 2016-12-06T19:47:34.831+01:00 WARNING: Skipped Sequence 6141376 didn't show up in MaxChannelLogMissingWaitTime, and isn't available from the * channel view.  If it's a valid sequence, it won't be replicated until Sync Gateway is restarted. -- db.(*changeCache).CleanSkippedSequenceQueue.func1() at change_cache.go:220
19:47:34.838350 2016-12-06T19:47:34.838+01:00 WARNING: Skipped Sequence 6141379 didn't show up in MaxChannelLogMissingWaitTime, and isn't available from the * channel view.  If it's a valid sequence, it won't be replicated until Sync Gateway is restarted. -- db.(*changeCache).CleanSkippedSequenceQueue.func1() at change_cache.go:220
19:47:34.845279 2016-12-06T19:47:34.845+01:00 WARNING: Skipped Sequence 6141381 didn't show up in MaxChannelLogMissingWaitTime, and isn't available from the * channel view.  If it's a valid sequence, it won't be replicated until Sync Gateway is restarted. -- db.(*changeCache).CleanSkippedSequenceQueue.func1() at change_cache.go:220
19:47:34.852858 2016-12-06T19:47:34.852+01:00 WARNING: Skipped Sequence 6141384 didn't show up in MaxChannelLogMissingWaitTime, and isn't available from the * channel view.  If it's a valid sequence, it won't be replicated until Sync Gateway is restarted. -- db.(*changeCache).CleanSkippedSequenceQueue.func1() at change_cache.go:220
19:47:34.859714 2016-12-06T19:47:34.859+01:00 WARNING: Skipped Sequence 6141386 didn't show up in MaxChannelLogMissingWaitTime, and isn't available from the * channel view.  If it's a valid sequence, it won't be replicated until Sync Gateway is restarted. -- db.(*changeCache).CleanSkippedSequenceQueue.func1() at change_cache.go:220

Now i’ve mentioned that there are docs missing, but i’ve double checked and it should be in an channel. Here’s an example of an doc that is not synced:

hour:9bb771f7-1262-4163-b46c-ab7ffa00d6be:1481036400
{
  "E": 0,
  "_sync": {
    "rev": "2-aba4d1764853d11e3604aba10b536d22",
    "sequence": 6239762,
    "recent_sequences": [
      6212718,
      6215357,
      6218697,
      6220541,
      6223324,
      6227096,
      6229486,
      6232715,
      6235943,
      6239762
    ],
    "history": {
      "revs": [
        "2-aba4d1764853d11e3604aba10b536d22",
        "1-7822eb9ca82d854ee272726264c946d7"
      ],
      "parents": [
        1,
        -1
      ],
      "channels": [
        [
          "channel-9bb771f7-1262-4163-b46c-ab7ffa00d6be"
        ],
        [
          "channel-9bb771f7-1262-4163-b46c-ab7ffa00d6be"
        ]
      ]
    },
    "upstream_cas": 1481048585673572400,
    "upstream_rev": "2-aba4d1764853d11e3604aba10b536d22",
    "time_saved": "2016-12-06T19:23:11.310825733+01:00"
  },
  "date": "2016-12-06T15:00:00.000Z",
  "electricity": {
    "delivered": {
      "actual": {
        "reading": 0,
        "unit": "kW"
      },
      "tariff1": {
        "reading": 555.366,
        "unit": "kWh"
      },
      "tariff2": {
        "reading": 736.381,
        "unit": "kWh"
      }
    },
    "received": {
      "actual": {
        "reading": 8.77,
        "unit": "kW"
      },
      "tariff1": {
        "reading": 6947.47,
        "unit": "kWh"
      },
      "tariff2": {
        "reading": 8901.207,
        "unit": "kWh"
      }
    }
  },
  "t": 1481036400,
  "timestamp": 1481036400,
  "ts": 1481036880,
  "type": "hour",
  "uuid": "9bb771f7-1262-4163-b46c-ab7ffa00d6be"
}

And an doc that is synced:

hour:9bb771f7-1262-4163-b46c-ab7ffa00d6be:1481025600
{
  "E": 0,
  "_sync": {
    "rev": "2-c826d88b80c0c85337ec549bb0a568c5",
    "sequence": 6237956,
    "recent_sequences": [
      6229600,
      6232790,
      6235952,
      6237956
    ],
    "history": {
      "revs": [
        "1-e3dd5791aa98fd8a6d46a55a31998bd6",
        "2-c826d88b80c0c85337ec549bb0a568c5"
      ],
      "parents": [
        -1,
        0
      ],
      "channels": [
        [
          "channel-9bb771f7-1262-4163-b46c-ab7ffa00d6be"
        ],
        [
          "channel-9bb771f7-1262-4163-b46c-ab7ffa00d6be"
        ]
      ]
    },
    "channels": {
      "channel-9bb771f7-1262-4163-b46c-ab7ffa00d6be": null
    },
    "upstream_cas": 1481048585674293200,
    "upstream_rev": "2-c826d88b80c0c85337ec549bb0a568c5",
    "time_saved": "2016-12-06T19:23:09.092023373+01:00"
  },
  "date": "2016-12-06T12:00:00.000Z",
  "electricity": {
    "delivered": {
      "actual": {
        "reading": 0,
        "unit": "kW"
      },
      "tariff1": {
        "reading": 555.366,
        "unit": "kWh"
      },
      "tariff2": {
        "reading": 736.381,
        "unit": "kWh"
      }
    },
    "received": {
      "actual": {
        "reading": 5.978,
        "unit": "kW"
      },
      "tariff1": {
        "reading": 6947.47,
        "unit": "kWh"
      },
      "tariff2": {
        "reading": 8876.015,
        "unit": "kWh"
      }
    }
  },
  "t": 1481025600,
  "timestamp": 1481025600,
  "ts": 1481026080,
  "type": "hour",
  "uuid": "9bb771f7-1262-4163-b46c-ab7ffa00d6be"
}

in the sync gateway, the timestamp calculation looks like this: doc.timestamp > (Date.now() / 1000) - 86400 Both of them return true.

The amount of changed docs is very limited.

Currently there are 1.2 million docs in the sync gateway

There is no swap initialized. And when I look at ram and CPU usage, there are no crazy peaks. The server has 6 cores and 16 gb ram.

I think that I found the bottleneck. It’s the large amount of docs sent by pouchdb (120k when sync gateway is flushed times 8 = 960k docs ) from the clients. After replication is finished, the amount becomes 8 docs per minute. When that is limited, the sync gateway works proper.

When I look at the github from Pouchdb, I see conflict errors caused by sync in the past https://github.com/pouchdb/pouchdb/issues/5793 Maybe the Pouchdb sync gateway is not a good idea.

1 Like

Thanks for the update. We’ll have to take a look at this.