CouchbaseError: Temporary failure received from server. Try again later


#1

I am keep getting this from time to time, when writing heavily to the server.
One local node, 4 indexes.

I am saving few bulks of 5000 (light) docs every minute.

What do I need to check / improve to get rid of this error?

Thanks,
Ady.


#2

Any idea what I can check to prevent this?


#3

@ady.shimony can you please give me a bit more details on the following?

  1. What is the CBServer version?
  2. Which SDK are you using?
  3. Cbcollect around the time you are seeing this issue.

I have a suspicion that you need to throttle your writes but if you can most details on the above it would be easier to figure out the problem.


#4

Thank you raju.

I am using Community Edition 5.1.1 build 5723
Nodejs sdk

Metadata overhead warning. Over 66% of RAM allocated to bucket “tx-history” on node “127.0.0.1” is taken up by keys and metadata. (repeated 11 times)
menelaus_web_alerts_srv 000
ns_1@127.0.0.1
7:37:17 PM Sat Sep 15, 2018
Service ‘memcached’ exited with status 137. Restarting. Messages:
2018-09-15T19:35:25.071834Z WARNING (tx-history) Slow runtime for ‘Running a flusher loop: shard 1’ on thread writer_worker_3: 673 ms
2018-09-15T19:35:25.072057Z WARNING (tx-history) Slow runtime for ‘Running a flusher loop: shard 2’ on thread writer_worker_2: 648 ms
2018-09-15T19:35:25.089031Z WARNING (tx-history) Slow runtime for ‘Running a flusher loop: shard 0’ on thread writer_worker_1: 651 ms
2018-09-15T19:35:30.311795Z WARNING (tx-history) Slow runtime for ‘Running a flusher loop: shard 2’ on thread writer_worker_2: 702 ms
2018-09-15T19:35:30.318865Z WARNING (tx-history) Slow runtime for ‘Running a flusher loop: shard 1’ on thread writer_worker_3: 772 ms
2018-09-15T19:35:30.371961Z WARNING (tx-history) Slow runtime for ‘Running a flusher loop: shard 0’ on thread writer_worker_1: 778 ms
2018-09-15T19:35:30.411040Z WARNING (tx-history) Slow runtime for ‘Running a flusher loop: shard 3’ on thread writer_worker_0: 630 ms
2018-09-15T19:36:50.177296Z WARNING (tx-history) Slow runtime for ‘Adjusting hash table sizes.’ on thread nonIO_worker_1: 60 ms hide
ns_log 000
ns_1@127.0.0.1
7:36:52 PM Sat Sep 15, 2018
Control connection to memcached on ‘ns_1@127.0.0.1’ disconnected: {badmatch,
{error,
closed}}
ns_memcached 000
ns_1@127.0.0.1
7:36:52 PM Sat Sep 15, 2018
Metadata overhead warning. Over 66% of RAM allocated to bucket “tx-history” on node “127.0.0.1” is taken up by keys and metadata.

Server:

Bucket:


#5

I am saving 5000 docs in parallel, wait for success response, and doing it in a loop, around 20-50 times (100K ~ 250K docs total).

Every thing is working most of the time, and then I get this error. Bucket memory is 16 gig, should be enough.

BTW - trying to save only 1000 in parallel make it worse, I get this error much more.
Looks like couch is heaving hard time to save so many docs? like the write queue is running out of memory.


#6

At a glance your cluster is under-provisioned (resident ratio of 0.927% shows heavy eviction has occurred), when you receive the errors on write it is most likely because the bucket has exceeded the high water mark and writes will fail until enough memory can be freed.

https://docs.couchbase.com/server/5.5/understanding-couchbase/buckets-memory-and-storage/memory.html#ejection

Consider increasing the memory available to the bucket (can be achieved by adding more nodes or giving more RAM to the existing bucket), also consider full-eviction mode if you really need to store more in the bucket than the available RAM.


#7

Thanks, that’s what I did, I moved to full-eviction and its gone, for now.


#8

@ady.shimony Is your issue resolved now?


#9

I thnks so, I added memory to the machine (56 gig) and move do to full-eviction.
But now I have indexing issues… out of disk space so I stopped the process, but that is for another topic I guess.

I will update soon when I return to run the process.

Thanks.


#10

@ady.shimony yes please post your issue with indexing, will do best to answer your question


#11

Ok. clean DB.
New machine. 2 tera ssd for indexes.
Full Ejection.
30 giga for the bucket.
4 giga for indexing.

Writing bluks of 5000 doc, around 10-50 bulks.

Getting this error again.
30 gig is not enough?

Why is it working most of the time, but fail from time to time?

I prefer to solve it then retry in the code, this is unacceptable “feature” of couhcbase…