Running out of disk space


#1

Hi,
We’re having a problem where we are almost running out of disk space.
Looking at the number of items we are storing in our the bucket, it should not be utilising anywhere near the disk space that it is and also states on the monitoring console.
Our data has a maximum lifetime of 60 minutes and the number of items has remained pretty consistent over time yet the active disk usage has had a steady increasing line over time.
I’ve taken a screenshot of our monitoring page for the bucket mentioned, the view is for a month.
Screenshot : http://www.use.com/9a1c5e8df6972fa92903
So basically the question is why if my items have a TTL of 60 minutes, the number of items is not increasing over time and the size of the items going in are not changing, why is the disk usage going up?
I dont know if this can be related to my other post (SAN Storage and VM Usage)?
Any help would be appreciate, please let me know if I can provide anymore information.
Thanks,
Marvin


#2

Hi Tug,
Thanks for responding, I have read about the way couchbase uses append only and actually this should be perfect for us. The items we store are all valid for 60 minutes from the time they are entered so in theory when the compaction runs all the records at the start would be invalid up to a certain point.
I think once every 60 minutes should be fine for the compaction to run, is there anyway to see how long a compaction process is taking ?
I’ve tried playing with the fragmentation % before compaction but the higher we have it the quicker we get to the point of the diks nearly being full. I tried at 60% but have now got back down to the default 30%, I fear if I go any lower it will be compacting 100% of the time.
With this append only model and how we are using it (every record being valid for 60 minutes) I would of expected the disk usage to be stepped, an increase followed by a drop for when compaction runs. Assuming a steady and constient rate of input after a few hours of running I would expect the disk usage to follow the same pattern but I just dont understand over a month how the disk usage can just keep increasing.
The only explantion I can think of would be that the compaction is taking longer than 60 minutes, basically it is not able to expire and rewrite the files fast enough. This is kind of the reason for my other question regarding the use of a SAN, could our SAN be too slow? Maybe if I can see how long the compaction is taking I can determine if this is the issue. If its not the issue then I’m kind of stumped.
Regards,
Marvin


#3

Hello,
The expiration of data stored in Couchbase could be viewed as a 2 steps process:

  • when the TTL is reached, the metadata is flagged in memory and the element is not accessible anymore (but the document is not “yet” removed from disk)
  • then a process is running to remove all expired elements from disk
    Couchbase is doing lazy expiration to be sure that it does not impact the overall performances.
    This is why you still see some space on disk.
    Take a look to this chapter in the documentation, it should help your:
    https://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-cbep
    Remember that Couchbase uses an Append Only approach so you need to configure the compaction too:
    http://blog.couchbase.com/compaction-magic-couchbase-server-20
    Regards

#4

jaceq wrote:
Hi Marvin,
Have a look at this topic: http://www.couchbase.com/forums/thread/persistence-becoming-issue-can-it-be-disabled

Thanks jaceq!
A few posts near the bottom sound very similar to the problem we are experiencing.
The unsafe purge sounds interesting, might have to give that a go in the morning.
Regarding the bug that dipti mentions, should this be fixed by now?
Thanks again.
Marvin


#5

Hi,
Our servers were alerting with disk usage >90% again so I gave the unsafePurgeBucket command a go.
curl -u Administrator:****** -X POST http://localhost:8091/pools/default/buckets/ResultCache/controller/unsaf
Wow, what a difference, the disk usage has dropped from 80gb to 17gb.
You can see my bucket monitor summary below.
http://www.use.com/a3c1829074aaca75075e
The thread that jaceq refers to mentions that this command purges metadata that optimises XDCR, surely there has to be some problem with that?
Thanks again jaceq!, this issue has stopped us pushing more load onto this cluster and has also prevented us from building out a new cluster for moving a much larger mysql cache to couchbase.
Regards,
Marvin.


Sync Gateway attempts to add missing document in changes feed on startup
#6

Just to add to this… sorry
Our bucket currently has 170,000 items in it
The Disk Usage is 76.4 Gig
The Data Usage is 64.4 Gig
The items going in have a key length of 32 and an average data length of 50,000 bytes
The bucket is set to use 1 replica.
The numbers just dont add up to me.
The data with 0% fragmentation should be around 8.5 Gig which is what I expect since I sized the VM’s so that all items would be served from RAM, 4 nodes with 3GB allocated to this bucket (12GB)
When I look at the bucket overview it shows I have 12GB RAM for my bucket and 8GB is in use with 3.22Gb free.
So the RAM is working exactly as I expect it.
Of course since we have 1 replica set of data then this doubles the amount of disk needed in the cluster.
So the total data on disk should be around 17Gig with 0% fragmentation. Even with 50% fragmentation it should be getting no where near the numbers that is actually in use.
Am I mis-understanding something?
Regards,
Marvin


#7

Hi,
In my case it was similar, data on disk dropped from 60GB to 10GB
So I have 5x more metadata than actual data and I don’t have XDCR setup at all!
Also I was wondering if I had XDCR could I still use this command or would it break it?
Anyway I think something is wrong here and someone from developers should look at this.


#8

Hi,
I’m still trying to get my head around this as to where our disk space is.
Can you validate this assumption…
The trigger for compaction is 30% fragmentation.
If we are inserting records at the same pace for a day and the records are valid for 60 minutes.
After the first hour all data is of course valid but about 18 minutes into the 2nd hour the fragmentation should reach 30% at which point the auto compaction would run at its next scheduled run time.
Of course this is making an assumption for the complete bucket.
Does the 30% fragmention apply to the full bucket or each individual vbucket? I think the latter?
Does the auto-compaction only compact the vbuckets that are 30% fragmented?
When running a manual compaction the disk space and overall fragmentation has a bigger drop, does this mean the manual compaction compacts all vbuckets even if they are <30% fragmented?
Our system is heavy on writes and I think 60 minutes is a relatively short TTL, we’ve tried to use a higher fragmentation limit than the default of 30% but that didn’t seem to work. Our servers are not loaded so do you think a smaller level of fragmentation could be used as the threshold, maybe 15/20%?
Last question, I think I read that the auto-compaction runs every 60 minutes, is this the same time every hour or is it 60 minutes from when the last compaction process ended?
Sorry for all the questions.
Regards,
Marvin


#9

Hi Marvin,
Have a look at this topic: http://www.couchbase.com/forums/thread/persistence-becoming-issue-can-it