Auto-compaction and Fragmentation


#1

What does fragmentation percentage mean?

From the doc, I understood that all the out-of-date/deleted etc will be considered fragmented. However, it does not seems to mention what the percentage represents. If fragmentation is 70%, does that mean 70% of total data usage? If so does this data only consider data on disk?

We are trying to use Couchbase to replace our Memcache. However, we’re having an issue where I ran into constant compaction with 30% setting, causing connection refused. We are currently testing at 90% but that percentage can be reached fairly fast considering our caching strategy. When 90% rate is hit, compaction only brought it down to around 70% making it hit 90% much faster.

Is there an option to set compaction based on total disk usage? Such as trigger compaction when disk space is less than 30% left for example.

Is there any other metrics I should pay attention to? Our current set up is 3 Couchbase-node cluster with the following spec for each node: 4 vCPU, RAM 8GB, Disk 30 GB. We’re only using Key-Value with no View or Index.


#2

Hi @RobGThai,
for the KV size, the value reported in fragmentation percent represent the % of bytes in the file that is orphaned/fragmented (with append only writes). So 30% fragmentation means you are wasting %30 of the size of the file is fragmented space.

The issue is the file gets much larger when you wait until 90% fragmentation. this means compaction takes longer which means, by the time you finish compaction, you already have fragmentation that occurred in during the compaction process.

you can get compaction triggered based on MB or time as well under settings. Make sure to remember that because compaction requires a file copy, you cannot utilize more than 10-12GB of the 30GB you have available on your drive. Depending on the pace of growth in your files and your IO performance, you may find that you need to lower the compaction trigger to below 10GB.

Could you give me more details on the error you are seeing exactly? I’d expect increased latency here instead of the refused connection error.
thanks
-cihan


#3

Thanks for the response.

So the MB compaction trigger is based on current disk usage size of total cluster? If I want to trigger when disk usage reach 30GB total then I can put 30,000 MB in the setting?

Right now our disk usage growth at 50% capacity is 1GB per minute. We are using cron to manually compact using Compact API every 20 minute.

The issue before I believe that 30% fragmentation is too low for our growth. Causing too many back-to-back compaction and causing connection to be refused. This makes me rather unsure if setting disk usage to 30 GB may cause back-to-back compaction again.

When I tried setting compaction to 90% fragmentation. By the time compaction finish, fragmentation has reached 75% again.

What’s the best method of auto-compaction in this scenario?