I’m curious why the data service uses it’s own managed cache as opposed to using the page cache (assuming linux OS) ?
Hi @mr_x ,
First I am not intimately familiar with Couchbase internals. Although I do agree somewhat with your premise why not just let the page cache fetch and manage data, I might lean towards why not just memory map everything and avoid double copies, however there are real substantial issues that crop up with all databases.
Typically page caches are easy to use and advantageous but there are considerations I will list a few in a generic fashion:
- They do suffer from double copy overhead kernel space to user space.
- They can suffer from non-sequential access patterns typical of database access or fragmentation.
- The page sizes are typically a set size 4K, 8K, 64K (128K for ZFS) and have an impedance mismatch to the data sizes .
- Consider 1 M “hot” 256 bytes records randomly spread across 10TB of data if your Linux page size is 4K you’re wasting 16X the space relying on the page cache where each item would be in a “page” this gets worse as the page size increases…
- If Linux needs more memory for normal applications than is currently available, some pages no longer in use will be automatically ejected. This could force unnecessary removal of “hot” data or some critical system control data.
- When relying on a page cache you will see latency spikes when the OS performs management of files/memory that are outside fo the realm of the database.
I just touched on a few issues above, for further insight I would recommend reading the following, note, I just did a Google search on:
"linux page cache" issues databases
- Do not underestimate performance impacts of swapping on NUMA database systems | FromDual
- Optimizing Linux Memory Management for Low-latency / High-throughput Databases | LinkedIn Engineering
The other thing I’d mention is the higher up you are in the application stack, the more intelligent you can be about memory management. Couchbase Server knows the access patterns and can therefore manage memory much more intelligently than the page cache could. Just one example: when we read something from disk for replication purposes via XDCR, we know that we don’t need to keep it in memory any longer than it takes to send it across the network and get acknowledgment. That’s one of many things the Data Service (a.k.a. KV engine) does when managing memory carefully.
Hope that helps!
Thanks Jon. I will check out those links.