Streaming the results of a database query

natoman · August 27, 2020, 5:26pm

I’m attempting to query a database with a high volume of documents, likely several gigabytes after Couchbase Lite’s compression has been accounted for. Is there any way to stream the results of a Couchbase Lite query one by one from the database or does IQuery.Execute() inherently load all the resulting documents into memory?

natoman · August 27, 2020, 5:30pm

Actually, would IWhere.Limit() allow me to page the database a set of documents at a time?

jens · August 27, 2020, 5:59pm

Queries do preload the results into memory (not the entire documents, just the result columns you define.)

One workaround is to page the query. Offset/limit is easy but generally not the best way. For one thing, it gets expensive as the offset increases, because the query engine usually has to compute and then throw away all the preceding rows. For another, adding/removing documents can throw off the offset, so you’re not paging evenly anymore.

A better solution is to do a range query based on the same criteria you’re sorting on. For example if you’re sorting by modDate, query ... WHERE modDate > $LastDate LIMIT $PageSize. Here you use a parameter LastDate that starts at zero (or an empty string), and each time you run the query you set it to the modDate of the last row.

(If multiple rows might have the same value, which is possible with modDate, you need to change > to >= and weed out initial rows that might be dups of the last ones from the prior query.)

Hope that makes sense. This technique is pretty general to any kind of database, so you should be able to find more detailed treatments of it in books or online references.

natoman · August 27, 2020, 6:13pm

That’s very helpful, thank you! I don’t know why I didn’t think of it once I realized all the results were being loaded into memory, but I could also run a query that selects less data (i.e. just the document id) and load all the query results into memory. Then I could just fetch each document by id. Of course there’s always the possibility of having so many documents that even loading the id’s into memory becomes very expensive. The option you outlined is more work but doesn’t have that deficiency.

jens · August 27, 2020, 8:53pm

Yeah, I almost brought up “just load the ID” but I had to go to a meeting…

That approach is going to be less performant because there are economies of scale in getting a whole bunch of results at once in a query. But if the number of items you’re fetching / displaying at once isn’t too big, it should work well.

I don’t know about .NET, but UIKit’s table view manages the row objects for you, so you only end up populating small numbers of rows at a time. That makes it very convenient to bind a Document to a row, since it’ll be freed when the table view frees the row.

natoman · September 17, 2020, 3:49pm

So we chose to go with the “just load the id’s” approach because it’s less invasive and there won’t really ever be a situation where we have so many documents that the id query will run out of memory. We purge documents that are too old and based on expected usage, we should never have more than a few MBs of documents before that purge happens. Finally, we’ll have a limit clause on the query to avoid an OOM.

However, I’m hammering this thing without the limit clause just to see what happens. I expected the process to reach about 4GB (it’s a 32-bit process) of memory usage and then throw an OOM and crash. Instead I’m getting a Lite Core “bad allocation” exception when the process reaches 2GB. I saw some Github issues related to “bad allocation” errors and it looks like they occur when Lite Core runs out of memory. So this exception isn’t all that out of the ordinary given what I’m doing, but it’s curious that it happens with only 2GB of memory used. Most of that memory (as I can see in Visual Studio) is .NET and other 3rd party libraries. Very little of it is Lite Core. Any thoughts on this?

natoman · September 17, 2020, 4:21pm

The top answer here might indicate why:

jens · September 17, 2020, 5:03pm

Instead I’m getting a Lite Core “bad allocation” exception when the process reaches 2GB

Collections with contiguous storage have a problem where the ‘grow’ operation temporarily requires over 2x the size of the collection. They need to allocate a new buffer of size 2n and copy the data into it before they can free the previous buffer of size n, so the high water mark is 3n. (Often the grow factor is less than 2, like 1.5, but that still results in memory usage greater than 2n.)

Another nasty problem that strikes in highly-loaded 32-bit processes is address space fragmentation. The process may be using only half of its 4GB address space, but the largest contiguous free space could be only tens of MB. (I remember Chrome running into this back in the day.)

natoman · September 17, 2020, 5:14pm

Aaaah. Yes, that’d do it. I assume that the Couchbase query uses contiguous storage just because it’s faster and really large queries should be split up like you suggested earlier. That explains why Couchbase was the one throwing the exception and not .NET. It might be worth determining which dll’s are forcing our project to compile as a 32-bit executable and seeing if we can replace them or isolate them in a separate process.

natoman · September 18, 2020, 1:45pm

Is the sort of “bad allocation” exception thrown by Lite Core recoverable? Does Lite Core reset itself automatically and free its memory? Or do I need to restart my application in the event that I encounter this exception?

jens · September 18, 2020, 6:21pm

Yes, it’s recoverable.