The issue of retrieve huge mount of data


#1

I use the java query method to retrieve the entire view in couchbase.
the data size is over 10 million. But it always throws the error message:
java.lang.OutOfMemoryError: GC overhead limit exceeded.

I believe it causes the heap is overflow and my computer memory is limited. Anyone can
give me some suggestion.


#2

hi @cuishucheng,
Your best bet is to use pagination. The idea is to introduce a limit in your query (say 20 elements) and then re-issue another query to get the next page.
Next page queries will use the last key in the previous page as a starting point.
See Pagination section in the view query documentation


#3

Hi, is there a pretty wan to do this in java client 2.x ?
Below seems to work but feels ugly.

    ViewQuery query = ViewQuery.from("x", "x")
            .stale(Stale.FALSE)
            .reduce(false)
            .limit(PAGINATION_SIZE)
            .startKey(from);
    
    ViewResult result = bucket.query(query);
    Iterator<ViewRow> rows = result.rows();
    while (rows.hasNext()) {
        Long lastSeenKey=null;
        while(rows.hasNext()) {
            final ViewRow row = rows.next();
            lastSeenKey = (Long)row.key();
           row.document();
           // use doc :)
        }
        query = query
                .startKey(lastSeenKey)
                .skip(1);
        rows = bucket.query(query).rows();
    }

#4

hi @Forsman, there’s no utility support that “wraps” this kind of code yet in the 2.x SDK, but that’s surely something we’d like to add back at some point.

It was available and wrapped in the 1.4.x client as a Paginator, but the underlying approach is the same. If you’re motivated, you could even contribute a port of it to the new SDK :smile: