We’re currently building a bulk data product using N1QL and have run into some limitations.
We have around 200 gigs of data spread over 60ish million documents. We need to subdivide the data into ~600 parts for an API. Our data set is expanding over time and we expect even more additions moving forward. We do continual ingestion of data for all 600 feeds (every 15-60 minutes depending).
We tried going with large indexes for what we are trying to accomplish, but it was too slow. Ultimately we swapped to each part having it’s own set of indexes but that resulted in over 3000 GSI that we are now having to manage.
We’re reaching out to see if there is something we are missing, or if there are other potential solutions to scaling this up.