Bulk Insert in couchbase

HI I have a scenario where I have some 20 million userIds.

I want to push all these ids in couchbase , but the ids should be looked up in constant time. Hence I am thinking of creating those 20 mil documents with empty data inside, so that adding , removing those userIds can be done in constant time. Is this approach right ? I do not want to push these userids inside document because in that case lookup will be slower. Is this understanding correct ?

If you’re referring to having documents in Couchbase with the key/id as the userId and the content of the document a JSON doc or an empty doc, as long as it’s in the in memory working set it’d be accessed on average in O(1). In fact, this is one of the advantages to Couchbase. Given a key, the client knows exactly which node to go to. There are no metadata lookups and there is very little metadata synchronization.

The size of the document doesn’t matter so much, as long as you have the resources to keep it in memory.

If you exhaust memory and eject values to disk, or you change the eviction policy to full eviction, then it can become O(log n). Removing or testing for existence is still O(1).

20 million isn’t that many items with modern systems, so it should be fine. Hope that helps!

1 Like