Querying large documents is extremly slow from SDK, but fast from web-ui or cbq

I’m storing lists of commonly used passwords (bruteforce lists) and want to quickly look up whether a password is contained within one of these lists when a user wants to register.
The passwords are stored in text files one per line. For each textfile a couchbase document is created that contains all of its passwords in an array. The documents are inserted by the following upsert query:

UPSERT INTO `blacklist`
VALUES ("password-blacklist::" || $name, {
   "type" : "password-blacklist",
   "name": $name,
   "passwords" :$passwords
})

$name is the name of the imported file (without the extension) and $passwords is the array of all passwords stored in that text Document. I’ve uploaded the lists for you, they are also loaded in the enterprise container that I’ve already uploaded. So you could also export them from there if you’ve got it running.

So here are some more details about the files:

  • The array of most documents contains exactly 1.500.000 passwords. The others contain between 20.000 and ~800.000 elements
  • The password lengths varies. Since these are commonly used passwords they aren’t too long. Almost all passwords should be below 20 symbols. There are just some few passwords that have lengths up to 255 symbols, these few exceptions are mostly wrong data (e.g. fragments of an html page) that somehow got into the dataset.

Thanks for the sharing the information (and samples). I see that many files have close to 1.5 Million items in the array!! 1.5M is indeed a large number of items per array. Let me try this on my setups (for 6.0 and 6.5) community edition and get back to you.

1 Like