When to use SQL vs Key Value Operations

Hi, I am just starting out with Couchbase and I’m figuring out how to build my system. I am using the python SDK. I am wondering when to use SQL vs Key Value Operations.

It looks to me like all the Key Value Operations require you to know the document keys in advance. I am not seeing a way to search for documents and return a list of keys. I don’t see a way to iterate over all the documents in a collect.

For example, lets say I want to delete all the documents in a collection and I want to use the remove method on the collection class. Can I find all the keys or do I need to use SQL for this? Is there a way to search for documents using a SQL where clause and get a list of document keys?

Thanks,
Chris King
Spire Animation Studios

Key-value operations are much faster than query operations. If you asked me for a number, I would say 20 times faster.

I am not seeing a way to search for documents and return a list of keys. I don’t see a way to iterate over all the documents in a collect.

Here is one way to get document keys - How to quickly list 1M document identifiers?

Here is another - python - how to get all the key from one bucket in couchbase? - Stack Overflow ( using map/reduce ).

Detailed instructions for using map/reduce :
Here’s the easy/quick way to get the ids.

  1. go to the couchbase webconsole
  2. select “Views”
  3. Click on “Add View”
  4. Complete Design Document Name with “allids” (giving design/dev)
  5. Enter View Name as “getids”
  6. Click on Save
  7. Click on Publish for the View.
  8. curl -u Administrator:password ‘http://localhost:8092/my_bucket/_design/allids/_view/getids?reduce=false’ > allids

The default Map without any Reduce gives all the ids in the bucket.

{"id":"999993","key":"999993","value":null}, {"id":"999994","key":"999994","value":null}, {"id":"999995","key":"999995","value":null}, {"id":"999996","key":"999996","value":null}, {"id":"999997","key":"999997","value":null}, {"id":"999998","key":"999998","value":null},

There is an upcoming feature - range-scan - that will also provide document keys.

It really depends on your use case.

If you always search by keys then Key Value is the fastest.

If you want to search with other fields in your document say by name or state or you want to run an aggregate and then group a certain way, or use expressions to evaluate or maybe limit the number of results you get, or say join 2 collections, then N1QL is the way to go.

You could also use a hybrid approach wherein you use expressions in N1QL to find the document id and then use KV to fetch the document.

Also, if you have used RDBMS in the past, and have SQL queries that you want to use and/or knowledge of SQL then again N1QL helps in smooth transitioning

1 Like

I haven’t found anything in the couchbase documentation that gives clues how to do this. I did find a forum post that linked to a stack overflow that told me how to find document ids from a SQL search.

SELECT META().id FROM container WHERE…

I will experiment with this.

The same stack overview post talks about using views to access the keys directly but I haven’t found any documentation that give any clues how to do that.

N1QL META().id gives document key.

https://couchbase.live/ might give u quick start modify one of query and META().id as projection and see.

@cking

I’m not sure this would really be a benefit, as you could do the whole thing in N1QL directly, and N1QL at worst would do essentially these same two steps internally, while at best depending on what you are trying to access it might have a faster path available to it, e.g. if you only need to retrieve certain fields and there is a covering index, then the fields can be served directly from the index without ever having to access the document itself.

In other words, if you already have a document key in hand, then looking it up by key is the fastest way to get that document, but if you have to do a N1QL query first to get the key, it seems likely you may get faster results by rolling the rest of what you want to do into that same N1QL query rather than going back to fetch the document yourself as a second step.

Original question was about deleting documents, but I think the same will hold: at worst N1QL has to get all the keys and then delete them all from KV, but it doesn’t have to go back and forth to the “outside world,” i.e. your application, so it will likely be faster.

’*

m not sure this would really be a benefit, as you could do the whole thing in N1QL directly, and N1QL at worst would do essentially these same two steps internally, while at best depending on what you are trying to access it might have a faster path available to it,

If you use SDKs to get the document from KV after getting meta().id thru N1QL(n1ql query of document keys must be covered), you can get the documents asynchronously and in parallel. It avoids huge data transfer by avoiding to go through two hops (Data node to query node to client).
*

2 Likes

Deleting few documents via N1QL is fine. If all the documents in collection and millions of them fastest approach will be drop and create collection and corresponding indexes.
EE Delete of lot of documents eventing can be option Examples: Using the Eventing Service | Couchbase Docs

Some useful documents:
https://docs.couchbase.com/python-sdk/current/concept-docs/data-services.html#querying

Program would be something on the lines of:

#!/usr/bin/python3
import sys
from couchbase.cluster import Cluster, ClusterOptions
from couchbase.auth import PasswordAuthenticator
pa = PasswordAuthenticator('username', 'password')
cluster = Cluster('couchbase://127.0.0.1', ClusterOptions(pa))
bucket = cluster.bucket('travel-sample')
collection = bucket.default_collection()
query = """
  SELECT meta().id as id
  FROM `travel-sample` h
  WHERE h.type = 'hotel'
    AND h.city = 'Malibu' LIMIT 5
"""
try:
  result = cluster.query(query)
  id1 = []
  for row in result:
    # each row is an instance of the query call
    id1.append(row['id'])
  result = collection.remove_multi(id1)
except:
  print("exception:", sys.exc_info()[0])

using views to access the keys directly but I haven’t found any documentation that give any clues how to do that.

Literally the first reply to your first post.