Primary index is it keys only or also values?

n1ql
index

#1

Is the primary index keys only or also values I see conflicting messages over the internet:

https://blog.couchbase.com/create-right-index-get-right-performance/:slight_smile:

The primary index is simply the index on the document key on the whole bucket

So according to this it’s on keys only and not on values (documents).

The primary index is the simplest index, indexing all of the documents in travel-sample

“all of the documents” ? is it indexing also the values / documents?!?!

So is primary index indexing keys only or also values(documents)?


#2

It only indexes the keys.

(If it also indexed the complete value it would just be a duplicate of the Bucket data in the Data Service).


#3

This makes sense to me but how do you explain the following:

I start a new couchbase cluster. Create new bucket named “mybucket”

I create 2 new documents:

key: transaction::1 value: {"docvalkey1":"docvalkey1-val1","docvalkey2":"docvalkey2-val1"}
key: transaction::2 value: {"docvalkey1":"docvalkey1-val2","docvalkey2":"docvalkey2-val2"} 

In couchbase administration I goto: localhost:8091 then to Query screen.

I type the following query that queries by value:

select docvalkey1 FROMmybucketWHERE docvalkey1 = 'docvalkey1-val1'

I get back as result:

[
  {
    "code": 4000,
    "msg": "No index available on keyspace mybucket that matches your query. Use CREATE INDEX or CREATE PRIMARY INDEX to create an index, or check that your expected index is online.",
    "query_from_user": "select docvalkey1 FROM `mybucket` WHERE docvalkey1 = 'docvalkey1-val1'"
  }
]

I then create a primary index:

CREATE PRIMARY INDEXmybucket-primary-indexONmybucket` USING GSI;

I then rerun the above query on values:

select docvalkey1 FROMmybucketWHERE docvalkey1 = 'docvalkey1-val1'

and low and behold this time I do get the results, the only change is that I create the primary index this time:

[
  {
    "docvalkey1": "docvalkey1-val1"
  }
]

How come only after adding the primary index it agreed to do a query on the values? doesn’t that means that primary index indexes values? if not why does it force me to create it to query by values?


#4

No, it means that query performs a scan of all documents (“table scan” in ralatinal terminology), fetches each from the Data Service and checks your predicate.


#5

I see I see also in the explain i don’t see its showing that the index is used when doing the flitering on values, it mentions the index only for scan and I hope it means its the scan of the keys:

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "PrimaryScan",
        "index": "mybucket-primary-index",
        "keyspace": "mybucket",
        "namespace": "default",
        "using": "gsi"
      },
      {
        "#operator": "Fetch",
        "keyspace": "mybucket",
        "namespace": "default"
      },
      {
        "#operator": "Parallel",
        "~child": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "Filter",
              "condition": "((`mybucket`.`docvalkey1`) = \"docvalkey1-val1\")"
            },
            {
              "#operator": "InitialProject",
              "result_terms": [
                {
                  "expr": "(`mybucket`.`docvalkey1`)"
                }
              ]
            },
            {
              "#operator": "FinalProject"
            }
          ]
        }
      }
    ]
  },
  "text": "select docvalkey1 FROM mybucket WHERE docvalkey1 = 'docvalkey1-val1'"
}

#6

docvalkey1 is field in the document. It uses primary index and get all the document keys and does Fetch of the document from DataNode and apply the filter.

If the index has all the information query required it will avoid fetch and uses covered index.
If you need the document key use META().id
SELEC META().id FROM mybucket WHERE META().id LIKE “xyz%”;


#7

I see, I actually did want to filter by the value and wondered why that was not possible without the primary index, as I thought couchbase must be able to do a full scan without primary index. But maybe I was wrong and for couchbase to do a full scan of keys it must have the primary index which is wierd to me, all it needed to do is scan all docs in the bucket and then apply the filters on value, which sounds trivial to me for cb to do without the primary index, just scan all docs in bucket…


#8

Couchbase works on (key, value) look ups. If you know the key you can access documents with out any index. If you don’t know the key and required to use query you required primary index or secondary index. Even Full Scan on table it required primary index which has all the keys in the bucket.

https://developer.couchbase.com/documentation/server/current/architecture/querying-data-and-query-data-service.html

https://developer.couchbase.com/documentation/server/current/architecture/architecture-intro.html


#9

I read the first linked page and I cite:

Unlike views, N1QL does not require a dedicated index and can issue ad-hoc queries on data

Well this is definitely incorrect as I have just learned that N1QL requires primary index. I can understand that it does not require index on the value, but for the sake of documentation preciseness, they must have said that primary index is a must. They cannot just say no index is required for N1QL as it’s simply incorrect.

I take it when they say “dedicated” they mean it’s own index. And that primary index is more like a global index which is not dedicated to the specific queries in N1QL. However I think it would only proper to mention here that although it does not require a dedicated index it does require the primary index at minimum in order to do a full scan and without primary index no full scan is possible by N1QL, would it be possible to clarify the documentation?


#10

In a limited use case, when you know the document keys, you can directly fetch and manipulate the data without going through the index.

SELECT * FROM b USE KEYS [“K1”, “K2”];