I have some confusion about key generation and indexes

query

#1

I have imported a dataset into Couchbase that looks like so:

{
  "CLUSTER": "M1M",
  "CLUSTER_NAME": "MARTIN MARIETTA",
  "PRIMARY": "",
  "SET_NUM": "10000163",
  SHORTENED_NAME": "MARTIN MARIETTA MATERIALS",
  "TYPE": "SET",
  "_class": "com.company.aad.xref.model.ClusterCodeXref"
}

I had to provide a key-generation strategy, and I made the strategy what I ultimately want my index to look like; %SET_NUM%::%TYPE%. So I have a few questions:

  • Does the key-generation automatically create a field called ID with those 2 elements, or do I need to create an ID column in my CSV dataset?
  • How can I create an index out of those 2 fields? I understand how to use the CREATE INDEX command with composite fields, but will that index look like the key generated by %SET_NUM%::%TYPE%? I need them to be the same.

I hope my question is clear! Would appreciate any help.


#2

The key can be accessed in query or index using META().id

My suggestion is switch the key to %TYPE%::%SET_NUM% so that it makes easier for query and efficient.
Example : Your bucket name is default
Primary index means index on document key on whole bucket.

            CREATE PRIMARY INDEX ON default;
            SELECT META().id, * FROM default WHERE META().id LIKE "SET::%";
            This is more efficient because "SET::" is prefix and can be passed to indexer and indexer 
                 produces only entries started with "SET::".
            If "::SET" is postfix index needs produce all index entries and query need to apply filter 

You can also create Secondary Index

CREATE INDEX ix1 ON default (`SET_NUM`) WHERE `TYPE` = "SET";
SELECT META().id * FROM default WHERE `TYPE` = "SET" AND `SET_NUM` > "10000163";

#3

In cbimport, the --generate-key option takes the expression to generate a DOCUMENT KEY, which lives outside the document. The key generation takes the expression to create the DOCUMENT KEY for each document imported. You simply need the SET_NUM and TYPE fields in your CSV file, which you have.

The PRIMARY INDEX on the imported data already looks like your key expression.

You can also create a separate index (if you like) with the expression, essentially creating the same index with duplicate (value, rowid). No real good reason to do that.

CREATE INDEX i1 ON bucket(SET_NUM || “::” || TYPE)


#4

Thank you both for such clear explanations!

The thing is, I’m really working with 2 use cases. I’m doing an initial load of Couchbase with cbimport, and thereafter data will be manually entered and queried by an application written with spring-data-couchbase.

Spring requires a document ID for the findById method. That’s why I thought I needed the generated key to live in both the document and the index (for faster retrieval). Does that make any sense to do?


#5

@dickersonc - Couchbase started out long ago as a key-value store, where each document has a key, and separate from that a value that could be a string (such as JSON), or even binary data.

When you put a document into Couchbase, you must give it a key, and the very fastest way to retrieve a document is to identify it with the key. That is why following a consistent key pattern is valuable.

There is no need to put the key into the document itself, though you can if you really want to. Even though the key isn’t inside the document, it is automatically tied to the document. When you create a “primary” index, that index looks at the key, not the document.

For your two use cases, use the same key pattern for cbimport and for inserting/upserting new documents with Spring.


#6

Thank you Eben! You’ve saved me a lot of work.

Can you tell me this though: As we insert new documents, how are their keys created? Is that something you specify in N1QL (I’m totally new to Couchbase)?


#7

If you insert documents through any of the SDKs, one of the required parameters is the document key.

If you insert documents through N1QL, the syntax also includes a key along with the document. E.g.,

INSERT INTO `travel-sample` (KEY,VALUE) 
VALUES ( "airline_4444", 
    { "callsign": "MY-AIR",
      "country": "United States",
      "iata": "Z1",
      "icao": "AQZ",
      "name": "80-My Air",
      "id": "4444",
      "type": "airline"} ),
VALUES ( "4445", { "callsign": "AIR-X",
      "country": "United States",
      "iata": "X1",
      "icao": "ARX",
      "name": "10-AirX",
      "id": "4445",
      "type": "airline"} ) ;