Create GSI on a Couchbase cluster

knightnyc · August 5, 2016, 3:50pm

I have a 4.5 Couchbase cluster setup with 3 nodes and these services

IP1 - Data/Index/Query
IP2 - Data/Index/Query
IP3 - Data/Index/Query

I have a Node.js app that runs a N1QL query to create some indexes:

CREATE INDEX ix_1 ON test_bucket (field1) USING GSI;
CREATE INDEX ix_2 ON test_bucket (field2) USING GSI;
CREATE INDEX ix_3 ON test_bucket (field3) USING GSI;

In the Indexes tab of the web console, I see:

test_bucket IP1 ix_1
test_bucket IP2 ix_2
test_bucket IP3 ix_3

Does this mean that ix_1 only exists on IP1, and not on IP2 and IP3? If a query is executed against IP2 or IP3, does a table scan happen? How do I explicitly create the index on a particular node from the Node.js driver?

My code:
var cluster = new couchbase.Cluster(“couchbase://IP1,IP2,IP3”);
var query = “CREATE INDEX ix_1 ON test_bucket (field1) USING GSI”;
var n1qlQuery = couchbase.N1qlQuery.fromString(query);
bucket.query(n1qlQuery, function (err, result) {});

Much appreciated

brett19 · August 5, 2016, 7:03pm

Hey @knightnyc,

The node specification when creating an index specifies which node in your cluster will contain the data for that index. You are free to query any of the nodes in the cluster and it will read from that index though (ie: querying from Node2 will still read an index that is on Node1).

Cheers, Brett

knightnyc · August 5, 2016, 10:34pm

Thanks for the reply.

I am still a bit confused. I thought GSI is located on each instance of the cluster that has the Query service. if querying node2 will read from the index that is located on node1, wouldn’t that require an extra network hop?

ingenthr · August 6, 2016, 3:42pm

In EE releases, each node can have independent services. You need not have the index service on a query node.

You are correct about the extra network hop. It’s my understanding that there is not currently an optimization for using the closest index when the same index definition is in multiple locations.

Siri · August 6, 2016, 5:28pm

Yes. Query will use the best fit Index irrespective of the location of the index. When multiple best fits exist (for example: several identical indexes), it will load balance. The link between N1QL and GSI uses an efficient binary format and maintains a connection pool, so local vs non-local should not have a noticeable impact unless the interconnect is unusually slow. For EE users (especially 4.5 memory optimized indexes), running Indexes on separate node(s) is usually the best topology choice.

(Note: It is possible to configure the load balancing to use a local index over a remote one if the interconnect is measurably slow; but it is almost never necessary, and may skew load balancing, so I won’t dive into that in detail)