Index cannot be built or deleted

scott · January 24, 2019, 11:50am

Software: Couchbase Community Edition Version 5.1.1 build 5723
Hardware: AWS EC2 ts.large (2 CPUs, 8 GB RAM) (3 Nodes in cluster)

I created an index yesterday in my cluster, but got an error message to the effect that it had timed out and would probably finish later. Here is the n1ql to create the index:

create index account_from_subscription 
on `account-details`(si) 
where tp = "account" 
USING GSI WITH {"nodes":["172.31.19.229:8091"]};

The index was created and I can see it both in my console in the indexes tab and running a query against system:indexes. In the console it just says created and in the query it has the state of ‘pending’.

[Update: I just tried creating another index with indicating the node and it worked fine and I could delete it. The index was created on a different node than the indicated one. Then tried yet another index to that node and am getting the same issues. So this must be related to the node.]

So then I thought, well maybe I need to build it, so I ran a build index query and got the following response:

[
  {
    "code": 5000,
    "msg": "BuildIndexes - cause: Request timed out. Index server may still be processing this request. Please check the status after sometime or retry.\n",
    "query_from_user": "build index on `account-details`(account_from_subscription)\nUSING GSI;"
  }
]

Checking in the console list - no change. When I run the query against system:indexes, the state is still pending.

So then I decide to drop the index and run the command to do that, result is:

[
  {
    "code": 5000,
    "msg": "GSI Drop() - cause: Fail to drop index on some indexer nodes.  Error=Request timed out. Index server may still be processing this request. Please check the status after sometime or retry.\n.  If cluster or indexer is currently unavailable, the operation will automaticaly retry after cluster is back to normal.",
    "query_from_user": "drop index `account-details`.account_from_subscription;"
  }
]

So I figure it takes some time or something. I decide to look at some logs to see if I can see what is going on and the indexer shows some interesting loops where it recognizes an index delete token and says that it should handle this in cleanup, but never actually does anything.

I have let this sit for about 11-12 hours so it should be all done, but it hasn’t. I have attached 2 excerpts from the indexer.log file about 10 hours apart, where you can see the line
Clean up deleted index 3353678333526950753 during periodic cleanup
repeated several times each.

The Activity indicator in the console has never shown the index being built and I did create another index on another bucket and it showed in the activity monitor and even finished building.

Can someone help me with this? Is there some other way to see if the something is going on or to make an action?

indexer_logs.zip (13.0 KB)

Thanks,
Scott

prathibha · January 25, 2019, 9:35am

Hi @scott, Please share the full cbcollect of the cluster for me to get more details. Specifically, I want to look at indexer logs at the time index create and drop, which I cannot find in current logs.

Thanks,
Prathibha

scott · January 25, 2019, 11:17am

Hello Prathibha,

Thanks for helping out. I have loaded all three collect logs here, plus specifically the indexer log from the timeframe that you are looking for, so you can start with that one. The relevant server is 172.31.19.229, so those are the logs I would look at next.

I appreciate any input. In the end what I did to get around the issue was to failover that node and readd it again, thus losing the index. However, when I tried to index again, it actually worked and started indexing, but then the node became unresponsive again (this was last night (24th) around 22:00 server time). I could not even ssh into the server to see what was going on in the logs. I let it run like that overnight, thinking (hoping?!) that it might finish and come back, but it didn’t. I surmise that the index is just too big for the server to handle (17 million rows on 2 CPU / 8GB machine), but I am not sure.

Again any input you have to this issue would help me out going forward and would be much appreciated.

https://media.smallcubed.com.s3.amazonaws.com/data/collectinfo-2019-01-25T104813-ns_1%40172.31.19.229.zip
https://media.smallcubed.com.s3.amazonaws.com/data/collectinfo-2019-01-25T104813-ns_1%40172.31.21.51.zip
https://media.smallcubed.com.s3.amazonaws.com/data/collectinfo-2019-01-25T104813-ns_1%40172.31.29.140.zip
https://media.smallcubed.com.s3.amazonaws.com/data/indexer.log.1.zip

Scott

varun.velamuri · January 30, 2019, 4:24am

Hi Scott,

Thanks for sharing the logs. We are looking into this issue to understand what might be going wrong. I will soon comeback to you with updates.

Thanks,
Varun

varun.velamuri · January 31, 2019, 5:27pm

Hi Scott,

From the logs, it looks like the indexer is stuck because of which building or dropping of indexes is not happening. Unfortunately, I couldn’t root cause the exact reason why the indexer might be stuck from the logs.

Also, I observe “panics” in the logs because runtime is out of memory. The server capacity seems to be underpowered… likely not sufficient for 17M docs

Thanks,
Varun