We have several environments set up. On some of them, N1QL queries are working correctly. On others, they are not working. All environments are running the same version (4.1.0). After searching this site and others, I tracked the issue down to the query port (8093) not being listened to.
I attempted contacting it through cbq on the box couchbase is installed and got this:
cbq> select * from default limit 1; [31m ERROR 5000 : Post http://localhost:8093/query: dial tcp 127.0.0.1:8093: ConnectEx tcp: No connection could be made because the target machine actively refused it. ←[0m
I looked through the logs but couldn’t find where anything relevant was recorded. Both servers where 8093 is closed had this in their logs, but it’s unclear if it’s related:
Service 'goxdcr' exited with status 1. Restarting. Messages: MetadataService 2016-11-30T04:41:08.908Z [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: ConnectEx tcp: No connection could be made because the target machine actively refused it., num_of_retry=3
MetadataService 2016-11-30T04:41:08.908Z [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: ConnectEx tcp: No connection could be made because the target machine actively refused it., num_of_retry=4
RemoteClusterService 2016-11-30T04:41:08.908Z [ERROR] Failed to get all entries, err=metakv failed for max number of retries = 5
Error starting remote cluster service. err=metakv failed for max number of retries = 5
[goport] 2016/11/30 04:41:08 c:/Program Files/Couchbase/Server/bin/goxdcr.exe terminated: exit status 1
Where is a good place to begin troubleshooting this issue?
What version of Windows are you running?
Note that Windows 10 Anniversary Update is not currently a supported platform, and Couchbase doesn’t work on it yet. The 4.6.0 Developer Preview build does work, and Windows 10 AU will be supported once 4.6.0 is released.
We’re running Windows Server 2012.
Is it a security issue? Do you need to explicitly enable these ports on Windows? Is there an equivalent of ps to check if the query service process is running?
The rest of Couchbase is running. 8091 is listening, 8092 is listening, etc. Everything except the N1QL/query port seems to be working.
The Couchbase service is running. If there are other services that the query port is tied to, then I didn’t see them in the services list even on the servers where everything is working.
What is the name of the process that the query port is tied to?
There is no firewall or other security features that would get in the way on this box.
There should be a process called cbq-engine.
I am not seeing this process. Are there logs or event log messages that should be tied to this process failing to start or stopping early?
Hi @isha, can you check for the cbq-engine process on Windows. Thx.
When you check the task manager , you should see a cbq-engine process running. If it is, then you will also be able to see the query.log logs.
If there is no query log and if there are no http requests from cbq-engine in http_access_internal.log then there is no query service running.
If there was a query service there should be a query log.
Could you please upload the query.log and the http_access_internal.log ? I can then take a look at why you cannot connect to 8093.
The logs should be in (default installation location)
query.log has not been created.
The forum’s saying “Sorry, new users can not upload attachments.”, so I can’t send the 15MB http_access_internal.log.
Is there somewhere else I could send it? It’s 593KB zipped, so might fit through email filters…
If there is no query log then there is no query service running. Hence you get the above error from cbq.
Did you check the box for query when installing Couchbase ?
Our Couchbase installs were all done by a scripted process that was run in the same manner on all of the boxes. It is working on some and not others.
When I ran the installer for 4.1.0 on my development box, it did not give any options for a partial install. It installed everything including query. In fact, the only option presented at all was the directory the files are copied to. After the software is installed and the initial setup webpages showed up from http://localhost:8091/, there was no mention of query. Despite this, the port is open and it accepts queries.
Are you thinking that the query parts were not installed at all? Or more that they were installed and are failing to start somehow?
Is there a configuration file that might show if query is enabled?
It might be that query was never installed at all. But i can’t verify that until I see the logs. If the query service is enabled then you will have query logs.
A couple of questions
For the nodes that didn’t install query, even though the script is the same, can you see the UI on that node and see what services have been enabled ?
When you say, scripted manner on all of the boxes, what does boxes refer to ? Docker instances ? VMs? AWS instances ? Local machines ?
If you see Data and/or Index services then that means that only query service was not brought up, in which case can you upload all the logs for the nodes where query was not successfully installed ?
(Maybe you can upload to dropbox and add a link here so that i can download it and take a look.)
Sounds like the installation doesn’t have credentials to the file system.