N1QL Queries Very Slow Though Fields Are Indexed

Hi,

I have Couchbase Community 6.6.0 one node cluster running on a Digitalocean VM (4 cores, 8 GB RAM) and I used the Python SDK to load about 2 mil documents from some CSV files. I also created indexes for the fields I want to query but, when using N1QL they are painfully slow.

This is the structure of a document:

  • active (number)
  • address (object), child type:
    • cp (string)
    • adress_details (string)
    • county (string, indexed)
    • loc (string, indexed)
    • no (string)
    • sect (string)
    • str (string)
  • vat_code (string, indexed)
  • date_status (string)
  • financial_data:2019 (object), child type:
    • CAEN (string)
    • I1 (number)
    • I10 (number)
    • I11 (number)
    • I12 (number)
    • I13 (number)
    • I14 (number)
    • I15 (number)
    • I16 (number)
    • I17 (number)
    • I18 (number)
    • I19 (number)
    • I2 (number)
    • I20 (number)
    • I21 (number)
    • I22 (number)
    • I23 (number)
    • I24 (number)
    • I25 (number)
    • I26 (number)
    • I27 (number)
    • I28 (number)
    • I29 (number)
    • I3 (number)
    • I30 (number)
    • I31 (number)
    • I32 (number)
    • I33 (number)
    • I34 (number)
    • I35 (number)
    • I36 (number)
    • I37 (number)
    • I38 (number)
    • I39 (number)
    • I4 (number)
    • I40 (number)
    • I41 (number)
    • I42 (number)
    • I43 (number)
    • I44 (number)
    • I45 (number)
    • I46 (number)
    • I5 (number)
    • I6 (number)
    • I7 (number)
    • I8 (number)
    • I9 (number)
  • imp_micro (string)
  • imp_proffit (string)
  • cc_number (object), child type:
    • year_cc (string)
    • county_cc (string)
    • no_cc (string)
    • cc_number_string (string)
  • name (string, indexed)
  • status (string)
  • balance_sheet_type (string, indexed)
  • vat (string, indexed)
  • type (string, indexed)

All the documents are mainly the same, they differ slightly in that some of them might not have all the elements inside “financial_data:2019”.

Some example of the queries I ran for testing purposes:

SELECT address.county AS county,
COUNT(name) AS number
FROM companies
WHERE balance_sheet_type = “uu”
OR balance_sheet_type = “bl_bs_sl”
OR balance_sheet_type = “ir”
GROUP BY address.county;
—> this query takes about 45 seconds

SELECT DISTINCT balance_sheet_type FROM companies;
—> this won’t run without a primary index although the balance_sheet_type is indexed; when there is a primary index defined, it takes about 20-30 seconds to yield the results

SELECT * FROM companies WHERE name LIKE “%AMAZO%”
—> it takes about 2 minutes to yield the result

SELECT * FROM companies WHERE name=“AMAZON”
—> this one actually runs pretty fast, it returns results in 6.4ms

I want to use this database in a Django application but the response times are very high. The types of queries the app needs to make are almost exclusively reads and they vary from extracting information about a single company, which seems to go fine when extracting by full or partial (using FTS) name or identification code. But it seem to hit a wall every time it needs to filter the dataset based on various criteria like extracting all companies in a given county or city, or when I try to extract data for a specific kind of company type. Aggregation queries like counts are also very slow.

In the future, I will also need to be able to query by financial (numerical) data, maybe by ranges of values and I guess it will weigh even more if I can’t optimize the queries.

I tried using the External Query Advisor but nothing of note came of it. According to it everything is ok.

A bit more information about the current setup:

  • memory allocation:
    • Data: 4904 MB
    • Index: 1024 MB
    • Search: 1024 MB
  • bucket:
    • there is only one bucket called “companies” and it is of the type “Couchbase”
    • no replicas enabled (as for now there is only one server)
    • Ejection Method: Full

I did not change anything else in the settings.

As a mention, until now, I only ran the queries inside the Couchbase GUI and did not try reads from some Python app.

Could you please help me with advice on how to get better performance with my queries and point me in the right direction?

Thanks,
Calin

create right indexes using https://index-advisor.couchbase.com/indexadvisor/