Full Text Search LIMIT using JAVA SDK


#1

Hi, I´m using couchbase 5.1 enterprise version. I defined a full text index for my locations and I’m using it with my JAVA SDK to get all our location with country “Italy”.

I expect a result of 15403 according my n1ql query, but with my full text index using JAVA SDK, I have to set a limit of 10000. If I assign a limit greater than 10000, my request failed.

My question is: Is this a truly limitation of full text search using JAVA SDK or is a configuration problem with my couchbase/SDK?

Please help


#2

@romero.pedro.17 can you please show the code you are executing for the query? thanks!

Also, it would be super helpful if you could enable TRACE level logging for the environment and paste us the snippet on how the generated query looked on the network.


#3

Ok, this could be hard but lets start:

I have my class CouchbaseTest where I define this init:

public void init() {

    val env = DefaultCouchbaseEnvironment.builder() //
            .kvTimeout(10000) //
            .queryTimeout(60000) //
            .maxRequestLifetime(70000) //
            .connectTimeout(40000) //
            .retryStrategy(BestEffortRetryStrategy.INSTANCE) //
            .retryDelay(Delay.exponential(TimeUnit.MICROSECONDS, 50, 50));

    val keyStore = KeyStore.getInstance("jks");
    keyStore.load(IOUtils.getResource("keystore.jks"), "changeit".toCharArray());
    env.sslEnabled(true).sslKeystore(keyStore);
    cluster = CouchbaseCluster
            .create(env.build(),
                    "nodeData1,nodeData2,nodeData3"
                            .split(",")) //
            .authenticate("admin", "qwerty");

}

Here I create and set my cluster enviroment.

Now, there’s fts, a method where I execute my query 100 times and average results.

public void fts() {

    val executor = Executors.newFixedThreadPool(1);
    val bucket = cluster.openBucket("CONTENT");
    val a = System.currentTimeMillis();
    val it = 100;
    val timesList = new ArrayList<Long>();

    for (int i = 0; i < it; ++i) {
        val j = i;
        executor.execute(() -> {
            execute(bucket, j, timesList); //HERE I EXECUTE MY QUERY
        });

    }

    executor.shutdown();
    executor.awaitTermination(1, TimeUnit.HOURS);
    System.out.println(System.currentTimeMillis() - a);
    System.out.println(timesList.stream() //
            .mapToInt(x -> x.intValue()) //
            .average().getAsDouble());

    System.out.println(timesList.stream() //
            .mapToInt(x -> x.intValue()) //
            .min().getAsInt());

    System.out.println(timesList.stream() //
            .mapToInt(x -> x.intValue()) //
            .max().getAsInt());

}

and finally, my execute method:

private void execute(Bucket bucket, int i, List timesList) {

    val query = new SearchQuery("locations", new TermQuery("location").field("main.type")); //HERE I CALL MY FTS INDEX "LOCATIONS"
    query.fields("id"); 
    query.addFacet("main.features", SearchFacet.term("main.features", 191));
    query.skip(0);
    query.limit(10000); //HERE I SET LIMIT, IF I DON´T SET IT ONLY RETURNS 10 RESULTS, BUT IF IS BIGGER THAN 10000 I HAD 0 RESULTS
    val result = bucket.query(query);

    val time = result.metrics().took() / 1000000;
    timesList.add(time);

    System.out.println("INDEX " + "LOCATIONS");
    System.out.println("time " + time + "ms");
    System.out.println("total " + result.metrics().totalHits());
    System.out.println("results " + result.hits().size());
    System.out.println(result.facets());

    System.out.println();

}

OK, and this is my final output:

WITH LIMIT <10000

INDEX LOCATIONS
time 55ms
total 15120 //YOU CAN SEE IT HITS MY 15120 results But i can only have 10000
results 10000
{main.features=TermFacetResult{name=‘main.features’, field=‘main.features’, total=0, missing=15120, other=0, terms=[]}}

INDEX LOCATIONS
time 55ms
total 15120
results 10000
{main.features=TermFacetResult{name=‘main.features’, field=‘main.features’, total=0, missing=15120, other=0, terms=[]}}

INDEX LOCATIONS
time 84ms
total 15120
results 10000
{main.features=TermFacetResult{name=‘main.features’, field=‘main.features’, total=0, missing=15120, other=0, terms=[]}}

But, if LIMIT >10000

INDEX LOCATIONS
time 0 ms
total 0
results 0
{}

INDEX LOCATIONS
time 0 ms
total 0
results 0
{}

INDEX LOCATIONS
time 0 ms
total 0
results 0
{}

Please help. Thanks


#4

Hi, I’m taking a look at our bug list to see if we’ve hit this before and it seems we have, but only with fuzzy queries. We have a 10,000 limit by default - you can override it with an environment variable I believe.

What if you remove the limit qualifier - you’ll still get max 10k but shouldn’t fail. I’m hunting down when if/when that bug was fixed and will find the override option for you. Thanks for calling it out!

It would be good to see your logs too :slight_smile:


#5

The limit on max number of results can be increased using this environment variable:
CBFT_ENV_OPTIONS=bleveMaxResultWindow=10000000


#6

ok, thanks. But, is there a way to redefine this environment variable from JAVA or I have to do it on my couchbase server exclusive? Can you give me an example using my init() method?


#7

@romero.pedro.17 no, this needs to be done on the server side at startup


#8

is there any documentation about this CBFT_ENV_OPTIONS param in general? I would like to read more about all options it offers