Need Help In Indexing In Couchbase

n1ql

#1

Hi All, We are building a service which will allow users to buy/sell/rent houses.

We are hoping to use couchbase for our cache layer ; we also plan to use Postgres as our permanent datastore. Our use case

All the house post on our site will having following attributes

  • rent/buy/sell
  • apartment/house/villa
  • city
  • locality
  • posted by broker/nobroker
  • 1BHK / 2 BHK/ 3BHK …etc ( number of Bedroom, Hall and Kitchen )
  • price range
  • area

Our users certainly needs to select city and rent/buy/sell flag but after that they can choose any other parameters as filters.

Based on this understanding we decided to do the following

Create following doctype - Housing--<RENT/BUY/SELL>OUT

and then create views on other attributes such as locality.

But this means if a user select multiple attributes as filters then either we need to use multiple views to serve the query which may be slow or need to go to our backend database Postgres.

Our questions are

  • What is the best way to think about views and create the same? In the above situation what views should we create?

  • Are there obvious downside of using multiple views to serve a query? Is there
    any guidelines in terms of what type of complex query can be served from couchbase i.e. for example queries using only 2 views?

  • Can we use N1QL queries on top of output of views to server the query? Will it better then serving data using properly indexed tables on Postgres?

Thanks in advance for your help.


#2

Hi, Your use case looks a lot like faceted search. Most of our customers usually do that through Elasticsearch. If you don’t require any fulltext features than you should use N1QL and GSI directly instead of Views.


#3

Thank you Idoguin. Many thanks for your response. This is indeed very helpful. Some more details on our use cases :

Our data set is going to contain various different types of documents but still it will be in the range of ~100M documents of size 1 to 10 KB. We can primarily split the data set into two parts :

  1. Documents generated by users for their own purpose like user profile. We will also have use cases where portion of their profile needs to be accessed by other users of the system.

  2. Documents generated by users which will queried/search/filter by other users. For example the one use case we mentioned in the email thread.

As I understand, for case (1) mentioned above probably we will be good with views and for case (2) we should use N1QL queries with GSI indexing. This may vary on use cases ( search pattern) basis but broadly would you agree with this understanding?

For storing/serving documents mentioned in this email thread (case 2 ) - as a starting point, we are doing performance test on a data set of size 1M documents using N1QL/GSI on a 2 core / 4GB machine. We will keep our question on this fourm live and updated with our test results.

Thanks again for your help.