Perfomance on CB 4


#1

Hi,

I would like to use CB4 for a big data project contains 10 Billion documents in a bucket.
Each document contains up to 100 name value integers, 1 thumbnail image (20KB) and 1 binary file (1KB).
I mainely need to filter document by date, time and other integer properties (no free text search needed).
I use .NET and Java SDKs.

My questions:

  • Is couchbase the right technology?
  • can CB4 hold such amount of documents in a single bucket?
  • How many nodes do I need (without replication)?

Thanks

Oren


#2

What’s your expected ‘active’ dataset and can you tolerate higher latencies? (Resident ratio less than 100%, some cache misses.)
Can you tolerate negative lookups going to disk? (Value vs Full ejection, higher latency and IO utilization)
What’s the insert/update/index rate?

Technically a single couchbase node with large enough storage and using full ejection can “hold” the data… it’s all about how you’re going to use it…


#3

Thanks for the response. Some additional info since I’m getting bad perfomance.

I have 2M documents, each contains 30 properties.
I use a single node on a strong machine with 40 logical CPU and 32GB RAM.

When filter by time range and 6 additional properties the query took more then 1 minutes.
I have all relevant properties indexed (GSI)

The N1QL looks somthing like:

Pasted image732x55 4.25 KB
and it return 100 documents after 1 minutes.

How can I improve it? On SQL Server it takes 5-10 seconds


#4

SELECT id, time from suspectentity where time >= ‘2016-01-19T09:00’ and time >= ‘2016-01-19T11:00’ and channelId=12 and metadata.faceHat=true and metadata.faceBeard=false and metadata.clothingShirtColor=1 and metadata.clothingPantsColor=1 LIMIT 10000;


#5

I have no experience with n1ql but most likely you’ll need to also add what your indexes and views look like for any expert to help diagnose this.
If I was to guess though, I’d say your indexes is/are not defined properly for your need.