Any idea about big latency in some exact time


#1

Our couchbase server version is 4.5.

We have 52 nodes with totally 2T RAM (about 1.3T in use).

We use 6 16 core vps as client using go sdk.

But some strange happened, all our 6 client has big latency in every hh:mm:46.

We restart system with diff time, and also does the client. Try to eliminate this it not caused by GC. But it still has big latency in every 46 secodes.

Any idea about why this happened?


#2

Hey @sdrzlyz,

Are you getting the latency spikes every 46s, or once a minute at exactly 46s after the minute?

Cheers, Brett


#3

“once a minute at exactly 46s after the minute?”

not exactly a minute. this is out log:

most of latency happens on hh:mm:46

2018/03/13 10:55:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=5136A7DA2F94E7CB8581F86EA76211E3, len=3243, using=144.030273ms)
2018/03/13 10:55:57 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=385E9A713FA716FBC8A8013BD36D1C94, len=50000, using=208.557947ms)
2018/03/13 10:56:01 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=7903A3C84F2607D8D1C8BDA8D77CC96E, len=16585, using=218.107185ms)
2018/03/13 10:56:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=90BEF66484376E905E4C571F789C9A62, len=7820, using=346.834758ms)
2018/03/13 10:56:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=DDF2A3BB4DE4D73E68706D548AA1A962, len=3701, using=244.263723ms)
2018/03/13 10:56:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=3162B1BE27968C0B1E9CC7DFA611AE22, len=8252, using=212.183201ms)
2018/03/13 10:57:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=151527725400200070, len=20, using=514.295794ms)
2018/03/13 10:57:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=08D30ABC9FA40C782CFEA86085693634, len=1267, using=265.080094ms)
2018/03/13 10:58:15 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=9FCE20276105CA19364CCD5DCEA90968, len=4321, using=224.236146ms)
2018/03/13 10:58:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=4D8090606EF1265504274CAF61AC2FA8, len=10452, using=237.205692ms)
2018/03/13 10:58:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=151855930900100080, len=3119, using=148.492957ms)
2018/03/13 10:59:44 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=3DD3FF7D847EE445E4910D4011E6F382, len=19886, using=447.173572ms)
2018/03/13 11:01:40 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=6B797AE803726C758C0163AF086DA0B3, len=33087, using=218.828686ms)
2018/03/13 11:02:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=5A9339A41ECC3C86BD908E8128C3120B, len=7072, using=392.399193ms)
2018/03/13 11:02:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=9E5386450A1E4E9D180E4108E24D17DC, len=1204, using=106.8154ms)
2018/03/13 11:02:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=3781DE3C159376DD3D1E9D1A9B4B8B5D, len=2302, using=468.003032ms)
2018/03/13 11:02:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=915837053BF3EC4B1A0F363F9B015E9F, len=3812, using=402.645059ms)
2018/03/13 11:02:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=709B1DE20F783F12E09ED0F9BFDE60E5, len=4218, using=250.856134ms)
2018/03/13 11:02:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=D267AB803488E0AA2FEC973A2EC831A8, len=4505, using=242.349954ms)
2018/03/13 11:02:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=4D622E0C8BD02E8FA6EDD6762A5C393C, len=1580, using=377.564875ms)
2018/03/13 11:03:02 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=2A5C706EE728F922E933C65D476542B9, len=20082, using=215.607699ms)
2018/03/13 11:03:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=90F9C76840CE762CECE7B075748783DB, len=14533, using=147.978343ms)
2018/03/13 11:04:20 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=78A0AD518BF8ED9622246A25DF127112, len=35966, using=210.356457ms)
2018/03/13 11:04:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=E09EFD682E674BD1BD12D2ED629C9E89, len=35500, using=186.34326ms)
2018/03/13 11:04:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=51F076AFD74EAD3666BD8DE9D7851A30, len=368, using=413.787265ms)
2018/03/13 11:04:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=A2F10801E58EB16256D31E0BF0C88CA9, len=5011, using=248.741517ms)
2018/03/13 11:05:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=FAA61C9A19FCCE1108BD1C8BB1C0C154, len=2825, using=137.096713ms)
2018/03/13 11:06:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=151519971600200060, len=160, using=170.589706ms)
2018/03/13 11:09:03 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=931F034601F4E8F1B2D6205ABAB2A714, len=43061, using=166.663092ms)
2018/03/13 11:11:44 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=7F8A89F95E173680E365FF9F8809C273, len=8265, using=180.929513ms)
2018/03/13 11:12:15 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=36779EFEFAEA18D4513AEDEDD07315A6, len=45496, using=142.710739ms)
2018/03/13 11:14:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=968E79CB66A1C32A06362B373641B242, len=19355, using=322.824013ms)
2018/03/13 11:15:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=FB40C3AC80B4D7B671AAA40EF2CFA24B, len=5289, using=112.915831ms)
2018/03/13 11:16:00 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=DBD5642B3B32C97E1A9423509B057C1F, len=1290, using=204.183217ms)
2018/03/13 11:17:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=8B70370684E5943AF188256C8DBF5679, len=4894, using=393.481354ms)
2018/03/13 11:17:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=150740932900100090, len=3465, using=266.825237ms)
2018/03/13 11:18:30 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=A569561C5F5FE4AF73997E3D6BE643F6, len=770, using=161.230683ms)
2018/03/13 11:18:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=C45FB25B52321B87A5DE23180FADA10D, len=2066, using=366.023731ms)
2018/03/13 11:18:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=9D56DAB701940944ED2667D0CBF7391A, len=4784, using=409.214641ms)
2018/03/13 11:19:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=6F704F0406AC2E1727FE20EF4552E338, len=1003, using=334.490226ms)
2018/03/13 11:19:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=442131B951DA2356E4CC1A1F7C53ECFF, len=5763, using=292.441241ms)
2018/03/13 11:21:43 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=6292D19F377A56259BB6A2B9C7AA1C21, len=16662, using=124.055954ms)
2018/03/13 11:22:09 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=32F9B5E5C147026F580396B8715781D6, len=14130, using=159.970836ms)
2018/03/13 11:23:12 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=11F2084C884275D1A3DD8C8000204B7F, len=43887, using=185.904493ms)
2018/03/13 11:23:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=ED4DC4775975EB3AD02A1B6C32C56637, len=2457, using=193.988506ms)
2018/03/13 11:25:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=9B317156E6CEA81B0F650469A128D1C7, len=9145, using=417.873934ms)
2018/03/13 11:26:56 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=391C6144EA73C88F06BF81AC924AF451, len=50000, using=203.74487ms)
2018/03/13 11:27:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=A4CA09AC8F8AECACC2D3DE53353ECAFD, len=50000, using=319.487328ms)
2018/03/13 11:27:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=446D6EEB3F5BFBE8B54E1FA5B604A84B, len=584, using=232.849808ms)
2018/03/13 11:27:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=562BD080FA909F1E9C7E3521B5A39CE8, len=458, using=213.542463ms)
2018/03/13 11:27:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=2A573CEC9093CF7A676BB3F988000641, len=3071, using=134.862658ms)
2018/03/13 11:27:45 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=CCB6F8EFE47B059398EAB2B75403B75C, len=8232, using=205.792066ms)
2018/03/13 11:27:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=94264E9AB14C6253A4233F08F79C9083, len=890, using=402.530854ms)
2018/03/13 11:27:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=2E1C94106A94E8AB4C872ECACCA5A215, len=6295, using=388.646203ms)
2018/03/13 11:27:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=7B7EEB0EB2D3A0EBA2BBF44558CD5DD3, len=1947, using=320.278341ms)
2018/03/13 11:27:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=94525B06BC730D70F34134F188C15028, len=5082, using=226.585353ms)
2018/03/13 11:27:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=9E8D59DDEC9555F9EED3F0A77AFDC71E, len=646, using=433.21981ms)
2018/03/13 11:27:46 [W] [GetVideo]xxxxx.get.latency gt than 100ms(id=F89C40D041D930CF0D602D13E95E7B31, len=12132, using=130.521243ms)

#4

Is this caused by cccpLooper because we have too many nodes?


#5

It’s likely that this is caused by some kind of polling event. Either internal, external or remote (for instance, a Go timer running every second, another process incurring CPU load every minute, or a service generating high-load against your service every minute, respectively). Note that CCCPLooper actually only contacts a single node per iteration, and those iterations are every 2.5s typically. I think this will require more deep diagnosis of the service to provide any insightful aid. You should start by trying to watch the load on the application server and the Couchbase server, and see if the loads there correlate to your timeouts at all. This will at least give us a direction to look in.

Cheers, Brett


#6
func main() {
	cluster, err := gocb.Connect(cbAddr)
	if err != nil {
		log.Fatalf("gocb connect err: %v", err)
	}

	bkt, err := cluster.OpenBucket("BKTNAME", "HIDDENPWD")
	if err != nil {
		log.Fatalf(err.Error())
	}

	f, err := os.Open("idlist")
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	r := bufio.NewReader(f)

	for {
		data, _, err := r.ReadLine()
		if err != nil {
			if err != io.EOF {
				fmt.Println(err)
			}
			break
		}

		key := string(data)
		err = fetch(bkt, key)
		if err != nil {
			fmt.Println(key, err)
		}
	}

	fmt.Println("all done")
}

func fetch(bkt *gocb.Bucket, key string) error {
        begin := time.Now()

	var resAll []int64
	cas, err := bkt.Get(key, &resAll)
	if err != nil {
		return err
	}

	if using := time.Since(begin); using > 100*time.Millisecond {
		log.Println(key, len(resAll), using)
	}

	return nil
}

I wrote an sample code, the output slows obvious on hh:mm:46~. I run this in a new computer.

OutLog:

2018/03/13 13:15:46 0482996A71209792AEA18364CFBC5CAA 1889 191.905876ms
2018/03/13 13:18:46 0CC3E55348258E068906B7617BCF49F4 683 428.717665ms
2018/03/13 13:18:46 0CC3EE7BC1E5BEDE1419BEBA9F0E73F2 32 283.442529ms
2018/03/13 13:19:45 0F803C833B53F86E8EB31CF84397B7C6 3407 325.174088ms
2018/03/13 13:19:46 0F8045C1AC472309A9D1325EECBA0F83 753 264.142864ms
2018/03/13 13:19:46 0F80477D14628FACA9C42B4629850A94 88 327.937484ms
2018/03/13 13:20:46 1254A5843B40CE3A6F7AF395FCDCF6D8 867 111.385336ms
2018/03/13 13:21:46 151F88B3BD9393C2C51EDE74B49E853D 527 236.131847ms
2018/03/13 13:22:46 17E592846BFDBEA7F057798C74FEBD9A 16 349.096649ms
2018/03/13 13:22:51 181F78740336486331DBDA27CE4A0CF2 24 141.099989ms
2018/03/13 13:23:00 187C6ADF66333F62DE28D8F0631013F1 643 119.311609ms
2018/03/13 13:23:01 188BE0CCC819FB34E6726A862D629444 15 164.884969ms
2018/03/13 13:23:02 189914F098E0F78343578EDA7A9DDEEC 1739 179.862595ms
2018/03/13 13:23:05 18B786D6ED8D5EA371F5603534A8C6F8 48 143.584642ms
2018/03/13 13:23:09 18E1D9702761C239312E45EC7E5E3F06 3 193.567502ms
2018/03/13 13:23:18 1936016329D3EDDE72707182EF9AA3FE 7342 100.676584ms
2018/03/13 13:23:45 1A6F605ABD6AD9E18EA9A61FDFCE4AEA 747 320.885497ms
2018/03/13 13:24:46 1D1C9BC8DDEA07564AB510653431F238 120 107.740635ms
2018/03/13 13:24:46 1D1C9D55DE8B55D837D71C41D39F0B81 248 195.092573ms
2018/03/13 13:24:57 1D9BE360D9BE668E0F058448B7D3FD3B 499 176.549094ms
2018/03/13 13:24:59 1DAB4BC0F091D6F7876EF70852ABBAD4 40 102.988922ms
2018/03/13 13:25:45 1FBE9E0C80D4F36CAE3E88086804AD88 24 468.368547ms
2018/03/13 13:26:46 227C21EFB9DA1FEB7C4453ABFF7028A4 162 444.850454ms
2018/03/13 13:29:10 28D0CEA7CDC4183B627505980CB328E2 151 184.89026ms
2018/03/13 13:29:18 2924D65ED77308E89F209A55FB88D73B 334 100.634019ms
2018/03/13 13:29:46 2A59716CF0BC60F38929CB25C0BEC2A5 31 397.5426ms
2018/03/13 13:30:49 2D17BC21E9FE057864EBAC34D73CAEEA 160 179.036252ms
2018/03/13 13:30:54 2D56E596B00EDA23B9F9669723B14074 4697 165.560864ms
2018/03/13 13:31:03 2DB30EC4F032681B14D6E0490598876D 13 106.165433ms
2018/03/13 13:31:05 2DC562AFA96AA8161F75083F29C6A7EE 1712 200.460079ms
2018/03/13 13:31:10 2DF24632DE1B66FA2C80AB27C484163B 782 137.410401ms
2018/03/13 13:31:45 2F6F7F94E376F9748B91C0E510377974 329 382.913412ms
2018/03/13 13:32:46 321B7D522EE6F65EA7A0443F6E2D9207 1710 471.936111ms
2018/03/13 13:33:46 34A94114DD8D7CDF1A2C77113070CBCA 519 479.487448ms
2018/03/13 13:34:45 373916E0E3A964D6606E6DDA69B588F1 1144 133.013811ms
2018/03/13 13:35:46 39D9BE0100E5882FA97A60CCA45306ED 32 493.156687ms
2018/03/13 13:36:46 3C81DD0E2870A930835A7C46B8B70C7C 16 318.365395ms
2018/03/13 13:37:47 3F2BF7969275BB559BD2E903EDE04CBF 7 125.864678ms
2018/03/13 13:37:55 3F804EB334011E54175C5018DC5BE6E4 439 181.56622ms
2018/03/13 13:37:58 3F9E1432EB7DC8ECFBE62033220C4A2D 15340 125.711557ms
2018/03/13 13:38:00 3FB675070557AC713B8CE82CE9E2D1FF 144 107.852665ms
2018/03/13 13:38:01 3FC07B7B28B412DB324177592298564E 1846 124.091357ms
2018/03/13 13:38:02 3FC56709847E0FB7529A4BB0EE271B30 1097 111.710949ms
2018/03/13 13:38:09 4011ED5505D9472BB01E2FBD4209BAB5 470 154.967449ms
2018/03/13 13:38:24 40B73BD6FE614279BED53D83255EF6BB 29 106.095159ms
2018/03/13 13:39:46 4443A2C785E789006ADA3E5674BB5418 436 458.083389ms
2018/03/13 13:40:45 46E3465D4BA57A07FAD443F0923D624E 48 415.760964ms
2018/03/13 13:40:53 473E59CA893F08118C8847D0D5B05C13 40 151.916716ms
2018/03/13 13:41:44 4970ED9598858F13E4FF2443AAD4B960 456 138.385198ms
2018/03/13 13:41:46 49849DE1C62D96DBDC71E6C24F0D6832 316 467.449948ms
2018/03/13 13:41:46 49897E113654C1485509987D36BB1CEF 6642 108.476388ms

#7

Anything more detail do I need to provide?


#8

Hey @sdrzlyz,
Can you take a look at the CPU load of your application server and Couchbase Server (though based on your report that running that app on a new server also exhibits the same behaviour, I would tend to think its an issue with your Couchbase node)?