ep_num_eject_failures is much higher (up to 1000x) then ep_num_value_ejects


#1

Worth noticing that this can be caused by bug that i reported before OR be cause of this bug http://www.couchbase.com/communities/q-and-a/ram-stats-do-not-reconcile-and-evictions-seems-be-not-working-expected and this one http://www.couchbase.com/communities/q-and-a/data-loss-251

Server works for some time, then stops ejecting items, and thows TEMP OOM errors. Only restart helps.


#2

We having problems with ejections, and stats shows that ejections barely working. What is the possible reason, and what stats/logs i should watch first?


#3

cbstats output from one of servers:

accepting_conns: 1
auth_cmds: 4
auth_errors: 0
bucket_active_conns: 1
bucket_conns: 57
bytes: 2923459464
bytes_read: 50697857251
bytes_written: 3663478624
cas_badval: 0
cas_hits: 0
cas_misses: 0
cmd_flush: 0
cmd_get: 9378
cmd_set: 241113
conn_yields: 1947
connection_structures: 5000
curr_connections: 259
curr_conns_on_port_11209: 111
curr_conns_on_port_11210: 146
curr_items: 3715265
curr_items_tot: 7066672
curr_temp_items: 0
daemon_connections: 4
decr_hits: 0
decr_misses: 0
delete_hits: 0
delete_misses: 0
ep_access_scanner_last_runtime: 0
ep_access_scanner_num_items: 0
ep_access_scanner_task_time: 2014-04-26 10:00:00
ep_allow_data_loss_during_shutdown: 1
ep_alog_block_size: 4096
ep_alog_path: /media/data/default/access.log
ep_alog_sleep_time: 1440
ep_alog_task_time: 10
ep_backend: couchdb
ep_bg_fetch_delay: 0
ep_bg_fetched: 3505
ep_bg_load: 65111098
ep_bg_load_avg: 18576
ep_bg_max_load: 467888
ep_bg_max_wait: 177918
ep_bg_meta_fetched: 0
ep_bg_min_load: 102
ep_bg_min_wait: 30
ep_bg_num_samples: 3505
ep_bg_remaining_jobs: 0
ep_bg_wait: 2618920
ep_bg_wait_avg: 747
ep_chk_max_items: 5000
ep_chk_period: 1800
ep_chk_persistence_remains: 0
ep_chk_persistence_timeout: 10
ep_chk_remover_stime: 5
ep_commit_num: 221903
ep_commit_time: 11
ep_commit_time_total: 2955151
ep_config_file:
ep_conflict_resolution_type: seqno
ep_couch_bucket: default
ep_couch_host: 127.0.0.1
ep_couch_port: 11213
ep_couch_reconnect_sleeptime: 250
ep_couch_response_timeout: 180000
ep_data_traffic_enabled: 0
ep_dbname: /media/data/default
ep_degraded_mode: 0
ep_diskqueue_drain: 3670099
ep_diskqueue_fill: 3670101
ep_diskqueue_items: 2
ep_diskqueue_memory: 64
ep_diskqueue_pending: 234
ep_exp_pager_stime: 3600
ep_expired_access: 0
ep_expired_pager: 0
ep_expiry_window: 3
ep_failpartialwarmup: 0
ep_flush_all: false
ep_flush_duration_total: 3868
ep_flushall_enabled: 0
ep_flusher_state: running
ep_flusher_todo: 1
ep_getl_default_timeout: 15
ep_getl_max_timeout: 30
ep_ht_locks: 5
ep_ht_size: 3079
ep_inconsistent_slave_chk: 0
ep_initfile:
ep_io_num_read: 332114
ep_io_num_write: 3734036
ep_io_read_bytes: 3429644805
ep_io_write_bytes: 41108043159
ep_item_begin_failed: 0
ep_item_commit_failed: 0
ep_item_flush_expired: 0
ep_item_flush_failed: 0
ep_item_num_based_new_chk: 1
ep_items_rm_from_checkpoints: 4054
ep_keep_closed_chks: 0
ep_klog_block_size: 4096
ep_klog_compactor_queue_cap: 500000
ep_klog_compactor_stime: 3600
ep_klog_flush: commit2
ep_klog_max_entry_ratio: 10
ep_klog_max_log_size: 2147483647
ep_klog_path:
ep_klog_sync: commit2
ep_kv_size: 2737261629
ep_max_bg_remaining_jobs: 0
ep_max_checkpoints: 2
ep_max_data_size: 7340032000
ep_max_item_size: 20971520
ep_max_num_workers: 4
ep_max_size: 7340032000
ep_max_txn_size: 10000
ep_max_vbuckets: 1024
ep_mem_high_wat: 3039027200
ep_mem_low_wat: 1039027200
ep_mem_tracker_enabled: true
ep_meta_data_memory: 962414800
ep_mlog_compactor_runs: 0
ep_mutation_mem_threshold: 95
ep_num_access_scanner_runs: 0
ep_num_eject_failures: 331083391
ep_num_expiry_pager_runs: 0
ep_num_non_resident: 6907022
ep_num_not_my_vbuckets: 2596
ep_num_ops_del_meta: 0
ep_num_ops_del_meta_res_fail: 0
ep_num_ops_del_ret_meta: 0
ep_num_ops_get_meta: 0
ep_num_ops_get_meta_on_set_meta: 0
ep_num_ops_set_meta: 0
ep_num_ops_set_meta_res_fail: 0
ep_num_ops_set_ret_meta: 0
ep_num_pager_runs: 87
ep_num_value_ejects: 3363851
ep_oom_errors: 0
ep_overhead: 60788945
ep_pager_active_vb_pcnt: 40
ep_pending_ops: 0
ep_pending_ops_max: 0
ep_pending_ops_max_duration: 0
ep_pending_ops_total: 0
ep_postInitfile:
ep_queue_size: 2
ep_startup_time: 1398483585
ep_storage_age: 0
ep_storage_age_highwat: 183
ep_tap_ack_grace_period: 300
ep_tap_ack_initial_sequence_number: 1
ep_tap_ack_interval: 1000
ep_tap_ack_window_size: 10
ep_tap_backfill_resident: 0.9
ep_tap_backlog_limit: 5000
ep_tap_backoff_period: 5
ep_tap_bg_fetch_requeued: 0
ep_tap_bg_fetched: 332
ep_tap_bg_max_pending: 500
ep_tap_keepalive: 300
ep_tap_noop_interval: 20
ep_tap_requeue_sleep_time: 0.1
ep_tap_throttle_cap_pcnt: 10
ep_tap_throttle_queue_cap: 1000000
ep_tap_throttle_threshold: 90
ep_tmp_oom_errors: 0
ep_total_cache_size: 2650348920
ep_total_del_items: 0
ep_total_enqueued: 3734172
ep_total_new_items: 3484029
ep_total_persisted: 3734036
ep_uncommitted_items: 1
ep_uuid: 1feea58bbdfcacf8382f2716b0dc097c
ep_value_size: 1736474231
ep_vb0: 0
ep_vb_snapshot_total: 861
ep_vb_total: 342
ep_vbucket_del: 185
ep_vbucket_del_avg_walltime: 494345
ep_vbucket_del_fail: 0
ep_vbucket_del_max_walltime: 2164792
ep_version: 2.5.1_1083_rel
ep_waitforwarmup: 0
ep_warmup: 1
ep_warmup_batch_size: 1000
ep_warmup_dups: 0
ep_warmup_min_items_threshold: 100
ep_warmup_min_memory_threshold: 100
ep_warmup_oom: 0
ep_warmup_thread: complete
ep_warmup_time: 280564496
ep_workload_optimization: read
get_hits: 8988
get_misses: 390
incr_hits: 0
incr_misses: 0
libevent: 2.0.11-stable
limit_maxbytes: 67108864
listen_disabled_num: 0
max_conns_on_port_11209: 1000
max_conns_on_port_11210: 9000
mem_used: 2923459464
pid: 31212
pointer_size: 64
rejected_conns: 0
rusage_system: 453.715641
rusage_user: 1519.307413
tap_checkpoint_end_received: 171
tap_checkpoint_end_sent: 171
tap_checkpoint_start_received: 957
tap_checkpoint_start_sent: 1127
tap_connect_received: 28
tap_mutation_received: 4239027
tap_mutation_sent: 163939
tap_opaque_received: 428
tap_opaque_sent: 56
tcp_nodelay: enable
threads: 4
time: 1398486612
total_connections: 8124
uptime: 3037
vb_active_curr_items: 3715265
vb_active_eject: 14446
vb_active_expired: 0
vb_active_ht_memory: 33298080
vb_active_itm_memory: 581201744
vb_active_meta_data_memory: 505693856
vb_active_num: 171
vb_active_num_non_resident: 3710822
vb_active_ops_create: 2492
vb_active_ops_delete: 0
vb_active_ops_reject: 0
vb_active_ops_update: 112874
vb_active_perc_mem_resident: 0
vb_active_queue_age: 0
vb_active_queue_drain: 162948
vb_active_queue_fill: 162950
vb_active_queue_memory: 64
vb_active_queue_pending: 234
vb_active_queue_size: 2
vb_dead_num: 0
vb_pending_curr_items: 0
vb_pending_eject: 0
vb_pending_expired: 0
vb_pending_ht_memory: 0
vb_pending_itm_memory: 0
vb_pending_meta_data_memory: 0
vb_pending_num: 0
vb_pending_num_non_resident: 0
vb_pending_ops_create: 0
vb_pending_ops_delete: 0
vb_pending_ops_reject: 0
vb_pending_ops_update: 0
vb_pending_perc_mem_resident: 0
vb_pending_queue_age: 0
vb_pending_queue_drain: 0
vb_pending_queue_fill: 0
vb_pending_queue_memory: 0
vb_pending_queue_pending: 0
vb_pending_queue_size: 0
vb_replica_curr_items: 3351407
vb_replica_eject: 3209177
vb_replica_expired: 0
vb_replica_ht_memory: 26370112
vb_replica_itm_memory: 2069147176
vb_replica_meta_data_memory: 456720944
vb_replica_num: 171
vb_replica_num_non_resident: 3196200
vb_replica_ops_create: 3351407
vb_replica_ops_delete: 0
vb_replica_ops_reject: 0
vb_replica_ops_update: 137134
vb_replica_perc_mem_resident: 4
vb_replica_queue_age: 0
vb_replica_queue_drain: 3507151
vb_replica_queue_fill: 3507151
vb_replica_queue_memory: 0
vb_replica_queue_pending: 0
vb_replica_queue_size: 0


#4

I think i give logs from wrong server, user this instead: https://gist.github.com/buger/35677e83ec4b1abe1644


#5

If you haven’t already seen it, there’s a good blog post on monitoring a Couchbase cluster at: http://blog.couchbase.com/how-many-nodes-part-4-monitoring-sizing