Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7987

Lustre 2.8 OSS with zfs 0.6.5 backend hitting most schedule_timeout

Details

    • Bug
    • Resolution: Low Priority
    • Critical
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      While trying to test for LU-7404 OST timeout the one of the OSS nodes became unaccessible via ssh and conman.

      This was running with 2 OSS nodes running 1 zfs zpool on each, 1 mds, and 4 clients running IOR. The first message that was seen was fairly soon after starting. About 2 hours after running lustre stopped responding to lustre clients and static scripts running on the OSS stopped running.

      First message seen was:
      Apr 4 21:06:29 wolf-4 kernel: LNet: Service thread pid 12256 was inactive for 200.30s. The thread might be hung, or it might only be slow an
      d will resume later. Dumping the stack trace for debugging purposes:

      Apr  4 21:06:29 wolf-4 kernel: Pid: 12256, comm: ll_ost02_000
      Apr  4 21:06:29 wolf-4 kernel: #012Call Trace:
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffff8163a9e9>] schedule+0x29/0x70
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffff816386d9>] schedule_timeout+0x209/0x2d0
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa012dc0c>] ? vdev_mirror_io_start+0xac/0x190 [zfs]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa012d150>] ? vdev_mirror_child_done+0x0/0x30 [zfs]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffff810d814c>] ? ktime_get_ts64+0x4c/0xf0
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffff8163a01e>] io_schedule_timeout+0xae/0x130
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffff810a6896>] ? prepare_to_wait_exclusive+0x56/0x90
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffff8163a0b8>] io_schedule+0x18/0x20
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa00116e0>] cv_wait_common+0xb0/0x150 [spl]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffff810a6ae0>] ? autoremove_wake_function+0x0/0x40
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa00117d8>] __cv_wait_io+0x18/0x20 [spl]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa016db83>] zio_wait+0x123/0x210 [zfs]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa00c44ad>] dbuf_read+0x30d/0x930 [zfs]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa00c60eb>] dmu_buf_will_dirty+0x4b/0xa0 [zfs]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa00deb46>] dnode_dirty_l1+0x36/0x50 [zfs]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa00e146a>] dnode_free_range+0x4da/0x6b0 [zfs]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa00cb829>] dmu_free_long_range+0x1d9/0x280 [zfs]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa11fe468>] osd_unlinked_object_free+0x28/0x280 [osd_zfs]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa11fe7d3>] osd_unlinked_list_emptify+0x63/0xa0 [osd_zfs]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa120112a>] osd_trans_stop+0x31a/0x5b0 [osd_zfs]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa12fef9f>] ofd_trans_stop+0x1f/0x60 [ofd]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa1302bb2>] ofd_object_destroy+0x2b2/0x890 [ofd]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa12fb8f7>] ofd_destroy_by_fid+0x307/0x510 [ofd]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa0fce170>] ? ldlm_blocking_ast+0x0/0x170 [ptlrpc]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa0fc8730>] ? ldlm_completion_ast+0x0/0x910 [ptlrpc]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa12edc67>] ofd_destroy_hdl+0x267/0xa50 [ofd]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa105af65>] tgt_request_handle+0x915/0x1320 [ptlrpc]
      Apr  4 21:06:29 wolf-4 kernel: [<ffffffffa1007beb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
      Apr  4 21:06:30 wolf-4 kernel: [<ffffffffa0cec578>] ? lc_watchdog_touch+0x68/0x180 [libcfs]
      Apr  4 21:06:30 wolf-4 kernel: [<ffffffffa1006498>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      Apr  4 21:06:30 wolf-4 kernel: [<ffffffff810b8c22>] ? default_wake_function+0x12/0x20
      Apr  4 21:06:30 wolf-4 kernel: [<ffffffff810af028>] ? __wake_up_common+0x58/0x90
      Apr  4 21:06:30 wolf-4 kernel: [<ffffffffa100bc90>] ptlrpc_main+0xa90/0x1db0 [ptlrpc]
      Apr  4 21:06:30 wolf-4 kernel: [<ffffffffa100b200>] ? ptlrpc_main+0x0/0x1db0 [ptlrpc]
      Apr  4 21:06:30 wolf-4 kernel: [<ffffffff810a5aef>] kthread+0xcf/0xe0
      Apr  4 21:06:30 wolf-4 kernel: [<ffffffff810a5a20>] ? kthread+0x0/0xe0
      Apr  4 21:06:30 wolf-4 kernel: [<ffffffff81645998>] ret_from_fork+0x58/0x90
      Apr  4 21:06:30 wolf-4 kernel: [<ffffffff810a5a20>] ? kthread+0x0/0xe0
      Apr  4 21:06:30 wolf-4 kernel: 
      Apr  4 21:06:30 wolf-4 kernel: LustreError: dumping log to /tmp/lustre-log.1459803990.12256
      Apr  4 21:07:12 wolf-4 ntpd_intres[3829]: ntp_intres.request: permission denied
      Apr  4 21:07:12 wolf-4 ntpd_intres[3829]: ntp_intres.request: permission denied
      Apr  4 21:07:19 wolf-4 kernel: LNet: Service thread pid 12256 completed after 249.57s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). 
      

      But the system appeared to keep running past that and then come to a grinding halt.

      Run began: Mon Apr  4 23:25:07 2016
      Command line used: /usr/bin/IOR -i 3 -v -b 50G -t 128k
      Machine: Linux wolf-6.wolf.hpdd.intel.com
      Start time skew across all tasks: 8.97 sec
      Path: /mnt/lustre
      FS: 37.9 TiB   Used FS: 0.6%   Inodes: 171.8 Mi   Used Inodes: 32.0%
      Participating tasks: 4
      
      Summary:
              api                = POSIX
              test filename      = testFile
              access             = single-shared-file
              pattern            = segmented (1 segment)
              ordering in a file = sequential offsets
              ordering inter file= no tasks offsets
              clients            = 4 (1 per node)
              repetitions        = 3
              xfersize           = 131072 bytes
              blocksize          = 50 GiB
              aggregate filesize = 200 GiB
      

      No messages appeared on wolf-4 OST1 but it was noticed by clients:

      Apr  4 23:37:12 wolf-6 kernel: Lustre: 2811:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1459813025/real 0]  req@ffff880d8570ad00 x1530711614167736/t0(0) o400->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 224/224 e 0 to 1 dl 1459813032 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      Apr  4 23:37:12 wolf-6 kernel: Lustre: coral-OST0001-osc-ffff881030662800: Connection to coral-OST0001 (at 192.168.1.4@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      Apr  4 23:37:16 wolf-6 kernel: Lustre: 2751:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1459812992/real 1459812992]  req@ffff880266e0bc00 x1530711614167672/t0(0) o3->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:6/4 lens 608/432 e 0 to 1 dl 1459813036 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
      Apr  4 23:37:16 wolf-6 kernel: Lustre: 2751:0:(client.c:2063:ptlrpc_expire_one_request()) Skipped 8 previous similar messages
      Apr  4 23:37:18 wolf-6 kernel: Lustre: 2742:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1459813032/real 0]  req@ffff880bbe376900 x1530711614167740/t0(0) o8->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 520/544 e 0 to 1 dl 1459813038 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      Apr  4 23:37:37 wolf-6 kernel: Lustre: 2742:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1459813057/real 1459813057]  req@ffff880bbe375d00 x1530711614167756/t0(0) o8->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 520/544 e 0 to 1 dl 1459813068 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Apr  4 23:38:02 wolf-6 kernel: Lustre: 2742:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1459813082/real 1459813082]  req@ffff880bbe375100 x1530711614167772/t0(0) o8->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 520/544 e 0 to 1 dl 1459813098 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Apr  4 23:38:11 wolf-6 ntpd_intres[2008]: ntp_intres.request: permission denied
      Apr  4 23:38:11 wolf-6 ntpd_intres[2008]: ntp_intres.request: permission denied
      Apr  4 23:38:27 wolf-6 kernel: Lustre: 2742:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1459813107/real 1459813107]  req@ffff880bbe374500 x1530711614167788/t0(0) o8->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 520/544 e 0 to 1 dl 1459813128 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Apr  4 23:38:52 wolf-6 kernel: Lustre: 2742:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1459813132/real 1459813132]  req@ffff880bbe373900 x1530711614167804/t0(0) o8->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 520/544 e 0 to 1 dl 1459813158 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      

      From stats being captured on OST1:

          time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  
      23:30:53     0     0      0     0    0     0    0     0    0   200M   62G  
          time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  
      23:31:29     0     0      0     0    0     0    0     0    0   200M   62G  
          time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  
      23:32:06     0     0      0     0    0     0    0     0    0   200M   62G  
          time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  
      23:32:42     0     0      0     0    0     0    0     0    0   200M   62G  
          time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  
      23:33:19     0     0      0     0    0     0    0     0    0   200M   62G  
          time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  
      23:33:55     0     0      0     0    0     0    0     0    0    62G   62G  
          time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  
      23:34:36     0     0      0     0    0     0    0     0    0    62G   62G  
          time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  
      23:35:18    10    10    100    10  100     0    0     0    0    62G   62G  
          time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  
      23:35:58    91    91    100    47  100    44  100     1  100    63G   62G  
          time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  
      23:36:39     5     5    100     5  100     0    0     0    0    62G   62G  
      
      6 1 0x01 91 4368 5368758848 527202119883608
      name                            type data
      hits                            4    91379399
      misses                          4    70802556
      demand_data_hits                4    62879518
      demand_data_misses              4    52414186
      demand_metadata_hits            4    23738570
      demand_metadata_misses          4    1105675
      prefetch_data_hits              4    4757199
      prefetch_data_misses            4    17281282
      prefetch_metadata_hits          4    4112
      prefetch_metadata_misses        4    1413
      mru_hits                        4    62351920
      mru_ghost_hits                  4    917736
      mfu_hits                        4    24266169
      mfu_ghost_hits                  4    632285
      deleted                         4    116650113
      mutex_miss                      4    447
      evict_skip                      4    184
      evict_not_enough                4    62
      evict_l2_cached                 4    0
      evict_l2_eligible               4    15019191377920
      evict_l2_ineligible             4    408751927296
      evict_l2_skip                   4    0
      hash_elements                   4    1059177
      hash_elements_max               4    1126648
      hash_collisions                 4    11087755
      hash_chains                     4    31686
      hash_chain_max                  4    4
      p                               4    63241464064
      c                               4    67457282048
      c_min                           4    33554432
      c_max                           4    67457282048
      size                            4    67456702536
      hdr_size                        4    396803920
      data_size                       4    66823651328
      metadata_size                   4    76845056
      other_size                      4    159402232
      anon_size                       4    39337984
      anon_evictable_data             4    0
      anon_evictable_metadata         4    0
      mru_size                        4    64780652544
      mru_evictable_data              4    64752189440
      mru_evictable_metadata          4    327680
      mru_ghost_size                  4    2679063040
      mru_ghost_evictable_data        4    2082865152
      mru_ghost_evictable_metadata    4    596197888
      mfu_size                        4    2080505856
      mfu_evictable_data              4    2024931328
      mfu_evictable_metadata          4    3112960
      mfu_ghost_size                  4    56092389888
      mfu_ghost_evictable_data        4    56057790464
      mfu_ghost_evictable_metadata    4    34599424
      l2_hits                         4    0
      l2_misses                       4    0
      l2_feeds                        4    0
      l2_rw_clash                     4    0
      l2_read_bytes                   4    0
      l2_write_bytes                  4    0
      l2_writes_sent                  4    0
      l2_writes_done                  4    0
      l2_writes_error                 4    0
      l2_writes_lock_retry            4    0
      l2_evict_lock_retry             4    0
      l2_evict_reading                4    0
      l2_evict_l1cached               4    0
      l2_free_on_write                4    0
      l2_cdata_free_on_write          4    0
      l2_abort_lowmem                 4    0
      l2_cksum_bad                    4    0
      l2_io_error                     4    0
      l2_size                         4    0
      l2_asize                        4    0
      l2_hdr_size                     4    0
      l2_compress_successes           4    0
      l2_compress_zeros               4    0
      l2_compress_failures            4    0
      memory_throttle_count           4    0
      duplicate_buffers               4    0
      duplicate_buffers_size          4    0
      duplicate_reads                 4    0
      memory_direct_count             4    0
      memory_indirect_count           4    0
      arc_no_grow                     4    0
      arc_tempreserve                 4    0
      arc_loaned_bytes                4    0
      arc_prune                       4    0
      arc_meta_used                   4    633051832
      arc_meta_limit                  4    50592961536
      arc_meta_max                    4    809615080
      arc_meta_min                    4    16777216
      arc_need_free                   4    0
      arc_sys_free                    4    2108039168 
      
      
      proc_arcstats.2016-04-04:21:52:28:misses                          4    60302081
      proc_arcstats.2016-04-04:21:53:05:misses                          4    60302081
      proc_arcstats.2016-04-04:21:53:41:misses                          4    60302081
      proc_arcstats.2016-04-04:21:54:18:misses                          4    60302081
      proc_arcstats.2016-04-04:21:54:55:misses                          4    60302081
      proc_arcstats.2016-04-04:21:55:31:misses                          4    60302081
      proc_arcstats.2016-04-04:21:56:08:misses                          4    60302081
      proc_arcstats.2016-04-04:21:56:44:misses                          4    60302081
      proc_arcstats.2016-04-04:21:57:21:misses                          4    60302081
      proc_arcstats.2016-04-04:21:57:57:misses                          4    60302081
      proc_arcstats.2016-04-04:21:58:34:misses                          4    60302081
      proc_arcstats.2016-04-04:21:59:11:misses                          4    60302081
      proc_arcstats.2016-04-04:21:59:47:misses                          4    60302099
      proc_arcstats.2016-04-04:22:00:25:misses                          4    60302137
      proc_arcstats.2016-04-04:22:01:06:misses                          4    60374920
      proc_arcstats.2016-04-04:22:01:47:misses                          4    60513681
      proc_arcstats.2016-04-04:22:02:27:misses                          4    60644796
      proc_arcstats.2016-04-04:22:03:08:misses                          4    60782163
      proc_arcstats.2016-04-04:22:03:49:misses                          4    60921202
      proc_arcstats.2016-04-04:22:04:30:misses                          4    61049984
      proc_arcstats.2016-04-04:22:05:10:misses                          4    61131959
      proc_arcstats.2016-04-04:22:05:51:misses                          4    61133558
      proc_arcstats.2016-04-04:22:06:32:misses                          4    61203715
      proc_arcstats.2016-04-04:22:07:13:misses                          4    61317428
      proc_arcstats.2016-04-04:22:07:54:misses                          4    61406420
      proc_arcstats.2016-04-04:22:08:35:misses                          4    61520867
      proc_arcstats.2016-04-04:22:09:15:misses                          4    61659833
      proc_arcstats.2016-04-04:22:09:56:misses                          4    61796365
      proc_arcstats.2016-04-04:22:10:37:misses                          4    61905346
      proc_arcstats.2016-04-04:22:11:17:misses                          4    61977003
      proc_arcstats.2016-04-04:22:11:56:misses                          4    61977043
      proc_arcstats.2016-04-04:22:12:37:misses                          4    62083132
      proc_arcstats.2016-04-04:22:13:17:misses                          4    62215655
      proc_arcstats.2016-04-04:22:13:57:misses                          4    62348643
      proc_arcstats.2016-04-04:22:14:38:misses                          4    62494512
      proc_arcstats.2016-04-04:22:15:18:misses                          4    62630944
      proc_arcstats.2016-04-04:22:15:59:misses                          4    62727592
      proc_arcstats.2016-04-04:22:16:38:misses                          4    62806912
      proc_arcstats.2016-04-04:22:17:18:misses                          4    62809594
      proc_arcstats.2016-04-04:22:17:59:misses                          4    62832835
      proc_arcstats.2016-04-04:22:18:40:misses                          4    62874008
      proc_arcstats.2016-04-04:22:19:20:misses                          4    62915527
      proc_arcstats.2016-04-04:22:20:00:misses                          4    62956780
      proc_arcstats.2016-04-04:22:20:40:misses                          4    62998788
      proc_arcstats.2016-04-04:22:21:20:misses                          4    63041800
      proc_arcstats.2016-04-04:22:22:01:misses                          4    63083422
      proc_arcstats.2016-04-04:22:22:41:misses                          4    63125355
      proc_arcstats.2016-04-04:22:23:21:misses                          4    63167419
      proc_arcstats.2016-04-04:22:24:01:misses                          4    63210702
      proc_arcstats.2016-04-04:22:24:41:misses                          4    63252813
      proc_arcstats.2016-04-04:22:25:21:misses                          4    63295877
      proc_arcstats.2016-04-04:22:26:01:misses                          4    63339516
      proc_arcstats.2016-04-04:22:26:41:misses                          4    63383531
      proc_arcstats.2016-04-04:22:27:22:misses                          4    63417625
      proc_arcstats.2016-04-04:22:28:02:misses                          4    63418160
      proc_arcstats.2016-04-04:22:28:44:misses                          4    63418448
      proc_arcstats.2016-04-04:22:29:25:misses                          4    63446807
      proc_arcstats.2016-04-04:22:30:06:misses                          4    63605174
      proc_arcstats.2016-04-04:22:30:46:misses                          4    63787444
      proc_arcstats.2016-04-04:22:31:27:misses                          4    63991757
      proc_arcstats.2016-04-04:22:32:08:misses                          4    64186064
      proc_arcstats.2016-04-04:22:32:48:misses                          4    64370507
      proc_arcstats.2016-04-04:22:33:29:misses                          4    64544830
      proc_arcstats.2016-04-04:22:34:10:misses                          4    64705200
      proc_arcstats.2016-04-04:22:34:51:misses                          4    64890908
      proc_arcstats.2016-04-04:22:35:31:misses                          4    65051186
      proc_arcstats.2016-04-04:22:36:12:misses                          4    65093562
      proc_arcstats.2016-04-04:22:36:48:misses                          4    65093566
      proc_arcstats.2016-04-04:22:37:25:misses                          4    65093566
      proc_arcstats.2016-04-04:22:38:02:misses                          4    65093566
      proc_arcstats.2016-04-04:22:38:38:misses                          4    65093566
      proc_arcstats.2016-04-04:22:39:15:misses                          4    65093566
      proc_arcstats.2016-04-04:22:39:51:misses                          4    65093566
      proc_arcstats.2016-04-04:22:40:28:misses                          4    65093566
      proc_arcstats.2016-04-04:22:41:05:misses                          4    65093566
      proc_arcstats.2016-04-04:22:41:41:misses                          4    65093566
      proc_arcstats.2016-04-04:22:42:18:misses                          4    65093566
      proc_arcstats.2016-04-04:22:42:54:misses                          4    65093566
      proc_arcstats.2016-04-04:22:43:31:misses                          4    65093566
      proc_arcstats.2016-04-04:22:44:07:misses                          4    65093566
      proc_arcstats.2016-04-04:22:44:44:misses                          4    65093593
      proc_arcstats.2016-04-04:22:45:24:misses                          4    65093635
      proc_arcstats.2016-04-04:22:46:05:misses                          4    65093688
      proc_arcstats.2016-04-04:22:46:47:misses                          4    65204992
      proc_arcstats.2016-04-04:22:47:27:misses                          4    65369668
      proc_arcstats.2016-04-04:22:48:08:misses                          4    65543276
      proc_arcstats.2016-04-04:22:48:49:misses                          4    65700706
      proc_arcstats.2016-04-04:22:49:29:misses                          4    65835289
      proc_arcstats.2016-04-04:22:50:10:misses                          4    65978426
      proc_arcstats.2016-04-04:22:50:51:misses                          4    66102294
      proc_arcstats.2016-04-04:22:51:32:misses                          4    66237487
      proc_arcstats.2016-04-04:22:52:12:misses                          4    66372029
      proc_arcstats.2016-04-04:22:52:53:misses                          4    66491789
      proc_arcstats.2016-04-04:22:53:33:misses                          4    66606048
      proc_arcstats.2016-04-04:22:54:14:misses                          4    66728350
      proc_arcstats.2016-04-04:22:54:54:misses                          4    66759150
      proc_arcstats.2016-04-04:22:55:35:misses                          4    66783980
      proc_arcstats.2016-04-04:22:56:16:misses                          4    66882343
      proc_arcstats.2016-04-04:22:56:56:misses                          4    66986649
      proc_arcstats.2016-04-04:22:57:37:misses                          4    67099214
      proc_arcstats.2016-04-04:22:58:18:misses                          4    67236074
      proc_arcstats.2016-04-04:22:58:58:misses                          4    67359854
      proc_arcstats.2016-04-04:22:59:39:misses                          4    67479208
      proc_arcstats.2016-04-04:23:00:19:misses                          4    67603703
      proc_arcstats.2016-04-04:23:01:00:misses                          4    67605334
      proc_arcstats.2016-04-04:23:01:42:misses                          4    67687472
      proc_arcstats.2016-04-04:23:02:22:misses                          4    67823336
      proc_arcstats.2016-04-04:23:03:03:misses                          4    67950500
      proc_arcstats.2016-04-04:23:03:44:misses                          4    68083660
      proc_arcstats.2016-04-04:23:04:24:misses                          4    68221409
      proc_arcstats.2016-04-04:23:05:04:misses                          4    68335712
      proc_arcstats.2016-04-04:23:05:45:misses                          4    68441748
      proc_arcstats.2016-04-04:23:06:24:misses                          4    68443307
      proc_arcstats.2016-04-04:23:07:05:misses                          4    68454637
      proc_arcstats.2016-04-04:23:07:46:misses                          4    68578571
      proc_arcstats.2016-04-04:23:08:27:misses                          4    68690534
      proc_arcstats.2016-04-04:23:09:07:misses                          4    68808692
      proc_arcstats.2016-04-04:23:09:48:misses                          4    68930912
      proc_arcstats.2016-04-04:23:10:28:misses                          4    69068534
      proc_arcstats.2016-04-04:23:11:09:misses                          4    69192434
      proc_arcstats.2016-04-04:23:11:50:misses                          4    69283782
      proc_arcstats.2016-04-04:23:12:30:misses                          4    69284522
      proc_arcstats.2016-04-04:23:13:12:misses                          4    69315159
      proc_arcstats.2016-04-04:23:13:52:misses                          4    69373955
      proc_arcstats.2016-04-04:23:14:33:misses                          4    69449303
      proc_arcstats.2016-04-04:23:15:13:misses                          4    69523499
      proc_arcstats.2016-04-04:23:15:53:misses                          4    69599070
      proc_arcstats.2016-04-04:23:16:34:misses                          4    69674993
      proc_arcstats.2016-04-04:23:17:14:misses                          4    69750733
      proc_arcstats.2016-04-04:23:17:55:misses                          4    69827307
      proc_arcstats.2016-04-04:23:18:35:misses                          4    69847427
      proc_arcstats.2016-04-04:23:19:16:misses                          4    69848231
      proc_arcstats.2016-04-04:23:19:57:misses                          4    69884019
      proc_arcstats.2016-04-04:23:20:37:misses                          4    69946465
      proc_arcstats.2016-04-04:23:21:18:misses                          4    70016284
      proc_arcstats.2016-04-04:23:21:58:misses                          4    70081329
      proc_arcstats.2016-04-04:23:22:39:misses                          4    70148143
      proc_arcstats.2016-04-04:23:23:19:misses                          4    70213937
      proc_arcstats.2016-04-04:23:23:59:misses                          4    70283057
      proc_arcstats.2016-04-04:23:24:40:misses                          4    70353187
      proc_arcstats.2016-04-04:23:25:21:misses                          4    70415880
      proc_arcstats.2016-04-04:23:26:00:misses                          4    70415996
      proc_arcstats.2016-04-04:23:26:37:misses                          4    70415996
      proc_arcstats.2016-04-04:23:27:13:misses                          4    70415996
      proc_arcstats.2016-04-04:23:27:50:misses                          4    70415996
      proc_arcstats.2016-04-04:23:28:26:misses                          4    70415996
      proc_arcstats.2016-04-04:23:29:03:misses                          4    70415996
      proc_arcstats.2016-04-04:23:29:40:misses                          4    70415996
      proc_arcstats.2016-04-04:23:30:16:misses                          4    70415996
      proc_arcstats.2016-04-04:23:30:53:misses                          4    70415996
      proc_arcstats.2016-04-04:23:31:29:misses                          4    70415996
      proc_arcstats.2016-04-04:23:32:06:misses                          4    70415996
      proc_arcstats.2016-04-04:23:32:42:misses                          4    70415996
      proc_arcstats.2016-04-04:23:33:19:misses                          4    70415996
      proc_arcstats.2016-04-04:23:33:55:misses                          4    70416034
      proc_arcstats.2016-04-04:23:34:36:misses                          4    70416109
      proc_arcstats.2016-04-04:23:35:18:misses                          4    70417565
      proc_arcstats.2016-04-04:23:35:58:misses                          4    70597975
      proc_arcstats.2016-04-04:23:36:39:misses                          4    70802556
      
      

      Attachments

        1. lustre-log.1459803990.12256.gz
          11.15 MB
        2. wolf-3-oss0-messages-20160403.gz
          256 kB
        3. wolf-4-oss1-messages.gz
          60 kB
        4. wolf-4-oss1-messages-20160403.gz
          141 kB
        5. wolf-4-ost1-stats.tgz
          3.91 MB
        6. wolf-5-mds-messages-20160404.gz
          92 kB
        7. wolf-6-client-messages.gz
          49 kB
        8. wolf-7-client-messages-20160403.gz
          157 kB

        Issue Links

          Activity

            [LU-7987] Lustre 2.8 OSS with zfs 0.6.5 backend hitting most schedule_timeout
            donut-crowd Donut Crowd (Inactive) made changes -
            Remote Link Original: This issue links to "Page (HPDD Community Wiki)" [ 17876 ]
            jsalians_intel John Salinas (Inactive) made changes -
            Resolution New: Low Priority [ 10100 ]
            Status Original: Open [ 1 ] New: Closed [ 6 ]
            jsalians_intel John Salinas (Inactive) made changes -
            Priority Original: Major [ 3 ] New: Critical [ 2 ]
            jsalians_intel John Salinas (Inactive) made changes -
            Remote Link New: This issue links to "Page (HPDD Community Wiki)" [ 17876 ]
            jsalians_intel John Salinas (Inactive) made changes -
            Summary Original: Lustre 2.8 OSS with zfs 0.6.5 backend stopped responding due to apparent low memory New: Lustre 2.8 OSS with zfs 0.6.5 backend hitting most schedule_timeout
            jay Jinshan Xiong (Inactive) made changes -
            Link New: This issue is related to LU-7404 [ LU-7404 ]
            jlevi Jodi Levi (Inactive) made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Jinshan Xiong [ jay ]
            jsalians_intel John Salinas (Inactive) made changes -
            Summary Original: Lustre 2.8 OSS with zfs 0.6.5 backend stopped responding after schedule_timeout New: Lustre 2.8 OSS with zfs 0.6.5 backend stopped responding due to apparent low memory
            jay Jinshan Xiong (Inactive) made changes -
            Description Original: While trying to test for LU-7404 OST timeout the one of the OSS nodes became unaccessible via ssh and conman.

            This was running with 2 OSS nodes running 1 zfs zpool on each, 1 mds, and 4 clients running IOR. The first message that was seen was fairly soon after starting. About 2 hours after running lustre stopped responding to lustre clients and static scripts running on the OSS stopped running.

            First message seen was:
            Apr 4 21:06:29 wolf-4 kernel: LNet: Service thread pid 12256 was inactive for 200.30s. The thread might be hung, or it might only be slow an
            d will resume later. Dumping the stack trace for debugging purposes:
            Apr 4 21:06:29 wolf-4 kernel: Pid: 12256, comm: ll_ost02_000
            Apr 4 21:06:29 wolf-4 kernel: #012Call Trace:
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff8163a9e9>] schedule+0x29/0x70
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff816386d9>] schedule_timeout+0x209/0x2d0
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa012dc0c>] ? vdev_mirror_io_start+0xac/0x190 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa012d150>] ? vdev_mirror_child_done+0x0/0x30 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff810d814c>] ? ktime_get_ts64+0x4c/0xf0
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff8163a01e>] io_schedule_timeout+0xae/0x130
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff810a6896>] ? prepare_to_wait_exclusive+0x56/0x90
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff8163a0b8>] io_schedule+0x18/0x20
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa00116e0>] cv_wait_common+0xb0/0x150 [spl]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff810a6ae0>] ? autoremove_wake_function+0x0/0x40
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa00117d8>] __cv_wait_io+0x18/0x20 [spl]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa016db83>] zio_wait+0x123/0x210 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa00c44ad>] dbuf_read+0x30d/0x930 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa00c60eb>] dmu_buf_will_dirty+0x4b/0xa0 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa00deb46>] dnode_dirty_l1+0x36/0x50 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa00e146a>] dnode_free_range+0x4da/0x6b0 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa00cb829>] dmu_free_long_range+0x1d9/0x280 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa11fe468>] osd_unlinked_object_free+0x28/0x280 [osd_zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa11fe7d3>] osd_unlinked_list_emptify+0x63/0xa0 [osd_zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa120112a>] osd_trans_stop+0x31a/0x5b0 [osd_zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa12fef9f>] ofd_trans_stop+0x1f/0x60 [ofd]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa1302bb2>] ofd_object_destroy+0x2b2/0x890 [ofd]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa12fb8f7>] ofd_destroy_by_fid+0x307/0x510 [ofd]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa0fce170>] ? ldlm_blocking_ast+0x0/0x170 [ptlrpc]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa0fc8730>] ? ldlm_completion_ast+0x0/0x910 [ptlrpc]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa12edc67>] ofd_destroy_hdl+0x267/0xa50 [ofd]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa105af65>] tgt_request_handle+0x915/0x1320 [ptlrpc]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa1007beb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffffa0cec578>] ? lc_watchdog_touch+0x68/0x180 [libcfs]
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffffa1006498>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffff810b8c22>] ? default_wake_function+0x12/0x20
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffff810af028>] ? __wake_up_common+0x58/0x90
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffffa100bc90>] ptlrpc_main+0xa90/0x1db0 [ptlrpc]
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffffa100b200>] ? ptlrpc_main+0x0/0x1db0 [ptlrpc]
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffff810a5aef>] kthread+0xcf/0xe0
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffff810a5a20>] ? kthread+0x0/0xe0
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffff81645998>] ret_from_fork+0x58/0x90
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffff810a5a20>] ? kthread+0x0/0xe0
            Apr 4 21:06:30 wolf-4 kernel:
            Apr 4 21:06:30 wolf-4 kernel: LustreError: dumping log to /tmp/lustre-log.1459803990.12256
            Apr 4 21:07:12 wolf-4 ntpd_intres[3829]: ntp_intres.request: permission denied
            Apr 4 21:07:12 wolf-4 ntpd_intres[3829]: ntp_intres.request: permission denied
            Apr 4 21:07:19 wolf-4 kernel: LNet: Service thread pid 12256 completed after 249.57s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).

            But the system appeared to keep running past that and then come to a grinding halt.
            Run began: Mon Apr 4 23:25:07 2016
            Command line used: /usr/bin/IOR -i 3 -v -b 50G -t 128k
            Machine: Linux wolf-6.wolf.hpdd.intel.com
            Start time skew across all tasks: 8.97 sec
            Path: /mnt/lustre
            FS: 37.9 TiB Used FS: 0.6% Inodes: 171.8 Mi Used Inodes: 32.0%
            Participating tasks: 4

            Summary:
                    api = POSIX
                    test filename = testFile
                    access = single-shared-file
                    pattern = segmented (1 segment)
                    ordering in a file = sequential offsets
                    ordering inter file= no tasks offsets
                    clients = 4 (1 per node)
                    repetitions = 3
                    xfersize = 131072 bytes
                    blocksize = 50 GiB
                    aggregate filesize = 200 GiB

            No messages appeared on wolf-4 OST1 but it was noticed by clients:
            Apr 4 23:37:12 wolf-6 kernel: Lustre: 2811:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1459813025/real 0] req@ffff880d8570ad00 x1530711614167736/t0(0) o400->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 224/224 e 0 to 1 dl 1459813032 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
            Apr 4 23:37:12 wolf-6 kernel: Lustre: coral-OST0001-osc-ffff881030662800: Connection to coral-OST0001 (at 192.168.1.4@o2ib) was lost; in progress operations using this service will wait for recovery to complete
            Apr 4 23:37:16 wolf-6 kernel: Lustre: 2751:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1459812992/real 1459812992] req@ffff880266e0bc00 x1530711614167672/t0(0) o3->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:6/4 lens 608/432 e 0 to 1 dl 1459813036 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
            Apr 4 23:37:16 wolf-6 kernel: Lustre: 2751:0:(client.c:2063:ptlrpc_expire_one_request()) Skipped 8 previous similar messages
            Apr 4 23:37:18 wolf-6 kernel: Lustre: 2742:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1459813032/real 0] req@ffff880bbe376900 x1530711614167740/t0(0) o8->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 520/544 e 0 to 1 dl 1459813038 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
            Apr 4 23:37:37 wolf-6 kernel: Lustre: 2742:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1459813057/real 1459813057] req@ffff880bbe375d00 x1530711614167756/t0(0) o8->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 520/544 e 0 to 1 dl 1459813068 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            Apr 4 23:38:02 wolf-6 kernel: Lustre: 2742:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1459813082/real 1459813082] req@ffff880bbe375100 x1530711614167772/t0(0) o8->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 520/544 e 0 to 1 dl 1459813098 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            Apr 4 23:38:11 wolf-6 ntpd_intres[2008]: ntp_intres.request: permission denied
            Apr 4 23:38:11 wolf-6 ntpd_intres[2008]: ntp_intres.request: permission denied
            Apr 4 23:38:27 wolf-6 kernel: Lustre: 2742:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1459813107/real 1459813107] req@ffff880bbe374500 x1530711614167788/t0(0) o8->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 520/544 e 0 to 1 dl 1459813128 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            Apr 4 23:38:52 wolf-6 kernel: Lustre: 2742:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1459813132/real 1459813132] req@ffff880bbe373900 x1530711614167804/t0(0) o8->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 520/544 e 0 to 1 dl 1459813158 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
             
            From stats being captured on OST1:
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:30:53 0 0 0 0 0 0 0 0 0 200M 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:31:29 0 0 0 0 0 0 0 0 0 200M 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:32:06 0 0 0 0 0 0 0 0 0 200M 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:32:42 0 0 0 0 0 0 0 0 0 200M 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:33:19 0 0 0 0 0 0 0 0 0 200M 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:33:55 0 0 0 0 0 0 0 0 0 62G 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:34:36 0 0 0 0 0 0 0 0 0 62G 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:35:18 10 10 100 10 100 0 0 0 0 62G 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:35:58 91 91 100 47 100 44 100 1 100 63G 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:36:39 5 5 100 5 100 0 0 0 0 62G 62G

            6 1 0x01 91 4368 5368758848 527202119883608
            name type data
            hits 4 91379399
            misses 4 70802556
            demand_data_hits 4 62879518
            demand_data_misses 4 52414186
            demand_metadata_hits 4 23738570
            demand_metadata_misses 4 1105675
            prefetch_data_hits 4 4757199
            prefetch_data_misses 4 17281282
            prefetch_metadata_hits 4 4112
            prefetch_metadata_misses 4 1413
            mru_hits 4 62351920
            mru_ghost_hits 4 917736
            mfu_hits 4 24266169
            mfu_ghost_hits 4 632285
            deleted 4 116650113
            mutex_miss 4 447
            evict_skip 4 184
            evict_not_enough 4 62
            evict_l2_cached 4 0
            evict_l2_eligible 4 15019191377920
            evict_l2_ineligible 4 408751927296
            evict_l2_skip 4 0
            hash_elements 4 1059177
            hash_elements_max 4 1126648
            hash_collisions 4 11087755
            hash_chains 4 31686
            hash_chain_max 4 4
            p 4 63241464064
            c 4 67457282048
            c_min 4 33554432
            c_max 4 67457282048
            size 4 67456702536
            hdr_size 4 396803920
            data_size 4 66823651328
            metadata_size 4 76845056
            other_size 4 159402232
            anon_size 4 39337984
            anon_evictable_data 4 0
            anon_evictable_metadata 4 0
            mru_size 4 64780652544
            mru_evictable_data 4 64752189440
            mru_evictable_metadata 4 327680
            mru_ghost_size 4 2679063040
            mru_ghost_evictable_data 4 2082865152
            mru_ghost_evictable_metadata 4 596197888
            mfu_size 4 2080505856
            mfu_evictable_data 4 2024931328
            mfu_evictable_metadata 4 3112960
            mfu_ghost_size 4 56092389888
            mfu_ghost_evictable_data 4 56057790464
            mfu_ghost_evictable_metadata 4 34599424
            l2_hits 4 0
            l2_misses 4 0
            l2_feeds 4 0
            l2_rw_clash 4 0
            l2_read_bytes 4 0
            l2_write_bytes 4 0
            l2_writes_sent 4 0
            l2_writes_done 4 0
            l2_writes_error 4 0
            l2_writes_lock_retry 4 0
            l2_evict_lock_retry 4 0
            l2_evict_reading 4 0
            l2_evict_l1cached 4 0
            l2_free_on_write 4 0
            l2_cdata_free_on_write 4 0
            l2_abort_lowmem 4 0
            l2_cksum_bad 4 0
            l2_io_error 4 0
            l2_size 4 0
            l2_asize 4 0
            l2_hdr_size 4 0
            l2_compress_successes 4 0
            l2_compress_zeros 4 0
            l2_compress_failures 4 0
            memory_throttle_count 4 0
            duplicate_buffers 4 0
            duplicate_buffers_size 4 0
            duplicate_reads 4 0
            memory_direct_count 4 0
            memory_indirect_count 4 0
            arc_no_grow 4 0
            arc_tempreserve 4 0
            arc_loaned_bytes 4 0
            arc_prune 4 0
            arc_meta_used 4 633051832
            arc_meta_limit 4 50592961536
            arc_meta_max 4 809615080
            arc_meta_min 4 16777216
            arc_need_free 4 0
            arc_sys_free 4 2108039168

            proc_arcstats.2016-04-04:21:52:28:misses 4 60302081
            proc_arcstats.2016-04-04:21:53:05:misses 4 60302081
            proc_arcstats.2016-04-04:21:53:41:misses 4 60302081
            proc_arcstats.2016-04-04:21:54:18:misses 4 60302081
            proc_arcstats.2016-04-04:21:54:55:misses 4 60302081
            proc_arcstats.2016-04-04:21:55:31:misses 4 60302081
            proc_arcstats.2016-04-04:21:56:08:misses 4 60302081
            proc_arcstats.2016-04-04:21:56:44:misses 4 60302081
            proc_arcstats.2016-04-04:21:57:21:misses 4 60302081
            proc_arcstats.2016-04-04:21:57:57:misses 4 60302081
            proc_arcstats.2016-04-04:21:58:34:misses 4 60302081
            proc_arcstats.2016-04-04:21:59:11:misses 4 60302081
            proc_arcstats.2016-04-04:21:59:47:misses 4 60302099
            proc_arcstats.2016-04-04:22:00:25:misses 4 60302137
            proc_arcstats.2016-04-04:22:01:06:misses 4 60374920
            proc_arcstats.2016-04-04:22:01:47:misses 4 60513681
            proc_arcstats.2016-04-04:22:02:27:misses 4 60644796
            proc_arcstats.2016-04-04:22:03:08:misses 4 60782163
            proc_arcstats.2016-04-04:22:03:49:misses 4 60921202
            proc_arcstats.2016-04-04:22:04:30:misses 4 61049984
            proc_arcstats.2016-04-04:22:05:10:misses 4 61131959
            proc_arcstats.2016-04-04:22:05:51:misses 4 61133558
            proc_arcstats.2016-04-04:22:06:32:misses 4 61203715
            proc_arcstats.2016-04-04:22:07:13:misses 4 61317428
            proc_arcstats.2016-04-04:22:07:54:misses 4 61406420
            proc_arcstats.2016-04-04:22:08:35:misses 4 61520867
            proc_arcstats.2016-04-04:22:09:15:misses 4 61659833
            proc_arcstats.2016-04-04:22:09:56:misses 4 61796365
            proc_arcstats.2016-04-04:22:10:37:misses 4 61905346
            proc_arcstats.2016-04-04:22:11:17:misses 4 61977003
            proc_arcstats.2016-04-04:22:11:56:misses 4 61977043
            proc_arcstats.2016-04-04:22:12:37:misses 4 62083132
            proc_arcstats.2016-04-04:22:13:17:misses 4 62215655
            proc_arcstats.2016-04-04:22:13:57:misses 4 62348643
            proc_arcstats.2016-04-04:22:14:38:misses 4 62494512
            proc_arcstats.2016-04-04:22:15:18:misses 4 62630944
            proc_arcstats.2016-04-04:22:15:59:misses 4 62727592
            proc_arcstats.2016-04-04:22:16:38:misses 4 62806912
            proc_arcstats.2016-04-04:22:17:18:misses 4 62809594
            proc_arcstats.2016-04-04:22:17:59:misses 4 62832835
            proc_arcstats.2016-04-04:22:18:40:misses 4 62874008
            proc_arcstats.2016-04-04:22:19:20:misses 4 62915527
            proc_arcstats.2016-04-04:22:20:00:misses 4 62956780
            proc_arcstats.2016-04-04:22:20:40:misses 4 62998788
            proc_arcstats.2016-04-04:22:21:20:misses 4 63041800
            proc_arcstats.2016-04-04:22:22:01:misses 4 63083422
            proc_arcstats.2016-04-04:22:22:41:misses 4 63125355
            proc_arcstats.2016-04-04:22:23:21:misses 4 63167419
            proc_arcstats.2016-04-04:22:24:01:misses 4 63210702
            proc_arcstats.2016-04-04:22:24:41:misses 4 63252813
            proc_arcstats.2016-04-04:22:25:21:misses 4 63295877
            proc_arcstats.2016-04-04:22:26:01:misses 4 63339516
            proc_arcstats.2016-04-04:22:26:41:misses 4 63383531
            proc_arcstats.2016-04-04:22:27:22:misses 4 63417625
            proc_arcstats.2016-04-04:22:28:02:misses 4 63418160
            proc_arcstats.2016-04-04:22:28:44:misses 4 63418448
            proc_arcstats.2016-04-04:22:29:25:misses 4 63446807
            proc_arcstats.2016-04-04:22:30:06:misses 4 63605174
            proc_arcstats.2016-04-04:22:30:46:misses 4 63787444
            proc_arcstats.2016-04-04:22:31:27:misses 4 63991757
            proc_arcstats.2016-04-04:22:32:08:misses 4 64186064
            proc_arcstats.2016-04-04:22:32:48:misses 4 64370507
            proc_arcstats.2016-04-04:22:33:29:misses 4 64544830
            proc_arcstats.2016-04-04:22:34:10:misses 4 64705200
            proc_arcstats.2016-04-04:22:34:51:misses 4 64890908
            proc_arcstats.2016-04-04:22:35:31:misses 4 65051186
            proc_arcstats.2016-04-04:22:36:12:misses 4 65093562
            proc_arcstats.2016-04-04:22:36:48:misses 4 65093566
            proc_arcstats.2016-04-04:22:37:25:misses 4 65093566
            proc_arcstats.2016-04-04:22:38:02:misses 4 65093566
            proc_arcstats.2016-04-04:22:38:38:misses 4 65093566
            proc_arcstats.2016-04-04:22:39:15:misses 4 65093566
            proc_arcstats.2016-04-04:22:39:51:misses 4 65093566
            proc_arcstats.2016-04-04:22:40:28:misses 4 65093566
            proc_arcstats.2016-04-04:22:41:05:misses 4 65093566
            proc_arcstats.2016-04-04:22:41:41:misses 4 65093566
            proc_arcstats.2016-04-04:22:42:18:misses 4 65093566
            proc_arcstats.2016-04-04:22:42:54:misses 4 65093566
            proc_arcstats.2016-04-04:22:43:31:misses 4 65093566
            proc_arcstats.2016-04-04:22:44:07:misses 4 65093566
            proc_arcstats.2016-04-04:22:44:44:misses 4 65093593
            proc_arcstats.2016-04-04:22:45:24:misses 4 65093635
            proc_arcstats.2016-04-04:22:46:05:misses 4 65093688
            proc_arcstats.2016-04-04:22:46:47:misses 4 65204992
            proc_arcstats.2016-04-04:22:47:27:misses 4 65369668
            proc_arcstats.2016-04-04:22:48:08:misses 4 65543276
            proc_arcstats.2016-04-04:22:48:49:misses 4 65700706
            proc_arcstats.2016-04-04:22:49:29:misses 4 65835289
            proc_arcstats.2016-04-04:22:50:10:misses 4 65978426
            proc_arcstats.2016-04-04:22:50:51:misses 4 66102294
            proc_arcstats.2016-04-04:22:51:32:misses 4 66237487
            proc_arcstats.2016-04-04:22:52:12:misses 4 66372029
            proc_arcstats.2016-04-04:22:52:53:misses 4 66491789
            proc_arcstats.2016-04-04:22:53:33:misses 4 66606048
            proc_arcstats.2016-04-04:22:54:14:misses 4 66728350
            proc_arcstats.2016-04-04:22:54:54:misses 4 66759150
            proc_arcstats.2016-04-04:22:55:35:misses 4 66783980
            proc_arcstats.2016-04-04:22:56:16:misses 4 66882343
            proc_arcstats.2016-04-04:22:56:56:misses 4 66986649
            proc_arcstats.2016-04-04:22:57:37:misses 4 67099214
            proc_arcstats.2016-04-04:22:58:18:misses 4 67236074
            proc_arcstats.2016-04-04:22:58:58:misses 4 67359854
            proc_arcstats.2016-04-04:22:59:39:misses 4 67479208
            proc_arcstats.2016-04-04:23:00:19:misses 4 67603703
            proc_arcstats.2016-04-04:23:01:00:misses 4 67605334
            proc_arcstats.2016-04-04:23:01:42:misses 4 67687472
            proc_arcstats.2016-04-04:23:02:22:misses 4 67823336
            proc_arcstats.2016-04-04:23:03:03:misses 4 67950500
            proc_arcstats.2016-04-04:23:03:44:misses 4 68083660
            proc_arcstats.2016-04-04:23:04:24:misses 4 68221409
            proc_arcstats.2016-04-04:23:05:04:misses 4 68335712
            proc_arcstats.2016-04-04:23:05:45:misses 4 68441748
            proc_arcstats.2016-04-04:23:06:24:misses 4 68443307
            proc_arcstats.2016-04-04:23:07:05:misses 4 68454637
            proc_arcstats.2016-04-04:23:07:46:misses 4 68578571
            proc_arcstats.2016-04-04:23:08:27:misses 4 68690534
            proc_arcstats.2016-04-04:23:09:07:misses 4 68808692
            proc_arcstats.2016-04-04:23:09:48:misses 4 68930912
            proc_arcstats.2016-04-04:23:10:28:misses 4 69068534
            proc_arcstats.2016-04-04:23:11:09:misses 4 69192434
            proc_arcstats.2016-04-04:23:11:50:misses 4 69283782
            proc_arcstats.2016-04-04:23:12:30:misses 4 69284522
            proc_arcstats.2016-04-04:23:13:12:misses 4 69315159
            proc_arcstats.2016-04-04:23:13:52:misses 4 69373955
            proc_arcstats.2016-04-04:23:14:33:misses 4 69449303
            proc_arcstats.2016-04-04:23:15:13:misses 4 69523499
            proc_arcstats.2016-04-04:23:15:53:misses 4 69599070
            proc_arcstats.2016-04-04:23:16:34:misses 4 69674993
            proc_arcstats.2016-04-04:23:17:14:misses 4 69750733
            proc_arcstats.2016-04-04:23:17:55:misses 4 69827307
            proc_arcstats.2016-04-04:23:18:35:misses 4 69847427
            proc_arcstats.2016-04-04:23:19:16:misses 4 69848231
            proc_arcstats.2016-04-04:23:19:57:misses 4 69884019
            proc_arcstats.2016-04-04:23:20:37:misses 4 69946465
            proc_arcstats.2016-04-04:23:21:18:misses 4 70016284
            proc_arcstats.2016-04-04:23:21:58:misses 4 70081329
            proc_arcstats.2016-04-04:23:22:39:misses 4 70148143
            proc_arcstats.2016-04-04:23:23:19:misses 4 70213937
            proc_arcstats.2016-04-04:23:23:59:misses 4 70283057
            proc_arcstats.2016-04-04:23:24:40:misses 4 70353187
            proc_arcstats.2016-04-04:23:25:21:misses 4 70415880
            proc_arcstats.2016-04-04:23:26:00:misses 4 70415996
            proc_arcstats.2016-04-04:23:26:37:misses 4 70415996
            proc_arcstats.2016-04-04:23:27:13:misses 4 70415996
            proc_arcstats.2016-04-04:23:27:50:misses 4 70415996
            proc_arcstats.2016-04-04:23:28:26:misses 4 70415996
            proc_arcstats.2016-04-04:23:29:03:misses 4 70415996
            proc_arcstats.2016-04-04:23:29:40:misses 4 70415996
            proc_arcstats.2016-04-04:23:30:16:misses 4 70415996
            proc_arcstats.2016-04-04:23:30:53:misses 4 70415996
            proc_arcstats.2016-04-04:23:31:29:misses 4 70415996
            proc_arcstats.2016-04-04:23:32:06:misses 4 70415996
            proc_arcstats.2016-04-04:23:32:42:misses 4 70415996
            proc_arcstats.2016-04-04:23:33:19:misses 4 70415996
            proc_arcstats.2016-04-04:23:33:55:misses 4 70416034
            proc_arcstats.2016-04-04:23:34:36:misses 4 70416109
            proc_arcstats.2016-04-04:23:35:18:misses 4 70417565
            proc_arcstats.2016-04-04:23:35:58:misses 4 70597975
            proc_arcstats.2016-04-04:23:36:39:misses 4 70802556
            New: While trying to test for LU-7404 OST timeout the one of the OSS nodes became unaccessible via ssh and conman.

            This was running with 2 OSS nodes running 1 zfs zpool on each, 1 mds, and 4 clients running IOR. The first message that was seen was fairly soon after starting. About 2 hours after running lustre stopped responding to lustre clients and static scripts running on the OSS stopped running.

            First message seen was:
            Apr 4 21:06:29 wolf-4 kernel: LNet: Service thread pid 12256 was inactive for 200.30s. The thread might be hung, or it might only be slow an
            d will resume later. Dumping the stack trace for debugging purposes:

            {noformat}
            Apr 4 21:06:29 wolf-4 kernel: Pid: 12256, comm: ll_ost02_000
            Apr 4 21:06:29 wolf-4 kernel: #012Call Trace:
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff8163a9e9>] schedule+0x29/0x70
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff816386d9>] schedule_timeout+0x209/0x2d0
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa012dc0c>] ? vdev_mirror_io_start+0xac/0x190 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa012d150>] ? vdev_mirror_child_done+0x0/0x30 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff810d814c>] ? ktime_get_ts64+0x4c/0xf0
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff8163a01e>] io_schedule_timeout+0xae/0x130
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff810a6896>] ? prepare_to_wait_exclusive+0x56/0x90
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff8163a0b8>] io_schedule+0x18/0x20
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa00116e0>] cv_wait_common+0xb0/0x150 [spl]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffff810a6ae0>] ? autoremove_wake_function+0x0/0x40
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa00117d8>] __cv_wait_io+0x18/0x20 [spl]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa016db83>] zio_wait+0x123/0x210 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa00c44ad>] dbuf_read+0x30d/0x930 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa00c60eb>] dmu_buf_will_dirty+0x4b/0xa0 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa00deb46>] dnode_dirty_l1+0x36/0x50 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa00e146a>] dnode_free_range+0x4da/0x6b0 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa00cb829>] dmu_free_long_range+0x1d9/0x280 [zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa11fe468>] osd_unlinked_object_free+0x28/0x280 [osd_zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa11fe7d3>] osd_unlinked_list_emptify+0x63/0xa0 [osd_zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa120112a>] osd_trans_stop+0x31a/0x5b0 [osd_zfs]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa12fef9f>] ofd_trans_stop+0x1f/0x60 [ofd]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa1302bb2>] ofd_object_destroy+0x2b2/0x890 [ofd]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa12fb8f7>] ofd_destroy_by_fid+0x307/0x510 [ofd]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa0fce170>] ? ldlm_blocking_ast+0x0/0x170 [ptlrpc]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa0fc8730>] ? ldlm_completion_ast+0x0/0x910 [ptlrpc]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa12edc67>] ofd_destroy_hdl+0x267/0xa50 [ofd]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa105af65>] tgt_request_handle+0x915/0x1320 [ptlrpc]
            Apr 4 21:06:29 wolf-4 kernel: [<ffffffffa1007beb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffffa0cec578>] ? lc_watchdog_touch+0x68/0x180 [libcfs]
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffffa1006498>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffff810b8c22>] ? default_wake_function+0x12/0x20
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffff810af028>] ? __wake_up_common+0x58/0x90
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffffa100bc90>] ptlrpc_main+0xa90/0x1db0 [ptlrpc]
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffffa100b200>] ? ptlrpc_main+0x0/0x1db0 [ptlrpc]
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffff810a5aef>] kthread+0xcf/0xe0
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffff810a5a20>] ? kthread+0x0/0xe0
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffff81645998>] ret_from_fork+0x58/0x90
            Apr 4 21:06:30 wolf-4 kernel: [<ffffffff810a5a20>] ? kthread+0x0/0xe0
            Apr 4 21:06:30 wolf-4 kernel:
            Apr 4 21:06:30 wolf-4 kernel: LustreError: dumping log to /tmp/lustre-log.1459803990.12256
            Apr 4 21:07:12 wolf-4 ntpd_intres[3829]: ntp_intres.request: permission denied
            Apr 4 21:07:12 wolf-4 ntpd_intres[3829]: ntp_intres.request: permission denied
            Apr 4 21:07:19 wolf-4 kernel: LNet: Service thread pid 12256 completed after 249.57s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
            {noformat}

            But the system appeared to keep running past that and then come to a grinding halt.

            {noformat}
            Run began: Mon Apr 4 23:25:07 2016
            Command line used: /usr/bin/IOR -i 3 -v -b 50G -t 128k
            Machine: Linux wolf-6.wolf.hpdd.intel.com
            Start time skew across all tasks: 8.97 sec
            Path: /mnt/lustre
            FS: 37.9 TiB Used FS: 0.6% Inodes: 171.8 Mi Used Inodes: 32.0%
            Participating tasks: 4

            Summary:
                    api = POSIX
                    test filename = testFile
                    access = single-shared-file
                    pattern = segmented (1 segment)
                    ordering in a file = sequential offsets
                    ordering inter file= no tasks offsets
                    clients = 4 (1 per node)
                    repetitions = 3
                    xfersize = 131072 bytes
                    blocksize = 50 GiB
                    aggregate filesize = 200 GiB
            {noformat}

            No messages appeared on wolf-4 OST1 but it was noticed by clients:

            {noformat}
            Apr 4 23:37:12 wolf-6 kernel: Lustre: 2811:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1459813025/real 0] req@ffff880d8570ad00 x1530711614167736/t0(0) o400->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 224/224 e 0 to 1 dl 1459813032 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
            Apr 4 23:37:12 wolf-6 kernel: Lustre: coral-OST0001-osc-ffff881030662800: Connection to coral-OST0001 (at 192.168.1.4@o2ib) was lost; in progress operations using this service will wait for recovery to complete
            Apr 4 23:37:16 wolf-6 kernel: Lustre: 2751:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1459812992/real 1459812992] req@ffff880266e0bc00 x1530711614167672/t0(0) o3->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:6/4 lens 608/432 e 0 to 1 dl 1459813036 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
            Apr 4 23:37:16 wolf-6 kernel: Lustre: 2751:0:(client.c:2063:ptlrpc_expire_one_request()) Skipped 8 previous similar messages
            Apr 4 23:37:18 wolf-6 kernel: Lustre: 2742:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1459813032/real 0] req@ffff880bbe376900 x1530711614167740/t0(0) o8->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 520/544 e 0 to 1 dl 1459813038 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
            Apr 4 23:37:37 wolf-6 kernel: Lustre: 2742:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1459813057/real 1459813057] req@ffff880bbe375d00 x1530711614167756/t0(0) o8->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 520/544 e 0 to 1 dl 1459813068 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            Apr 4 23:38:02 wolf-6 kernel: Lustre: 2742:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1459813082/real 1459813082] req@ffff880bbe375100 x1530711614167772/t0(0) o8->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 520/544 e 0 to 1 dl 1459813098 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            Apr 4 23:38:11 wolf-6 ntpd_intres[2008]: ntp_intres.request: permission denied
            Apr 4 23:38:11 wolf-6 ntpd_intres[2008]: ntp_intres.request: permission denied
            Apr 4 23:38:27 wolf-6 kernel: Lustre: 2742:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1459813107/real 1459813107] req@ffff880bbe374500 x1530711614167788/t0(0) o8->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 520/544 e 0 to 1 dl 1459813128 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            Apr 4 23:38:52 wolf-6 kernel: Lustre: 2742:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1459813132/real 1459813132] req@ffff880bbe373900 x1530711614167804/t0(0) o8->coral-OST0001-osc-ffff881030662800@192.168.1.4@o2ib:28/4 lens 520/544 e 0 to 1 dl 1459813158 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            {noformat}
             
            From stats being captured on OST1:
            {noformat}
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:30:53 0 0 0 0 0 0 0 0 0 200M 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:31:29 0 0 0 0 0 0 0 0 0 200M 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:32:06 0 0 0 0 0 0 0 0 0 200M 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:32:42 0 0 0 0 0 0 0 0 0 200M 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:33:19 0 0 0 0 0 0 0 0 0 200M 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:33:55 0 0 0 0 0 0 0 0 0 62G 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:34:36 0 0 0 0 0 0 0 0 0 62G 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:35:18 10 10 100 10 100 0 0 0 0 62G 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:35:58 91 91 100 47 100 44 100 1 100 63G 62G
                time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
            23:36:39 5 5 100 5 100 0 0 0 0 62G 62G
            {noformat}

            {noformat}

            6 1 0x01 91 4368 5368758848 527202119883608
            name type data
            hits 4 91379399
            misses 4 70802556
            demand_data_hits 4 62879518
            demand_data_misses 4 52414186
            demand_metadata_hits 4 23738570
            demand_metadata_misses 4 1105675
            prefetch_data_hits 4 4757199
            prefetch_data_misses 4 17281282
            prefetch_metadata_hits 4 4112
            prefetch_metadata_misses 4 1413
            mru_hits 4 62351920
            mru_ghost_hits 4 917736
            mfu_hits 4 24266169
            mfu_ghost_hits 4 632285
            deleted 4 116650113
            mutex_miss 4 447
            evict_skip 4 184
            evict_not_enough 4 62
            evict_l2_cached 4 0
            evict_l2_eligible 4 15019191377920
            evict_l2_ineligible 4 408751927296
            evict_l2_skip 4 0
            hash_elements 4 1059177
            hash_elements_max 4 1126648
            hash_collisions 4 11087755
            hash_chains 4 31686
            hash_chain_max 4 4
            p 4 63241464064
            c 4 67457282048
            c_min 4 33554432
            c_max 4 67457282048
            size 4 67456702536
            hdr_size 4 396803920
            data_size 4 66823651328
            metadata_size 4 76845056
            other_size 4 159402232
            anon_size 4 39337984
            anon_evictable_data 4 0
            anon_evictable_metadata 4 0
            mru_size 4 64780652544
            mru_evictable_data 4 64752189440
            mru_evictable_metadata 4 327680
            mru_ghost_size 4 2679063040
            mru_ghost_evictable_data 4 2082865152
            mru_ghost_evictable_metadata 4 596197888
            mfu_size 4 2080505856
            mfu_evictable_data 4 2024931328
            mfu_evictable_metadata 4 3112960
            mfu_ghost_size 4 56092389888
            mfu_ghost_evictable_data 4 56057790464
            mfu_ghost_evictable_metadata 4 34599424
            l2_hits 4 0
            l2_misses 4 0
            l2_feeds 4 0
            l2_rw_clash 4 0
            l2_read_bytes 4 0
            l2_write_bytes 4 0
            l2_writes_sent 4 0
            l2_writes_done 4 0
            l2_writes_error 4 0
            l2_writes_lock_retry 4 0
            l2_evict_lock_retry 4 0
            l2_evict_reading 4 0
            l2_evict_l1cached 4 0
            l2_free_on_write 4 0
            l2_cdata_free_on_write 4 0
            l2_abort_lowmem 4 0
            l2_cksum_bad 4 0
            l2_io_error 4 0
            l2_size 4 0
            l2_asize 4 0
            l2_hdr_size 4 0
            l2_compress_successes 4 0
            l2_compress_zeros 4 0
            l2_compress_failures 4 0
            memory_throttle_count 4 0
            duplicate_buffers 4 0
            duplicate_buffers_size 4 0
            duplicate_reads 4 0
            memory_direct_count 4 0
            memory_indirect_count 4 0
            arc_no_grow 4 0
            arc_tempreserve 4 0
            arc_loaned_bytes 4 0
            arc_prune 4 0
            arc_meta_used 4 633051832
            arc_meta_limit 4 50592961536
            arc_meta_max 4 809615080
            arc_meta_min 4 16777216
            arc_need_free 4 0
            arc_sys_free 4 2108039168

            {noformat}

            {noformat}

            proc_arcstats.2016-04-04:21:52:28:misses 4 60302081
            proc_arcstats.2016-04-04:21:53:05:misses 4 60302081
            proc_arcstats.2016-04-04:21:53:41:misses 4 60302081
            proc_arcstats.2016-04-04:21:54:18:misses 4 60302081
            proc_arcstats.2016-04-04:21:54:55:misses 4 60302081
            proc_arcstats.2016-04-04:21:55:31:misses 4 60302081
            proc_arcstats.2016-04-04:21:56:08:misses 4 60302081
            proc_arcstats.2016-04-04:21:56:44:misses 4 60302081
            proc_arcstats.2016-04-04:21:57:21:misses 4 60302081
            proc_arcstats.2016-04-04:21:57:57:misses 4 60302081
            proc_arcstats.2016-04-04:21:58:34:misses 4 60302081
            proc_arcstats.2016-04-04:21:59:11:misses 4 60302081
            proc_arcstats.2016-04-04:21:59:47:misses 4 60302099
            proc_arcstats.2016-04-04:22:00:25:misses 4 60302137
            proc_arcstats.2016-04-04:22:01:06:misses 4 60374920
            proc_arcstats.2016-04-04:22:01:47:misses 4 60513681
            proc_arcstats.2016-04-04:22:02:27:misses 4 60644796
            proc_arcstats.2016-04-04:22:03:08:misses 4 60782163
            proc_arcstats.2016-04-04:22:03:49:misses 4 60921202
            proc_arcstats.2016-04-04:22:04:30:misses 4 61049984
            proc_arcstats.2016-04-04:22:05:10:misses 4 61131959
            proc_arcstats.2016-04-04:22:05:51:misses 4 61133558
            proc_arcstats.2016-04-04:22:06:32:misses 4 61203715
            proc_arcstats.2016-04-04:22:07:13:misses 4 61317428
            proc_arcstats.2016-04-04:22:07:54:misses 4 61406420
            proc_arcstats.2016-04-04:22:08:35:misses 4 61520867
            proc_arcstats.2016-04-04:22:09:15:misses 4 61659833
            proc_arcstats.2016-04-04:22:09:56:misses 4 61796365
            proc_arcstats.2016-04-04:22:10:37:misses 4 61905346
            proc_arcstats.2016-04-04:22:11:17:misses 4 61977003
            proc_arcstats.2016-04-04:22:11:56:misses 4 61977043
            proc_arcstats.2016-04-04:22:12:37:misses 4 62083132
            proc_arcstats.2016-04-04:22:13:17:misses 4 62215655
            proc_arcstats.2016-04-04:22:13:57:misses 4 62348643
            proc_arcstats.2016-04-04:22:14:38:misses 4 62494512
            proc_arcstats.2016-04-04:22:15:18:misses 4 62630944
            proc_arcstats.2016-04-04:22:15:59:misses 4 62727592
            proc_arcstats.2016-04-04:22:16:38:misses 4 62806912
            proc_arcstats.2016-04-04:22:17:18:misses 4 62809594
            proc_arcstats.2016-04-04:22:17:59:misses 4 62832835
            proc_arcstats.2016-04-04:22:18:40:misses 4 62874008
            proc_arcstats.2016-04-04:22:19:20:misses 4 62915527
            proc_arcstats.2016-04-04:22:20:00:misses 4 62956780
            proc_arcstats.2016-04-04:22:20:40:misses 4 62998788
            proc_arcstats.2016-04-04:22:21:20:misses 4 63041800
            proc_arcstats.2016-04-04:22:22:01:misses 4 63083422
            proc_arcstats.2016-04-04:22:22:41:misses 4 63125355
            proc_arcstats.2016-04-04:22:23:21:misses 4 63167419
            proc_arcstats.2016-04-04:22:24:01:misses 4 63210702
            proc_arcstats.2016-04-04:22:24:41:misses 4 63252813
            proc_arcstats.2016-04-04:22:25:21:misses 4 63295877
            proc_arcstats.2016-04-04:22:26:01:misses 4 63339516
            proc_arcstats.2016-04-04:22:26:41:misses 4 63383531
            proc_arcstats.2016-04-04:22:27:22:misses 4 63417625
            proc_arcstats.2016-04-04:22:28:02:misses 4 63418160
            proc_arcstats.2016-04-04:22:28:44:misses 4 63418448
            proc_arcstats.2016-04-04:22:29:25:misses 4 63446807
            proc_arcstats.2016-04-04:22:30:06:misses 4 63605174
            proc_arcstats.2016-04-04:22:30:46:misses 4 63787444
            proc_arcstats.2016-04-04:22:31:27:misses 4 63991757
            proc_arcstats.2016-04-04:22:32:08:misses 4 64186064
            proc_arcstats.2016-04-04:22:32:48:misses 4 64370507
            proc_arcstats.2016-04-04:22:33:29:misses 4 64544830
            proc_arcstats.2016-04-04:22:34:10:misses 4 64705200
            proc_arcstats.2016-04-04:22:34:51:misses 4 64890908
            proc_arcstats.2016-04-04:22:35:31:misses 4 65051186
            proc_arcstats.2016-04-04:22:36:12:misses 4 65093562
            proc_arcstats.2016-04-04:22:36:48:misses 4 65093566
            proc_arcstats.2016-04-04:22:37:25:misses 4 65093566
            proc_arcstats.2016-04-04:22:38:02:misses 4 65093566
            proc_arcstats.2016-04-04:22:38:38:misses 4 65093566
            proc_arcstats.2016-04-04:22:39:15:misses 4 65093566
            proc_arcstats.2016-04-04:22:39:51:misses 4 65093566
            proc_arcstats.2016-04-04:22:40:28:misses 4 65093566
            proc_arcstats.2016-04-04:22:41:05:misses 4 65093566
            proc_arcstats.2016-04-04:22:41:41:misses 4 65093566
            proc_arcstats.2016-04-04:22:42:18:misses 4 65093566
            proc_arcstats.2016-04-04:22:42:54:misses 4 65093566
            proc_arcstats.2016-04-04:22:43:31:misses 4 65093566
            proc_arcstats.2016-04-04:22:44:07:misses 4 65093566
            proc_arcstats.2016-04-04:22:44:44:misses 4 65093593
            proc_arcstats.2016-04-04:22:45:24:misses 4 65093635
            proc_arcstats.2016-04-04:22:46:05:misses 4 65093688
            proc_arcstats.2016-04-04:22:46:47:misses 4 65204992
            proc_arcstats.2016-04-04:22:47:27:misses 4 65369668
            proc_arcstats.2016-04-04:22:48:08:misses 4 65543276
            proc_arcstats.2016-04-04:22:48:49:misses 4 65700706
            proc_arcstats.2016-04-04:22:49:29:misses 4 65835289
            proc_arcstats.2016-04-04:22:50:10:misses 4 65978426
            proc_arcstats.2016-04-04:22:50:51:misses 4 66102294
            proc_arcstats.2016-04-04:22:51:32:misses 4 66237487
            proc_arcstats.2016-04-04:22:52:12:misses 4 66372029
            proc_arcstats.2016-04-04:22:52:53:misses 4 66491789
            proc_arcstats.2016-04-04:22:53:33:misses 4 66606048
            proc_arcstats.2016-04-04:22:54:14:misses 4 66728350
            proc_arcstats.2016-04-04:22:54:54:misses 4 66759150
            proc_arcstats.2016-04-04:22:55:35:misses 4 66783980
            proc_arcstats.2016-04-04:22:56:16:misses 4 66882343
            proc_arcstats.2016-04-04:22:56:56:misses 4 66986649
            proc_arcstats.2016-04-04:22:57:37:misses 4 67099214
            proc_arcstats.2016-04-04:22:58:18:misses 4 67236074
            proc_arcstats.2016-04-04:22:58:58:misses 4 67359854
            proc_arcstats.2016-04-04:22:59:39:misses 4 67479208
            proc_arcstats.2016-04-04:23:00:19:misses 4 67603703
            proc_arcstats.2016-04-04:23:01:00:misses 4 67605334
            proc_arcstats.2016-04-04:23:01:42:misses 4 67687472
            proc_arcstats.2016-04-04:23:02:22:misses 4 67823336
            proc_arcstats.2016-04-04:23:03:03:misses 4 67950500
            proc_arcstats.2016-04-04:23:03:44:misses 4 68083660
            proc_arcstats.2016-04-04:23:04:24:misses 4 68221409
            proc_arcstats.2016-04-04:23:05:04:misses 4 68335712
            proc_arcstats.2016-04-04:23:05:45:misses 4 68441748
            proc_arcstats.2016-04-04:23:06:24:misses 4 68443307
            proc_arcstats.2016-04-04:23:07:05:misses 4 68454637
            proc_arcstats.2016-04-04:23:07:46:misses 4 68578571
            proc_arcstats.2016-04-04:23:08:27:misses 4 68690534
            proc_arcstats.2016-04-04:23:09:07:misses 4 68808692
            proc_arcstats.2016-04-04:23:09:48:misses 4 68930912
            proc_arcstats.2016-04-04:23:10:28:misses 4 69068534
            proc_arcstats.2016-04-04:23:11:09:misses 4 69192434
            proc_arcstats.2016-04-04:23:11:50:misses 4 69283782
            proc_arcstats.2016-04-04:23:12:30:misses 4 69284522
            proc_arcstats.2016-04-04:23:13:12:misses 4 69315159
            proc_arcstats.2016-04-04:23:13:52:misses 4 69373955
            proc_arcstats.2016-04-04:23:14:33:misses 4 69449303
            proc_arcstats.2016-04-04:23:15:13:misses 4 69523499
            proc_arcstats.2016-04-04:23:15:53:misses 4 69599070
            proc_arcstats.2016-04-04:23:16:34:misses 4 69674993
            proc_arcstats.2016-04-04:23:17:14:misses 4 69750733
            proc_arcstats.2016-04-04:23:17:55:misses 4 69827307
            proc_arcstats.2016-04-04:23:18:35:misses 4 69847427
            proc_arcstats.2016-04-04:23:19:16:misses 4 69848231
            proc_arcstats.2016-04-04:23:19:57:misses 4 69884019
            proc_arcstats.2016-04-04:23:20:37:misses 4 69946465
            proc_arcstats.2016-04-04:23:21:18:misses 4 70016284
            proc_arcstats.2016-04-04:23:21:58:misses 4 70081329
            proc_arcstats.2016-04-04:23:22:39:misses 4 70148143
            proc_arcstats.2016-04-04:23:23:19:misses 4 70213937
            proc_arcstats.2016-04-04:23:23:59:misses 4 70283057
            proc_arcstats.2016-04-04:23:24:40:misses 4 70353187
            proc_arcstats.2016-04-04:23:25:21:misses 4 70415880
            proc_arcstats.2016-04-04:23:26:00:misses 4 70415996
            proc_arcstats.2016-04-04:23:26:37:misses 4 70415996
            proc_arcstats.2016-04-04:23:27:13:misses 4 70415996
            proc_arcstats.2016-04-04:23:27:50:misses 4 70415996
            proc_arcstats.2016-04-04:23:28:26:misses 4 70415996
            proc_arcstats.2016-04-04:23:29:03:misses 4 70415996
            proc_arcstats.2016-04-04:23:29:40:misses 4 70415996
            proc_arcstats.2016-04-04:23:30:16:misses 4 70415996
            proc_arcstats.2016-04-04:23:30:53:misses 4 70415996
            proc_arcstats.2016-04-04:23:31:29:misses 4 70415996
            proc_arcstats.2016-04-04:23:32:06:misses 4 70415996
            proc_arcstats.2016-04-04:23:32:42:misses 4 70415996
            proc_arcstats.2016-04-04:23:33:19:misses 4 70415996
            proc_arcstats.2016-04-04:23:33:55:misses 4 70416034
            proc_arcstats.2016-04-04:23:34:36:misses 4 70416109
            proc_arcstats.2016-04-04:23:35:18:misses 4 70417565
            proc_arcstats.2016-04-04:23:35:58:misses 4 70597975
            proc_arcstats.2016-04-04:23:36:39:misses 4 70802556

            {noformat}
            jsalians_intel John Salinas (Inactive) created issue -

            People

              jay Jinshan Xiong (Inactive)
              jsalians_intel John Salinas (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: