Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3109

ZFS - very slow reads, OST watchdogs

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • Lustre 2.4.0
    • Hyperion/LLNL
    • 3
    • 7557

    Description

      Running ior file-per-process. we observe one or two of the OSTs have a very excessive load factor compared to the others (load average of 112 vs LA of 0.1)
      System log shows large number of watchdogs. IO is not failing but rates are very, very slow.
      First watchdog (log attached)

      2013-04-04 11:26:05 LNet: Service thread pid 8074 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      2013-04-04 11:26:05 Pid: 8074, comm: ll_ost_io00_018
      2013-04-04 11:26:05
      2013-04-04 11:26:05 Call Trace:
      2013-04-04 11:26:05  [<ffffffffa056cd40>] ? arc_read_nolock+0x530/0x810 [zfs]
      2013-04-04 11:26:05  [<ffffffffa04e45ac>] cv_wait_common+0x9c/0x1a0 [spl]
      2013-04-04 11:26:05  [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40
      2013-04-04 11:26:05  [<ffffffffa04e46e3>] __cv_wait+0x13/0x20 [spl]
      2013-04-04 11:26:05  [<ffffffffa060633b>] zio_wait+0xeb/0x160 [zfs]
      2013-04-04 11:26:05  [<ffffffffa057106d>] dbuf_read+0x3fd/0x720 [zfs]
      2013-04-04 11:26:06  [<ffffffffa0572c1b>] dbuf_prefetch+0x10b/0x2b0 [zfs]
      2013-04-04 11:26:06  [<ffffffffa0586381>] dmu_zfetch_dofetch+0xf1/0x160 [zfs]
      2013-04-04 11:26:06  [<ffffffffa0570280>] ? dbuf_read_done+0x0/0x110 [zfs]
      2013-04-04 11:26:06  [<ffffffffa0587211>] dmu_zfetch+0xaa1/0xe40 [zfs]
      2013-04-04 11:26:06  [<ffffffffa05710fa>] dbuf_read+0x48a/0x720 [zfs]
      2013-04-04 11:26:06  [<ffffffffa0578bc9>] dmu_buf_hold_array_by_dnode+0x179/0x570 [zfs]
      2013-04-04 11:26:06  [<ffffffffa0579b28>] dmu_buf_hold_array_by_bonus+0x68/0x90 [zfs]
      2013-04-04 11:26:06  [<ffffffffa0d4c95d>] osd_bufs_get+0x49d/0x9a0 [osd_zfs]
      2013-04-04 11:26:06  [<ffffffff81270f7c>] ? put_dec+0x10c/0x110
      2013-04-04 11:26:06  [<ffffffffa0723736>] ? lu_object_find+0x16/0x20 [obdclass]
      2013-04-04 11:26:06  [<ffffffffa0ded49f>] ofd_preprw_read+0x13f/0x7e0 [ofd]
      2013-04-04 11:26:06  [<ffffffffa0dedec5>] ofd_preprw+0x385/0x1190 [ofd]
      2013-04-04 11:26:06  [<ffffffffa0da739c>] obd_preprw+0x12c/0x3d0 [ost]
      2013-04-04 11:26:06  [<ffffffffa0dace80>] ost_brw_read+0xd00/0x12e0 [ost]
      2013-04-04 11:26:06  [<ffffffff812739b6>] ? vsnprintf+0x2b6/0x5f0
      2013-04-04 11:26:06  [<ffffffffa035127b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
      2013-04-04 11:26:06  [<ffffffffa0361bdb>] ? libcfs_debug_vmsg2+0x50b/0xbb0 [libcfs]
      2013-04-04 11:26:06  [<ffffffffa08a2f4c>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc]
      2013-04-04 11:26:06  [<ffffffffa08a30a8>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc]
      2013-04-04 11:26:06  [<ffffffffa0db3a63>] ost_handle+0x2b53/0x46f0 [ost]
      2013-04-04 11:26:06  [<ffffffffa035e0e4>] ? libcfs_id2str+0x74/0xb0 [libcfs]
      2013-04-04 11:26:06  [<ffffffffa08b21ac>] ptlrpc_server_handle_request+0x41c/0xdf0 [ptlrpc]
      2013-04-04 11:26:06  [<ffffffffa03525de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      2013-04-04 11:26:06  [<ffffffffa08a97e9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      2013-04-04 11:26:06  [<ffffffff81052223>] ? __wake_up+0x53/0x70
      2013-04-04 11:26:06  [<ffffffffa08b36f5>] ptlrpc_main+0xb75/0x1870 [ptlrpc]
      2013-04-04 11:26:06  [<ffffffffa08b2b80>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
      2013-04-04 11:26:06  [<ffffffff8100c0ca>] child_rip+0xa/0x20
      2013-04-04 11:26:06  [<ffffffffa08b2b80>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
      2013-04-04 11:26:06  [<ffffffffa08b2b80>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
      2013-04-04 11:26:06  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      2013-04-04 11:26:06
      2013-04-04 11:26:06 LustreError: dumping log to /tmp/lustre-log.1365099965.8074
      

      Attachments

        Issue Links

          Activity

            [LU-3109] ZFS - very slow reads, OST watchdogs

            This is probably the same as LU-5278, which has patches that improve ZFS read performance significantly.

            adilger Andreas Dilger added a comment - This is probably the same as LU-5278 , which has patches that improve ZFS read performance significantly.

            rc10 is old, you should definitely upgrade to the latest rc of 0.6.0. 0.6.1 is a little TOO new, because packaging has changed there, and lustre will need a little tweaking to find the new paths and things automatically. You can build by hand by giving spl and zfs paths, but the latest 0.6.0 rc will just be easier.

            morrone Christopher Morrone (Inactive) added a comment - rc10 is old, you should definitely upgrade to the latest rc of 0.6.0. 0.6.1 is a little TOO new, because packaging has changed there, and lustre will need a little tweaking to find the new paths and things automatically. You can build by hand by giving spl and zfs paths, but the latest 0.6.0 rc will just be easier.
            keith Keith Mannthey (Inactive) added a comment - - edited

            oprofile didn't just work. LU-3111 filed.

            keith Keith Mannthey (Inactive) added a comment - - edited oprofile didn't just work. LU-3111 filed.

            It seems our zfs is 0.6.0-rc10 and there is a 0.6.1 .

            keith Keith Mannthey (Inactive) added a comment - It seems our zfs is 0.6.0-rc10 and there is a 0.6.1 .

            "normal"

            /proc/fs/lustre/obdfilter/lustre-OST0001/stats @ 1365124587.752293
            Name                      Cur.Count  Cur.Rate   #Events   Unit           last        min          avg        max
            read_bytes                0          0          1625295   [bytes]           0       4096    393794.95    1048576
            write_bytes               6568       437        652523    [bytes]  6880755712       4096   1047962.14    1048576
            get_info                  73         4          1538      
            set_info_async            0          0          3         
            connect                   0          0          125       
            statfs                    3          0          1226      
            create                    0          0          12        
            destroy                   34         2          768       
            sync                      39         2          768       
            preprw                    6568       437        2277818   
            commitrw                  6568       437        2277818   
            ping                      108        7          23612 
            

            "bad"

            /proc/fs/lustre/obdfilter/lustre-OST0003/stats @ 1365124582.616412
            Name                      Cur.Count  Cur.Rate   #Events   Unit           last        min          avg        max
            read_bytes                0          0          1226305   [bytes]           0       4096    521965.14    1048576
            write_bytes               5602       373        653713    [bytes]  5861138432       4096   1042516.96    1048576
            get_info                  127        8          1426      
            set_info_async            0          0          2         
            connect                   0          0          125       
            statfs                    3          0          1223      
            create                    0          0          12        
            destroy                   63         4          712       
            sync                      64         4          712       
            preprw                    5602       373        1880018   
            commitrw                  5602       373        1880018   
            ping                      95         6          23601   
            

            Looks like a "normal" server is doing more work.

            keith Keith Mannthey (Inactive) added a comment - "normal" /proc/fs/lustre/obdfilter/lustre-OST0001/stats @ 1365124587.752293 Name Cur.Count Cur.Rate #Events Unit last min avg max read_bytes 0 0 1625295 [bytes] 0 4096 393794.95 1048576 write_bytes 6568 437 652523 [bytes] 6880755712 4096 1047962.14 1048576 get_info 73 4 1538 set_info_async 0 0 3 connect 0 0 125 statfs 3 0 1226 create 0 0 12 destroy 34 2 768 sync 39 2 768 preprw 6568 437 2277818 commitrw 6568 437 2277818 ping 108 7 23612 "bad" /proc/fs/lustre/obdfilter/lustre-OST0003/stats @ 1365124582.616412 Name Cur.Count Cur.Rate #Events Unit last min avg max read_bytes 0 0 1226305 [bytes] 0 4096 521965.14 1048576 write_bytes 5602 373 653713 [bytes] 5861138432 4096 1042516.96 1048576 get_info 127 8 1426 set_info_async 0 0 2 connect 0 0 125 statfs 3 0 1223 create 0 0 12 destroy 63 4 712 sync 64 4 712 preprw 5602 373 1880018 commitrw 5602 373 1880018 ping 95 6 23601 Looks like a "normal" server is doing more work.

            We are using zfs-0.6.0-rc10

            cliffw Cliff White (Inactive) added a comment - We are using zfs-0.6.0-rc10

            Alex, this is consistently reproduceable on Hyperion - what data should we be looking at?

            cliffw Cliff White (Inactive) added a comment - Alex, this is consistently reproduceable on Hyperion - what data should we be looking at?

            People

              wc-triage WC Triage
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: