Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4696

Test timeout on sanity test_51ba: nlink before: 70002, created before: 70000

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.6.0
    • 3
    • 12908

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/27fddb2c-a070-11e3-9f3a-52540035b04c.

      The sub-test test_51ba failed with the following error:

      test failed to respond and timed out

      Info required for matching: sanity 51ba

      Attachments

        Issue Links

          Activity

            [LU-4696] Test timeout on sanity test_51ba: nlink before: 70002, created before: 70000

            Close old bug

            adilger Andreas Dilger added a comment - Close old bug

            This may just be caused by ZFS slowness.

            adilger Andreas Dilger added a comment - This may just be caused by ZFS slowness.

            It looks like the client is doing an unlink:

            16:39:05:unlinkmany
            16:39:05:Call Trace:
            16:39:05: [<ffffffff81528a42>] schedule_timeout+0x192/0x2e0
            16:39:05: [<ffffffffa0777a2a>] ptlrpc_set_wait+0x2da/0x860 [ptlrpc]
            16:39:05: [<ffffffffa0778037>] ptlrpc_queue_wait+0x87/0x220 [ptlrpc]
            16:39:05: [<ffffffffa09c4bf5>] mdc_reint+0x75/0x3b0 [mdc]
            16:39:05: [<ffffffffa09c5c30>] mdc_unlink+0x1b0/0x500 [mdc]
            16:39:05: [<ffffffffa09783ca>] lmv_unlink+0x31a/0x8b0 [lmv]
            16:39:05: [<ffffffffa0b7b6db>] ll_rmdir+0x15b/0x5d0 [lustre]
            16:39:05: [<ffffffff81197ab0>] vfs_rmdir+0xc0/0xf0
            

            and the server is doing an unlink:

            16:39:06:mdt00_004
            16:39:06:Call Trace:
            16:39:06: [<ffffffff81528823>] io_schedule+0x73/0xc0
            16:39:06: [<ffffffffa0142e7c>] cv_wait_common+0x8c/0x100 [spl]
            16:39:06: [<ffffffffa0142f08>] __cv_wait_io+0x18/0x20 [spl]
            16:39:06: [<ffffffffa02864ab>] zio_wait+0xfb/0x1b0 [zfs]
            16:39:06: [<ffffffffa01f2fdd>] dbuf_read+0x3fd/0x750 [zfs]
            16:39:06: [<ffffffffa01f34b9>] __dbuf_hold_impl+0x189/0x480 [zfs]
            16:39:06: [<ffffffffa01f382f>] dbuf_hold_impl+0x7f/0xb0 [zfs]
            16:39:06: [<ffffffffa01f48e0>] dbuf_hold+0x20/0x30 [zfs]
            16:39:06: [<ffffffffa01fa6e7>] dmu_buf_hold+0x97/0x1d0 [zfs]
            16:39:06: [<ffffffffa0255e17>] zap_lockdir+0x57/0x730 [zfs]
            16:39:06: [<ffffffffa02579a4>] zap_cursor_retrieve+0x1e4/0x2f0 [zfs]
            16:39:06: [<ffffffffa0e91278>] osd_index_retrieve_skip_dots+0x28/0x60 [osd_zfs]
            16:39:06: [<ffffffffa0e91888>] osd_dir_it_next+0x98/0x120 [osd_zfs]
            16:39:06: [<ffffffffa1081431>] lod_it_next+0x21/0x90 [lod]
            16:39:06: [<ffffffffa10d5fe9>] mdd_may_delete+0x519/0x9f0 [mdd]
            16:39:06: [<ffffffffa10d6505>] mdd_unlink_sanity_check+0x45/0x100 [mdd]
            16:39:06: [<ffffffffa10dc453>] mdd_unlink+0x233/0xd00 [mdd]
            16:39:06: [<ffffffffa0fbb4e8>] mdo_unlink+0x18/0x50 [mdt]
            16:39:06: [<ffffffffa0fc1470>] mdt_reint_unlink+0xa10/0x1170 [mdt]
            16:39:06: [<ffffffffa0fbb1e1>] mdt_reint_rec+0x41/0xe0 [mdt]
            16:39:06: [<ffffffffa0fa0e13>] mdt_reint_internal+0x4c3/0x780 [mdt]
            16:39:06: [<ffffffffa0fa165b>] mdt_reint+0x6b/0x120 [mdt]
            16:39:06: [<ffffffffa09ef43c>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
            16:39:06: [<ffffffffa099e6ea>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]
            16:39:05: [<ffffffff8119a9b4>] do_rmdir+0x184/0x1f0
            16:39:05: [<ffffffff8119aa76>] sys_rmdir+0x16/0x20
            

            It isn't clear if this is ZFS just being slow, or if these threads are hung.

            adilger Andreas Dilger added a comment - It looks like the client is doing an unlink: 16:39:05:unlinkmany 16:39:05:Call Trace: 16:39:05: [<ffffffff81528a42>] schedule_timeout+0x192/0x2e0 16:39:05: [<ffffffffa0777a2a>] ptlrpc_set_wait+0x2da/0x860 [ptlrpc] 16:39:05: [<ffffffffa0778037>] ptlrpc_queue_wait+0x87/0x220 [ptlrpc] 16:39:05: [<ffffffffa09c4bf5>] mdc_reint+0x75/0x3b0 [mdc] 16:39:05: [<ffffffffa09c5c30>] mdc_unlink+0x1b0/0x500 [mdc] 16:39:05: [<ffffffffa09783ca>] lmv_unlink+0x31a/0x8b0 [lmv] 16:39:05: [<ffffffffa0b7b6db>] ll_rmdir+0x15b/0x5d0 [lustre] 16:39:05: [<ffffffff81197ab0>] vfs_rmdir+0xc0/0xf0 and the server is doing an unlink: 16:39:06:mdt00_004 16:39:06:Call Trace: 16:39:06: [<ffffffff81528823>] io_schedule+0x73/0xc0 16:39:06: [<ffffffffa0142e7c>] cv_wait_common+0x8c/0x100 [spl] 16:39:06: [<ffffffffa0142f08>] __cv_wait_io+0x18/0x20 [spl] 16:39:06: [<ffffffffa02864ab>] zio_wait+0xfb/0x1b0 [zfs] 16:39:06: [<ffffffffa01f2fdd>] dbuf_read+0x3fd/0x750 [zfs] 16:39:06: [<ffffffffa01f34b9>] __dbuf_hold_impl+0x189/0x480 [zfs] 16:39:06: [<ffffffffa01f382f>] dbuf_hold_impl+0x7f/0xb0 [zfs] 16:39:06: [<ffffffffa01f48e0>] dbuf_hold+0x20/0x30 [zfs] 16:39:06: [<ffffffffa01fa6e7>] dmu_buf_hold+0x97/0x1d0 [zfs] 16:39:06: [<ffffffffa0255e17>] zap_lockdir+0x57/0x730 [zfs] 16:39:06: [<ffffffffa02579a4>] zap_cursor_retrieve+0x1e4/0x2f0 [zfs] 16:39:06: [<ffffffffa0e91278>] osd_index_retrieve_skip_dots+0x28/0x60 [osd_zfs] 16:39:06: [<ffffffffa0e91888>] osd_dir_it_next+0x98/0x120 [osd_zfs] 16:39:06: [<ffffffffa1081431>] lod_it_next+0x21/0x90 [lod] 16:39:06: [<ffffffffa10d5fe9>] mdd_may_delete+0x519/0x9f0 [mdd] 16:39:06: [<ffffffffa10d6505>] mdd_unlink_sanity_check+0x45/0x100 [mdd] 16:39:06: [<ffffffffa10dc453>] mdd_unlink+0x233/0xd00 [mdd] 16:39:06: [<ffffffffa0fbb4e8>] mdo_unlink+0x18/0x50 [mdt] 16:39:06: [<ffffffffa0fc1470>] mdt_reint_unlink+0xa10/0x1170 [mdt] 16:39:06: [<ffffffffa0fbb1e1>] mdt_reint_rec+0x41/0xe0 [mdt] 16:39:06: [<ffffffffa0fa0e13>] mdt_reint_internal+0x4c3/0x780 [mdt] 16:39:06: [<ffffffffa0fa165b>] mdt_reint+0x6b/0x120 [mdt] 16:39:06: [<ffffffffa09ef43c>] tgt_request_handle+0x23c/0xac0 [ptlrpc] 16:39:06: [<ffffffffa099e6ea>] ptlrpc_main+0xd1a/0x1980 [ptlrpc] 16:39:05: [<ffffffff8119a9b4>] do_rmdir+0x184/0x1f0 16:39:05: [<ffffffff8119aa76>] sys_rmdir+0x16/0x20 It isn't clear if this is ZFS just being slow, or if these threads are hung.

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: