Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4427

performance-sanity test_6: mdsrate hung (sys_mknod)

Details

    • 3
    • 12167

    Description

      performance-sanity test_6 hung as follows on client:

      13:08:35:Lustre: DEBUG MARKER: ===== mdsrate-lookup-10dirs.sh Test preparation: creating 10 dirs with 26778 files.
      13:08:35:INFO: task mdsrate:19365 blocked for more than 120 seconds.
      13:08:35:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      13:08:35:mdsrate       D 0000000000000001     0 19365  19361 0x00000080
      13:08:35: ffff88005eba1ac8 0000000000000082 0000000000000000 ffff88006e406800
      13:08:35: ffff88005eba1a88 ffffffffa0681629 ffff88007a0aee00 0000000000000003
      13:08:35: ffff88006dfd2638 ffff88005eba1fd8 000000000000fb88 ffff88006dfd2638
      13:08:35:Call Trace:
      13:08:35: [<ffffffffa0681629>] ? __ptlrpc_request_bufs_pack+0x349/0x3c0 [ptlrpc]
      13:08:35: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
      13:08:35: [<ffffffff8150f62b>] mutex_lock+0x2b/0x50
      13:08:35: [<ffffffffa084a8dc>] mdc_reint+0x3c/0x3b0 [mdc]
      13:08:35: [<ffffffffa084beae>] mdc_create+0x20e/0x780 [mdc]
      13:08:35: [<ffffffffa0ae6196>] lmv_create+0x316/0x700 [lmv]
      13:08:35: [<ffffffffa09dbf46>] ll_new_node+0x176/0x640 [lustre]
      13:08:35: [<ffffffffa09dc578>] ll_mknod_generic+0x168/0x280 [lustre]
      13:08:35: [<ffffffff8118e0e3>] ? generic_permission+0x23/0xb0
      13:08:35: [<ffffffffa09e2ef1>] ll_create_nd+0x971/0xe80 [lustre]
      13:08:35: [<ffffffff8118e4c2>] ? __lookup_hash+0x102/0x160
      13:08:35: [<ffffffff8118fbd4>] vfs_create+0xb4/0xe0
      13:08:35: [<ffffffff81192800>] sys_mknodat+0x280/0x2a0
      13:08:35: [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
      13:08:35: [<ffffffff8119283a>] sys_mknod+0x1a/0x20
      13:08:35: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      Maloo report: https://maloo.whamcloud.com/test_sets/fc281616-73c4-11e3-b4ff-52540035b04c

      Attachments

        Issue Links

          Activity

            [LU-4427] performance-sanity test_6: mdsrate hung (sys_mknod)
            utopiabound Nathaniel Clark added a comment - - edited

            The most recent failure of this type is from 2015-03-24 (all other perf-sanity/6 timeouts are LU-3786)
            https://testing.hpdd.intel.com/test_sets/d817106c-d325-11e4-a357-5254006e85c2

            utopiabound Nathaniel Clark added a comment - - edited The most recent failure of this type is from 2015-03-24 (all other perf-sanity/6 timeouts are LU-3786 ) https://testing.hpdd.intel.com/test_sets/d817106c-d325-11e4-a357-5254006e85c2

            I can't find any problem in the report above. can it be that ZFS is just making progress too slow?

            bzzz Alex Zhuravlev added a comment - I can't find any problem in the report above. can it be that ZFS is just making progress too slow?
            yujian Jian Yu added a comment -

            Does this still happen?

            By searching on Maloo, I found the latest performance-sanity test 6 time out failures were all LU-3786.

            yujian Jian Yu added a comment - Does this still happen? By searching on Maloo, I found the latest performance-sanity test 6 time out failures were all LU-3786 .

            Does this still happen?

            simmonsja James A Simmons added a comment - Does this still happen?
            sarah Sarah Liu added a comment - another instance: https://testing.hpdd.intel.com/test_sets/5ccde712-1251-11e5-bec9-5254006e85c2
            yujian Jian Yu added a comment - FYI, by searching on Maloo, I found the failure did not occur on master branch. I also tried to reproduce the failure on master build #2820 but failed to hit the issue. Here are the test reports on master build #2820: performance-sanity: https://testing.hpdd.intel.com/test_sets/b60591f8-9d6d-11e4-9d48-5254006e85c2 (ldiskfs) https://testing.hpdd.intel.com/test_sets/b915d632-9d6d-11e4-9d48-5254006e85c2 (ldiskfs) https://testing.hpdd.intel.com/test_sets/34af27f2-9fe5-11e4-a43c-5254006e85c2 (zfs) https://testing.hpdd.intel.com/test_sets/98bf8674-9fe5-11e4-a43c-5254006e85c2 (zfs) parallel-scale: https://testing.hpdd.intel.com/test_sets/bc82d59a-9d6d-11e4-9d48-5254006e85c2 (ldiskfs) https://testing.hpdd.intel.com/test_sets/c5ccc930-9d6d-11e4-9d48-5254006e85c2 (ldiskfs) https://testing.hpdd.intel.com/test_sets/7af16a0c-9fc4-11e4-a450-5254006e85c2 (zfs) https://testing.hpdd.intel.com/test_sets/a98eeeb6-9fc4-11e4-a450-5254006e85c2 (zfs) https://testing.hpdd.intel.com/test_sets/bffa1554-9fc4-11e4-a450-5254006e85c2 (zfs) https://testing.hpdd.intel.com/test_sets/f702fb8e-9ef6-11e4-a23e-5254006e85c2 (zfs) https://testing.hpdd.intel.com/test_sets/fd7dc124-9ef6-11e4-a23e-5254006e85c2 (zfs)

            performance-sanity/6 hangs seems to fall into two catagories.
            1) performance-sanity/5 fails (LU-4428 or LU-5146) then mdsrate hangs (this bug)
            2) test_5 passes and mkdir hangs (LU-3786)

            utopiabound Nathaniel Clark added a comment - performance-sanity/6 hangs seems to fall into two catagories. 1) performance-sanity/5 fails ( LU-4428 or LU-5146 ) then mdsrate hangs (this bug) 2) test_5 passes and mkdir hangs ( LU-3786 )
            yujian Jian Yu added a comment - Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/61/ FSTYPE=zfs The same failure occurred: https://maloo.whamcloud.com/test_sets/50a6f7f0-ebe5-11e3-82b2-52540035b04c
            pjones Peter Jones added a comment -

            Nathaniel

            Could you please look into this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Nathaniel Could you please look into this one? Thanks Peter
            yujian Jian Yu added a comment - Lustre build: http://build.whamcloud.com/job/lustre-b2_5/47/ FSTYPE=zfs The same failure occurred: https://maloo.whamcloud.com/test_sets/99e3e576-cb70-11e3-95c9-52540035b04c

            People

              utopiabound Nathaniel Clark
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: