Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7172

replay-single test_70d hung on MDT unmount

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.8.0
    • autotest
    • 3
    • 9223372036854775807

    Description

      Replay-single test 70d hangs on unmounts of one of the MDTs in a DNE configuration. Logs for this failure are at https://testing.hpdd.intel.com/test_sets/0c5cca82-5c61-11e5-9065-5254006e85c2 .

      From the MDS2, MDS3, MDS4 console logs, we see

      00:57:23:Lustre: lustre-MDT0001 is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck?
      00:57:23:Lustre: lustre-MDT0001 is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 5. Is it stuck?
      00:57:23:Lustre: lustre-MDT0001: Not available for connect from 10.2.4.158@tcp (stopping)
      00:57:23:Lustre: Skipped 336 previous similar messages
      00:57:23:INFO: task umount:4687 blocked for more than 120 seconds.
      00:57:23:      Not tainted 2.6.32-573.3.1.el6_lustre.gde57418.x86_64 #1
      00:57:23:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      00:57:23:umount        D 0000000000000000     0  4687   4686 0x00000080
      00:57:23: ffff88005cbefa38 0000000000000082 0000000000000000 ffff88005cbef9d8
      00:57:23: ffff88005cbef998 000200000a0204a1 00000440a839357a 0000000000000000
      00:57:23: ffff880056458044 000000010042c434 ffff8800688e9068 ffff88005cbeffd8
      00:57:23:Call Trace:
      01:05:22: [<ffffffff81539a62>] schedule_timeout+0x192/0x2e0
      01:05:22: [<ffffffff81089c10>] ? process_timeout+0x0/0x10
      01:05:22: [<ffffffffa05ce6b6>] obd_exports_barrier+0xb6/0x190 [obdclass]
      01:05:22: [<ffffffffa0e9c145>] mdt_device_fini+0x475/0xf40 [mdt]
      01:05:22: [<ffffffffa05d0496>] ? class_disconnect_exports+0x116/0x2f0 [obdclass]
      01:05:22: [<ffffffffa05ea8b2>] class_cleanup+0x572/0xd30 [obdclass]
      01:05:23: [<ffffffffa05cb156>] ? class_name2dev+0x56/0xe0 [obdclass]
      01:05:23: [<ffffffffa05ecf46>] class_process_config+0x1ed6/0x2840 [obdclass]
      01:05:23: [<ffffffffa04b2b61>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      01:05:23: [<ffffffff8117892c>] ? __kmalloc+0x21c/0x230
      01:05:23: [<ffffffffa05edd6f>] class_manual_cleanup+0x4bf/0x8e0 [obdclass]
      01:05:23: [<ffffffffa05cb156>] ? class_name2dev+0x56/0xe0 [obdclass]
      01:05:23: [<ffffffffa062721c>] server_put_super+0xa0c/0xed0 [obdclass]
      01:05:23: [<ffffffff811b0176>] ? invalidate_inodes+0xf6/0x190
      01:05:23: [<ffffffff811943bb>] generic_shutdown_super+0x5b/0xe0
      01:05:23: [<ffffffff811944a6>] kill_anon_super+0x16/0x60
      01:05:23: [<ffffffffa05f0c26>] lustre_kill_super+0x36/0x60 [obdclass]
      01:05:23: [<ffffffff81194c47>] deactivate_super+0x57/0x80
      01:05:23: [<ffffffff811b4adf>] mntput_no_expire+0xbf/0x110
      01:05:23: [<ffffffff811b562b>] sys_umount+0x7b/0x3a0
      01:05:23: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
      01:05:23:Lustre: lustre-MDT0001 is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 5. Is it stuck?
      

      Other instances of this failure:
      2015-09-08 23:27:49 – https://testing.hpdd.intel.com/test_sets/fe085252-56cb-11e5-84d0-5254006e85c2
      2015-09-11 10:01:48 - https://testing.hpdd.intel.com/test_sets/23ba7c20-58cd-11e5-baa0-5254006e85c2
      2015-09-11 19:25:28 - https://testing.hpdd.intel.com/test_sets/a55b755e-5904-11e5-a4d9-5254006e85c2

      There are a few other cases of this test failing with same stack trace, but MDS console messages differ:
      2015-09-09 05:56:14 - https://testing.hpdd.intel.com/test_sets/3dda199c-56fe-11e5-8947-5254006e85c2
      2015-09-12 19:16:06 - https://testing.hpdd.intel.com/test_sets/53eb4ab0-59d0-11e5-825b-5254006e85c2

      All of these failures are in review-dne-part-2

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: