Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3632

insanity 0 hung when unmounting an OST

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • 3
    • 9354

    Description

      This issue was created by maloo for Li Wei <liwei@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/ceba5fc4-f46f-11e2-b8a2-52540035b04c.

      The sub-test test_0 failed with the following error:

      test failed to respond and timed out

      Info required for matching: insanity 0

      Attachments

        Issue Links

          Activity

            [LU-3632] insanity 0 hung when unmounting an OST

            I believe this is a duplicate of LU-3230

            utopiabound Nathaniel Clark added a comment - I believe this is a duplicate of LU-3230

            From the OSS console:

            23:42:46:Lustre: DEBUG MARKER: umount -d /mnt/ost3
            23:42:46:Lustre: Failing over lustre-OST0002
            23:42:46:Lustre: Skipped 2 previous similar messages
            23:42:46:Lustre: lustre-OST0002: Not available for connect from 10.10.16.107@tcp (stopping)
            23:42:46:Lustre: Skipped 2 previous similar messages
            23:42:46:Lustre: lustre-OST0002 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck?
            23:42:46:Lustre: lustre-OST0002 is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 5. Is it stuck?
            23:42:46:Lustre: lustre-OST0002 is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck?
            23:42:46:Lustre: lustre-OST0002 is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 5. Is it stuck?
            23:42:46:Lustre: lustre-OST0002 is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 5. Is it stuck?
            23:42:46:INFO: task umount:6586 blocked for more than 120 seconds.
            23:42:46:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            23:42:46:umount D 0000000000000000 0 6586 6585 0x00000080
            23:42:46: ffff880047e09aa8 0000000000000082 ffffffff00000010 ffff880047e09a58
            23:42:46: ffff880047e09a18 ffff88006cd5ec00 ffffffffa078e717 0000000000000000
            23:42:46: ffff88007c601098 ffff880047e09fd8 000000000000fb88 ffff88007c601098
            23:42:46:Call Trace:
            23:42:46: [<ffffffff8150ee42>] schedule_timeout+0x192/0x2e0
            23:42:46: [<ffffffff810810e0>] ? process_timeout+0x0/0x10
            23:42:46: [<ffffffffa05d662d>] cfs_schedule_timeout_and_set_state+0x1d/0x20 [libcfs]
            23:42:46: [<ffffffffa070f548>] obd_exports_barrier+0x98/0x170 [obdclass]
            23:42:46: [<ffffffffa0e42962>] ofd_device_fini+0x42/0x230 [ofd]
            23:42:46: [<ffffffffa073ae67>] class_cleanup+0x577/0xda0 [obdclass]
            23:42:46: [<ffffffffa07116f6>] ? class_name2dev+0x56/0xe0 [obdclass]
            23:42:46: [<ffffffffa073c74c>] class_process_config+0x10bc/0x1c80 [obdclass]
            23:42:46: [<ffffffffa0736133>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass]
            23:42:46: [<ffffffffa073d489>] class_manual_cleanup+0x179/0x6f0 [obdclass]
            23:42:46: [<ffffffffa07116f6>] ? class_name2dev+0x56/0xe0 [obdclass]
            23:42:46: [<ffffffffa077893c>] server_put_super+0x5ec/0xf60 [obdclass]
            23:50:45: [<ffffffff811833ab>] generic_shutdown_super+0x5b/0xe0
            23:50:45: [<ffffffff81183496>] kill_anon_super+0x16/0x60
            23:50:45: [<ffffffffa073f336>] lustre_kill_super+0x36/0x60 [obdclass]
            23:50:45: [<ffffffff81183c37>] deactivate_super+0x57/0x80
            23:50:45: [<ffffffff811a1c8f>] mntput_no_expire+0xbf/0x110
            23:50:45: [<ffffffff811a26fb>] sys_umount+0x7b/0x3a0
            23:50:45: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            23:50:45:Lustre: lustre-OST0002 is waiting for obd_unlinked_exports more than 256 seconds. The obd refcount = 5. Is it stuck?
            23:50:45:Lustre: lustre-OST0002: Not available for connect from 10.10.16.107@tcp (stopping)
            23:50:45:Lustre: Skipped 308 previous similar messages

            liwei Li Wei (Inactive) added a comment - From the OSS console: 23:42:46:Lustre: DEBUG MARKER: umount -d /mnt/ost3 23:42:46:Lustre: Failing over lustre-OST0002 23:42:46:Lustre: Skipped 2 previous similar messages 23:42:46:Lustre: lustre-OST0002: Not available for connect from 10.10.16.107@tcp (stopping) 23:42:46:Lustre: Skipped 2 previous similar messages 23:42:46:Lustre: lustre-OST0002 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck? 23:42:46:Lustre: lustre-OST0002 is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 5. Is it stuck? 23:42:46:Lustre: lustre-OST0002 is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck? 23:42:46:Lustre: lustre-OST0002 is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 5. Is it stuck? 23:42:46:Lustre: lustre-OST0002 is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 5. Is it stuck? 23:42:46:INFO: task umount:6586 blocked for more than 120 seconds. 23:42:46:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 23:42:46:umount D 0000000000000000 0 6586 6585 0x00000080 23:42:46: ffff880047e09aa8 0000000000000082 ffffffff00000010 ffff880047e09a58 23:42:46: ffff880047e09a18 ffff88006cd5ec00 ffffffffa078e717 0000000000000000 23:42:46: ffff88007c601098 ffff880047e09fd8 000000000000fb88 ffff88007c601098 23:42:46:Call Trace: 23:42:46: [<ffffffff8150ee42>] schedule_timeout+0x192/0x2e0 23:42:46: [<ffffffff810810e0>] ? process_timeout+0x0/0x10 23:42:46: [<ffffffffa05d662d>] cfs_schedule_timeout_and_set_state+0x1d/0x20 [libcfs] 23:42:46: [<ffffffffa070f548>] obd_exports_barrier+0x98/0x170 [obdclass] 23:42:46: [<ffffffffa0e42962>] ofd_device_fini+0x42/0x230 [ofd] 23:42:46: [<ffffffffa073ae67>] class_cleanup+0x577/0xda0 [obdclass] 23:42:46: [<ffffffffa07116f6>] ? class_name2dev+0x56/0xe0 [obdclass] 23:42:46: [<ffffffffa073c74c>] class_process_config+0x10bc/0x1c80 [obdclass] 23:42:46: [<ffffffffa0736133>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass] 23:42:46: [<ffffffffa073d489>] class_manual_cleanup+0x179/0x6f0 [obdclass] 23:42:46: [<ffffffffa07116f6>] ? class_name2dev+0x56/0xe0 [obdclass] 23:42:46: [<ffffffffa077893c>] server_put_super+0x5ec/0xf60 [obdclass] 23:50:45: [<ffffffff811833ab>] generic_shutdown_super+0x5b/0xe0 23:50:45: [<ffffffff81183496>] kill_anon_super+0x16/0x60 23:50:45: [<ffffffffa073f336>] lustre_kill_super+0x36/0x60 [obdclass] 23:50:45: [<ffffffff81183c37>] deactivate_super+0x57/0x80 23:50:45: [<ffffffff811a1c8f>] mntput_no_expire+0xbf/0x110 23:50:45: [<ffffffff811a26fb>] sys_umount+0x7b/0x3a0 23:50:45: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b 23:50:45:Lustre: lustre-OST0002 is waiting for obd_unlinked_exports more than 256 seconds. The obd refcount = 5. Is it stuck? 23:50:45:Lustre: lustre-OST0002: Not available for connect from 10.10.16.107@tcp (stopping) 23:50:45:Lustre: Skipped 308 previous similar messages

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: