Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.8.0
    • None
    • autotest
    • 3
    • 9223372036854775807

    Description

      ost-pools hangs on unmount of an OST. No individual tests fails and there are no errors in the last test run; test 26 in ost-pools. Just the unmounting of one of the OSTs in test clean up hangs. Logs are at https://testing.hpdd.intel.com/test_sets/ea392e2a-776b-11e5-a00c-5254006e85c2

      The last thing we see in the suite_stdout log is:

      16:16:45:Stopping /mnt/ost7 (opts:-f) on shadow-6vm7
      16:16:45:CMD: shadow-6vm7 umount -d -f /mnt/ost7
      16:16:56:CMD: shadow-6vm7 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      16:16:56:CMD: shadow-6vm7 grep -c /mnt/ost8' ' /proc/mounts
      16:16:56:Stopping /mnt/ost8 (opts:-f) on shadow-6vm7
      16:16:56:CMD: shadow-6vm7 umount -d -f /mnt/ost8
      17:14:51:********** Timeout by autotest system **********
      

      This failure was on the master branch in review-dne-part-2. There are other similar hangs on unmount of the OSTs in ost-pools for some 'full' group test sessions. Logs for these are at
      2015-10-03 02:46:25 - https://testing.hpdd.intel.com/test_sets/a436aad2-69ed-11e5-9fbf-5254006e85c2
      2015-10-07 04:38:39 - https://testing.hpdd.intel.com/test_sets/c7a37340-6d19-11e5-ab7f-5254006e85c2
      2015-10-07 05:32:11 - https://testing.hpdd.intel.com/test_sets/4398325c-6cd8-11e5-96b4-5254006e85c2

      Attachments

        Issue Links

          Activity

            [LU-7326] ost-pools hangs on OST unmount

            With the stack trace, we can see this is a duplicate of LU-7038.

            jamesanunez James Nunez (Inactive) added a comment - With the stack trace, we can see this is a duplicate of LU-7038 .

            I found the console logs that capture the activity between the last test completing and the start of the new test suite. In the attached file, ost-pools.test_complete.console.shadow-6vm7.log, we can see the stack trace from umount:

            16:17:07:Lustre: DEBUG MARKER: umount -d -f /mnt/ost8
            16:17:07:LustreError: 6705:0:(lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 3
            16:17:07:LustreError: 6705:0:(lu_object.c:1224:lu_device_fini()) LBUG
            16:17:07:Pid: 6705, comm: umount
            16:17:07:
            16:17:07:Call Trace:
            16:17:07: [<ffffffffa049b875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            16:17:07: [<ffffffffa049be77>] lbug_with_loc+0x47/0xb0 [libcfs]
            16:17:07: [<ffffffffa05f0618>] lu_device_fini+0xb8/0xc0 [obdclass]
            16:17:07: [<ffffffffa05d1efd>] ls_device_put+0x7d/0x2e0 [obdclass]
            16:17:07: [<ffffffffa05d22d2>] local_oid_storage_fini+0x172/0x410 [obdclass]
            16:17:07: [<ffffffffa0dc876f>] lfsck_instance_cleanup+0x20f/0x7e0 [lfsck]
            16:17:07: [<ffffffffa0dcaf7b>] lfsck_degister+0x4b/0x60 [lfsck]
            16:17:07: [<ffffffffa0e935cb>] ofd_device_fini+0xab/0x260 [ofd]
            16:17:07: [<ffffffffa05dfb82>] class_cleanup+0x572/0xd20 [obdclass]
            16:17:07: [<ffffffffa05e2206>] class_process_config+0x1ed6/0x2830 [obdclass]
            16:17:07: [<ffffffffa04a7b61>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            16:17:07: [<ffffffff811788ec>] ? __kmalloc+0x21c/0x230
            16:17:07: [<ffffffffa05e301f>] class_manual_cleanup+0x4bf/0x8e0 [obdclass]
            16:17:07: [<ffffffffa05c0746>] ? class_name2dev+0x56/0xe0 [obdclass]
            16:17:07: [<ffffffffa061aeec>] server_put_super+0xa0c/0xed0 [obdclass]
            16:17:07: [<ffffffff811b0116>] ? invalidate_inodes+0xf6/0x190
            16:17:07: [<ffffffff8119437b>] generic_shutdown_super+0x5b/0xe0
            16:17:07: [<ffffffff81194466>] kill_anon_super+0x16/0x60
            16:17:07: [<ffffffffa05e5ed6>] lustre_kill_super+0x36/0x60 [obdclass]
            16:17:07: [<ffffffff81194c07>] deactivate_super+0x57/0x80
            16:17:07: [<ffffffff811b4a7f>] mntput_no_expire+0xbf/0x110
            16:17:07: [<ffffffff811b55cb>] sys_umount+0x7b/0x3a0
            16:17:07: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
            16:17:07:
            16:17:07:Kernel panic - not syncing: LBUG
            16:17:07:Pid: 6705, comm: umount Not tainted 2.6.32-573.7.1.el6_lustre.gef63c03.x86_64 #1
            
            jamesanunez James Nunez (Inactive) added a comment - I found the console logs that capture the activity between the last test completing and the start of the new test suite. In the attached file, ost-pools.test_complete.console.shadow-6vm7.log, we can see the stack trace from umount: 16:17:07:Lustre: DEBUG MARKER: umount -d -f /mnt/ost8 16:17:07:LustreError: 6705:0:(lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 3 16:17:07:LustreError: 6705:0:(lu_object.c:1224:lu_device_fini()) LBUG 16:17:07:Pid: 6705, comm: umount 16:17:07: 16:17:07:Call Trace: 16:17:07: [<ffffffffa049b875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 16:17:07: [<ffffffffa049be77>] lbug_with_loc+0x47/0xb0 [libcfs] 16:17:07: [<ffffffffa05f0618>] lu_device_fini+0xb8/0xc0 [obdclass] 16:17:07: [<ffffffffa05d1efd>] ls_device_put+0x7d/0x2e0 [obdclass] 16:17:07: [<ffffffffa05d22d2>] local_oid_storage_fini+0x172/0x410 [obdclass] 16:17:07: [<ffffffffa0dc876f>] lfsck_instance_cleanup+0x20f/0x7e0 [lfsck] 16:17:07: [<ffffffffa0dcaf7b>] lfsck_degister+0x4b/0x60 [lfsck] 16:17:07: [<ffffffffa0e935cb>] ofd_device_fini+0xab/0x260 [ofd] 16:17:07: [<ffffffffa05dfb82>] class_cleanup+0x572/0xd20 [obdclass] 16:17:07: [<ffffffffa05e2206>] class_process_config+0x1ed6/0x2830 [obdclass] 16:17:07: [<ffffffffa04a7b61>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 16:17:07: [<ffffffff811788ec>] ? __kmalloc+0x21c/0x230 16:17:07: [<ffffffffa05e301f>] class_manual_cleanup+0x4bf/0x8e0 [obdclass] 16:17:07: [<ffffffffa05c0746>] ? class_name2dev+0x56/0xe0 [obdclass] 16:17:07: [<ffffffffa061aeec>] server_put_super+0xa0c/0xed0 [obdclass] 16:17:07: [<ffffffff811b0116>] ? invalidate_inodes+0xf6/0x190 16:17:07: [<ffffffff8119437b>] generic_shutdown_super+0x5b/0xe0 16:17:07: [<ffffffff81194466>] kill_anon_super+0x16/0x60 16:17:07: [<ffffffffa05e5ed6>] lustre_kill_super+0x36/0x60 [obdclass] 16:17:07: [<ffffffff81194c07>] deactivate_super+0x57/0x80 16:17:07: [<ffffffff811b4a7f>] mntput_no_expire+0xbf/0x110 16:17:07: [<ffffffff811b55cb>] sys_umount+0x7b/0x3a0 16:17:07: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b 16:17:07: 16:17:07:Kernel panic - not syncing: LBUG 16:17:07:Pid: 6705, comm: umount Not tainted 2.6.32-573.7.1.el6_lustre.gef63c03.x86_64 #1

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: