[LU-7326] ost-pools hangs on OST unmount Created: 21/Oct/15 Updated: 22/Oct/15 Resolved: 22/Oct/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
autotest |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
ost-pools hangs on unmount of an OST. No individual tests fails and there are no errors in the last test run; test 26 in ost-pools. Just the unmounting of one of the OSTs in test clean up hangs. Logs are at https://testing.hpdd.intel.com/test_sets/ea392e2a-776b-11e5-a00c-5254006e85c2 The last thing we see in the suite_stdout log is: 16:16:45:Stopping /mnt/ost7 (opts:-f) on shadow-6vm7 16:16:45:CMD: shadow-6vm7 umount -d -f /mnt/ost7 16:16:56:CMD: shadow-6vm7 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 16:16:56:CMD: shadow-6vm7 grep -c /mnt/ost8' ' /proc/mounts 16:16:56:Stopping /mnt/ost8 (opts:-f) on shadow-6vm7 16:16:56:CMD: shadow-6vm7 umount -d -f /mnt/ost8 17:14:51:********** Timeout by autotest system ********** This failure was on the master branch in review-dne-part-2. There are other similar hangs on unmount of the OSTs in ost-pools for some 'full' group test sessions. Logs for these are at |
| Comments |
| Comment by James Nunez (Inactive) [ 22/Oct/15 ] |
|
I found the console logs that capture the activity between the last test completing and the start of the new test suite. In the attached file, ost-pools.test_complete.console.shadow-6vm7.log, we can see the stack trace from umount: 16:17:07:Lustre: DEBUG MARKER: umount -d -f /mnt/ost8 16:17:07:LustreError: 6705:0:(lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 3 16:17:07:LustreError: 6705:0:(lu_object.c:1224:lu_device_fini()) LBUG 16:17:07:Pid: 6705, comm: umount 16:17:07: 16:17:07:Call Trace: 16:17:07: [<ffffffffa049b875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 16:17:07: [<ffffffffa049be77>] lbug_with_loc+0x47/0xb0 [libcfs] 16:17:07: [<ffffffffa05f0618>] lu_device_fini+0xb8/0xc0 [obdclass] 16:17:07: [<ffffffffa05d1efd>] ls_device_put+0x7d/0x2e0 [obdclass] 16:17:07: [<ffffffffa05d22d2>] local_oid_storage_fini+0x172/0x410 [obdclass] 16:17:07: [<ffffffffa0dc876f>] lfsck_instance_cleanup+0x20f/0x7e0 [lfsck] 16:17:07: [<ffffffffa0dcaf7b>] lfsck_degister+0x4b/0x60 [lfsck] 16:17:07: [<ffffffffa0e935cb>] ofd_device_fini+0xab/0x260 [ofd] 16:17:07: [<ffffffffa05dfb82>] class_cleanup+0x572/0xd20 [obdclass] 16:17:07: [<ffffffffa05e2206>] class_process_config+0x1ed6/0x2830 [obdclass] 16:17:07: [<ffffffffa04a7b61>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 16:17:07: [<ffffffff811788ec>] ? __kmalloc+0x21c/0x230 16:17:07: [<ffffffffa05e301f>] class_manual_cleanup+0x4bf/0x8e0 [obdclass] 16:17:07: [<ffffffffa05c0746>] ? class_name2dev+0x56/0xe0 [obdclass] 16:17:07: [<ffffffffa061aeec>] server_put_super+0xa0c/0xed0 [obdclass] 16:17:07: [<ffffffff811b0116>] ? invalidate_inodes+0xf6/0x190 16:17:07: [<ffffffff8119437b>] generic_shutdown_super+0x5b/0xe0 16:17:07: [<ffffffff81194466>] kill_anon_super+0x16/0x60 16:17:07: [<ffffffffa05e5ed6>] lustre_kill_super+0x36/0x60 [obdclass] 16:17:07: [<ffffffff81194c07>] deactivate_super+0x57/0x80 16:17:07: [<ffffffff811b4a7f>] mntput_no_expire+0xbf/0x110 16:17:07: [<ffffffff811b55cb>] sys_umount+0x7b/0x3a0 16:17:07: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b 16:17:07: 16:17:07:Kernel panic - not syncing: LBUG 16:17:07:Pid: 6705, comm: umount Not tainted 2.6.32-573.7.1.el6_lustre.gef63c03.x86_64 #1 |
| Comment by James Nunez (Inactive) [ 22/Oct/15 ] |
|
With the stack trace, we can see this is a duplicate of |