Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.4.0, Lustre 2.1.3
-
3
-
6217
Description
Hi,
When trying to unmount a Lustre client, we got the following problem:
Lustre: DEBUG MARKER: Wed Nov 21 06:25:01 2012 LustreError: 11559:0:(ldlm_lock.c:1697:ldlm_lock_cancel()) ### lock still has references ns: ptmp-MDT0000-mdc-ffff88030871bc00 lock: ffff88060dbd2d80/0x4618f3ec8d79d8be lrc: 4/0,1 mode: PW/PW res: 8590405073/266 rrc: 2 type: FLK pid: 4414 [0->551] flags: 0x22002890 remote: 0xc8980c051f8f6afd expref: -99 pid: 4414 timeout: 0 LustreError: 11559:0:(ldlm_lock.c:1698:ldlm_lock_cancel()) LBUG Pid: 11559, comm: umount Call Trace: [<ffffffffa040d7f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa040de07>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa063343d>] ldlm_lock_cancel+0x1ad/0x1b0 [ptlrpc] [<ffffffffa064d245>] ldlm_cli_cancel_local+0xb5/0x380 [ptlrpc] [<ffffffffa06510b8>] ldlm_cli_cancel+0x58/0x3b0 [ptlrpc] [<ffffffffa063ae18>] cleanup_resource+0x168/0x300 [ptlrpc] [<ffffffffa063afda>] ldlm_resource_clean+0x2a/0x50 [ptlrpc] [<ffffffffa041e28f>] cfs_hash_for_each_relax+0x17f/0x380 [libcfs] [<ffffffffa063afb0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc] [<ffffffffa063afb0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc] [<ffffffffa041fcaf>] cfs_hash_for_each_nolock+0x7f/0x1c0 [libcfs] [<ffffffffa0637a69>] ldlm_namespace_cleanup+0x29/0xb0 [ptlrpc] [<ffffffffa0638adb>] __ldlm_namespace_free+0x4b/0x540 [ptlrpc] [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc] [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc] [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc] [<ffffffffa041fcb7>] ? cfs_hash_for_each_nolock+0x87/0x1c0 [libcfs] [<ffffffffa063903f>] ldlm_namespace_free_prior+0x6f/0x230 [ptlrpc] [<ffffffffa063fc4c>] client_disconnect_export+0x23c/0x460 [ptlrpc] [<ffffffffa0b42a44>] lmv_disconnect+0x644/0xc70 [lmv] [<ffffffffa0a470bc>] client_common_put_super+0x46c/0xe80 [lustre] [<ffffffffa0a47ba0>] ll_put_super+0xd0/0x360 [lustre] [<ffffffff8117e01c>] ? dispose_list+0x11c/0x140 [<ffffffff8117e4a8>] ? invalidate_inodes+0x158/0x1a0 [<ffffffff8116542b>] generic_shutdown_super+0x5b/0x110 [<ffffffff81165546>] kill_anon_super+0x16/0x60 [<ffffffffa050897a>] lustre_kill_super+0x4a/0x60 [obdclass] [<ffffffff811664e0>] deactivate_super+0x70/0x90 [<ffffffff811826bf>] mntput_no_expire+0xbf/0x110 [<ffffffff81183188>] sys_umount+0x78/0x3c0 [<ffffffff810030f2>] system_call_fastpath+0x16/0x1b Kernel panic - not syncing: LBUG Pid: 11559, comm: umount Not tainted 2.6.32-220.23.1.bl6.Bull.28.8.x86_64 #1 Call Trace: [<ffffffff81484650>] ? panic+0x78/0x143 [<ffffffffa040de5b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [<ffffffffa063343d>] ? ldlm_lock_cancel+0x1ad/0x1b0 [ptlrpc] [<ffffffffa064d245>] ? ldlm_cli_cancel_local+0xb5/0x380 [ptlrpc] [<ffffffffa06510b8>] ? ldlm_cli_cancel+0x58/0x3b0 [ptlrpc] [<ffffffffa063ae18>] ? cleanup_resource+0x168/0x300 [ptlrpc] [<ffffffffa063afda>] ? ldlm_resource_clean+0x2a/0x50 [ptlrpc] [<ffffffffa041e28f>] ? cfs_hash_for_each_relax+0x17f/0x380 [libcfs] [<ffffffffa063afb0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc] [<ffffffffa063afb0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc] [<ffffffffa041fcaf>] ? cfs_hash_for_each_nolock+0x7f/0x1c0 [libcfs] [<ffffffffa0637a69>] ? ldlm_namespace_cleanup+0x29/0xb0 [ptlrpc] [<ffffffffa0638adb>] ? __ldlm_namespace_free+0x4b/0x540 [ptlrpc] [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc] [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc] [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc] [<ffffffffa041fcb7>] ? cfs_hash_for_each_nolock+0x87/0x1c0 [libcfs] [<ffffffffa063903f>] ? ldlm_namespace_free_prior+0x6f/0x230 [ptlrpc] [<ffffffffa063fc4c>] ? client_disconnect_export+0x23c/0x460 [ptlrpc] [<ffffffffa0b42a44>] ? lmv_disconnect+0x644/0xc70 [lmv] [<ffffffffa0a470bc>] ? client_common_put_super+0x46c/0xe80 [lustre] [<ffffffffa0a47ba0>] ? ll_put_super+0xd0/0x360 [lustre] [<ffffffff8117e01c>] ? dispose_list+0x11c/0x140 [<ffffffff8117e4a8>] ? invalidate_inodes+0x158/0x1a0 [<ffffffff8116542b>] ? generic_shutdown_super+0x5b/0x110 [<ffffffff81165546>] ? kill_anon_super+0x16/0x60 [<ffffffffa050897a>] ? lustre_kill_super+0x4a/0x60 [obdclass] [<ffffffff811664e0>] ? deactivate_super+0x70/0x90 [<ffffffff811826bf>] ? mntput_no_expire+0xbf/0x110 [<ffffffff81183188>] ? sys_umount+0x78/0x3c0 [<ffffffff810030f2>] ? system_call_fastpath+0x16/0x1b
This issue is exactly the same as the one described in LU-1429, which is a duplicate of LU-1328, which itself seems to be related to LU-1421.
The issue seems to be resolved, but it is very unclear to me which patches are needed in order to completely fix the issue.
I add that we need of fix for b2_1.
Can you please advise?
TIA,
Sebastien.
Attachments
Issue Links
- is related to
-
LU-3701 Failure on test suite posix subtest test_1: fcntl.18/fcntl.35 Unresolved
- Resolved