[LU-3607] Interop 2.3.0<->2.5 failure on test suite racer test_1: ASSERTION( !lustre_handle_is_used(&lhc->mlh_reg_lh) ) failed Created: 17/Jul/13 Updated: 04/Nov/13 Resolved: 04/Nov/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Emoly Liu |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
server: 2.3.0 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9153 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/0ebc1474-eda3-11e2-8e3a-52540035b04c. The sub-test test_1 failed with the following error:
MDT console: 20:53:46:LustreError: 20637:0:(mdt_open.c:1463:mdt_reint_open()) ASSERTION( !lustre_handle_is_used(&lhc->mlh_reg_lh) ) failed: 20:53:46:LustreError: 20637:0:(mdt_open.c:1463:mdt_reint_open()) LBUG 20:53:46:Pid: 20637, comm: mdt00_018 20:53:46: 20:53:46:Call Trace: 20:53:46: [<ffffffffa071a905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 20:53:46: [<ffffffffa071af17>] lbug_with_loc+0x47/0xb0 [libcfs] 20:53:46: [<ffffffffa0c8a0c3>] mdt_reint_open+0x15b3/0x18a0 [mdt] 20:53:46: [<ffffffffa0c122be>] ? md_ucred+0x1e/0x60 [mdd] 20:53:46: [<ffffffffa0c57235>] ? mdt_ucred+0x15/0x20 [mdt] 20:53:46: [<ffffffffa0c73151>] mdt_reint_rec+0x41/0xe0 [mdt] 20:53:46: [<ffffffffa0c6c9aa>] mdt_reint_internal+0x50a/0x810 [mdt] 20:53:46: [<ffffffffa0c6cf7d>] mdt_intent_reint+0x1ed/0x500 [mdt] 20:53:46: [<ffffffffa0c69191>] mdt_intent_policy+0x371/0x6a0 [mdt] 20:53:46: [<ffffffffa098f881>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc] 20:53:46: [<ffffffffa09b79bf>] ldlm_handle_enqueue0+0x48f/0xf70 [ptlrpc] 20:53:46: [<ffffffffa0c69506>] mdt_enqueue+0x46/0x130 [mdt] 20:53:46: [<ffffffffa0c60802>] mdt_handle_common+0x922/0x1740 [mdt] 20:53:46: [<ffffffffa0c616f5>] mdt_regular_handle+0x15/0x20 [mdt] 20:53:46: [<ffffffffa09e7b3c>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc] 20:53:46: [<ffffffffa071b65e>] ? cfs_timer_arm+0xe/0x10 [libcfs] 20:53:46: [<ffffffffa09def37>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc] 20:53:46: [<ffffffff810533f3>] ? __wake_up+0x53/0x70 20:53:46: [<ffffffffa09e9111>] ptlrpc_main+0xbf1/0x19e0 [ptlrpc] 20:53:46: [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc] 20:53:46: [<ffffffff8100c14a>] child_rip+0xa/0x20 20:53:46: [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc] 20:53:46: [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc] 20:53:46: [<ffffffff8100c140>] ? child_rip+0x0/0x20 20:53:46: 20:53:46:Kernel panic - not syncing: LBUG 20:53:46:Pid: 20637, comm: mdt00_018 Not tainted 2.6.32-279.5.1.el6_lustre.gb16fe80.x86_64 #1 20:53:46:Call Trace: 20:53:46: [<ffffffff814fd58a>] ? panic+0xa0/0x168 20:53:46: [<ffffffffa071af6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 20:53:46: [<ffffffffa0c8a0c3>] ? mdt_reint_open+0x15b3/0x18a0 [mdt] 20:53:46: [<ffffffffa0c122be>] ? md_ucred+0x1e/0x60 [mdd] 20:53:46: [<ffffffffa0c57235>] ? mdt_ucred+0x15/0x20 [mdt] 20:53:46: [<ffffffffa0c73151>] ? mdt_reint_rec+0x41/0xe0 [mdt] 20:53:46: [<ffffffffa0c6c9aa>] ? mdt_reint_internal+0x50a/0x810 [mdt] 20:53:46: [<ffffffffa0c6cf7d>] ? mdt_intent_reint+0x1ed/0x500 [mdt] 20:53:46: [<ffffffffa0c69191>] ? mdt_intent_policy+0x371/0x6a0 [mdt] 20:53:46: [<ffffffffa098f881>] ? ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc] 20:53:46: [<ffffffffa09b79bf>] ? ldlm_handle_enqueue0+0x48f/0xf70 [ptlrpc] 20:53:46: [<ffffffffa0c69506>] ? mdt_enqueue+0x46/0x130 [mdt] 20:53:46: [<ffffffffa0c60802>] ? mdt_handle_common+0x922/0x1740 [mdt] 20:53:46: [<ffffffffa0c616f5>] ? mdt_regular_handle+0x15/0x20 [mdt] 20:53:46: [<ffffffffa09e7b3c>] ? ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc] 20:53:46: [<ffffffffa071b65e>] ? cfs_timer_arm+0xe/0x10 [libcfs] 20:53:46: [<ffffffffa09def37>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc] 20:53:46: [<ffffffff810533f3>] ? __wake_up+0x53/0x70 20:53:46: [<ffffffffa09e9111>] ? ptlrpc_main+0xbf1/0x19e0 [ptlrpc] 20:53:46: [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc] 20:53:46: [<ffffffff8100c14a>] ? child_rip+0xa/0x20 20:53:46: [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc] 20:53:46: [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc] 20:53:46: [<ffffffff8100c140>] ? child_rip+0x0/0x20 20:53:46:Initializing cgroup subsys cpuset 20:53:46:Initializing cgroup subsys cpu |
| Comments |
| Comment by Prakash Surya (Inactive) [ 08/Oct/13 ] |
|
Looks like we just hit this on Grove's MDS (Sequoia's FS). |
| Comment by Christopher Morrone [ 08/Oct/13 ] |
|
Grove MDS was running lustre 2.4.0-RC2_10chaos. Clients are 2.4.0ish and 2.1.4ish. |
| Comment by Peter Jones [ 11/Oct/13 ] |
|
Emoly Could you please comment on this one? Thanks Peter |
| Comment by Emoly Liu [ 16/Oct/13 ] |
|
I can't open the logs in that maloo test report link. I will try to reproduce it. Or, Prakash, could you please upload the logs if you have? Thanks! |
| Comment by Christopher Morrone [ 28/Oct/13 ] |
|
No, this was on our secure network. We cannot provide logs. |
| Comment by Emoly Liu [ 29/Oct/13 ] |
|
Does the problem still happen? I fail to reproduce it locally and I will ask the admin to restore the maloo logs. BTW, Sarah, could you please help reproduce this LBUG? Thanks. |
| Comment by Emoly Liu [ 31/Oct/13 ] |
|
Joshua and Mike, Could you please restore the maloo logs in http://maloo.whamcloud.com/test_sets/0ebc1474-eda3-11e2-8e3a-52540035b04c ? Thanks. |
| Comment by Peter Jones [ 04/Nov/13 ] |
|
duplicate of |