[LU-3607] Interop 2.3.0<->2.5 failure on test suite racer test_1: ASSERTION( !lustre_handle_is_used(&lhc->mlh_reg_lh) ) failed Created: 17/Jul/13  Updated: 04/Nov/13  Resolved: 04/Nov/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Emoly Liu
Resolution: Duplicate Votes: 0
Labels: None
Environment:

server: 2.3.0
client: lustre-master build #1560


Issue Links:
Related
is related to LU-4179 LBUG ASSERTION( !lustre_handle_is_use... Resolved
Severity: 3
Rank (Obsolete): 9153

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/0ebc1474-eda3-11e2-8e3a-52540035b04c.

The sub-test test_1 failed with the following error:

test failed to respond and timed out

MDT console:

20:53:46:LustreError: 20637:0:(mdt_open.c:1463:mdt_reint_open()) ASSERTION( !lustre_handle_is_used(&lhc->mlh_reg_lh) ) failed: 
20:53:46:LustreError: 20637:0:(mdt_open.c:1463:mdt_reint_open()) LBUG
20:53:46:Pid: 20637, comm: mdt00_018
20:53:46:
20:53:46:Call Trace:
20:53:46: [<ffffffffa071a905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
20:53:46: [<ffffffffa071af17>] lbug_with_loc+0x47/0xb0 [libcfs]
20:53:46: [<ffffffffa0c8a0c3>] mdt_reint_open+0x15b3/0x18a0 [mdt]
20:53:46: [<ffffffffa0c122be>] ? md_ucred+0x1e/0x60 [mdd]
20:53:46: [<ffffffffa0c57235>] ? mdt_ucred+0x15/0x20 [mdt]
20:53:46: [<ffffffffa0c73151>] mdt_reint_rec+0x41/0xe0 [mdt]
20:53:46: [<ffffffffa0c6c9aa>] mdt_reint_internal+0x50a/0x810 [mdt]
20:53:46: [<ffffffffa0c6cf7d>] mdt_intent_reint+0x1ed/0x500 [mdt]
20:53:46: [<ffffffffa0c69191>] mdt_intent_policy+0x371/0x6a0 [mdt]
20:53:46: [<ffffffffa098f881>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
20:53:46: [<ffffffffa09b79bf>] ldlm_handle_enqueue0+0x48f/0xf70 [ptlrpc]
20:53:46: [<ffffffffa0c69506>] mdt_enqueue+0x46/0x130 [mdt]
20:53:46: [<ffffffffa0c60802>] mdt_handle_common+0x922/0x1740 [mdt]
20:53:46: [<ffffffffa0c616f5>] mdt_regular_handle+0x15/0x20 [mdt]
20:53:46: [<ffffffffa09e7b3c>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
20:53:46: [<ffffffffa071b65e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
20:53:46: [<ffffffffa09def37>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
20:53:46: [<ffffffff810533f3>] ? __wake_up+0x53/0x70
20:53:46: [<ffffffffa09e9111>] ptlrpc_main+0xbf1/0x19e0 [ptlrpc]
20:53:46: [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
20:53:46: [<ffffffff8100c14a>] child_rip+0xa/0x20
20:53:46: [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
20:53:46: [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
20:53:46: [<ffffffff8100c140>] ? child_rip+0x0/0x20
20:53:46:
20:53:46:Kernel panic - not syncing: LBUG
20:53:46:Pid: 20637, comm: mdt00_018 Not tainted 2.6.32-279.5.1.el6_lustre.gb16fe80.x86_64 #1
20:53:46:Call Trace:
20:53:46: [<ffffffff814fd58a>] ? panic+0xa0/0x168
20:53:46: [<ffffffffa071af6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
20:53:46: [<ffffffffa0c8a0c3>] ? mdt_reint_open+0x15b3/0x18a0 [mdt]
20:53:46: [<ffffffffa0c122be>] ? md_ucred+0x1e/0x60 [mdd]
20:53:46: [<ffffffffa0c57235>] ? mdt_ucred+0x15/0x20 [mdt]
20:53:46: [<ffffffffa0c73151>] ? mdt_reint_rec+0x41/0xe0 [mdt]
20:53:46: [<ffffffffa0c6c9aa>] ? mdt_reint_internal+0x50a/0x810 [mdt]
20:53:46: [<ffffffffa0c6cf7d>] ? mdt_intent_reint+0x1ed/0x500 [mdt]
20:53:46: [<ffffffffa0c69191>] ? mdt_intent_policy+0x371/0x6a0 [mdt]
20:53:46: [<ffffffffa098f881>] ? ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
20:53:46: [<ffffffffa09b79bf>] ? ldlm_handle_enqueue0+0x48f/0xf70 [ptlrpc]
20:53:46: [<ffffffffa0c69506>] ? mdt_enqueue+0x46/0x130 [mdt]
20:53:46: [<ffffffffa0c60802>] ? mdt_handle_common+0x922/0x1740 [mdt]
20:53:46: [<ffffffffa0c616f5>] ? mdt_regular_handle+0x15/0x20 [mdt]
20:53:46: [<ffffffffa09e7b3c>] ? ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
20:53:46: [<ffffffffa071b65e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
20:53:46: [<ffffffffa09def37>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
20:53:46: [<ffffffff810533f3>] ? __wake_up+0x53/0x70
20:53:46: [<ffffffffa09e9111>] ? ptlrpc_main+0xbf1/0x19e0 [ptlrpc]
20:53:46: [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
20:53:46: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
20:53:46: [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
20:53:46: [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
20:53:46: [<ffffffff8100c140>] ? child_rip+0x0/0x20
20:53:46:Initializing cgroup subsys cpuset
20:53:46:Initializing cgroup subsys cpu


 Comments   
Comment by Prakash Surya (Inactive) [ 08/Oct/13 ]

Looks like we just hit this on Grove's MDS (Sequoia's FS).

Comment by Christopher Morrone [ 08/Oct/13 ]

Grove MDS was running lustre 2.4.0-RC2_10chaos. Clients are 2.4.0ish and 2.1.4ish.

Comment by Peter Jones [ 11/Oct/13 ]

Emoly

Could you please comment on this one?

Thanks

Peter

Comment by Emoly Liu [ 16/Oct/13 ]

I can't open the logs in that maloo test report link. I will try to reproduce it.

Or, Prakash, could you please upload the logs if you have? Thanks!

Comment by Christopher Morrone [ 28/Oct/13 ]

No, this was on our secure network. We cannot provide logs.

Comment by Emoly Liu [ 29/Oct/13 ]

Does the problem still happen? I fail to reproduce it locally and I will ask the admin to restore the maloo logs.

BTW, Sarah, could you please help reproduce this LBUG? Thanks.

Comment by Emoly Liu [ 31/Oct/13 ]

Joshua and Mike,

Could you please restore the maloo logs in http://maloo.whamcloud.com/test_sets/0ebc1474-eda3-11e2-8e3a-52540035b04c ?

Thanks.

Comment by Peter Jones [ 04/Nov/13 ]

duplicate of LU-4179

Generated at Sat Feb 10 01:35:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.