[LU-5934] mdt_intent_reint()) ASSERTION( rc == 0 ) failed: Error occurred but lock handle is still in use, rc = -2 Created: 19/Nov/14  Updated: 22/Feb/17  Resolved: 11/Mar/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.5.4
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Major
Reporter: Johann Lombardi (Inactive) Assignee: Liang Zhen (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-5930 mdt_reint_open()) @@@ OPEN & CREAT no... Closed
is duplicated by LU-6706 :mdt_intent_reint()) ASSERTION( rc ==... Resolved
Related
is related to LU-6189 LustreError: (mdt_handler.c:4078:mdt_... Resolved
Severity: 3
Rank (Obsolete): 16571

 Description   

Bug hit with soak tests while testing MDT failover:

LDISKFS-fs warning (device dm-2): ldiskfs_multi_mount_protect: MMP interval 42 higher than expected, please wait.

LDISKFS-fs (dm-2): recovery complete
LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on. Opts: 
Lustre: soaked-MDT0000: used disk, loading
LustreError: 11-0: soaked-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.
Lustre: soaked-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
LDISKFS-fs (dm-3): recovery complete
LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on. Opts: 
Lustre: home-MDT0001: used disk, loading
Lustre: home-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
Lustre: soaked-MDT0000: Will be in recovery for at least 2:30, or until 9 clients reconnect
LustreError: 4283:0:(mdt_open.c:1652:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid.  req@ffff88081283c800 x14844821187
66556/t0(77329207333) o101->e0f06817-48a4-23ba-7529-dacbe1c5c96b@192.168.1.116@o2ib100:0/0 lens 1408/3512 e 0 to 0 dl 1416337962 ref 
1 fl Interpret:/4/0 rc 0/0
Lustre: home-MDT0001: Will be in recovery for at least 2:30, or until 9 clients reconnect
LustreError: 4653:0:(mdt_open.c:1652:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid.  req@ffff8804122ef800 x14844820751
33836/t0(77310248349) o101->8fe2e201-b336-c2b3-1690-2b8671ccb3ac@192.168.1.121@o2ib100:0/0 lens 568/3512 e 0 to 0 dl 1416337966 ref 1
 fl Interpret:/4/0 rc 0/0
LustreError: 4653:0:(mdt_open.c:1652:mdt_reint_open()) Skipped 6 previous similar messages
LustreError: 4275:0:(mdt_open.c:1652:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid.  req@ffff88041ec27c00 x14844821336
81580/t0(77313286391) o101->5af8cf20-f59d-b674-4342-95d4f9187d5e@192.168.1.126@o2ib100:0/0 lens 1744/3512 e 0 to 0 dl 1416337976 ref 
1 fl Interpret:/4/0 rc 0/0
LustreError: 4275:0:(mdt_open.c:1652:mdt_reint_open()) Skipped 26 previous similar messages
LustreError: 4284:0:(mdt_open.c:1652:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid.  req@ffff880836688000 x14844821165
12776/t0(77315789759) o101->0a6d6736-23b0-a177-bd78-0dc387bbcfed@192.168.1.124@o2ib100:0/0 lens 1792/3512 e 0 to 0 dl 1416337994 ref 
1 fl Interpret:/4/0 rc 0/0
LustreError: 4284:0:(mdt_open.c:1652:mdt_reint_open()) Skipped 2 previous similar messages
LustreError: 4689:0:(mdt_open.c:1652:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid.  req@ffff880410c2b400 x14844823276
10436/t0(77382041571) o101->c9895ebb-5dd8-614d-e10a-fe88fed518aa@192.168.1.127@o2ib100:0/0 lens 808/3512 e 0 to 0 dl 1416337998 ref 1
 fl Interpret:/4/0 rc 0/0
LustreError: 4689:0:(mdt_open.c:1652:mdt_reint_open()) Skipped 8 previous similar messages
Lustre: home-MDT0001: Recovery over after 0:39, of 9 clients 9 recovered and 0 were evicted.
Lustre: soaked-MDT0000: Recovery over after 1:24, of 9 clients 9 recovered and 0 were evicted.
LustreError: 0:0:(ldlm_lockd.c:344:waiting_locks_callback()) ### lock callback timer expired after 151s: evicting client at 192.168.1
.124@o2ib100  ns: mdt-soaked-MDT0000_UUID lock: ffff88055ce43700/0xeb9796ef206ff833 lrc: 4/0,0 mode: CR/CR res: [0x2ac0000bd6:0xe1db:
0x0].0 bits 0x2 rrc: 6 type: IBT flags: 0x60200000000020 nid: 192.168.1.124@o2ib100 remote: 0x55621c70e3b62760 expref: 76246 pid: 469
4 timeout: 4340788641 lvb_type: 0
LustreError: 4725:0:(mdt_handler.c:4078:mdt_intent_reint()) ASSERTION( rc == 0 ) failed: Error occurred but lock handle is still in use, rc = -2
LustreError: 4725:0:(mdt_handler.c:4078:mdt_intent_reint()) LBUG
Pid: 4725, comm: mdt03_011

Call Trace:
 [<ffffffffa0550895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0550e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa103d79a>] mdt_intent_reint+0x51a/0x520 [mdt]
 [<ffffffffa103ac4e>] mdt_intent_policy+0x3ae/0x770 [mdt]
 [<ffffffffa07ed2e5>] ldlm_lock_enqueue+0x135/0x980 [ptlrpc]
 [<ffffffffa0816dab>] ldlm_handle_enqueue0+0x51b/0x10c0 [ptlrpc]
 [<ffffffff81069f15>] ? enqueue_entity+0x125/0x450
 [<ffffffffa103b116>] mdt_enqueue+0x46/0xe0 [mdt]
 [<ffffffffa103fcda>] mdt_handle_common+0x52a/0x1470 [mdt]
 [<ffffffffa107d455>] mds_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa08470d5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
 [<ffffffffa05514ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa05623b5>] ? lc_watchdog_touch+0x65/0x170 [libcfs]
 [<ffffffffa083e779>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
 [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90
 [<ffffffffa084843d>] ptlrpc_main+0xaed/0x1770 [ptlrpc]
 [<ffffffffa0847950>] ? ptlrpc_main+0x0/0x1770 [ptlrpc]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG

Please note that LU-5930 was already created for the "OPEN & CREAT not in open replay/by_fid" message.



 Comments   
Comment by Johann Lombardi (Inactive) [ 19/Nov/14 ]

For the record, the LBUG was hit on lola-8, but crash dump did not work

hush: syntax error: unexpected (
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Not tainted 2.6.32-431.29.2.el6_lustre.g36cd22b.x86_64 #1
Call Trace:
 [<ffffffff81528ffc>] ? panic+0xa7/0x16f
 [<ffffffff81077332>] ? do_exit+0x862/0x870
 [<ffffffff81077447>] ? sys_exit+0x17/0x20
 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b

See /scratch/logs/console/lola-8.log

Comment by Gerrit Updater [ 20/Nov/14 ]

Comment by Gerrit Updater [ 20/Nov/14 ]

Comment by Gerrit Updater [ 24/Nov/14 ]

Comment by Gerrit Updater [ 24/Nov/14 ]

Liang Zhen (liang.zhen@intel.com) uploaded a new patch: http://review.whamcloud.com/12828
Subject: LU-5934 mdt: faulty assertion in mdt_intent_reint
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: efaf3ff7318d4d2684204f97c4773d69c083b0f7

Comment by Gerrit Updater [ 24/Dec/14 ]

Liang Zhen (liang.zhen@intel.com) uploaded a new patch: http://review.whamcloud.com/13179
Subject: LU-5934 mdt: faulty assertion in mdt_intent_reint
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8cc1cee2449b0b8bb09b17f1a08e225632d9f496

Comment by Gerrit Updater [ 24/Dec/14 ]

Liang Zhen (liang.zhen@intel.com) uploaded a new patch: http://review.whamcloud.com/13180
Subject: LU-5934 mdt: cleanup for mdt_object_open_lock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9b0b4a381a6930ca7fa12cd47629d2d712e32afa

Comment by Gerrit Updater [ 10/Jan/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13179/
Subject: LU-5934 mdt: faulty assertion in mdt_intent_reint
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 392fb6e4b8c471b387b6dfb6a802a88949cd9d89

Comment by Peter Jones [ 11/Mar/15 ]

Landed for 2.7

Comment by Jay Lan (Inactive) [ 10/Jun/15 ]

Is #13180 is a required fix to address this problem on b2_5?

Generated at Sat Feb 10 01:55:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.