Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.4.2
-
Lustre 2.4.2-14chaos (see github.com/chaos/lustre)
-
3
-
15745
Description
2014-09-11 21:10:30 LustreError: 0:0:(ldlm_lockd.c:402:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 192.168.120.199@o2ib7 ns: mdt-lsd-MDT0000_UUID lock: ffff880321a4a480/0x6bd4680b789ee41f lrc: 4/0,0 mode: PR/PR res: [0x2000112f3:0xf:0x0].0 bits 0x13 rrc: 4 type: IBT flags: 0x200000000020 nid: 192.168.120.199@o2ib7 remote: 0xf350c14aff003b28 expref: 30 pid: 17248 timeout: 6838410913 lvb_type: 0 used 0 2014-09-11 21:10:30 LustreError: 15075:0:(mdt_handler.c:1423:mdt_getattr_name_lock()) ASSERTION( lock != NULL ) failed: Invalid lock handle 0x6bd4680b789ee41f 2014-09-11 21:10:30 LustreError: 15075:0:(mdt_handler.c:1423:mdt_getattr_name_lock()) LBUG 2014-09-11 21:10:30 Pid: 15075, comm: mdt00_069
The backtrace is:
PID: 15075 TASK: ffff880d7001f540 CPU: 2 COMMAND: "mdt00_069" #0 [ffff880d70021938] machine_kexec+0x18b at ffffffff810391ab #1 [ffff880d70021998] crash_kexec+0x72 at ffffffff810c5ee2 #2 [ffff880d70021a68] panic+0xae at ffffffff8152b247 #3 [ffff880d70021ae8] lbug_with_loc+0x9b at ffffffffa0601f4b [libcfs] #4 [ffff880d70021b08] mdt_getattr_name_lock+0x18d0 at ffffffffa0e99900 [mdt] #5 [ffff880d70021bc8] mdt_intent_getattr+0x29d at ffffffffa0e99c5d [mdt] #6 [ffff880d70021c28] mdt_intent_policy+0x39e at ffffffffa0e86fde [mdt] #7 [ffff880d70021c68] ldlm_lock_enqueue+0x361 at ffffffffa08b8911 [ptlrpc] #8 [ffff880d70021cc8] ldlm_handle_enqueue0+0x4ef at ffffffffa08e1a7f [ptlrpc] #9 [ffff880d70021d38] mdt_enqueue+0x46 at ffffffffa0e87466 [mdt] #10 [ffff880d70021d58] mdt_handle_common+0x647 at ffffffffa0e8c0d7 [mdt] #11 [ffff880d70021da8] mds_regular_handle+0x15 at ffffffffa0ec7c75 [mdt] #12 [ffff880d70021db8] ptlrpc_server_handle_request+0x398 at ffffffffa0912188 [ptlrpc] #13 [ffff880d70021eb8] ptlrpc_main+0xace at ffffffffa091351e [ptlrpc] #14 [ffff880d70021f48] child_rip+0xa at ffffffff8100c24a
This looks like the same assertion assertion as LU-5579, but that was presumably hit on Lustre 2.6 or later.
Attachments
Issue Links
- is related to
-
LU-5579 MDS crashed by "mdt_check_resent_lock()) ASSERTION( lock != NULL ) failed"
-
- Resolved
-
I suspect ESTALE would propagate all the way up to userspace.
On the other hand, if it's due to eviction of that same client, it does not matter due to a bunch of EIO and other stuff this client will get anyway.
In case of the Vitaly-described race where resend happens in parallel with delayed delivery of RPC for which the resend happened, ESTALE is just going to be dropped because the client will not be waiting for this duplicate reply.