[LU-8860] lock callback errors after client umount Created: 22/Nov/16  Updated: 06/Feb/17  Resolved: 06/Feb/17

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Mikhail Pershin Assignee: Mikhail Pershin
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-9066 ior ERROR: read() failed, Input/outpu... Closed
Related
is related to LU-9066 ior ERROR: read() failed, Input/outpu... Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

During performance testing I've noticed error messages from AST callbacks like this:

LustreError: 12408:0:(client.c:1164:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880051aa6520 x1551685286400384/t0(0) o104->lustre-MDT0000@0@lo:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
LustreError: 12408:0:(client.c:1164:ptlrpc_import_delay_req()) Skipped 2 previous similar messages
LustreError: 12408:0:(ldlm_lockd.c:687:ldlm_handle_ast_error()) ### client (nid 0@lo) failed to reply to blocking AST (req@ffff880051aa6520 x1551685286400384 status 0 rc -5), evict it ns: mdt-lustre-MDT0000_UUID

This happens if client performed some tests actively and was remounted. The reason of this is the old disconnected export which is not yet fully finished while new connection is established already. As result there are locks from that old export and new operations might conflict with some of them, causing blocking AST attempt. When AST request is prepared the ptlrpc_import_delay_req() find its import is in CLOSED state and return -EIO error. Then ldlm_handle_ast_error() consider this as an error and evict old export which is disconnected already.

I think that makes a little sense to perform all these actions, better to recognize that export is failed/disconnected when we found the lock and don't consider it as blocking lock at all.



 Comments   
Comment by Gerrit Updater [ 23/Nov/16 ]

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/23921
Subject: LU-8860 ldlm: don't send AST for outdated locks
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fbb1cb6894f2a2505fe13ca354d3d1bb89d9993c

Comment by Mikhail Pershin [ 23/Nov/16 ]

This is just a possible patch which solves that problem.

Comment by Mikhail Pershin [ 23/Nov/16 ]

Oleg mentioned that this situation should be handled by old patch http://review.whamcloud.com/5843 and there must be no ASTs on disconnected exports. It makes sense to investigate why it is not so.

Comment by Vitaly Fertman [ 24/Nov/16 ]

I think the reason for this is LU-6271 which kills the client locks locally on umount, landed later. before that only evicted export could have locks on stalled exports.

Comment by Joseph Gmitter (Inactive) [ 28/Nov/16 ]

Assigning to Mike as he already has a patch in flight.

Comment by Mikhail Pershin [ 06/Feb/17 ]

After discussion with Vitaly I agreed this is not an issue and should be handled by current code properly.

Generated at Sat Feb 10 02:21:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.