[LU-8306] lost BL AST during failover Created: 20/Jun/16  Updated: 14/Mar/17  Resolved: 14/Mar/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: Andriy Skulysh Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Duplicate
Related
is related to LU-9195 Improve flock locks reconstruct/recov... Open
is related to LU-8347 granting conflicting locks Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Replayed waiting lock doesn't checks if BL AST was sent.

We need check for conflicting locks during every reprocess call.



 Comments   
Comment by Gerrit Updater [ 20/Jun/16 ]

Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: http://review.whamcloud.com/20883
Subject: LU-8306 ldlm: lost BL AST during failover
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: af0bab5b2f81648160555c79860653c3ce1c9db7

Comment by Vitaly Fertman [ 11/Oct/16 ]

the patch is fixing the following case:

  • cl1 has a granted lock;
  • cl2 has a waiting lock, BL AST is sent but lost on a way;
  • failover, locks are replayed and applied on the server in the correct order;
  • waiting lock is just put to the resource, no new BL AST is re-sent, no timeout can happen for the granted lock on server, no timeout for the waiting lock on client;
    => cl2 will be hanging for a long time until cl1 will cancel its aged lock; may lead to cl2 eviction.
Comment by Gerrit Updater [ 05/Jan/17 ]

Niu Yawei (yawei.niu@intel.com) uploaded a new patch: https://review.whamcloud.com/24716
Subject: LU-8306 ldlm: send blocking ASTs after lock replay
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7a77fed963b1e9b0cbf6ac9097bef0f56269ec68

Comment by Gerrit Updater [ 06/Jan/17 ]

Niu Yawei (yawei.niu@intel.com) uploaded a new patch: https://review.whamcloud.com/24737
Subject: LU-8306 ldlm: reduce lock reprocess after recovery
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2f1739d01e34c13f9e212f065c95864be2f5e5dc

Comment by Gerrit Updater [ 14/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24716/
Subject: LU-8306 ldlm: send blocking ASTs after lock replay
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2ae4276ed35c0d89bde8029982e089b916620ae3

Comment by Peter Jones [ 14/Mar/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:16:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.