[LU-8765] dead loop in ptlrpc_replay_next() Created: 27/Oct/16  Updated: 03/Sep/18  Resolved: 03/Jan/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: Niu Yawei (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I see there is a defect about imp_replay_cursor which can lead to ptlrpc_replay_next() run into a dead loop:

  • During replay, imp_replay_cursor moves to an open request A;
  • Client close the file, so that rq_replay of the A open request is cleared;
  • ptlrpc_replay_next() is called to continue replay, it calls ptlrpc_free_committed() to remove committed/closed request from replay/committed list, request A is removed from committed list; (The open request is still being held by the pending close request, so it's not freed);
  • ptlrpc_replay_next() then try to move imp_replay_cursor to next, but the next is itself now, dead loop;


 Comments   
Comment by Gerrit Updater [ 27/Oct/16 ]

Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/23418
Subject: LU-8765 ptlrpc: update replay cursor when close during replay
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f09f17661323a0f134d9ef02044b863693ee0a9c

Comment by Gerrit Updater [ 01/Jan/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23418/
Subject: LU-8765 ptlrpc: update replay cursor when close during replay
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1e8dfacb6f58d875d7840eb89a3af3e780659367

Comment by Peter Jones [ 03/Jan/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:20:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.