[LU-9075] frequent mdt_hsm_update_request_state()/mdt_coordinator_cb() couple of error msgs when CDT has to deal with a huge backlog of actions Created: 03/Feb/17  Updated: 04/Aug/17  Resolved: 23/Apr/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: Bruno Faccini (Inactive) Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

"(mdt_coordinator.c:1473:mdt_hsm_update_request_state()) ... Cannot find running request for cookie ..."/"(mdt_coordinator.c:339:mdt_coordinator_cb()) ... cannot cleanup timed out request ..." couple of msgs, seems to be a consequence of CDT busy parsing huge number of actions LLOG records, and this may be because they should concern active requests that have completed and thus that have already been removed from memory in mdt_hsm_update_request_state() (using mdt_cdt_remove_request() and in the context of a MDT thread handling CT's MDS_HSM_PROGRESS requests), but corresponding action LLOG record update is stuck awaiting for CDT to give-back cdt_llog_lock in mdt_agent_record_update().

Possible fix to this could be to use mdt_agent_record_update() before mdt_cdt_remove_request() in mdt_hsm_update_request_state().



 Comments   
Comment by Gerrit Updater [ 03/Feb/17 ]

Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: https://review.whamcloud.com/25243
Subject: LU-9075 mdt: avoid race causing mdt_coordinator_cb() err msgs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d1a5a1cdd52614596da3aa6f3e0b3a87ef69af9f

Comment by Gerrit Updater [ 23/Apr/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25243/
Subject: LU-9075 mdt: avoid race causing mdt_coordinator_cb() err msgs
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d1535dc90b01770e56a0c79c7bb1e7c9cd8f1c6a

Comment by Peter Jones [ 23/Apr/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:23:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.