Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9075

frequent mdt_hsm_update_request_state()/mdt_coordinator_cb() couple of error msgs when CDT has to deal with a huge backlog of actions

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      "(mdt_coordinator.c:1473:mdt_hsm_update_request_state()) ... Cannot find running request for cookie ..."/"(mdt_coordinator.c:339:mdt_coordinator_cb()) ... cannot cleanup timed out request ..." couple of msgs, seems to be a consequence of CDT busy parsing huge number of actions LLOG records, and this may be because they should concern active requests that have completed and thus that have already been removed from memory in mdt_hsm_update_request_state() (using mdt_cdt_remove_request() and in the context of a MDT thread handling CT's MDS_HSM_PROGRESS requests), but corresponding action LLOG record update is stuck awaiting for CDT to give-back cdt_llog_lock in mdt_agent_record_update().

      Possible fix to this could be to use mdt_agent_record_update() before mdt_cdt_remove_request() in mdt_hsm_update_request_state().

      Attachments

        Activity

          People

            bfaccini Bruno Faccini (Inactive)
            bfaccini Bruno Faccini (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: