Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7920

hsm coordinator request_count and max_requests not used consistently

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • Lustre 2.8.0
    • 3
    • 9223372036854775807

    Description

      I've always wondered why sometimes max_requests has an effect and other times it appears to be completely ignored.

      The hsm coordinator receives new actions from user space in action lists, and there can be between and ~50 actions in the list. As the coordinator sends the actions (aka "requests") to agents, it tracks how many are being processed in cdt->cdt_request_count. There is also a tunable parameter, cdt->cdt_max_requests, that is presumably intended to limit the number of requests sent to the agents. The count is compared against max_requests in the main cdt loop prior to processing each action list:

      	/* still room for work ? */
      	if (atomic_read(&cdt->cdt_request_count) ==
      	    cdt->cdt_max_requests)
      		break;
      

      Note it is checking for equality there.

      Since this check occurs prior to processing an action list, and there can be multiple requests per list, it is easy to see that request_count can very easily greatly exceed max_requests, and when the happens the coordinator continues to send all available requests to the agents, which might actually be the right thing to do anyway.

      If we really want a limit, then a simple workaround here is to change "==" to ">=" to provide some kind of limit, but it really seems wrong to have a single global limit regardless of the number of agents and archives. Ideally the agents should be able to maximize the throughput to each of the archives they are handling. I believe there have been some discussions on how best to do this, but I don't think we've reached a consensus yet.

      Attachments

        Issue Links

          Activity

            People

              utopiabound Nathaniel Clark
              rread Robert Read (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: