Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14069

OBD_FAIL_LDLM_CANCEL_BL_CB_RACE is buggy in ldlm_handle_cp_callback

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Lustre 2.14.0, Lustre 2.12.6
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      There's this code in ldlm_handle_cp_callback:

              if (OBD_FAIL_CHECK(OBD_FAIL_LDLM_CANCEL_BL_CB_RACE)) {
                      long to = cfs_time_seconds(1);
      
                      ldlm_callback_reply(req, 0);
      
                      while (to > 0) {
                              schedule_timeout_interruptible(to);
                              if (ldlm_is_granted(lock) ||
                                  ldlm_is_destroyed(lock))
                                      break;
                      }
              }
      

      This looks like it was supposed to be a time-bound wait and indeed looking at when it was introduced (commit 022b1022, bz 11300) it has the to assigned from schedule_timeout.

      This got broken by commit adde80ff which is some squashed head commit and lost the to assignment.

      Not this seems to be breaking LU-13692 patch

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                green Oleg Drokin
                Reporter:
                green Oleg Drokin
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: