Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9063

hsm: race on the coordinator's state

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      There is nothing to prevent a coordinator from restarting after it was shut down. This is a problem at cleanup time for example when unmounting an mdt.

      The race is a bit difficult to trigger. The simplest way to do it is to add a delay in mdt_hsm_cdt_stop() right after the coordinator's state is set to CDT_STOPPED.

      After applying the patch (cf. attachment), concurrently run:

      while [ $(lctl get_param -n mdt.lustre-MDT0000.hsm_control) != "stopped" ]; do
          sleep 1
      done
      lctl set_param mdt.lustre-MDT0000.hsm_control enabled
      
      

      and

      lustre/tests/llmountcleanup.sh
      
      

      This should trigger the following:
      kernel:LustreError: 20570:0:(mdt_coordinator.c:391:hsm_cdt_procfs_fini()) ASSERTION( cdt->cdt_state == CDT_STOPPED ) failed:
      kernel:LustreError: 20570:0:(mdt_coordinator.c:391:hsm_cdt_procfs_fini()) LBUG

        Attachments

          Activity

            People

            • Assignee:
              hongchao.zhang Hongchao Zhang
              Reporter:
              cealustre CEA
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated: