Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9063

hsm: race on the coordinator's state

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      There is nothing to prevent a coordinator from restarting after it was shut down. This is a problem at cleanup time for example when unmounting an mdt.

      The race is a bit difficult to trigger. The simplest way to do it is to add a delay in mdt_hsm_cdt_stop() right after the coordinator's state is set to CDT_STOPPED.

      After applying the patch (cf. attachment), concurrently run:

      while [ $(lctl get_param -n mdt.lustre-MDT0000.hsm_control) != "stopped" ]; do
          sleep 1
      done
      lctl set_param mdt.lustre-MDT0000.hsm_control enabled
      
      

      and

      lustre/tests/llmountcleanup.sh
      
      

      This should trigger the following:
      kernel:LustreError: 20570:0:(mdt_coordinator.c:391:hsm_cdt_procfs_fini()) ASSERTION( cdt->cdt_state == CDT_STOPPED ) failed:
      kernel:LustreError: 20570:0:(mdt_coordinator.c:391:hsm_cdt_procfs_fini()) LBUG

      Attachments

        Activity

          [LU-9063] hsm: race on the coordinator's state
          spitzcor Cory Spitz added a comment -

          And I'm guessing that https://review.whamcloud.com/c/fs/lustre-release/+/25170 can be abandoned.

          spitzcor Cory Spitz added a comment - And I'm guessing that https://review.whamcloud.com/c/fs/lustre-release/+/25170 can be abandoned.
          spitzcor Cory Spitz added a comment -

          Seems that this is an old issue that can be resolved. Quentin B. was probably right.

          spitzcor Cory Spitz added a comment - Seems that this is an old issue that can be resolved. Quentin B. was probably right.

          I think this patch would be a fix: https://review.whamcloud.com/#/c/22667

          bougetq Quentin Bouget (Inactive) added a comment - I think this patch would be a fix: https://review.whamcloud.com/#/c/22667

          Vinayak (vinayakswami.hariharmath@seagate.com) uploaded a new patch: https://review.whamcloud.com/25269
          Subject: LU-9063 tests: patch to create the race
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 3f97bdee4ae56a2e89c6a6eff13ba2aff52bc36b

          gerrit Gerrit Updater added a comment - Vinayak (vinayakswami.hariharmath@seagate.com) uploaded a new patch: https://review.whamcloud.com/25269 Subject: LU-9063 tests: patch to create the race Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3f97bdee4ae56a2e89c6a6eff13ba2aff52bc36b

          Vinayak (vinayakswami.hariharmath@seagate.com) uploaded a new patch: https://review.whamcloud.com/25170
          Subject: LU-9063 hsm: protect cdt_state with mutex lock
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 52e08b6ec58240ce6f88f4ddbcdb3dd4216797bf

          gerrit Gerrit Updater added a comment - Vinayak (vinayakswami.hariharmath@seagate.com) uploaded a new patch: https://review.whamcloud.com/25170 Subject: LU-9063 hsm: protect cdt_state with mutex lock Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 52e08b6ec58240ce6f88f4ddbcdb3dd4216797bf
          pjones Peter Jones added a comment -

          Hongchao

          Could you please advise on this one?

          Thanks

          Peter

          pjones Peter Jones added a comment - Hongchao Could you please advise on this one? Thanks Peter

          People

            hongchao.zhang Hongchao Zhang
            cealustre CEA
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: