Details

    • Technical task
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0, Lustre 2.5.1
    • Lustre 2.5.0
    • 9297

    Description

      Using the Jul 22, 2013 HSM stack, executing a released file (and thereby triggering a restore) leaves the file writable while it's being executed.

      # cd /mnt/lustre
      # cp /bin/sleep SLEEP
      # lfs hsm_archive SLEEP
      # sleep 1
      # lfs hsm_release SLEEP
      # ./SLEEP 10 && echo DONE &
      [1] 4243
      # sleep 1
      # pgrep -l SLEEP
      4244 SLEEP
      # cd /mnt/lustre2
      # echo 'Hi!' > SLEEP
      # cat SLEEP
      Hi!
      # -bash: line 238:  4244 Bus error               (core dumped) ./SLEEP 10
      
      [1]+  Exit 135                ./SLEEP 10 && echo DONE  (wd: /mnt/lustre)
      (wd now: /mnt/lustre2)
      

      Attachments

        Activity

          [LU-3616] HSM restore for execute allows writes to file
          pjones Peter Jones added a comment -

          Yes it is being tracked for 2.5.1.

          pjones Peter Jones added a comment - Yes it is being tracked for 2.5.1.

          This should also be considered for 2.5.1

          adegremont Aurelien Degremont (Inactive) added a comment - This should also be considered for 2.5.1
          pjones Peter Jones added a comment -

          Landed for 2.6

          pjones Peter Jones added a comment - Landed for 2.6

          Patch-set #2 of Change #7636 successfully passed auto-tests including its own+new sanity-hsm/test_30c sub-test.

          This allows restore on exec() to continue to work but now prevents any write to be allowed during exec() and make it fail.

          BTW, reading code of sub-tests test_30[a,b], against same exec() on released files area, I have been surprised by the following comment :

          # restore at exec cannot work on agent node (because of Linux kernel
          # protection of executables)
          needclients 2 || return 0
          ...
          

          at their beginning.
          Is it (comment and "needclients 2") still of actuality, because as per my latest tests, restore at exec() also works on Agent-Node (I mean I tested on a single+full node ...) ?

          bfaccini Bruno Faccini (Inactive) added a comment - Patch-set #2 of Change #7636 successfully passed auto-tests including its own+new sanity-hsm/test_30c sub-test. This allows restore on exec() to continue to work but now prevents any write to be allowed during exec() and make it fail. BTW, reading code of sub-tests test_30 [a,b] , against same exec() on released files area, I have been surprised by the following comment : # restore at exec cannot work on agent node (because of Linux kernel # protection of executables) needclients 2 || return 0 ... at their beginning. Is it (comment and "needclients 2") still of actuality, because as per my latest tests, restore at exec() also works on Agent-Node (I mean I tested on a single+full node ...) ?
          bfaccini Bruno Faccini (Inactive) added a comment - - edited

          1st patch-set of http://review.whamcloud.com/7636 successfully passed auto-tests and also did not trigger the original problem when running John's reproducer.

          I will submit a new version/patch-set #2 with the same code but adding a specific+new sub-test in sanity-hsm, based on John's reproducer.

          bfaccini Bruno Faccini (Inactive) added a comment - - edited 1st patch-set of http://review.whamcloud.com/7636 successfully passed auto-tests and also did not trigger the original problem when running John's reproducer. I will submit a new version/patch-set #2 with the same code but adding a specific+new sub-test in sanity-hsm, based on John's reproducer.

          1st patch attempt is at http://review.whamcloud.com/7636. Build is ok but auto-tests never started ...
          So, I just re-triggered auto-tests.

          bfaccini Bruno Faccini (Inactive) added a comment - 1st patch attempt is at http://review.whamcloud.com/7636 . Build is ok but auto-tests never started ... So, I just re-triggered auto-tests.

          I did not write this part of the patch, but it seems it could be change. I'm trusting Oleg regarding this.
          If this fix the code snippet you've posted, I'm fine. Just ensure restore at exec it is still working.

          adegremont Aurelien Degremont (Inactive) added a comment - I did not write this part of the patch, but it seems it could be change. I'm trusting Oleg regarding this. If this fix the code snippet you've posted, I'm fine. Just ensure restore at exec it is still working.

          This behavior has been introduced in both mdt_mfd_open()/mdt_object_open_lock() routine (in lustre/mdt/mdt_open.c) by commit c42b426c87c3d3b1dc9eda612cc831293dc80d68 from Gerrit patch/Change-Id Ic8f82ddc9a56206307c2e5be2523fb7ce42b8638 (at http://review.whamcloud.com/3035) for LU-1338 (now HSM-5) ticket.

          And Oleg already warned about this in its Change comment !

          I wonder if I can simply revert these changes to get the correct behavior, and I would like to get Aurelien (since he is the original change author) feed-back on this.

          bfaccini Bruno Faccini (Inactive) added a comment - This behavior has been introduced in both mdt_mfd_open()/mdt_object_open_lock() routine (in lustre/mdt/mdt_open.c) by commit c42b426c87c3d3b1dc9eda612cc831293dc80d68 from Gerrit patch/Change-Id Ic8f82ddc9a56206307c2e5be2523fb7ce42b8638 (at http://review.whamcloud.com/3035 ) for LU-1338 (now HSM-5) ticket. And Oleg already warned about this in its Change comment ! I wonder if I can simply revert these changes to get the correct behavior, and I would like to get Aurelien (since he is the original change author) feed-back on this.

          Normal (without HSM actions/cmds) behavior would be to have "echo 'Hi!' > SLEEP" fail with "Text file busy"/ETXTBSY.

          And dual/lustre2 mount access is the key ...

          I am walking thru the code to see where we missed something during hsm_release.

          bfaccini Bruno Faccini (Inactive) added a comment - Normal (without HSM actions/cmds) behavior would be to have "echo 'Hi!' > SLEEP" fail with "Text file busy"/ETXTBSY. And dual/lustre2 mount access is the key ... I am walking thru the code to see where we missed something during hsm_release.

          People

            bfaccini Bruno Faccini (Inactive)
            jhammond John Hammond
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: