HSM _not only_ small fixes and to do list goes here
(LU-3647)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.1 |
| Type: | Technical task | Priority: | Major |
| Reporter: | John Hammond | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HSM | ||
| Rank (Obsolete): | 9297 |
| Description |
|
Using the Jul 22, 2013 HSM stack, executing a released file (and thereby triggering a restore) leaves the file writable while it's being executed. # cd /mnt/lustre # cp /bin/sleep SLEEP # lfs hsm_archive SLEEP # sleep 1 # lfs hsm_release SLEEP # ./SLEEP 10 && echo DONE & [1] 4243 # sleep 1 # pgrep -l SLEEP 4244 SLEEP # cd /mnt/lustre2 # echo 'Hi!' > SLEEP # cat SLEEP Hi! # -bash: line 238: 4244 Bus error (core dumped) ./SLEEP 10 [1]+ Exit 135 ./SLEEP 10 && echo DONE (wd: /mnt/lustre) (wd now: /mnt/lustre2) |
| Comments |
| Comment by Bruno Faccini (Inactive) [ 05/Sep/13 ] |
|
Normal (without HSM actions/cmds) behavior would be to have "echo 'Hi!' > SLEEP" fail with "Text file busy"/ETXTBSY. And dual/lustre2 mount access is the key ... I am walking thru the code to see where we missed something during hsm_release. |
| Comment by Bruno Faccini (Inactive) [ 11/Sep/13 ] |
|
This behavior has been introduced in both mdt_mfd_open()/mdt_object_open_lock() routine (in lustre/mdt/mdt_open.c) by commit c42b426c87c3d3b1dc9eda612cc831293dc80d68 from Gerrit patch/Change-Id Ic8f82ddc9a56206307c2e5be2523fb7ce42b8638 (at http://review.whamcloud.com/3035) for LU-1338 (now HSM-5) ticket. And Oleg already warned about this in its Change comment ! I wonder if I can simply revert these changes to get the correct behavior, and I would like to get Aurelien (since he is the original change author) feed-back on this. |
| Comment by Aurelien Degremont (Inactive) [ 12/Sep/13 ] |
|
I did not write this part of the patch, but it seems it could be change. I'm trusting Oleg regarding this. |
| Comment by Bruno Faccini (Inactive) [ 25/Sep/13 ] |
|
1st patch attempt is at http://review.whamcloud.com/7636. Build is ok but auto-tests never started ... |
| Comment by Bruno Faccini (Inactive) [ 02/Oct/13 ] |
|
1st patch-set of http://review.whamcloud.com/7636 successfully passed auto-tests and also did not trigger the original problem when running John's reproducer. I will submit a new version/patch-set #2 with the same code but adding a specific+new sub-test in sanity-hsm, based on John's reproducer. |
| Comment by Bruno Faccini (Inactive) [ 11/Oct/13 ] |
|
Patch-set #2 of Change #7636 successfully passed auto-tests including its own+new sanity-hsm/test_30c sub-test. This allows restore on exec() to continue to work but now prevents any write to be allowed during exec() and make it fail. BTW, reading code of sub-tests test_30[a,b], against same exec() on released files area, I have been surprised by the following comment : # restore at exec cannot work on agent node (because of Linux kernel # protection of executables) needclients 2 || return 0 ... at their beginning. |
| Comment by Peter Jones [ 25/Oct/13 ] |
|
Landed for 2.6 |
| Comment by Aurelien Degremont (Inactive) [ 12/Nov/13 ] |
|
This should also be considered for 2.5.1 |
| Comment by Peter Jones [ 12/Nov/13 ] |
|
Yes it is being tracked for 2.5.1. |