HSM _not only_ small fixes and to do list goes here
(LU-3647)
|
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.5.0 |
| Type: | Technical task | Priority: | Blocker |
| Reporter: | John Hammond | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB, HSM | ||
| Issue Links: |
|
||||||||||||||||
| Rank (Obsolete): | 9850 | ||||||||||||||||
| Description |
|
Non-root users cannot archive their files and root cannot archive files owned by other users. Moreover the lfs hsm_archive command fails silently. To reproduce, start HSM and do the following: # cd /mnt/lustre # dd if=/dev/zero of=Asterix bs=1M count=10 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.0633262 s, 166 MB/s # chown sanity: Asterix # lfs hsm_archive Asterix # echo $? 0 Running the same operation with trace enabled show that the failure originates from: 00000004:00000001:3.0:1377109173.525959:0:15687:0:(mdd_object.c:911:mdd_xattr_sanity_check()) Process leaving (rc=18446744073709551615 : -1 : ffffffffffffffff) 00000004:00000001:3.0:1377109173.525959:0:15687:0:(mdd_object.c:1034:mdd_xattr_set()) Process leaving (rc=18446744073709551615 : -1 : ffffffffffffffff) 00000004:00000001:3.0:1377109173.525964:0:15687:0:(mdt_hsm.c:77:mdt_hsm_attr_set()) Process leaving (rc=18446744073709551615 : -1 : ffffffffffffffff) The failure in mdd_xattr_sanity_check() is because the file ownership is not root and the coordinator runs with no capabilities. The failed request leaves an orphaned agent action according to proc # lfs path2fid /mnt/lustre/Asterix [0x200000400:0x1:0x0] # cat /proc/fs/lustre/mdt/lustre-MDT0000/hsm/agent_actions lrh=[type=10680000 len=136 idx=73] fid=[0x200000400:0x1:0x0] dfid=[0x200000400:0x1:0x0] compound/cookie=0x52150414/0x52150414 action=ARCHIVE archive#=0 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[] Similarly a non-root user cannot archive any files. # cd /mnt/lustre # mkdir sanity # chown sanity: sanity # cd sanity # su sanity $ dd if=/dev/zero of=Obelix bs=1M count=50 $ ls -l Obelix -rw-rw-r-- 1 sanity sanity 52428800 Aug 21 14:03 Obelix $ lfs hsm_archive Obelix $ echo $? 0 $ sleep 60 $ lfs hsm_state Obelix Obelix: (0x00000000) |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 22/Aug/13 ] |
|
good catch. For command line tool, only the root can archive and release files; normal users should be able to restore their own files, by either `lfs hsm_restore' or directly access. |
| Comment by jacques-charles lafoucriere [ 25/Aug/13 ] |
|
IMO there is no reason to distinguish explicit request (through cmd line) from implicit request. Also archive/restore must be doable by user. One use case is in a job epilog, the user archive all the data produced by the run. In any case there is no security issue, the only issue is the interaction with quotas which is an open pb. For me, inforced quotas are incompatible with HSM. |
| Comment by Andreas Dilger [ 26/Aug/13 ] |
|
I think there are two problems here:
|
| Comment by John Hammond [ 26/Aug/13 ] |
|
Andreas, indeed. Supporting HSM archive by normal users requires raising the capabilities of the coordinator which we need to do anyway. Then we have there is the issue that an unprivileged user may archive files they don't own. Should normal users be allowed to release their own files? If so then there is also the problem that, as implemented, HSM release requires that the user possess create privileges in the MDT's local root. I believe that we all agree that implicit restore should be handled transparently. This can be fixed in the copytool. What about explicit restore? There are some weak security issues here, since user space can just give an arbitrary FID to llite. On the permissions question, I think this is policy and so should be deferred to the site admins, via proc tunables (yucky but whatever) or a setuid() binary (less yucky but not something we usually do). |
| Comment by Jinshan Xiong (Inactive) [ 26/Aug/13 ] |
I tend to think we need to distinguish request from command line and policy engine. Archive and release should be a privileged operation because it will consume system resources. Otherwise, the system may be easily DoS attack by malicious users. This is the usual case for similar kind of product. For epilog cases, I think we should do some work on policy engine so that it can be picked. |
| Comment by John Hammond [ 27/Aug/13 ] |
|
Please see http://review.whamcloud.com/7461. |
| Comment by jacques-charles lafoucriere [ 27/Aug/13 ] |
|
| Comment by John Hammond [ 27/Aug/13 ] |
|
Also note that there is not enough permission checking in mdt_hsm_request() and friends. A non-root user can cancel HSM requests, do hsm_remove, ... |
| Comment by Thomas LEIBOVICI - CEA (Inactive) [ 28/Aug/13 ] |
Release is not really resource consuming, it is resource freeing (it is a way for the user to do a kind of posix_fadvise(DONTNEED) with HSM meaning). I'd rather say restore is the more resource consuming (use Lustre disk space + copy bandwidth + tape drive). And archive, that use copy bandwidth.
A DoS is still easy on a HSM system by restoring all readable files.
Indeed, this is dangerous... |
| Comment by Jinshan Xiong (Inactive) [ 28/Aug/13 ] |
thinking about a case that an archive already exists so the malicious user can keep releasing and restoring a file in an infinite loop. |
| Comment by John Hammond [ 30/Aug/13 ] |
|
I though it best to restart the permissions debate in a net ticket. Please see |
| Comment by John Hammond [ 04/Sep/13 ] |
|
Patch landed to master. |