HSM _not only_ small fixes and to do list goes here (LU-3647)

[LU-3811] non-root users cannot archive files, root cannot archive non-root users' files Created: 21/Aug/13  Updated: 21/Oct/13  Resolved: 04/Sep/13

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.5.0

Type: Technical task Priority: Blocker
Reporter: John Hammond Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: HB, HSM

Issue Links:
Duplicate
duplicates LU-3825 mdt_hsm_release() clobbers ma_valid Closed
Related
is related to LU-3866 missing permission checks on HSM oper... Resolved
Rank (Obsolete): 9850

 Description   

Non-root users cannot archive their files and root cannot archive files owned by other users. Moreover the lfs hsm_archive command fails silently. To reproduce, start HSM and do the following:

# cd /mnt/lustre
# dd if=/dev/zero of=Asterix bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.0633262 s, 166 MB/s
# chown sanity: Asterix 
# lfs hsm_archive Asterix 
# echo $?
0

Running the same operation with trace enabled show that the failure originates from:

00000004:00000001:3.0:1377109173.525959:0:15687:0:(mdd_object.c:911:mdd_xattr_sanity_check()) Process leaving (rc=18446744073709551615 : -1 : ffffffffffffffff)
00000004:00000001:3.0:1377109173.525959:0:15687:0:(mdd_object.c:1034:mdd_xattr_set()) Process leaving (rc=18446744073709551615 : -1 : ffffffffffffffff)
00000004:00000001:3.0:1377109173.525964:0:15687:0:(mdt_hsm.c:77:mdt_hsm_attr_set()) Process leaving (rc=18446744073709551615 : -1 : ffffffffffffffff)

The failure in mdd_xattr_sanity_check() is because the file ownership is not root and the coordinator runs with no capabilities.

The failed request leaves an orphaned agent action according to proc

# lfs path2fid /mnt/lustre/Asterix 
[0x200000400:0x1:0x0]
# cat /proc/fs/lustre/mdt/lustre-MDT0000/hsm/agent_actions 
lrh=[type=10680000 len=136 idx=73] fid=[0x200000400:0x1:0x0] dfid=[0x200000400:0x1:0x0] compound/cookie=0x52150414/0x52150414 action=ARCHIVE archive#=0 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]

Similarly a non-root user cannot archive any files.

# cd /mnt/lustre
# mkdir sanity
# chown sanity: sanity
# cd sanity
# su sanity
$ dd if=/dev/zero of=Obelix bs=1M count=50
$ ls -l Obelix
-rw-rw-r-- 1 sanity sanity 52428800 Aug 21 14:03 Obelix
$ lfs hsm_archive Obelix 
$ echo $?
0
$ sleep 60
$ lfs hsm_state Obelix
Obelix: (0x00000000)


 Comments   
Comment by Jinshan Xiong (Inactive) [ 22/Aug/13 ]

good catch. For command line tool, only the root can archive and release files; normal users should be able to restore their own files, by either `lfs hsm_restore' or directly access.

Comment by jacques-charles lafoucriere [ 25/Aug/13 ]

IMO there is no reason to distinguish explicit request (through cmd line) from implicit request. Also archive/restore must be doable by user. One use case is in a job epilog, the user archive all the data produced by the run. In any case there is no security issue, the only issue is the interaction with quotas which is an open pb. For me, inforced quotas are incompatible with HSM.

Comment by Andreas Dilger [ 26/Aug/13 ]

I think there are two problems here:

  • the volatile files created should use the uid/gid of the creating process for the new file, as with any regular file create
  • the root copytool should chown/chmod the file to the correct ownership before trying to swap the layout
Comment by John Hammond [ 26/Aug/13 ]

Andreas, indeed.

Supporting HSM archive by normal users requires raising the capabilities of the coordinator which we need to do anyway. Then we have there is the issue that an unprivileged user may archive files they don't own.

Should normal users be allowed to release their own files? If so then there is also the problem that, as implemented, HSM release requires that the user possess create privileges in the MDT's local root.

I believe that we all agree that implicit restore should be handled transparently. This can be fixed in the copytool. What about explicit restore? There are some weak security issues here, since user space can just give an arbitrary FID to llite.

On the permissions question, I think this is policy and so should be deferred to the site admins, via proc tunables (yucky but whatever) or a setuid() binary (less yucky but not something we usually do).

Comment by Jinshan Xiong (Inactive) [ 26/Aug/13 ]

IMO there is no reason to distinguish explicit request (through cmd line) from implicit request. Also archive/restore must be doable by user. One use case is in a job epilog, the user archive all the data produced by the run. In any case there is no security issue, the only issue is the interaction with quotas which is an open pb. For me, inforced quotas are incompatible with HSM.

I tend to think we need to distinguish request from command line and policy engine. Archive and release should be a privileged operation because it will consume system resources. Otherwise, the system may be easily DoS attack by malicious users. This is the usual case for similar kind of product.

For epilog cases, I think we should do some work on policy engine so that it can be picked.

Comment by John Hammond [ 27/Aug/13 ]

Please see http://review.whamcloud.com/7461.

Comment by jacques-charles lafoucriere [ 27/Aug/13 ]
  • the volatile files created should use the uid/gid of the creating process for the new file, as with any regular file create
    so it will not be possible to restore a file in a RO directory. there is no reason for such restriction
Comment by John Hammond [ 27/Aug/13 ]

Also note that there is not enough permission checking in mdt_hsm_request() and friends. A non-root user can cancel HSM requests, do hsm_remove, ...

Comment by Thomas LEIBOVICI - CEA (Inactive) [ 28/Aug/13 ]

Archive and release should be a privileged operation because it will consume system resources.

Release is not really resource consuming, it is resource freeing (it is a way for the user to do a kind of posix_fadvise(DONTNEED) with HSM meaning).

I'd rather say restore is the more resource consuming (use Lustre disk space + copy bandwidth + tape drive). And archive, that use copy bandwidth.

Otherwise, the system may be easily DoS attack by malicious users.

A DoS is still easy on a HSM system by restoring all readable files.
For archive and restore, it is more a QoS issue to avoid a single user to get all the bandwidth.

A non-root user can cancel HSM requests, do hsm_remove, ...

Indeed, this is dangerous...

Comment by Jinshan Xiong (Inactive) [ 28/Aug/13 ]

Release is not really resource consuming, it is resource freeing (it is a way for the user to do a kind of posix_fadvise(DONTNEED) with HSM meaning).

thinking about a case that an archive already exists so the malicious user can keep releasing and restoring a file in an infinite loop.

Comment by John Hammond [ 30/Aug/13 ]

I though it best to restart the permissions debate in a net ticket. Please see LU-3866.

Comment by John Hammond [ 04/Sep/13 ]

Patch landed to master.

Generated at Sat Feb 10 01:37:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.