Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3647 HSM _not only_ small fixes and to do list goes here
  3. LU-3811

non-root users cannot archive files, root cannot archive non-root users' files

Details

    • Technical task
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0
    • Lustre 2.5.0
    • 9850

    Description

      Non-root users cannot archive their files and root cannot archive files owned by other users. Moreover the lfs hsm_archive command fails silently. To reproduce, start HSM and do the following:

      # cd /mnt/lustre
      # dd if=/dev/zero of=Asterix bs=1M count=10
      10+0 records in
      10+0 records out
      10485760 bytes (10 MB) copied, 0.0633262 s, 166 MB/s
      # chown sanity: Asterix 
      # lfs hsm_archive Asterix 
      # echo $?
      0
      

      Running the same operation with trace enabled show that the failure originates from:

      00000004:00000001:3.0:1377109173.525959:0:15687:0:(mdd_object.c:911:mdd_xattr_sanity_check()) Process leaving (rc=18446744073709551615 : -1 : ffffffffffffffff)
      00000004:00000001:3.0:1377109173.525959:0:15687:0:(mdd_object.c:1034:mdd_xattr_set()) Process leaving (rc=18446744073709551615 : -1 : ffffffffffffffff)
      00000004:00000001:3.0:1377109173.525964:0:15687:0:(mdt_hsm.c:77:mdt_hsm_attr_set()) Process leaving (rc=18446744073709551615 : -1 : ffffffffffffffff)
      

      The failure in mdd_xattr_sanity_check() is because the file ownership is not root and the coordinator runs with no capabilities.

      The failed request leaves an orphaned agent action according to proc

      # lfs path2fid /mnt/lustre/Asterix 
      [0x200000400:0x1:0x0]
      # cat /proc/fs/lustre/mdt/lustre-MDT0000/hsm/agent_actions 
      lrh=[type=10680000 len=136 idx=73] fid=[0x200000400:0x1:0x0] dfid=[0x200000400:0x1:0x0] compound/cookie=0x52150414/0x52150414 action=ARCHIVE archive#=0 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]
      

      Similarly a non-root user cannot archive any files.

      # cd /mnt/lustre
      # mkdir sanity
      # chown sanity: sanity
      # cd sanity
      # su sanity
      $ dd if=/dev/zero of=Obelix bs=1M count=50
      $ ls -l Obelix
      -rw-rw-r-- 1 sanity sanity 52428800 Aug 21 14:03 Obelix
      $ lfs hsm_archive Obelix 
      $ echo $?
      0
      $ sleep 60
      $ lfs hsm_state Obelix
      Obelix: (0x00000000)
      

      Attachments

        Issue Links

          Activity

            [LU-3811] non-root users cannot archive files, root cannot archive non-root users' files
            jhammond John Hammond added a comment -

            Patch landed to master.

            jhammond John Hammond added a comment - Patch landed to master.
            jhammond John Hammond added a comment -

            I though it best to restart the permissions debate in a net ticket. Please see LU-3866.

            jhammond John Hammond added a comment - I though it best to restart the permissions debate in a net ticket. Please see LU-3866 .

            Release is not really resource consuming, it is resource freeing (it is a way for the user to do a kind of posix_fadvise(DONTNEED) with HSM meaning).

            thinking about a case that an archive already exists so the malicious user can keep releasing and restoring a file in an infinite loop.

            jay Jinshan Xiong (Inactive) added a comment - Release is not really resource consuming, it is resource freeing (it is a way for the user to do a kind of posix_fadvise(DONTNEED) with HSM meaning). thinking about a case that an archive already exists so the malicious user can keep releasing and restoring a file in an infinite loop.

            Archive and release should be a privileged operation because it will consume system resources.

            Release is not really resource consuming, it is resource freeing (it is a way for the user to do a kind of posix_fadvise(DONTNEED) with HSM meaning).

            I'd rather say restore is the more resource consuming (use Lustre disk space + copy bandwidth + tape drive). And archive, that use copy bandwidth.

            Otherwise, the system may be easily DoS attack by malicious users.

            A DoS is still easy on a HSM system by restoring all readable files.
            For archive and restore, it is more a QoS issue to avoid a single user to get all the bandwidth.

            A non-root user can cancel HSM requests, do hsm_remove, ...

            Indeed, this is dangerous...

            leibovici-cea Thomas LEIBOVICI - CEA (Inactive) added a comment - - edited Archive and release should be a privileged operation because it will consume system resources. Release is not really resource consuming, it is resource freeing (it is a way for the user to do a kind of posix_fadvise(DONTNEED) with HSM meaning). I'd rather say restore is the more resource consuming (use Lustre disk space + copy bandwidth + tape drive). And archive, that use copy bandwidth. Otherwise, the system may be easily DoS attack by malicious users. A DoS is still easy on a HSM system by restoring all readable files. For archive and restore, it is more a QoS issue to avoid a single user to get all the bandwidth. A non-root user can cancel HSM requests, do hsm_remove, ... Indeed, this is dangerous...

            Also note that there is not enough permission checking in mdt_hsm_request() and friends. A non-root user can cancel HSM requests, do hsm_remove, ...

            jhammond John Hammond added a comment - Also note that there is not enough permission checking in mdt_hsm_request() and friends. A non-root user can cancel HSM requests, do hsm_remove, ...
            • the volatile files created should use the uid/gid of the creating process for the new file, as with any regular file create
              so it will not be possible to restore a file in a RO directory. there is no reason for such restriction
            jcl jacques-charles lafoucriere added a comment - the volatile files created should use the uid/gid of the creating process for the new file, as with any regular file create so it will not be possible to restore a file in a RO directory. there is no reason for such restriction
            jhammond John Hammond added a comment - Please see http://review.whamcloud.com/7461 .

            IMO there is no reason to distinguish explicit request (through cmd line) from implicit request. Also archive/restore must be doable by user. One use case is in a job epilog, the user archive all the data produced by the run. In any case there is no security issue, the only issue is the interaction with quotas which is an open pb. For me, inforced quotas are incompatible with HSM.

            I tend to think we need to distinguish request from command line and policy engine. Archive and release should be a privileged operation because it will consume system resources. Otherwise, the system may be easily DoS attack by malicious users. This is the usual case for similar kind of product.

            For epilog cases, I think we should do some work on policy engine so that it can be picked.

            jay Jinshan Xiong (Inactive) added a comment - IMO there is no reason to distinguish explicit request (through cmd line) from implicit request. Also archive/restore must be doable by user. One use case is in a job epilog, the user archive all the data produced by the run. In any case there is no security issue, the only issue is the interaction with quotas which is an open pb. For me, inforced quotas are incompatible with HSM. I tend to think we need to distinguish request from command line and policy engine. Archive and release should be a privileged operation because it will consume system resources. Otherwise, the system may be easily DoS attack by malicious users. This is the usual case for similar kind of product. For epilog cases, I think we should do some work on policy engine so that it can be picked.
            jhammond John Hammond added a comment -

            Andreas, indeed.

            Supporting HSM archive by normal users requires raising the capabilities of the coordinator which we need to do anyway. Then we have there is the issue that an unprivileged user may archive files they don't own.

            Should normal users be allowed to release their own files? If so then there is also the problem that, as implemented, HSM release requires that the user possess create privileges in the MDT's local root.

            I believe that we all agree that implicit restore should be handled transparently. This can be fixed in the copytool. What about explicit restore? There are some weak security issues here, since user space can just give an arbitrary FID to llite.

            On the permissions question, I think this is policy and so should be deferred to the site admins, via proc tunables (yucky but whatever) or a setuid() binary (less yucky but not something we usually do).

            jhammond John Hammond added a comment - Andreas, indeed. Supporting HSM archive by normal users requires raising the capabilities of the coordinator which we need to do anyway. Then we have there is the issue that an unprivileged user may archive files they don't own. Should normal users be allowed to release their own files? If so then there is also the problem that, as implemented, HSM release requires that the user possess create privileges in the MDT's local root. I believe that we all agree that implicit restore should be handled transparently. This can be fixed in the copytool. What about explicit restore? There are some weak security issues here, since user space can just give an arbitrary FID to llite. On the permissions question, I think this is policy and so should be deferred to the site admins, via proc tunables (yucky but whatever) or a setuid() binary (less yucky but not something we usually do).

            I think there are two problems here:

            • the volatile files created should use the uid/gid of the creating process for the new file, as with any regular file create
            • the root copytool should chown/chmod the file to the correct ownership before trying to swap the layout
            adilger Andreas Dilger added a comment - I think there are two problems here: the volatile files created should use the uid/gid of the creating process for the new file, as with any regular file create the root copytool should chown/chmod the file to the correct ownership before trying to swap the layout

            People

              jay Jinshan Xiong (Inactive)
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: