Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9929

Use "setfacl" to set "default" setting fail when nodemap enabled

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.11.0, Lustre 2.10.2
    • None
    • None
    • Lustre 2.9
    • 3
    • 9223372036854775807

    Description

      Hi ,

      When we setfacl default in lustre directory, I got unmapping group id(getfacl) after first setting(setfacl) . Then we executed setfacl command again and got fail ( Operation not permitted).

      Please help us to fix this problem. Thanks!

      The detail information is listed below.

      1.cat /etc/passwd
      user1:x:1001:1001::/home/user1:/bin/bash
      2.nodemap setting
      nodemap.21b7e9f04fed448e.idmap=
      [
      .....

      { idtype: gid, client_id: 1001, fs_id: 23501 }

      ,
      .....
      ]
      3.setfacl steps
      [root@hsm client]# mkdir hadoop3
      [root@hsm client]# getfacl /mnt/client/hadoop3
      getfacl: Removing leading '/' from absolute path names
      file: mnt/client/hadoop3
      owner: root
      group: root
      user::rwx
      group::r-x
      other::r-x
      [root@hsm client]# setfacl -R -d -m group:user1:rwx /mnt/client/hadoop3
      [root@hsm client]# getfacl /mnt/client/hadoop3
      getfacl: Removing leading '/' from absolute path names
      file: mnt/client/hadoop3
      owner: root
      group: root
      user::rwx
      group::r-x
      other::r-x
      default:user::rwx
      default:group::r-x
      default:group:23501:rwx
      default:mask::rwx
      default:other::r-x
      [root@hsm client]# setfacl -R -d -m group:user1:rwx /mnt/client/hadoop3
      setfacl: /mnt/client/hadoop3: Operation not permitted

      Attachments

        1. aclclient.log
          3.59 MB
        2. aclserver.log
          3.13 MB

        Issue Links

          Activity

            [LU-9929] Use "setfacl" to set "default" setting fail when nodemap enabled
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29010/
            Subject: LU-9929 nodemap: add default ACL unmapping handling
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 62fee20556a4c90361bd28edb903dc77c9540133

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29010/ Subject: LU-9929 nodemap: add default ACL unmapping handling Project: fs/lustre-release Branch: master Current Patch Set: Commit: 62fee20556a4c90361bd28edb903dc77c9540133

            Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/29010
            Subject: LU-9929 nodemap: add unmapping process for default ACLs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0c7cc39d930d7ac7b2b0614d623103c5caddb2e6

            gerrit Gerrit Updater added a comment - Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/29010 Subject: LU-9929 nodemap: add unmapping process for default ACLs Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0c7cc39d930d7ac7b2b0614d623103c5caddb2e6
            emoly.liu Emoly Liu added a comment - - edited

            When acl default unmapping code is added to mdt_getxattr(), EPERM issue disappears too.

            The issue happened because after the first setfacl, a wrong default acl was cached in the client side, when running setfacl again, FS didn't know this wrong unmapped gid(23501), so treated it as a squash id, then this squash id entry(8 bytes) was skipped in nodemap_map_acl(). That's why we saw EPERM error.

            I will submit a patch later.

            emoly.liu Emoly Liu added a comment - - edited When acl default unmapping code is added to mdt_getxattr(), EPERM issue disappears too. The issue happened because after the first setfacl, a wrong default acl was cached in the client side, when running setfacl again, FS didn't know this wrong unmapped gid(23501), so treated it as a squash id, then this squash id entry(8 bytes) was skipped in nodemap_map_acl(). That's why we saw EPERM error. I will submit a patch later.
            emoly.liu Emoly Liu added a comment -

            Hi sebg-crd-pm,
            As you said, there are two issues:

            • wrong mapping gid(getfacl): according to my following debugging information, the second tree_type should be 0 (NODEMAP_FS_TO_CLIENT) instead. This should be fixed soon.
              Sep 13 15:03:02 centos7-2 kernel: id=1001, id_type=1, tree_type=1
              Sep 13 15:03:02 centos7-2 kernel: id=23501, id_type=1, tree_type=1
              
              
            • EPERM(setfacl): the diff 52-44=8 bytes is caused by the acl entry whose nm_squash_id is 99. I still need some time to investigate where this squash id(99) comes from and why it needs to be skipped, and if we skip this check, why it returns EINVAL instead from ldiskfs.

            I will give a update later.

            emoly.liu Emoly Liu added a comment - Hi sebg-crd-pm , As you said, there are two issues: wrong mapping gid(getfacl): according to my following debugging information, the second tree_type should be 0 (NODEMAP_FS_TO_CLIENT) instead. This should be fixed soon. Sep 13 15:03:02 centos7-2 kernel: id=1001, id_type=1, tree_type=1 Sep 13 15:03:02 centos7-2 kernel: id=23501, id_type=1, tree_type=1 EPERM(setfacl): the diff 52-44=8 bytes is caused by the acl entry whose nm_squash_id is 99. I still need some time to investigate where this squash id(99) comes from and why it needs to be skipped, and if we skip this check, why it returns EINVAL instead from ldiskfs. I will give a update later.

            Hi Emoly,

            Do you have any update ?

            Thanks!

            sebg-crd-pm sebg-crd-pm (Inactive) added a comment - Hi Emoly, Do you have any update ? Thanks!
            emoly.liu Emoly Liu added a comment -

            The following log

            00000004:00000001:2.0:1504689607.481641:0:22011:0:(mdt_xattr.c:327:mdt_reint_setxattr()) Process leaving via out (rc=18446744073709551615 : -1 : 0xffffffffffffffff)
            
            

            shows the error comes from the following code

            int mdt_reint_setxattr() {
            ...
                            rc = nodemap_map_acl(nodemap, rr->rr_eadata, xattr_len,
                                                 NODEMAP_CLIENT_TO_FS);
                            nodemap_putref(nodemap);
                            if (rc < 0)
                                    GOTO(out, rc);
            
                            /* ACLs were mapped out, return an error so the user knows */
                            if (rc != xattr_len)
                                    GOTO(out, rc = -EPERM);
            ...
            }
            
            

            The debugging information shows rc(44) != xattr_len(52). I will see what's wrong here.

            emoly.liu Emoly Liu added a comment - The following log 00000004:00000001:2.0:1504689607.481641:0:22011:0:(mdt_xattr.c:327:mdt_reint_setxattr()) Process leaving via out (rc=18446744073709551615 : -1 : 0xffffffffffffffff) shows the error comes from the following code int mdt_reint_setxattr() { ... rc = nodemap_map_acl(nodemap, rr->rr_eadata, xattr_len, NODEMAP_CLIENT_TO_FS); nodemap_putref(nodemap); if (rc < 0) GOTO(out, rc); /* ACLs were mapped out, return an error so the user knows */ if (rc != xattr_len) GOTO(out, rc = -EPERM); ... } The debugging information shows rc(44) != xattr_len(52). I will see what's wrong here.
            emoly.liu Emoly Liu added a comment -

            Thanks, I can see this issue now. I will investigate it.

            emoly.liu Emoly Liu added a comment - Thanks, I can see this issue now. I will investigate it.

            Hi Emoly,

            Did you set your client ip to nodemap_test ? (lctl nodemap_add_range)?
            The client will in nodemap.default if you did not set it to nodemap_test.
            Please also check nodemap.active=1(lctl get_param nodemap.*) and "lctl set_param nodemap.nodemap_test.admin_nodemap=1" for root access permission. Thanks

            Which nodes are there in your system and which nodes have user "user1"?
            >> One server node 172.20.110.212(mgt/mdt/ost), One client node 172.20.110.211
            Which steps did you run on which node?
            >> Please see below
            Can you provide all your nodemap information by the command "lctl get_param nodemap.$your_nodemap.*"?
            >> Please see below
            Can you collect some lustre logs on mgs node and client node during you test by the following commands:
            >> see attached files.
            What is your detailed lustre version?
            >> Test it in Lustre 2.10 (lustre-release-58fd06e.tar.gz)/ one node(mgs/mds/oss ) follow these steps.

            [setup lustre]
            lctl set_param nodemap.active=1
            lctl nodemap_add nodemap_test
            lctl set_param nodemap.nodemap_test.admin_nodemap=1
            lctl nodemap_add_idmap --name nodemap_test --idtype gid --idmap 1001:23501
            lctl nodemap_add_range --name nodemap_test --range 172.20.110.[211-211]@o2ib
            lctl get_param nodemap.*
            lctl get_param nodemap.nodemap_test.*

            [output]
            nodemap.active=1
            nodemap.nodemap_test.admin_nodemap=1
            nodemap.nodemap_test.deny_unknown=0
            nodemap.nodemap_test.exports=
            [

            { nid: 172.20.110.211@o2ib, uuid: 2a980ffd-962a-eef8-37aa-cedf34253b31 }

            ,

            { nid: 172.20.110.211@o2ib, uuid: 2a980ffd-962a-eef8-37aa-cedf34253b31 }

            ,
            ]
            nodemap.nodemap_test.fileset=

            nodemap.nodemap_test.id=1
            nodemap.nodemap_test.idmap=[

            { idtype: gid, client_id: 1001, fs_id: 23501 }

            ]
            nodemap.nodemap_test.map_mode=both
            nodemap.nodemap_test.ranges=
            [

            { id: 1, start_nid: 172.20.110.211@o2ib, end_nid: 172.20.110.211@o2ib }

            ]

            [client]
            //client node user ppp: aclserver.log aclclient.log ppp:x:1001:1001::/home/ppp:/bin/bash

            [root@hsm mnt]mount.lustre 172.20.110.212@o2ib:/jlustre /mnt/lustre
            [root@hsm mnt]mkdir -p /mnt/lustre/hadoop3
            [root@hsm mnt]# setfacl -R -d -m group:ppp:rwx /mnt/lustre/hadoop3
            [root@hsm mnt]# getfacl /mnt/lustre/hadoop3
            getfacl: Removing leading '/' from absolute path names

            1. file: mnt/lustre/hadoop3
            2. owner: root
            3. group: root
              user::rwx
              group::r-x
              other::r-x
              default:user::rwx
              default:group::r-x
              default:group:23501:rwx
              default:mask::rwx
              default:other::r-x

            [root@hsm mnt]# setfacl -R -d -m group:ppp:rwx /mnt/lustre/hadoop3
            setfacl: /mnt/lustre/hadoop3: Operation not permitted

            sebg-crd-pm sebg-crd-pm (Inactive) added a comment - Hi Emoly, Did you set your client ip to nodemap_test ? (lctl nodemap_add_range)? The client will in nodemap.default if you did not set it to nodemap_test. Please also check nodemap.active=1(lctl get_param nodemap.*) and "lctl set_param nodemap.nodemap_test.admin_nodemap=1" for root access permission. Thanks Which nodes are there in your system and which nodes have user "user1"? >> One server node 172.20.110.212(mgt/mdt/ost), One client node 172.20.110.211 Which steps did you run on which node? >> Please see below Can you provide all your nodemap information by the command "lctl get_param nodemap.$your_nodemap.*"? >> Please see below Can you collect some lustre logs on mgs node and client node during you test by the following commands: >> see attached files. What is your detailed lustre version? >> Test it in Lustre 2.10 (lustre-release-58fd06e.tar.gz)/ one node(mgs/mds/oss ) follow these steps. [setup lustre] lctl set_param nodemap.active=1 lctl nodemap_add nodemap_test lctl set_param nodemap.nodemap_test.admin_nodemap=1 lctl nodemap_add_idmap --name nodemap_test --idtype gid --idmap 1001:23501 lctl nodemap_add_range --name nodemap_test --range 172.20.110. [211-211] @o2ib lctl get_param nodemap.* lctl get_param nodemap.nodemap_test.* [output] nodemap.active=1 nodemap.nodemap_test.admin_nodemap=1 nodemap.nodemap_test.deny_unknown=0 nodemap.nodemap_test.exports= [ { nid: 172.20.110.211@o2ib, uuid: 2a980ffd-962a-eef8-37aa-cedf34253b31 } , { nid: 172.20.110.211@o2ib, uuid: 2a980ffd-962a-eef8-37aa-cedf34253b31 } , ] nodemap.nodemap_test.fileset= nodemap.nodemap_test.id=1 nodemap.nodemap_test.idmap=[ { idtype: gid, client_id: 1001, fs_id: 23501 } ] nodemap.nodemap_test.map_mode=both nodemap.nodemap_test.ranges= [ { id: 1, start_nid: 172.20.110.211@o2ib, end_nid: 172.20.110.211@o2ib } ] [client] //client node user ppp: aclserver.log aclclient.log ppp:x:1001:1001::/home/ppp:/bin/bash [root@hsm mnt] mount.lustre 172.20.110.212@o2ib:/jlustre /mnt/lustre [root@hsm mnt] mkdir -p /mnt/lustre/hadoop3 [root@hsm mnt] # setfacl -R -d -m group:ppp:rwx /mnt/lustre/hadoop3 [root@hsm mnt] # getfacl /mnt/lustre/hadoop3 getfacl: Removing leading '/' from absolute path names file: mnt/lustre/hadoop3 owner: root group: root user::rwx group::r-x other::r-x default:user::rwx default:group::r-x default:group:23501:rwx default:mask::rwx default:other::r-x [root@hsm mnt] # setfacl -R -d -m group:ppp:rwx /mnt/lustre/hadoop3 setfacl: /mnt/lustre/hadoop3: Operation not permitted
            emoly.liu Emoly Liu added a comment -

            sebg-crd-pm,
            I can't reproduce this issue in single node or multiple nodes tests. Here are my steps and output of my test:

            #On client node
            + groupadd -g 1001 user1
            + useradd -g user1 -u 1001 user1
            + cat /etc/passwd
            + grep user1
            user1:x:1001:1001::/home/user1:/bin/bash
            
            #On MGS node
            + lctl nodemap_add nodemap_test
            + lctl nodemap_add_idmap --name nodemap_test --idtype gid --idmap 1001:23501
            + lctl get_param 'nodemap.nodemap_test.*'
            nodemap.nodemap_test.admin_nodemap=0
            nodemap.nodemap_test.deny_unknown=0
            nodemap.nodemap_test.exports=[
            
            ]
            nodemap.nodemap_test.fileset=
            
            nodemap.nodemap_test.id=5
            nodemap.nodemap_test.idmap=[
             { idtype: gid, client_id: 1001, fs_id: 23501 }
            ]
            nodemap.nodemap_test.map_mode=both
            nodemap.nodemap_test.ranges=[
            
            ]
            nodemap.nodemap_test.squash_gid=99
            nodemap.nodemap_test.squash_uid=99
            nodemap.nodemap_test.trusted_nodemap=0
            
            #On client node
            + mkdir -p /mnt/lustre/hadoop3
            + getfacl /mnt/lustre/hadoop3
            getfacl: Removing leading '/' from absolute path names
            # file: mnt/lustre/hadoop3
            # owner: root
            # group: root
            user::rwx
            group::r-x
            other::r-x
            
            + echo '1st: setfacl'
            1st: setfacl
            + setfacl -R -d -m group:user1:rwx /mnt/lustre/hadoop3
            + getfacl /mnt/lustre/hadoop3
            getfacl: Removing leading '/' from absolute path names
            # file: mnt/lustre/hadoop3
            # owner: root
            # group: root
            user::rwx
            group::r-x
            other::r-x
            default:user::rwx
            default:group::r-x
            default:group:user1:rwx
            default:mask::rwx
            default:other::r-x
            
            + echo '2nd: setfacl'
            2nd: setfacl
            + setfacl -R -d -m group:user1:rwx /mnt/lustre/hadoop3
            + getfacl /mnt/lustre/hadoop3
            getfacl: Removing leading '/' from absolute path names
            # file: mnt/lustre/hadoop3
            # owner: root
            # group: root
            user::rwx
            group::r-x
            other::r-x
            default:user::rwx
            default:group::r-x
            default:group:user1:rwx
            default:mask::rwx
            default:other::r-x
            
            

            Could you provide the following information?

            • Which nodes are there in your system and which nodes have user "user1"?
            • Which steps did you run on which node?
            • Can you provide all your nodemap information by the command "lctl get_param nodemap.$your_nodemap.*"?
            • Can you collect some lustre logs on mgs node and client node during you test by the following commands:
              #before test
              lctl set_param debug=-1 debug_mb=1000
              lctl dk > /dev/null
              #testing ...
              #after test
              lctl dk > $logfile
              Then please upload the logfile here.
              
              
            • What is your detailed lustre version?

            Thanks!

            emoly.liu Emoly Liu added a comment - sebg-crd-pm , I can't reproduce this issue in single node or multiple nodes tests. Here are my steps and output of my test: #On client node + groupadd -g 1001 user1 + useradd -g user1 -u 1001 user1 + cat /etc/passwd + grep user1 user1:x:1001:1001::/home/user1:/bin/bash #On MGS node + lctl nodemap_add nodemap_test + lctl nodemap_add_idmap --name nodemap_test --idtype gid --idmap 1001:23501 + lctl get_param 'nodemap.nodemap_test.*' nodemap.nodemap_test.admin_nodemap=0 nodemap.nodemap_test.deny_unknown=0 nodemap.nodemap_test.exports=[ ] nodemap.nodemap_test.fileset= nodemap.nodemap_test.id=5 nodemap.nodemap_test.idmap=[ { idtype: gid, client_id: 1001, fs_id: 23501 } ] nodemap.nodemap_test.map_mode=both nodemap.nodemap_test.ranges=[ ] nodemap.nodemap_test.squash_gid=99 nodemap.nodemap_test.squash_uid=99 nodemap.nodemap_test.trusted_nodemap=0 #On client node + mkdir -p /mnt/lustre/hadoop3 + getfacl /mnt/lustre/hadoop3 getfacl: Removing leading '/' from absolute path names # file: mnt/lustre/hadoop3 # owner: root # group: root user::rwx group::r-x other::r-x + echo '1st: setfacl' 1st: setfacl + setfacl -R -d -m group:user1:rwx /mnt/lustre/hadoop3 + getfacl /mnt/lustre/hadoop3 getfacl: Removing leading '/' from absolute path names # file: mnt/lustre/hadoop3 # owner: root # group: root user::rwx group::r-x other::r-x default:user::rwx default:group::r-x default:group:user1:rwx default:mask::rwx default:other::r-x + echo '2nd: setfacl' 2nd: setfacl + setfacl -R -d -m group:user1:rwx /mnt/lustre/hadoop3 + getfacl /mnt/lustre/hadoop3 getfacl: Removing leading '/' from absolute path names # file: mnt/lustre/hadoop3 # owner: root # group: root user::rwx group::r-x other::r-x default:user::rwx default:group::r-x default:group:user1:rwx default:mask::rwx default:other::r-x Could you provide the following information? Which nodes are there in your system and which nodes have user "user1"? Which steps did you run on which node? Can you provide all your nodemap information by the command "lctl get_param nodemap.$your_nodemap.*"? Can you collect some lustre logs on mgs node and client node during you test by the following commands: #before test lctl set_param debug=-1 debug_mb=1000 lctl dk > /dev/null #testing ... #after test lctl dk > $logfile Then please upload the logfile here. What is your detailed lustre version? Thanks!
            emoly.liu Emoly Liu added a comment -

            I will look into this issue.

            emoly.liu Emoly Liu added a comment - I will look into this issue.

            People

              emoly.liu Emoly Liu
              sebg-crd-pm sebg-crd-pm (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: