[LU-9929] Use "setfacl" to set "default" setting fail when nodemap enabled Created: 30/Aug/17  Updated: 12/Oct/17  Resolved: 30/Sep/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.11.0, Lustre 2.10.2

Type: Bug Priority: Critical
Reporter: sebg-crd-pm (Inactive) Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre 2.9


Attachments: File aclclient.log     File aclserver.log    
Issue Links:
Duplicate
duplicates LU-9759 setfacl default setting fail when nod... Closed
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Hi ,

When we setfacl default in lustre directory, I got unmapping group id(getfacl) after first setting(setfacl) . Then we executed setfacl command again and got fail ( Operation not permitted).

Please help us to fix this problem. Thanks!

The detail information is listed below.

1.cat /etc/passwd
user1:x:1001:1001::/home/user1:/bin/bash
2.nodemap setting
nodemap.21b7e9f04fed448e.idmap=
[
.....

{ idtype: gid, client_id: 1001, fs_id: 23501 }

,
.....
]
3.setfacl steps
[root@hsm client]# mkdir hadoop3
[root@hsm client]# getfacl /mnt/client/hadoop3
getfacl: Removing leading '/' from absolute path names
file: mnt/client/hadoop3
owner: root
group: root
user::rwx
group::r-x
other::r-x
[root@hsm client]# setfacl -R -d -m group:user1:rwx /mnt/client/hadoop3
[root@hsm client]# getfacl /mnt/client/hadoop3
getfacl: Removing leading '/' from absolute path names
file: mnt/client/hadoop3
owner: root
group: root
user::rwx
group::r-x
other::r-x
default:user::rwx
default:group::r-x
default:group:23501:rwx
default:mask::rwx
default:other::r-x
[root@hsm client]# setfacl -R -d -m group:user1:rwx /mnt/client/hadoop3
setfacl: /mnt/client/hadoop3: Operation not permitted



 Comments   
Comment by Peter Jones [ 30/Aug/17 ]

Emoly

Could you please assist with this one?

Thanks

Peter

Comment by sebg-crd-pm (Inactive) [ 01/Sep/17 ]

Hi Emoly,

Do you have any comment about this issue?
We really need your help.

Thanks !

Comment by Emoly Liu [ 02/Sep/17 ]

I will look into this issue.

Comment by Emoly Liu [ 04/Sep/17 ]

sebg-crd-pm,
I can't reproduce this issue in single node or multiple nodes tests. Here are my steps and output of my test:

#On client node
+ groupadd -g 1001 user1
+ useradd -g user1 -u 1001 user1
+ cat /etc/passwd
+ grep user1
user1:x:1001:1001::/home/user1:/bin/bash

#On MGS node
+ lctl nodemap_add nodemap_test
+ lctl nodemap_add_idmap --name nodemap_test --idtype gid --idmap 1001:23501
+ lctl get_param 'nodemap.nodemap_test.*'
nodemap.nodemap_test.admin_nodemap=0
nodemap.nodemap_test.deny_unknown=0
nodemap.nodemap_test.exports=[

]
nodemap.nodemap_test.fileset=

nodemap.nodemap_test.id=5
nodemap.nodemap_test.idmap=[
 { idtype: gid, client_id: 1001, fs_id: 23501 }
]
nodemap.nodemap_test.map_mode=both
nodemap.nodemap_test.ranges=[

]
nodemap.nodemap_test.squash_gid=99
nodemap.nodemap_test.squash_uid=99
nodemap.nodemap_test.trusted_nodemap=0

#On client node
+ mkdir -p /mnt/lustre/hadoop3
+ getfacl /mnt/lustre/hadoop3
getfacl: Removing leading '/' from absolute path names
# file: mnt/lustre/hadoop3
# owner: root
# group: root
user::rwx
group::r-x
other::r-x

+ echo '1st: setfacl'
1st: setfacl
+ setfacl -R -d -m group:user1:rwx /mnt/lustre/hadoop3
+ getfacl /mnt/lustre/hadoop3
getfacl: Removing leading '/' from absolute path names
# file: mnt/lustre/hadoop3
# owner: root
# group: root
user::rwx
group::r-x
other::r-x
default:user::rwx
default:group::r-x
default:group:user1:rwx
default:mask::rwx
default:other::r-x

+ echo '2nd: setfacl'
2nd: setfacl
+ setfacl -R -d -m group:user1:rwx /mnt/lustre/hadoop3
+ getfacl /mnt/lustre/hadoop3
getfacl: Removing leading '/' from absolute path names
# file: mnt/lustre/hadoop3
# owner: root
# group: root
user::rwx
group::r-x
other::r-x
default:user::rwx
default:group::r-x
default:group:user1:rwx
default:mask::rwx
default:other::r-x

Could you provide the following information?

  • Which nodes are there in your system and which nodes have user "user1"?
  • Which steps did you run on which node?
  • Can you provide all your nodemap information by the command "lctl get_param nodemap.$your_nodemap.*"?
  • Can you collect some lustre logs on mgs node and client node during you test by the following commands:
    #before test
    lctl set_param debug=-1 debug_mb=1000
    lctl dk > /dev/null
    #testing ...
    #after test
    lctl dk > $logfile
    Then please upload the logfile here.
    
    
  • What is your detailed lustre version?

Thanks!

Comment by sebg-crd-pm (Inactive) [ 06/Sep/17 ]

Hi Emoly,

Did you set your client ip to nodemap_test ? (lctl nodemap_add_range)?
The client will in nodemap.default if you did not set it to nodemap_test.
Please also check nodemap.active=1(lctl get_param nodemap.*) and "lctl set_param nodemap.nodemap_test.admin_nodemap=1" for root access permission. Thanks

Which nodes are there in your system and which nodes have user "user1"?
>> One server node 172.20.110.212(mgt/mdt/ost), One client node 172.20.110.211
Which steps did you run on which node?
>> Please see below
Can you provide all your nodemap information by the command "lctl get_param nodemap.$your_nodemap.*"?
>> Please see below
Can you collect some lustre logs on mgs node and client node during you test by the following commands:
>> see attached files.
What is your detailed lustre version?
>> Test it in Lustre 2.10 (lustre-release-58fd06e.tar.gz)/ one node(mgs/mds/oss ) follow these steps.

[setup lustre]
lctl set_param nodemap.active=1
lctl nodemap_add nodemap_test
lctl set_param nodemap.nodemap_test.admin_nodemap=1
lctl nodemap_add_idmap --name nodemap_test --idtype gid --idmap 1001:23501
lctl nodemap_add_range --name nodemap_test --range 172.20.110.[211-211]@o2ib
lctl get_param nodemap.*
lctl get_param nodemap.nodemap_test.*

[output]
nodemap.active=1
nodemap.nodemap_test.admin_nodemap=1
nodemap.nodemap_test.deny_unknown=0
nodemap.nodemap_test.exports=
[

{ nid: 172.20.110.211@o2ib, uuid: 2a980ffd-962a-eef8-37aa-cedf34253b31 }

,

{ nid: 172.20.110.211@o2ib, uuid: 2a980ffd-962a-eef8-37aa-cedf34253b31 }

,
]
nodemap.nodemap_test.fileset=

nodemap.nodemap_test.id=1
nodemap.nodemap_test.idmap=[

{ idtype: gid, client_id: 1001, fs_id: 23501 }

]
nodemap.nodemap_test.map_mode=both
nodemap.nodemap_test.ranges=
[

{ id: 1, start_nid: 172.20.110.211@o2ib, end_nid: 172.20.110.211@o2ib }

]

[client]
//client node user ppp: aclserver.log aclclient.log ppp:x:1001:1001::/home/ppp:/bin/bash

[root@hsm mnt]mount.lustre 172.20.110.212@o2ib:/jlustre /mnt/lustre
[root@hsm mnt]mkdir -p /mnt/lustre/hadoop3
[root@hsm mnt]# setfacl -R -d -m group:ppp:rwx /mnt/lustre/hadoop3
[root@hsm mnt]# getfacl /mnt/lustre/hadoop3
getfacl: Removing leading '/' from absolute path names

  1. file: mnt/lustre/hadoop3
  2. owner: root
  3. group: root
    user::rwx
    group::r-x
    other::r-x
    default:user::rwx
    default:group::r-x
    default:group:23501:rwx
    default:mask::rwx
    default:other::r-x

[root@hsm mnt]# setfacl -R -d -m group:ppp:rwx /mnt/lustre/hadoop3
setfacl: /mnt/lustre/hadoop3: Operation not permitted

Comment by Emoly Liu [ 07/Sep/17 ]

Thanks, I can see this issue now. I will investigate it.

Comment by Emoly Liu [ 07/Sep/17 ]

The following log

00000004:00000001:2.0:1504689607.481641:0:22011:0:(mdt_xattr.c:327:mdt_reint_setxattr()) Process leaving via out (rc=18446744073709551615 : -1 : 0xffffffffffffffff)

shows the error comes from the following code

int mdt_reint_setxattr() {
...
                rc = nodemap_map_acl(nodemap, rr->rr_eadata, xattr_len,
                                     NODEMAP_CLIENT_TO_FS);
                nodemap_putref(nodemap);
                if (rc < 0)
                        GOTO(out, rc);

                /* ACLs were mapped out, return an error so the user knows */
                if (rc != xattr_len)
                        GOTO(out, rc = -EPERM);
...
}

The debugging information shows rc(44) != xattr_len(52). I will see what's wrong here.

Comment by sebg-crd-pm (Inactive) [ 13/Sep/17 ]

Hi Emoly,

Do you have any update ?

Thanks!

Comment by Emoly Liu [ 13/Sep/17 ]

Hi sebg-crd-pm,
As you said, there are two issues:

  • wrong mapping gid(getfacl): according to my following debugging information, the second tree_type should be 0 (NODEMAP_FS_TO_CLIENT) instead. This should be fixed soon.
    Sep 13 15:03:02 centos7-2 kernel: id=1001, id_type=1, tree_type=1
    Sep 13 15:03:02 centos7-2 kernel: id=23501, id_type=1, tree_type=1
    
    
  • EPERM(setfacl): the diff 52-44=8 bytes is caused by the acl entry whose nm_squash_id is 99. I still need some time to investigate where this squash id(99) comes from and why it needs to be skipped, and if we skip this check, why it returns EINVAL instead from ldiskfs.

I will give a update later.

Comment by Emoly Liu [ 15/Sep/17 ]

When acl default unmapping code is added to mdt_getxattr(), EPERM issue disappears too.

The issue happened because after the first setfacl, a wrong default acl was cached in the client side, when running setfacl again, FS didn't know this wrong unmapped gid(23501), so treated it as a squash id, then this squash id entry(8 bytes) was skipped in nodemap_map_acl(). That's why we saw EPERM error.

I will submit a patch later.

Comment by Gerrit Updater [ 15/Sep/17 ]

Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/29010
Subject: LU-9929 nodemap: add unmapping process for default ACLs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0c7cc39d930d7ac7b2b0614d623103c5caddb2e6

Comment by Gerrit Updater [ 30/Sep/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29010/
Subject: LU-9929 nodemap: add default ACL unmapping handling
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 62fee20556a4c90361bd28edb903dc77c9540133

Comment by Peter Jones [ 30/Sep/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 09/Oct/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29369
Subject: LU-9929 nodemap: add default ACL unmapping handling
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 6b798de0455d056d77ac5570187d3f550be52210

Comment by Gerrit Updater [ 11/Oct/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/29369/
Subject: LU-9929 nodemap: add default ACL unmapping handling
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 2ee62fbbf14e055d0134eb0859999be394909f8f

Generated at Sat Feb 10 02:30:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.