[LU-16054] changes in lu-14797 create issues with older clients accessing fs though nodemaps Created: 28/Jul/22 Updated: 29/Sep/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Beal | Assignee: | Sebastien Buisson |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
We have a client accessing a file system via a nodemap.
ubuntu@lus2526-tcp1:~$ cat /sys/fs/lustre/version
2.14.0_2_gb280f22
ubuntu@lus2526-tcp1:~$ df /lustre/scratch12{5..6}
Filesystem 1K-blocks Used Available Use% Mounted on
10.160.40.37@tcp1:10.160.40.36@tcp1:/lus25 5161226796192 21176 5109139362568 1% /lustre/scratch125
10.160.42.37@tcp1:10.160.42.36@tcp1:/lus26 4301022330160 17604 4257616135516 1% /lustre/scratch126
ubuntu@lus2526-tcp1:~$ id
uid=1000(ubuntu) gid=1000(ubuntu) groups=1000(ubuntu),4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),117(netdev),118(lxd)
ubuntu@lus2526-tcp1:~$ echo > /lustre/scratch125/$(uname -n) ; echo > /lustre/scratch126/$(uname -n)
ubuntu@lus2526-tcp1:~$ ls -l /lustre/scratch12*/*
-rw-rw-r-- 1 ubuntu ubuntu 1 Jul 28 07:45 /lustre/scratch125/lus2526-tcp1
-rw-rw-r-- 1 ubuntu ubuntu 1 Jul 28 07:45 /lustre/scratch126/lus2526-tcp1
The external default client sees: jb23@gen3-os0000011:~$ id jb23 uid=12296(jb23) gid=1105(team94) groups=1105(team94),15283(sag-secure-fileshare),1415(ssg-confluence),1400(www-pagesmith),15016(isg-dcim),4999(jb23test),15141(sag-mfa),1533(ssg-isg),1490(docker),706(ssg),15404(dir-admins),15456(IDS),15264(sag-mso365-apps) jb23@gen3-os0000011:~$ find /lustre/scratch12[56]/admin/team94/jb23/tcp* -type f -ls 144115339507007490 4 -rw-rw-r-- 1 99 acedbdoc 1 Jul 28 08:45 /lustre/scratch125/admin/team94/jb23/tcp1/lus2526-tcp1 144115406599094274 4 -rw-rw-r-- 1 99 acedbdoc 1 Jul 28 08:45 /lustre/scratch126/admin/team94/jb23/tcp1/lus2526-tcp1 jb23@gen3-os0000011:~$ ls -l /lustre/scratch12*/admin/team94/jb23/tcp1/lus2526-tcp1 -rw-rw-r-- 1 99 acedbdoc 1 Jul 28 08:45 /lustre/scratch125/admin/team94/jb23/tcp1/lus2526-tcp1 -rw-rw-r-- 1 99 acedbdoc 1 Jul 28 08:45 /lustre/scratch126/admin/team94/jb23/tcp1/lus2526-tcp1 jb23@gen3-os0000011:~$ getent group acedbdoc acedbdoc:*:99:image jb23@gen3-os0000011:~$ ls -l .bashrc -rw-r--r-- 1 jb23 team94 1130 May 26 12:47 .bashrc
The nodemap is configured [root@lus25-mds1 ~]# cat /sys/fs/lustre/version 2.12.6_ddn66 [root@lus25-mds1 lustre_casm16]# for i in * > do > echo $i $(cat $i ) > done admin_nodemap 0 audit_mode 1 deny_unknown 0 exports [ { nid: 10.177.127.35@tcp1, uuid: 720c5dc0-efbf-4d55-9340-8a2bde26d039 }, ] fileset /admin/team94/jb23/tcp1 id 1 idmap [ { idtype: uid, client_id: 1000, fs_id: 12296 }, { idtype: gid, client_id: 1000, fs_id: 1105 } ] map_mode all ranges [ { id: 1, start_nid: 10.177.126.0@tcp1, end_nid: 10.177.127.255@tcp1 } ] sepol squash_gid 1105 squash_projid squash_uid 12296 trusted_nodemap 0 |
| Comments |
| Comment by James Beal [ 28/Jul/22 ] |
|
root@gen3-os0000011:~# ls -l /lustre/scratch125/admin/team94/
root@gen3-os0000011:~# ls -ld /lustre/scratch125/admin/team94/jb23/lfs_test
drwxrwxrwx 2 99 acedbdoc 4096 Jul 27 15:30 /lustre/scratch125/admin/team94/jb23/lfs_test
root@gen3-os0000011:~# chown jb23 /lustre/scratch125/admin/team94/jb23/lfs_test
chown: changing ownership of '/lustre/scratch125/admin/team94/jb23/lfs_test': Operation not permitted
root@gen3-os0000011:~#
root@gen3-os0000011:~# lctl list_nids 172.27.71.121@tcp the default nodemap appears to have been changed, this may be unrelated however its not something I have done delibrately. And are other system we are commissioning has the same issue. [root@lus25-mds1 ost-survey]# lctl nodemap_test_nid 172.27.71.121@tcp default [root@lus25-mds1 exports]# lctl get_param -R 'nodemap.default' nodemap.default.admin_nodemap=0 nodemap.default.audit_mode=1 nodemap.default.exports= [ { nid: 10.177.161.188@tcp5, uuid: 12ebc53b-9e45-427d-8e30-5a2439569728 }, { nid: 172.27.71.121@tcp, uuid: a64511ef-8ced-4702-892e-50a74c631d98 }, { nid: 10.160.40.12@tcp, uuid: lus25-MDT0000-lwp-OST0008_UUID }, { nid: 10.160.40.12@tcp, uuid: lus25-MDT0000-lwp-OST0009_UUID }, { nid: 10.160.40.13@tcp, uuid: lus25-MDT0000-lwp-OST000a_UUID }, { nid: 10.160.40.10@tcp, uuid: lus25-MDT0000-lwp-OST0005_UUID }, { nid: 10.160.40.11@tcp, uuid: lus25-MDT0000-lwp-OST0006_UUID }, { nid: 10.160.40.9@tcp, uuid: lus25-MDT0000-lwp-OST0002_UUID }, { nid: 10.160.40.8@tcp, uuid: lus25-MDT0000-lwp-OST0001_UUID }, { nid: 10.160.40.9@tcp, uuid: lus25-MDT0000-lwp-OST0003_UUID }, { nid: 10.160.40.8@tcp, uuid: lus25-MDT0000-lwp-OST0000_UUID }, { nid: 10.160.40.5@tcp, uuid: lus25-MDT0000-lwp-MDT0001_UUID }, { nid: 10.160.40.5@tcp, uuid: lus25-MDT0001-mdtlov_UUID }, { nid: 0@lo, uuid: lus25-MDT0000-lwp-MDT0000_UUID }, { nid: 10.160.40.6@tcp, uuid: lus25-MDT0000-lwp-MDT0002_UUID }, { nid: 10.160.40.6@tcp, uuid: lus25-MDT0002-mdtlov_UUID }, { nid: 10.160.40.7@tcp, uuid: lus25-MDT0000-lwp-MDT0003_UUID }, { nid: 10.160.40.7@tcp, uuid: lus25-MDT0003-mdtlov_UUID }, { nid: 10.160.40.13@tcp, uuid: lus25-MDT0000-lwp-OST000b_UUID }, { nid: 10.160.40.10@tcp, uuid: lus25-MDT0000-lwp-OST0004_UUID }, { nid: 10.160.40.11@tcp, uuid: lus25-MDT0000-lwp-OST0007_UUID }, ] nodemap.default.fileset=nodemap.default.id=0 nodemap.default.squash_gid=99 nodemap.default.squash_projid=99 nodemap.default.squash_uid=99 nodemap.default.trusted_nodemap=0 |
| Comment by James Beal [ 28/Jul/22 ] |
|
I have attached the full nodemap information form lus25 as an attachment. For comparision with another system we have.
[root@lus24-mds1 ~]# lctl get_param -R 'nodemap.default' nodemap.default.admin_nodemap=0 nodemap.default.audit_mode=1 nodemap.default.exports=[ ] nodemap.default.fileset=/null nodemap.default.id=0 nodemap.default.squash_gid=65534 nodemap.default.squash_uid=65534 nodemap.default.trusted_nodemap=0
I am wondering if the issue is not with writes being poorly translated via nodemaps, but that the default nodemap being damaged somehow and therefore it appears to have all files owned by user 99 as I am accessing via the default nodemap which has uid squash turned on. |
| Comment by Sebastien Buisson [ 28/Jul/22 ] |
|
Indeed, for lus25 you have admin and trusted properties to 0, so all accesses from nodes considered as part of this 'default' nodemap will be squashed to the ID 99 that you defined. |
| Comment by Sebastien Buisson [ 28/Jul/22 ] |
|
I think the first thing to do would be the cleanup the situation regarding
|
| Comment by James Beal [ 28/Jul/22 ] |
|
Do all clients including those using the default nodemap need those patches ? Are the patches in community release 2.15.0 ? |
| Comment by Sebastien Buisson [ 28/Jul/22 ] |
Yes, not matter which nodemap they are in, they can hit the problem.
The patches are included in 2.15.0 (as well as EXA 5.2.5 and EXA 6.1). |
| Comment by James Beal [ 28/Jul/22 ] |
|
comparing lus24 which works how we would expect with no mapping and lus25. I can see the squashed uid and gid are different but the admin trusted are the same ? [root@lus24-mds1 ~]# lctl nodemap_test_nid 10.10.10.1@tcp0 Native
[root@lus25-mds1 exports]# lctl nodemap_test_nid 10.10.10.1@tcp
default
I think this an important difference. |
| Comment by Sebastien Buisson [ 28/Jul/22 ] |
|
I can see no mapping for node 10.10.10.1@tcp in your definitions of lus25. Is that intended to have a different behavior than with lus24, which maps it to the 'Native' nodemap? |
| Comment by James Beal [ 28/Jul/22 ] |
|
What I am saying is that the two systems are different and the behaviour that is useful is the one lus24has.
( How do we say all@tcp0 is native ) |
| Comment by Sebastien Buisson [ 02/Aug/22 ] |
The client part is actually optional to have |
| Comment by Sebastien Buisson [ 02/Aug/22 ] |
What I meant is |
| Comment by James Beal [ 26/Sep/23 ] |
|
I have just ran into this again on a newly reinstalled system...
Server is [root@lus22-mds1 secure-lustre]# cat /sys/fs/lustre/version 2.12.9_ddn8 Client is ubuntu@lus2526-tcp15:~$ cat /sys/fs/lustre/version 2.15.3 I see
[root@lus22-mds1 secure-lustre]# lctl nodemap_test_nid 10.10.10.1@tcp
default
|
| Comment by Peter Jones [ 28/Sep/23 ] |
|
James This is puzzling. I understand that you've opened a support ticket to track this. Sébastien will need to review the logs supplied there and get back to you Peter |
| Comment by James Beal [ 29/Sep/23 ] |
|
Support pointed out " He has spotted one issue with the nodemap onfiguration though. there is no privileged nodemap, as explained in the Lustre Operations Manual: For proper operations, the Lustre file system requires to have a privileged group that Hopefully next time we make that mistake this comment will remind me.
[root@lus22-mds1 secure-lustre]# lctl nodemap_add Native
[root@lus22-mds1 secure-lustre]# lctl nodemap_add_range --name Native --range "[0-255].[0-255].[0-255].[0-255]@tcp"
[root@lus22-mds1 secure-lustre]# lctl nodemap_modify --name Native --property admin --value 1
[root@lus22-mds1 secure-lustre]# lctl nodemap_modify --name Native --property trusted --value 1
Thank you and sorry for the confusion. |