Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.10.3
-
None
-
CentOS 7.4
-
9223372036854775807
Description
Hi!
On our Oak storage system, which is a global storage system with limited access, we use the nodemap feature for GIDs, and UIDs are only a subset of the ones available on the different client clusters. A recent compatibility issue with Singularity (https://github.com/singularityware/singularity/issues/1313 if you're interested in the whole story) led to the discovery of stat() failing on the client /oak mount point from time to time. It wasn't a problem so far until we hit this issue with Singularity. Correct me if I'm wrong, but the issue is that the MDT will refuse to answer any rpc from an unknown UID, leading to stat() on /oak returning EPERM. This leads to things like this for unknown UIDs:
[user@sh-104-49 ~]$ ls -l / ls: cannot access /oak: Permission denied total 44 ... d?????????? ? ? ? ? ? oak ...
I said from time to time, because IF a user with Oak access did previously run Singularity on this compute node, thus (I believe) populating the client inode cache, stat() would then work even for unknown users. As a non-reproductible issue, it has been painful to troubleshoot.
Anyway, we recently fixed the issue by forking l_getidentity.c to allow unknown UIDs to query the MDT so that stat() on the mount point '/oak' doesn't fail:
diff --git a/lustre/utils/l_getidentity.c b/lustre/utils/l_getidentity.c index 6aca6dc..72896a8 100644 --- a/lustre/utils/l_getidentity.c +++ b/lustre/utils/l_getidentity.c @@ -111,9 +111,11 @@ int get_groups_local(struct identity_downcall_data *data, pw = getpwuid(data->idd_uid); if (!pw) { - errlog("no such user %u\n", data->idd_uid); - data->idd_err = errno ? errno : EIDRM; - return -1; + /* Stanford limited client trust: all uid are mapped with primary group 37 */ + errlog("warning: no secondary groups for unknown user %u\n", data->idd_uid); + data->idd_gid = 37; + data->idd_ngroups = 0; + return 0; } data->idd_gid = pw->pw_gid;
Because all access control is done using UID and secondary GIDs, we should be good. Now stat() does work on every host mount points, making Singularity happy to run with autofs.
So I wanted to raise the issue here to know what you think about this issue? Maybe Lustre filesystems should allow the stat rpc from unknown users on its root directory? Or would it make sense to add this kind of limited UID trust to l_getidentity?
Thanks!
Stephane
Hi John - Thanks for your reply, that's useful.
Note that we're not using nodemap for UIDs, only GIDs (we use gid_only), so that won't work I think. What we would need is some kind of default squash_uid.
I agree too with the statement "If you can see it then you should be able to stat it."
Thanks,
Stephane