[LU-12244] lfs check subcommand no longer works as non-root user Created: 29/Apr/19  Updated: 15/Jul/19  Resolved: 03/Jul/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Cameron Harr Assignee: Peter Jones
Resolution: Fixed Votes: 0
Labels: llnl
Environment:

Client/Server: lustre-2.10.6_2.chaos-1.ch6.x86_64
kernel: 3.10.0-957.5.1.3chaos.ch6.x86_64


Issue Links:
Related
is related to LU-11850 Relocating /proc/fs/lustre/ost to /sy... In Progress
is related to LU-8066 Move lustre procfs handling to sysfs ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

With the Lustre 2.10 client, non-root users appear to no longer be able to run "lfs check <servers|osts|mds>". Strace of the command shows a "Permission denied" error accessing /sys/kernel/debug/lustre/devices, resulting in the following error at runtime: error: check: mds status failed

Our Operations staff w/o root access needs "lfs check ..." functionality to monitor and fix file system issues, so fixing this issue would helpful for us.

statfs("/sys/kernel/debug/", {f_type=DEBUGFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
stat("/sys/fs/lnet/devices", 0x7fffffff8760) = -1 ENOENT (No such file or directory)
stat("/sys/fs/lustre/devices", 0x7fffffff8760) = -1 ENOENT (No such file or directory)
stat("/sys/kernel/debug/lnet/devices", 0x7fffffff8740) = -1 EACCES (Permission denied)
stat("/sys/kernel/debug/lustre/devices", 0x7fffffff8740) = -1 EACCES (Permission denied)
stat("/proc/fs/lnet/devices", 0x7fffffff8760) = -1 ENOENT (No such file or directory)
stat("/proc/fs/lustre/devices", 0x7fffffff8760) = -1 ENOENT (No such file or directory)
stat("/proc/sys/lnet/devices", 0x7fffffff8760) = -1 ENOENT (No such file or directory)
stat("/proc/sys/lustre/devices", 0x7fffffff8760) = -1 ENOENT (No such file or directory)
write(2, "error: check: mds status failed\n", 32error: check: mds status failed
) = 32
exit_group(2)                           = ?
+++ exited with 2 +++


 Comments   
Comment by James A Simmons [ 30/Apr/19 ]

We have fixes for that which landed to newer lustre versions. You need patch:

https://review.whamcloud.com/33799

Comment by Andreas Dilger [ 30/Apr/19 ]

The move of the /proc/fs/lustre/devices file from procfs to debugfs was done as part of patch https://review.whamcloud.com/23428 "LU-8066 obdclass: move lustre sysctl to sysfs" landed for 2.9.56, so it has been in all 2.10 releases.

I suspect the reason it is a problem now is that RedHat has backported a change from newer kernels to their kernel that makes debugfs root-only.

Comment by Andreas Dilger [ 30/Apr/19 ]

I've cherry-picked the patch to b2_10:

https://review.whamcloud.com/34782

Comment by Cameron Harr [ 30/Apr/19 ]

Thank you both. My search for related tickets missed LU-11850, which is very similar.

Sparing Redhat from some blame, we only recently started rolling out 2.10 clients (from 2.8) so this is only affecting us now.

Comment by James A Simmons [ 01/May/19 ]

LU-11850 only impacts 2.12 LTS users. 

Comment by Cameron Harr [ 01/May/19 ]

... And I had searched only on 2.10. Thanks.

Comment by Andreas Dilger [ 01/May/19 ]

Cameron, did you try out the patch for b2_10? Did it solve your problem? It only affects the userspace tools on the client, so you wouldn't need to upgrade all of the kernel modules or take an outage to install it.

Comment by Olaf Faaland [ 01/May/19 ]

Andreas, we haven't tried it, but we can do so today and post back.

Comment by Olaf Faaland [ 01/May/19 ]

Yep, that patch worked without any other patches or modification.

Comment by Olaf Faaland [ 01/May/19 ]

For our own recordkeeping, our local ticket: TOSS-4503

Comment by Peter Jones [ 03/Jun/19 ]

I believe that this issue is fixed in both 2.10.8 and 2.12.2 so the ticket can now be considered RESOLVED

Comment by Olaf Faaland [ 10/Jun/19 ]

Peter,
I do not see that this patch landed to 2.12.2 even though it was backported to 2.10.

[faaland1@hefe branch:2.10.6-llnl lustre-210] $git lg wcrev/b2_12 | grep LU-8066 | grep iterate
[faaland1@hefe branch:2.10.6-llnl lustre-210] $git lg wcrev/b2_10 | grep LU-8066 | grep iterate
* e55cd4f LU-8066 utils: have llapi_target_iterate use sysfs tree

https://review.whamcloud.com/#/c/34781/

Has a +2 but has not landed due to undiagnosed test failures, last activity May 8.

Comment by Peter Jones [ 10/Jun/19 ]

You are correct - sorry.

Comment by Peter Jones [ 01/Jul/19 ]

I think that once the current b2_12-next branch lands that this really will be complete.

Comment by Peter Jones [ 03/Jul/19 ]

Ok - now this is fixed on b2_12

Generated at Sat Feb 10 02:50:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.