[LU-9067] lctl dl command fails on el6 Created: 30/Jan/17 Updated: 01/Mar/17 Resolved: 01/Mar/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Bob Glossman (Inactive) | Assignee: | James A Simmons |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This problem has been seen on el6.8. The command 'lctl dl' fails. This appears to be due to the "devices" entry used by the command being missing. There isn't any lustre "devices" file anywhere in /proc or /sys. Don't know exactly why not. |
| Comments |
| Comment by Andreas Dilger [ 30/Jan/17 ] |
|
What version of Lustre is this? It must either be the upstream kernel client, or something from recent master, for it to be in /sys/*, but they are using an old (pre 2.8) version of lctl. That was added in patch http://review.whamcloud.com/17468 " |
| Comment by Bob Glossman (Inactive) [ 30/Jan/17 ] |
|
this is the current tip of master. tag 2.9.52 The lctl command is definitely looking in all the new locations, including /sys. |
| Comment by Andreas Dilger [ 30/Jan/17 ] |
|
Could be fallout from patch https://review.whamcloud.com/23427 "LU-8066 obd: Add debugfs root" since part of the description is:
|
| Comment by Bob Glossman (Inactive) [ 30/Jan/17 ] |
|
Andreas, |
| Comment by Bob Glossman (Inactive) [ 30/Jan/17 ] |
|
failure isn't only on prereleases Same symptom, no lustre "devices" file created anywhere in /sys or /proc. |
| Comment by Bob Glossman (Inactive) [ 30/Jan/17 ] |
|
fwiw on failing systems there is a /sys/kernel/debug dir, there isn't any /sys/kernel/debug/lustre dir. Don't know if that's a useful clue or not. |
| Comment by Dmitry Eremin (Inactive) [ 31/Jan/17 ] |
|
Bob, mount -t debugfs none /sys/kernel/debug You can add an equivalent /etc/fstab line to automatically mount it. |
| Comment by Bob Glossman (Inactive) [ 31/Jan/17 ] |
|
Dmitry, This begs some questions though: 2) if mount of debugfs in now a requirement for lustre to operate correctly how do we ensure that it is always done on all distros? |
| Comment by Andreas Dilger [ 31/Jan/17 ] |
|
I agree with Bob - if this changes the behavior out of the box then it will be problematic for users. Either we need to automatically mount debugfs in lctl if "dl" (or other commands that need debugfs content) is used, revert the patch moving this over to debugfs until the problem is fixed, or figure out some other way to handle this. I'm surprised that this didn't cause any test failures during landing, but I guess that means there is no test that checks the output of "lctl dl" (yet). In ancient days we used to get the "dl" content via ioctl(), but for large filesystem that caused problems due to the size of the output. |
| Comment by James A Simmons [ 31/Jan/17 ] |
|
Ugh. I would suggest that we call mount() in lctl.c to handle this. To many patches have already landed for this to be reverted. Let me patch it up. We might need to back port this to earlier lustre versions as well. |
| Comment by Joseph Gmitter (Inactive) [ 31/Jan/17 ] |
|
Hi James, Assigning to you per your commentary above. Thanks. |
| Comment by Gerrit Updater [ 31/Jan/17 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/25182 |
| Comment by Bob Glossman (Inactive) [ 31/Jan/17 ] |
|
found the answer to my question 1) above. For el6 the same thing could probably be done in an init.d file for debugfs |
| Comment by Bob Glossman (Inactive) [ 31/Jan/17 ] |
|
James, |
| Comment by Bob Glossman (Inactive) [ 31/Jan/17 ] |
|
I note that on el6 mount for /proc is accomplished with a line in /etc/fstab. |
| Comment by James A Simmons [ 31/Jan/17 ] |
|
The init.d solution only handles node bring up. What happens if for some reason debugfs is umounted long after the node has been up? That is why I did my approach. |
| Comment by Bob Glossman (Inactive) [ 31/Jan/17 ] |
|
afaik nothing prevents a root privileged user from unmounting /proc either. I just prefer an approach that is less intrusive than doing something extra on nearly every lfs or lctl invocation. I'm thinking of maybe altering /etc/fstab at installation time, and doing it only on el6 and only if not already done. If a debugfs line is added then also do a 'mount debugfs' right then. It then is mounted at boot time ever after. No impact on lustre libs or utils. Just a suggestion. |
| Comment by James A Simmons [ 01/Feb/17 ] |
|
True an admin can umount /proc either. Okay lets go with the special script at startup. I can test it on a RHEL6 client for you. |
| Comment by James A Simmons [ 02/Feb/17 ] |
|
Bob have you come up with a boot script yet? |
| Comment by Bob Glossman (Inactive) [ 02/Feb/17 ] |
|
James, My thinking is no boot script is needed. Once /etc/fstab is modified debugfs would always get mounted at boot time by current existing scripts. |
| Comment by James A Simmons [ 02/Feb/17 ] |
|
Okay. I misunderstood. I updated the lnet init.d startup script for RHEL6.8 to mount debugfs. Can you try the last version of my previous patch. |
| Comment by Bob Glossman (Inactive) [ 02/Feb/17 ] |
|
James, 1) not everybody uses the lnet script to startup lustre # service lnet start mount: none already mounted or /sys/kernel/debug busy mount: according to mtab, debugfs is already mounted on /sys/kernel/debug LNET configured |
| Comment by James A Simmons [ 02/Feb/17 ] |
|
Does anyone use any of those scripts? Its just we can not replace fstab when installing lustre. fstab can be very site specific. Somewhere some how debugfs has to be mounted. Suggestions besides the whole idea of nuking a sites fstab file? Also we need debugfs available in the case of routers which only will have lnet installed. Perhaps my libcfs code is the best option. |
| Comment by Bob Glossman (Inactive) [ 02/Feb/17 ] |
|
I favor the idea of editing /etc/fstab on the fly at install time, adding a line for debugfs. Do it only on el6, do it only if such a line is not already there. This preserves any local site fstab edits or changes. Just not sure how to accomplish that. I don't favor complete replacement of /etc/fstab ever. |
| Comment by James A Simmons [ 02/Feb/17 ] |
|
So most people don't use the provided startup scripts that come with lustre? |
| Comment by Bob Glossman (Inactive) [ 02/Feb/17 ] |
|
I believe actual practice in real installations varies quite a bit. I have seen many sites that don't use them at all. Personally I don't in my own test setups. I think they were initially done as examples, not as required must use features. They have existed for a long time. afaik, the most common use of the lnet startup script is on routers. It's an easy way to get the needed kernel modules loaded reliably at boot time. On routers there are typically no mount or other lustre activities that would get modules loaded otherwise. |
| Comment by James A Simmons [ 02/Feb/17 ] |
|
Looking at our own systems we manage fstab with puppet so any changes at install time will be stomped on soon after. I don't think modifying fstab is going to work. If people don't use the startup script then we are going to have to go with the libcfs library mounting debugfs for us. We just need to do it one time. |
| Comment by Bob Glossman (Inactive) [ 02/Feb/17 ] |
|
not sure what " manage fstab with puppet" means. if you have external methods to change and maintain fstab, how do you do other site specific changes, for example adding nfs client mounts? maybe in such cases a debugfs mount can be added by an admin. |
| Comment by James A Simmons [ 03/Feb/17 ] |
|
Does SLES11 have this issue also? I see my Cray system its mounted but I wonder in general. I have an idea!!! What about calling sys_mount when the libcfs modules loads? We can make it conditional only for RHEL6 and that way it only would happen at module load. Does that sound reasonable? |
| Comment by Bob Glossman (Inactive) [ 03/Feb/17 ] |
|
not an issue on SLES11 or SLES12. debugfs mounted there. el6 is the only context I can find where it's not mounted by default. In sles11 it's mounted via fstab. |
| Comment by Bob Glossman (Inactive) [ 03/Feb/17 ] |
|
James,
I see nothing that restricts it to el6 or happening only at module load time, as mentioned in your comment (above). |
| Comment by James A Simmons [ 03/Feb/17 ] |
|
Newer distros symlink mtab to /proc/mounts but that is not the case for RHEL6. Luckly their is a function to add entries to mtab. |
| Comment by Bob Glossman (Inactive) [ 03/Feb/17 ] |
|
yes, el6 is old school. Still maintains mtab as a real, separate file. Not linked to /proc/mounts. All distros used to be that way. |
| Comment by James A Simmons [ 06/Feb/17 ] |
|
I updated the patch so if mtab is a real file it will add a debugfs entry. |
| Comment by Bob Glossman (Inactive) [ 27/Feb/17 ] |
|
more on master: I think this problem is blocking all el6 tests on master atm. |
| Comment by James A Simmons [ 27/Feb/17 ] |
|
Should be landing very soon |
| Comment by Gerrit Updater [ 01/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25182/ |
| Comment by James A Simmons [ 01/Mar/17 ] |
|
Patch has landed |