[LU-12529] How to remove bad config values from MGS logs Created: 09/Jul/19 Updated: 07/Aug/19 Resolved: 07/Aug/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Mahmoud Hanafi | Assignee: | Patrick Farrell (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
lustre 2.10. |
||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
We have some bad config vlaues. Jul 9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'lov.nbp7-MDT0000-mdtlov.stripecount=1'' failed with exit code 2. Jul 9 16:13:23 testpbs systemd-udevd[98110]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST0026-osc-MDT0000.active=0'' failed with exit code 2. Jul 9 16:13:23 testpbs systemd-udevd[98110]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST002a-osc-MDT0000.active=0'' failed with exit code 2. Jul 9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST0046-osc-MDT0000.active=0'' failed with exit code 2. Jul 9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST003e-osc-MDT0000.active=0'' failed with exit code 2. Jul 9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST003a-osc-MDT0000.active=0'' failed with exit code 2. Jul 9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST002e-osc-MDT0000.active=0'' failed with exit code 2. Jul 9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST0032-osc-MDT0000.active=0'' failed with exit code 2. Jul 9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST004e-osc-MDT0000.active=0'' failed with exit code 2. Jul 9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST0042-osc-MDT0000.active=0'' failed with exit code 2. Jul 9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST0052-osc-MDT0000.active=0'' failed with exit code 2. Jul 9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST004a-osc-MDT0000.active=0'' failed with exit code 2. Jul 9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST0036-osc-MDT0000.active=0'' failed with exit code 2. How do we remove these I tried like this. |
| Comments |
| Comment by Patrick Farrell (Inactive) [ 10/Jul/19 ] |
|
Mahmoud, Did you run that on the MGS? If not, you must. Additionally, I think you probably need something like: lctl set_param -P -d osc.*.active As you probably didn't specify each OST individually when setting this up? |
| Comment by Mahmoud Hanafi [ 10/Jul/19 ] |
|
That doesn't work. Looks like there the log entry for one nbp7-mds1 ~ # llog_reader /tmp/params | grep osc.nbp7-OST0026-osc-MDT0000.active #68 (224)marker 2 (flags=0x01, v2.7.3.0) general 'osc.nbp7-OST0026-osc-MDT0000.active' Tue Jul 10 08:28:03 2018- #69 (128)set_param 0:general 1:osc.nbp7-OST0026-osc-MDT0000.active=0 2:lctl #70 (224)END marker 2 (flags=0x02, v2.7.3.0) general 'osc.nbp7-OST0026-osc-MDT0000.active' Tue Jul 10 08:28:03 2018- #274 (224)SKIP START marker 5 (flags=0x05, v2.10.6.0) nbp7-OST-OST0026 'osc.nbp7-OST0026-osc-MDT0000.active' Tue Jul 9 15:48:34 2019-Tue Jul 9 16:02:23 2019 #275 (144)SKIP set_param 0:nbp7-OST-OST0026 1:osc.nbp7-OST0026-osc-MDT0000.active=0= 2:lctl #276 (224)SKIP END marker 5 (flags=0x06, v2.10.6.0) nbp7-OST-OST0026 'osc.nbp7-OST0026-osc-MDT0000.active' Tue Jul 9 15:48:34 2019-Tue Jul 9 16:02:23 2019 Running this has no effect. lctl set_param -P -d osc.nbp7-OST0026-osc-MDT0000.active |
| Comment by Patrick Farrell (Inactive) [ 11/Jul/19 ] |
|
Can you collect debug logs for this? DEBUGMB=`lctl get_param -n debug_mb` lctl set_param *debug=-1 debug_mb=10000 lctl clear lctl mark "before" lctl set_param -P -d osc.nbp7-OST0026-osc-MDT0000.active #Write out the log lctl dk > /tmp/log#Set debug back to defaults lctl set_param debug="super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck" lctl set_param debug_mb=$DEBUGMB And then also, separately, can you please collect an strace of the command? strace -v lctl set_param -P -d osc.nbp7-OST0026-osc-MDT0000.active |
| Comment by Andreas Dilger [ 11/Jul/19 ] |
|
You could also run on the MGS "lctl --device MGS llog_print nbp7-client" or "... params" to dump the client config llogs while the system is mounted instead of the "params" log. You can use "lctl --device MGS llog_cancel params 69" to cancel that record number. |
| Comment by Patrick Farrell (Inactive) [ 11/Jul/19 ] |
|
Mahmoud, Even if Andreas' suggestion works (you should definitely try it), could you please gather those logs I asked for? They may be helpful in identifying a problem or problems. |
| Comment by Mahmoud Hanafi [ 11/Jul/19 ] |
|
On nbp7 I already ran Andreas command and it work. But we have 2 other filesystem that also have bad records. On those I get
nbp2-mds ~ # lctl --device MGS llog_print params
OBD_IOC_LLOG_PRINT failed: Inappropriate ioctl for device
They both are running server versions.
[3115652.139607] LustreError: 35937:0:(llog_ioctl.c:254:llog_print_cb()) not enough space for print log records
|
| Comment by Patrick Farrell (Inactive) [ 11/Jul/19 ] |
|
What do you mean by "server versions"? These commands need to be run on the MGS. |
| Comment by Mahmoud Hanafi [ 11/Jul/19 ] |
|
They are all running the same luster version. nbp7-mds1 ~ # modinfo lustre filename: /lib/modules/3.10.0-693.21.1.el7.20180508.x86_64.lustre2106/extra/lustre/fs/lustre.ko license: GPL version: 2.10.6 description: Lustre Client File System author: OpenSFS, Inc. <http://www.lustre.org/> retpoline: Y rhelversion: 7.4 srcversion: E459483EA54C83D0585ECA3 depends: obdclass,ptlrpc,libcfs,lnet,lmv,mdc,lov vermagic: 3.10.0-693.21.1.el7.20180508.x86_64.lustre2106 SMP mod_unload modversions nbp2-mds ~ # modinfo lustre filename: /lib/modules/3.10.0-693.21.1.el7.20180508.x86_64.lustre2106/extra/lustre/fs/lustre.ko license: GPL version: 2.10.6 description: Lustre Client File System author: OpenSFS, Inc. <http://www.lustre.org/> retpoline: Y rhelversion: 7.4 srcversion: E459483EA54C83D0585ECA3 depends: obdclass,ptlrpc,libcfs,lnet,lmv,mdc,lov vermagic: 3.10.0-693.21.1.el7.20180508.x86_64.lustre2106 SMP mod_unload modversions |
| Comment by Andreas Dilger [ 12/Jul/19 ] |
|
mhanafi the bad config records may appear in either the "params" log or in the "fsname-client" config log. The name of the logfile passed to "lctl llog_cancel" needs to match. If "llog_print" complains about the log file being too large, you can grab the lctl binary from 2.10.7 or later, or apply the patch https://review.whamcloud.com/3381 " |
| Comment by Mahmoud Hanafi [ 13/Jul/19 ] |
|
What about the Inappropriate ioctl error.
nbp2-mds ~ # lctl --device MGS llog_print params
OBD_IOC_LLOG_PRINT failed: Inappropriate ioctl for device
brk(0x67d000) = 0x67d000 brk(NULL) = 0x67d000 open("/dev/obd", O_RDWR) = 3 ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x66, 0x7f, 0x08), 0x7fffffffc9f0) = 0 ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x66, 0xc0, 0x08), 0x7fffffffc9f0) = -1 ENOTTY (Inappropriate ioctl for device) write(2, "OBD_IOC_LLOG_PRINT failed: Inapp"..., 58OBD_IOC_LLOG_PRINT failed: Inappropriate ioctl for device ) = 58 rt_sigaction(SIGINT, {0x405ef0, ~[RTMIN RT_1], SA_RESTORER|SA_RESTART, 0x7fffed6d65d0}, NULL, 8) = 0 exit_group(1) = ? +++ exited with 1 +++ |
| Comment by Andreas Dilger [ 18/Jul/19 ] |
I tested this out locally on my 2.10 system and got the same error. I thought that llog_print was working for the params file, and there is even a regression test for this (conf-sanity.sh test_123ab(), so it isn't clear why this isn't working. |
| Comment by Andreas Dilger [ 18/Jul/19 ] |
|
It looks like the MGS needs patch https://review.whamcloud.com/34250: LU-4939 obdclass: llog_print params file
Allow llog_print to handle the params file in yaml
which was landed to b2_10 as v2_10_6-45-gfb77f09ac8, so it would be included into 2.10.7 and later. My home/test server is running only 2.10.5. |
| Comment by Andreas Dilger [ 07/Aug/19 ] |
|
The various issues in this ticket have been resolved, or are fixed in a newer release. |