[LU-11566] sanity test_60aa: llog_print_cb()) not enough space for print log records Created: 24/Oct/18 Updated: 01/Mar/22 Resolved: 09/Sep/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.10.6 |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.3 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Ben Evans (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/473e7234-d712-11e8-ad90-52540065bddc test_60aa produced the following stack trace: Lustre: DEBUG MARKER: /usr/sbin/lctl --device %MGS llog_print \$lustre-client ------------[ cut here ]------------ WARNING: CPU: 1 PID: 5457 at lib/vsprintf.c:1741 vsnprintf+0x691/0x6a0 CPU: 1 PID: 5457 Comm: llog_process_th Kernel: 3.10.0-862.14.4.el7_lustre.x86_64 #1 Call Trace: dump_stack+0x19/0x1b warn_slowpath_null+0x1d/0x20 vsnprintf+0x691/0x6a0 snprintf+0x49/0x70 class_config_yaml_output+0x22a/0x430 [obdclass] llog_print_cb+0x415/0x4f0 [obdclass] llog_process_thread+0x892/0x15a0 [obdclass] llog_process_thread_daemonize+0x9f/0xe0 [obdclass] kthread+0xd1/0xe0 ---[ end trace a7e0036714283adf ]--- LustreError: 5457:0:(llog_ioctl.c:264:llog_print_cb()) not enough space for print log records It looks like this is a problem in how class_config_yaml_output() is written: �������� if (LUSTRE_CFG_BUFLEN(lcfg, 0) > 0) ����������������ptr += snprintf(ptr, end - ptr, ", device: %s", ��������������������������������lustre_cfg_string(lcfg, 0)); If ptr ever exceeds end then a negative value is passed to snprintf(), and triggers this WARN_ONCE(). This can happen if the message is too long, because snprintf() will return the total length that would have been printed, not the actual length, so it keeps growing. Two fixes are needed here:
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Andreas Dilger [ 24/Oct/18 ] |
|
This looks a bit worse than I thought. It seems that the current mechanism for printing the config logs is to print the whole log into a single buffer in the kernel. This means any config with a lot of records is going to overflow the 8192-byte buffer (I thought it was holding at most a single line of output). It seems there are a couple of options here:
Probably the third option is best. This would involve creating a debugfs entry for each config record that generates the YAML-formatted output, then have llog_print access that file (for compatibility, users could also then use "lctl get_param mgs.param.<config>" or similar. Note that the MGS should only generate debugfs entries for well-known config files, like param, <fsname>-MDTxxxx, <fsname>-OSTxxxx}, and whatever the sptlrpc file is named, and not any cruft in the CONFIGS directory such as backups or other files in that directory. |
| Comment by Andreas Dilger [ 25/Oct/18 ] |
|
Ben, it looks like this will affect your patch to backup/restore the config logs. If the config log is large, then the supplied buffer will not be large enough to hold all of the config records. As a short term solution, it might be enough to increase MAX_IOC_BUFLEN to e.g. 16KB, but I don't think that is a long-term solution. See my previous comment about using debugfs for the longer-term solution. I think that would be more suitable for 2.13, since it will involve some non-trivial changes to this code. |
| Comment by Gerrit Updater [ 30/Oct/18 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33511 |
| Comment by James A Simmons [ 30/Oct/18 ] |
|
I do like the idea of using debugfs but remember its limited to root only. Will this matter to admins or for logging tools? In case we do need non root, but still privileged access, we can use the generic netlink API. For the generic netlink case we can send up to 32K size packets and each packet could contain a seq number so the packets can be pieced together. Just another option for the 2.13 window. |
| Comment by Andreas Dilger [ 31/Oct/18 ] |
|
We already have an ioctl, I don't think netlink is an improvement for this case. For llog_print this should really be a root-only operation anyway, so debugfs is OK. |
| Comment by Andreas Dilger [ 20/Nov/18 ] |
|
Ben, is there any chance you could look into this? I think that without some kind of fix here, your (IMHO very useful) |
| Comment by Ben Evans (Inactive) [ 20/Nov/18 ] |
|
Yep, it's on my list of things to do. I think the short-term solution is to read each file in chunks. Long term, debugfs should work fine. |
| Comment by Andreas Dilger [ 20/Nov/18 ] |
|
If you can work up a mechanism to read the file in chunks, that is fine too. Probably the easiest would be to print a single record per call, so that the buffer is always large enough. A bit more complex would be to pack multiple whole records into the buffer, and stop when the last one doesn't fit. Splitting a single record across multiple calls would be hard to handle in both the kernel and userspace. Strangely, I didn't know that llog_print allows specifying the start and end record, so that makes the solution relatively easy to implement. Just specify a series of start/end pairs internally until there are no more records. It would be relatively safe to assume e.g. 256 bytes/record and then plan accordingly. Note that there is a minor bug in jt_llog_print() in that it doesn't allow specifying only the starting record (i.e. argc == 3). There is also a small bug in the lctl-llog_print.8 man page, in that it should show "lctl llog_print <logname> [start_index [end_index]]" (i.e. "end_index" is only allowed if "start_index" is specified, instead of implying that "end_index" can be used independently). It would also be useful if llog_print checked for named arguments like --start and --end (or similar) instead of depending on positional parameters, since that makes parsing more complex as new options are added. |
| Comment by Gerrit Updater [ 10/Dec/18 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33815 |
| Comment by Gerrit Updater [ 14/Dec/18 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33863 |
| Comment by Gerrit Updater [ 10/Jan/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34004 |
| Comment by Gerrit Updater [ 30/Jan/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33815/ |
| Comment by Gerrit Updater [ 23/Feb/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33863/ |
| Comment by Gerrit Updater [ 28/Mar/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34523 |
| Comment by Gerrit Updater [ 08/Apr/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34004/ |
| Comment by Gerrit Updater [ 10/May/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34850 |
| Comment by Gerrit Updater [ 08/Jun/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34850/ |
| Comment by Patrick Farrell (Inactive) [ 09/Sep/19 ] |
|
Main patches landed, moved remaining cleanup to LU-12737. |
| Comment by Gerrit Updater [ 11/Feb/22 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/46510 |