Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.12.0, Lustre 2.10.6
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/473e7234-d712-11e8-ad90-52540065bddc
test_60aa produced the following stack trace:
Lustre: DEBUG MARKER: /usr/sbin/lctl --device %MGS llog_print \$lustre-client ------------[ cut here ]------------ WARNING: CPU: 1 PID: 5457 at lib/vsprintf.c:1741 vsnprintf+0x691/0x6a0 CPU: 1 PID: 5457 Comm: llog_process_th Kernel: 3.10.0-862.14.4.el7_lustre.x86_64 #1 Call Trace: dump_stack+0x19/0x1b warn_slowpath_null+0x1d/0x20 vsnprintf+0x691/0x6a0 snprintf+0x49/0x70 class_config_yaml_output+0x22a/0x430 [obdclass] llog_print_cb+0x415/0x4f0 [obdclass] llog_process_thread+0x892/0x15a0 [obdclass] llog_process_thread_daemonize+0x9f/0xe0 [obdclass] kthread+0xd1/0xe0 ---[ end trace a7e0036714283adf ]--- LustreError: 5457:0:(llog_ioctl.c:264:llog_print_cb()) not enough space for print log records
It looks like this is a problem in how class_config_yaml_output() is written:
�������� if (LUSTRE_CFG_BUFLEN(lcfg, 0) > 0) ����������������ptr += snprintf(ptr, end - ptr, ", device: %s", ��������������������������������lustre_cfg_string(lcfg, 0));
If ptr ever exceeds end then a negative value is passed to snprintf(), and triggers this WARN_ONCE(). This can happen if the message is too long, because snprintf() will return the total length that would have been printed, not the actual length, so it keeps growing.
Two fixes are needed here:
- increase the buffer size so we don't overflow it, and userspace gets the records it wants
- fix the code to stop printing, or compute the remaining buffer space differently so it doesn't become negative
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_60b - CDEBUG_LIMIT not limiting messages (103)