[LU-16800] Improve Lustre API error reporting Created: 05/May/23  Updated: 16/Jul/23

Status: In Progress
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Alexandre Ioffe Assignee: Alexandre Ioffe
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Blocker
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
  • Add rate control of info and error messages
  • Add ability to print file name/function name/line number of where from the message is sourced from
  • Eliminate the split of one message on a number printfs(). Such split may cause one error message be separated on multiple distanced lines in the log in multithread application
  • Fix this code:
    lustre\utils\liblustreapi.c : error_callback_default()
	/*
		 * Remove trailing linefeed so error string can be appended.
		 * @fmt is a const string, so we can't modify it directly.
		 */
		if (has_nl && (newfmt = strdup(fmt)))
			*strrchr(newfmt, '\n') = '\0';

This code trims end of line if the format has more than one '\n'

 

We can use level parameter to pass file/function/line format specifiers.

Need to review all cases of level parameter use. Review this code:

lustre\utils\liblustreapi_hsm.c :  llapi_hsm_log_error()

 

    real_level = level & LLAPI_MSG_NO_ERRNO;
    real_level = real_level > 0 ? level - LLAPI_MSG_NO_ERRNO : level;
 

 



 Comments   
Comment by Alexandre Ioffe [ 26/May/23 ]

To test and demonstrate different aspects of the new error logging API I have integrated it in lamigo/lpurge. This example does not have the rate control.

Here are samples (subject to further correction as needed): 

Used macro

 

LLAPI_PRINTF((LLAPI_MSG_DEBUG | LX_PRINTF_OPT), lx_log_prefix, 0, fmt, ##args)

and

 

 

LLAPI_PRINTF((LLAPI_MSG_INFO | LX_PRINTF_OPT),  lx_log_prefix, 0, fmt, ##args)

 

 

LX_PRINTF_OPT =  LLAPI_MSG_NO_ERRNO | LLAPI_MSG_PREFIX | LLAPI_MSG_SEVERITY | LLAPI_MSG_TIMEOFDAY | LLAPI_MSG_FILE | LLAPI_MSG_LINE

 

 

lx_log_prefix - custom prefix "testfs-MDT0000"

 

1685086614.273859 testfs-MDT0000:DEBUG:lamigo.c:3527:sync hot to fast [0x200000401:0x3ff1:0x0]: H: 0/1, P: 1/0, L 1, I 0
1685086594.896799 testfs-MDT0000:DEBUG:lamigo_alr.c:192:keepalive msg from host:'ost-centOS8'
1685086622.056213 testfs-MDT0000:INFO:lamigo.c:3355:received signal 15, exiting

The same as above, but no LLAPI_MSG_FILE | LLAPI_MSG_LINE

 

1685127033.955474 testfs-MDT0000:DEBUG:sync hot to fast [0x200000401:0x5:0x0]: H: 0/1, P: 0/1, L 1, I 0

 

 

Backward compatible calls (prepended by llapi_set_command_name(opt.o_mdtname). The output is supposedly the same as original:

 

llapi_error(LLAPI_MSG_INFO, 0, fmt, ##args);

 

 

lamigo testfs-MDT0000: sync hot to fast [0x200000401:0x5:0x0]: H: 0/1, P: 0/1, L 1, I 0 : Success (0)

 

 

 

llapi_err_noerrno(LLAPI_MSG_INFO, fmt, ##args);

 

 

lamigo testfs-MDT0000: sync hot to fast [0x200000401:0x5:0x0]: H: 0/1, P: 0/1, L 1, I 0

 

 

 

llapi_printf(LLAPI_MSG_INFO, fmt, ##args);

 

 

sync hot to fast [0x200000401:0x5:0x0]: H: 0/1, P: 0/1, L 1, I 0

 

 

Comment by Alexandre Ioffe [ 16/Jul/23 ]

Review in 6.0

https://review.whamcloud.com/c/ex/lustre-release/+/50859/

 

Generated at Sat Feb 10 03:30:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.