[LU-14475] Make too short log messages meanful Created: 27/Feb/21  Updated: 10/Oct/21  Resolved: 10/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: Lustre 2.15.0

Type: Improvement Priority: Minor
Reporter: Feng Lei Assignee: Feng Lei
Resolution: Fixed Votes: 0
Labels: easy

Issue Links:
Related
is related to LU-14431 error messages that do not stop with ... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

These log messages are too short. Rewrite them in a more meanful way.

max keywords length is [3]:

lustre/osd-ldiskfs/osd_iam_lvar.c(327): %#x < %#x
lustre/obdclass/lu_object.c(2284): [%d]: %p %x (%p,%p,%p) %d %d "%s"@%p

max keywords length is [4]:

lnet/lnet/api-ni.c(4236): rc2 %d
libcfs/include/libcfs/libcfs_private.h(119): LBUG

max keywords length is [6]:

lnet/klnds/o2iblnd/o2iblnd.c(164): Short %s: %d(%d)
lnet/klnds/o2iblnd/o2iblnd.c(275): Short %s: %d(%d)
lustre/osd-ldiskfs/osd_iam_lvar.c(279): used: %d

max keywords length is [7]:

lustre/ptlrpc/gss/gss_rawobj.c(121): buflen %u < %u
lustre/ptlrpc/gss/gss_rawobj.c(139): buflen %u
lustre/ptlrpc/gss/gss_rawobj.c(155): buflen %u < %u
lustre/ptlrpc/gss/gss_rawobj.c(230): buflen %u < %u
lustre/ptlrpc/gss/gss_pipefs.c(793): buflen %u < %u
lustre/target/tgt_grant.c(571): %s: cli %s/%p dirty %ld pend %ld grant %ld
lustre/target/tgt_grant.c(867): %s: cli %s/%p dirty %ld pend %ld grant %ld
lustre/obdclass/obd_mount_server.c(1618): no obd %s
lustre/obdclass/class_obd.c(596): do_div(%#llx,256) %llu != %llu
lustre/obdecho/echo_client.c(1817): lookup %s: rc = %d

max keywords length is [8]:

lustre/ptlrpc/pack_generic.c(584): buffer %d length %d
lustre/include/lustre_fid.h(778): Bad %llu to set %s
lustre/osp/osp_sync.c(153): Bad %llu to set %s
lustre/osd-ldiskfs/osd_iam_lvar.c(155): %p %8.8x "%*.*s"
lustre/osd-ldiskfs/osd_io.c(1224): ino %lu, blk %llu create %d: rc %d
lustre/ofd/ofd_fs.c(187): Bad %llu to set %s
lustre/ofd/ofd_obd.c(1157): %s: Bad %llu to set %s : rc %d
lustre/obdclass/lu_ref.c(96): lu_ref: %p %d %d %s:%d
lustre/llite/symlink.c(77): %s: inode %s: rc = %d
lustre/osc/osc_object.c(380): Bad %llu to set %s : rc %d

max keywords length is [9]:

lnet/klnds/socklnd/socklnd_proto.c(613): Bad nips %d from ip %pI4h
lnet/klnds/socklnd/socklnd_proto.c(683): Bad nips %d from ip %pI4h
lustre/ptlrpc/nrs_tbf.c(995): jobids \{%s} illegal
lustre/ptlrpc/nrs_tbf.c(1199): nids \{%s} illegal
lustre/ptlrpc/nrs_tbf.c(2663): %ss \{%s} illegal
lustre/target/tgt_grant.c(967): %s: cli %s/%p grant %ld want %llu current %llu
lustre/obdclass/class_obd.c(591): do_div(%#llx,256) != %llu
lustre/obdclass/obd_config.c(559): setup %s failed (%d)
libcfs/include/libcfs/libcfs_private.h(75): ASSERTION

max keywords length is [10]:

lnet/selftest/rpc.c(433): LNet%s(%s, %d, %lld) failed: %d
lnet/selftest/rpc.c(1058): RPC %p: bulk %p, service %d
lnet/selftest/rpc.c(1461): rpcev %p, crpc %p, reqstev %p, replyev %p, bulkev %p
lnet/selftest/ping_test.c(147): Bad magic %u from %s, %u expected.
lnet/selftest/ping_test.c(156): Bad seq %u from %s, %u expected.
lnet/lnet/api-ni.c(3245): net %s not found
lnet/lnet/api-ni.c(3277): nid %s not found
lnet/lnet/lib-ptl.c(769): Active ME %p on exit
lnet/lnet/lib-move.c(4333): net %s not found
lnet/klnds/socklnd/socklnd_proto.c(635): Zero IP[%d] from ip %pI4h
lnet/klnds/socklnd/socklnd_proto.c(705): Zero IP[%d] from ip %pI4h
lnet/klnds/o2iblnd/o2iblnd_cb.c(614): Can't map %u bytes (%u/%u)s: %d
lustre/ptlrpc/niobuf.c(95): LNetPut(%s, %d, %lld) failed: %d
lustre/ptlrpc/gss/sec_gss.c(2018): sec size %d too small
lustre/ptlrpc/gss/gss_cli_upcall.c(270): obd %s not setup
lustre/ptlrpc/gss/gss_krb5_mech.c(567): bad filler
lustre/mgs/mgs_llog.c(205): Insane cfg
lustre/mgs/mgs_llog.c(629): max index %d exceeded.
lustre/mgs/mgs_llog.c(872): Insane cfg
lustre/mgs/mgs_llog.c(2109): Insane cfg
lustre/mgs/mgs_llog.c(3700): Insane cfg
lustre/include/obd_class.h(428): Device %d not setup
lustre/include/obd_class.h(478): %s: no %s operation
lustre/include/obd_class.h(501): %s: no %s operation
lustre/include/obd_class.h(564): %s: no %s operation
lustre/include/obd_class.h(670): %s: no %s operation
lustre/include/obd_class.h(693): %s: no %s operation
lustre/include/obd_class.h(713): %s: no %s operation
lustre/include/obd_class.h(733): %s: no %s operation
lustre/include/obd_class.h(754): %s: no %s operation
lustre/include/obd_class.h(772): %s: no %s operation
lustre/include/obd_class.h(788): %s: no %s operation
lustre/include/obd_class.h(827): %s: no %s operation
lustre/include/obd_class.h(872): %s: no %s operation
lustre/include/obd_class.h(917): %s: no %s operation
lustre/include/obd_class.h(931): %s: no %s operation
lustre/include/obd_class.h(944): %s: no %s operation
lustre/include/obd_class.h(959): %s: no %s operation
lustre/include/obd_class.h(974): %s: no %s operation
lustre/include/obd_class.h(1064): %s: no %s operation
lustre/include/obd_class.h(1129): %s: no %s operation
lustre/include/obd_class.h(1154): %s: no %s operation
lustre/include/obd_class.h(1176): %s: no %s operation
lustre/include/obd_class.h(1260): %s: no %s operation
lustre/include/lustre_fid.h(864): bad IGIF, %s
lustre/mdc/mdc_request.c(2317): Bad magic %x!=%x
lustre/mdc/mdc_locks.c(91): it disp: %X, status: %d
lustre/osd-ldiskfs/osd_compat.c(99): found %s (%lu/%u) is mode %o
lustre/fld/fld_request.c(108): exp: 0x%p (%s), srv: 0x%p (%s), idx: %llu
lustre/obdclass/genops.c(491): Cleanup %s returned %d
lustre/obdclass/llog_test.c(846): 5e: found %d records
lustre/obdclass/llog_test.c(859): 5f: found %d records
lustre/obdclass/obd_config.c(610): Device %d not setup
lustre/llite/file.c(4241): %s: migrate %s, but FID %s is insane
lustre/mdd/mdd_device.c(1879): Bad magic %x != %x

max keywords length is [11]:

lnet/lnet/lib-move.c(2217): no path to %s from net %s
lnet/klnds/o2iblnd/o2iblnd_cb.c(3272): %s: REJECTED %d
lnet/klnds/o2iblnd/o2iblnd.c(222): Bad magic: %08x
lustre/ptlrpc/gss/gss_pipefs.c(971): failed msg %p (seq %u, uid %u, svc %u, nid %#llx, obd %.*s): errno %d
lustre/ptlrpc/gss/gss_krb5_mech.c(1125): odd offset %d in page %d
lustre/mgs/mgs_llog.c(4301): err %d on param '%s'
lustre/lod/lod_lov.c(239): can't find %s device
lustre/lod/lod_lov.c(548): can't find %s device
lustre/lod/lod_lov.c(1095): %s: bad idx: %d of %d
lustre/include/obd_class.h(413): NULL device
lustre/include/obd_class.h(1190): NULL device
lustre/lov/lov_obd.c(134): Target %s not set up
lustre/lov/lov_dev.c(344): Target %s not set up
lustre/ldlm/ldlm_lib.c(240): connection %s not found
lustre/target/out_handler.c(915): %s: undo for %s:%d: rc = %d
lustre/target/update_recovery.c(1415): %s: undo for %s:%d: rc = %d
lustre/fld/fld_cache.c(347): NEW range =%s curr = %s
lustre/obdclass/lu_ref.c(98): link: %s %p
lustre/obdclass/obd_config.c(667): Precleanup %s returned %d
lustre/mdd/mdd_orphans.c(455): %s: bad FID %s cleaning '%s'

max keywords length is [12]:

lnet/lnet/lib-move.c(2015): no route to %s from %s
lnet/klnds/gnilnd/gnilnd.c(1570): bad command %d
lnet/klnds/o2iblnd/o2iblnd.c(250): Bad checksum
lustre/ptlrpc/gss/gss_pipefs.c(931): refresh ctx %p(uid %d) failed: %d/0x%08x: %s
lustre/ptlrpc/gss/sec_gss.c(2375): gss version %u, expect %u
lustre/ptlrpc/gss/sec_gss.c(2737): proc %d not support
lustre/ptlrpc/gss/gss_svc_upcall.c(1019): handle size %u too large
lustre/ptlrpc/gss/gss_cli_upcall.c(265): no such obd %s
lustre/ptlrpc/gss/gss_cli_upcall.c(276): obd %s has stopped
lustre/ptlrpc/gss/gss_krb5_mech.c(555): bad token id
lustre/ptlrpc/sec_bulk.c(902): Invalid svc %u
lustre/mdt/mdt_io.c(577): %s: wrong cmd %d received!
lustre/mgs/mgs_llog.c(409): fsname %s is too long
lustre/fid/fid_store.c(192): %s: Read only %d bytes of %d
lustre/osp/osp_precreate.c(230): %s: can't sync
lustre/osp/osp_md_object.c(511): %s: lookup %s %s wrong size %d
lustre/osp/lwp_dev.c(632): Unknown key %s
lustre/osp/osp_sync.c(1671): %s: can't sync
lustre/lov/lov_obd.c(1317): ost %d is inactive
lustre/lov/lov_dev.c(357): add failed (%d), deleting %s
lustre/lov/lov_ea.c(188): LOV EA %s too small: %zu, need %u
lustre/lov/lov_ea.c(257): %s: OST index %d missing
lustre/target/tgt_grant.c(157): %s: cli %s/%p ted_dirty(%ld) > maxsize(%llu)
lustre/target/tgt_grant.c(263): %s: tot_dirty %llu > maxsize %llu
lustre/ofd/ofd_io.c(949): %s: wrong cmd %d received!
lustre/obdecho/echo.c(364): cleaning up %u pages (%d obdos)
lustre/obdecho/echo.c(440): cleaning up %d pages (%d obdos)
lustre/obdecho/echo_client.c(2373): invalid oid %s
lustre/obdecho/echo_client.c(2421): invalid oid %s
lustre/mdd/mdd_object.c(2511): %s: restore %s HSM error: %d/%d
lustre/mgc/mgc_request.c(337): logname %s is too long
libcfs/include/libcfs/libcfs_fail.h(165): cfs_race id %x sleeping
libcfs/include/libcfs/libcfs_fail.h(192): cfs_race id %x sleeping

max keywords length is [13]:

lnet/selftest/conrpc.c(1217): Out of memory
lnet/selftest/conrpc.c(1263): Out of memory
lnet/selftest/brw_test.c(344): BRW RPC to %s failed with %d
lnet/selftest/console.c(414): Out of memory
lnet/selftest/console.c(483): Out of memory
lnet/selftest/console.c(1552): Out of memory
lnet/selftest/console.c(1948): Out of memory
lnet/selftest/console.c(1971): Out of memory
lnet/lnet/config.c(278): Out of Memory
lnet/lnet/lib-move.c(4216): %s, src %s: bad %s payload %d (0 expected)
lnet/klnds/o2iblnd/o2iblnd_cb.c(2718): Can't accept %s: %d
lnet/klnds/o2iblnd/o2iblnd_cb.c(3316): %s DISCONNECTED
lnet/klnds/o2iblnd/o2iblnd.c(229): Bad version: %x
lnet/klnds/o2iblnd/o2iblnd.c(269): Bad src nid: %s
lnet/klnds/o2iblnd/o2iblnd.c(2556): Out of memory
lustre/ptlrpc/sec.c(411): import %p (%s) with no sec
lustre/ptlrpc/gss/sec_gss.c(2414): unknown proc %u
lustre/ptlrpc/gss/gss_krb5_mech.c(572): bad EC or RRC
lustre/ptlrpc/gss/gss_krb5_mech.c(853): page %d: odd offset %u len %u, blocksize %d
lustre/ptlrpc/gss/gss_krb5_mech.c(1301): odd bodysize %d
lustre/ptlrpc/gss/gss_krb5_mech.c(1369): error decrypt
lustre/ptlrpc/nrs.c(292): NRS: arg '%s' is too long
lustre/ptlrpc/pack_generic.c(575): buffer %d length %d is not valid
lustre/ptlrpc/service.c(2917): cannot start %s thread #%d_%d: rc %d
lustre/ptlrpc/sec_config.c(475): can't merge: %d
lustre/ptlrpc/sec_bulk.c(893): Invalid type %u
lustre/mdt/mdt_identity.c(92): no upcall set
lustre/mdt/mdt_handler.c(3580): Invalid fid: %s
lustre/osp/osp_md_object.c(521): %s: lookup %s %s invalid fid %s
lustre/osp/osp_sync.c(1219): can't get llh
lustre/osd-zfs/osd_handler.c(918): %s: can't open %s
lustre/osd-zfs/osd_index.c(1059): %s: OIC for %s isn't filled
lustre/lmv/lmv_obd.c(299): target %s not attached
lustre/osd-ldiskfs/osd_handler.c(6133): %s: OIC for %s isn't filled
lustre/osd-ldiskfs/osd_scrub.c(1758): Fail to find %.*s in %.*s (%lu/%u): rc = %d
lustre/osd-ldiskfs/osd_quota.c(587): Too many(%d) trans qids!
lustre/osd-ldiskfs/osd_oi.c(323): %s: can't open %s: rc = %d
lustre/osd-ldiskfs/osd_oi.c(463): %s: can't open %s: rc = %d
lustre/osd-ldiskfs/osd_oi.c(604): %s: lookup FLD %s: rc = %d
lustre/osd-ldiskfs/osd_io.c(1872): %s: can't read %u@%llu on ino %lu: rc = %ld
lustre/ldlm/ldlm_resource.c(940): Unknown type %d for ns %s
lustre/target/tgt_lastrcvd.c(1314): %s: client idx %d has offset %lld
lustre/target/tgt_handler.c(218): %s: short ioobj
lustre/obdclass/obd_mount_server.c(303): can't set_fs %d
lustre/obdclass/llog_test.c(690): 4: close log %s failed: %d
lustre/obdclass/class_obd.c(481): Device %d not attached
lustre/obdclass/obd_mount.c(202): %s setup error %d
lustre/obdclass/obd_config.c(455): Device %d not attached
lustre/obdclass/obd_config.c(572): OBD device %d still set up
lustre/obdclass/obd_config.c(579): OBD device %d not attached
lustre/obdclass/obd_config.c(2090): empty cleanup
lustre/mdd/mdd_device.c(374): %s: cancel idx %u of catalog %s: rc = %d
lustre/mdd/mdd_dir.c(1208): link_ea add '%.*s' failed %d %s
lustre/mdd/mdd_dir.c(1212): link_ea del '%.*s' failed %d %s
lustre/osc/osc_request.c(658): dirty %lu - %lu > dirty_max %lu
lustre/osc/osc_request.c(674): dirty %lu - dirty_max %lu too big???
lustre/osc/osc_cache.c(671): max_pages: %#x chunkbits: %u chunk_mask: %#lx

max keywords length is [14]:

lnet/selftest/rpc.c(1060): Event: status %d, type %d, lnet %d
lnet/selftest/rpc.c(1536): Dropping RPC (%s) from %s: status %d mlength %d type %u magic %u.
lnet/selftest/brw_test.c(389): Bulk transfer %s %s has failed: %d
lnet/lnet/api-ni.c(4208): %s: Short reply %d(%d min)
lnet/lnet/api-ni.c(4218): %s: Short reply %d(%d expected)
lnet/lnet/router.c(486): Routes to %s via %s and %s not supported
lnet/lnet/config.c(366): Duplicate net %s. Ignore
lnet/lnet/lib-move.c(1966): can't send to %s. no NI on %s
lnet/lnet/lib-move.c(2971): Error sending %s to %s: %d
lnet/lnet/lib-move.c(4231): %s, src %s: bad %s payload %d (%d max expected)
lnet/lnet/lib-move.c(4690): Dropping PUT (%llu:%d:%s): MD (%d) invalid
lnet/lnet/lib-move.c(4910): Dropping GET (%llu:%d:%s): MD (%d) invalid
lnet/lnet/lib-socket.c(298): Error %d getting sock %s IP/port
lnet/klnds/socklnd/socklnd.c(1239): Route %s %pI4h connected to %pI4h
lnet/klnds/socklnd/socklnd_cb.c(1212): [%p] Error %d on read from %s ip %pI4h:%d
lnet/klnds/o2iblnd/o2iblnd_cb.c(526): Stale rx from %s
lnet/klnds/o2iblnd/o2iblnd_cb.c(3359): Timed out tx: %s, %lld seconds
lustre/ptlrpc/gss/gss_rawobj.c(165): fail to alloc %u bytes
lustre/ptlrpc/gss/gss_keyring.c(1336): invalid: data %p, len %lu
lustre/ptlrpc/gss/sec_gss.c(1088): gss backend 0x%x not found
lustre/ptlrpc/gss/gss_crypto.c(76): failed to set %s key, len %d, rc = %d
lustre/ptlrpc/gss/gss_crypto.c(456): encrypt error %d
lustre/ptlrpc/nrs.c(726): Policy %s is busy with %d references
lustre/ptlrpc/pack_generic.c(2373): Invalid magic %#x
lustre/mdt/mdt_handler.c(2306): Invalid hash: %#llx != %#llx
lustre/mdt/mdt_handler.c(5600): param is empty
lustre/mgs/mgs_llog.c(967): %s: modify %s/%s failed: rc = %d
lustre/mgs/mgs_llog.c(2028): %s: writing log %s: rc = %d
lustre/mgs/mgs_llog.c(3732): param is empty
lustre/fid/fid_store.c(230): %s: Can't find "%s" obj %d
lustre/lod/lod_lov.c(120): %s: qos_del_tgt(%s) failed:rc = %d
lustre/lod/lod_lov.c(565): %s: device %d is not set up
lustre/lod/lod_object.c(3605): Invalid magic %x
lustre/lod/lod_dev.c(132): %s: invalid FID %s
lustre/lod/lod_dev.c(2034): %s: disconnect #%d
lustre/include/obd_class.h(1284): cleaned up obd
lustre/include/lustre_fid.h(871): Too large OID %#llx to set IDIF %s
lustre/include/lustre_fid.h(880): Too large OID %#llx to set REG %s
lustre/include/obd_support.h(818): vmalloc of '%s' (%d bytes) failed
lustre/osd-zfs/osd_xattr.c(1131): %s: free xattr %s failed: rc = %d
lustre/osd-zfs/osd_xattr.c(1140): %s: free xattr %llu failed: rc = %d
lustre/osd-zfs/osd_handler.c(958): %s: lookup for %s failed: rc = %d
lustre/osd-zfs/osd_index.c(409): wrong parent: %s != %s
lustre/osd-zfs/osd_index.c(1209): %s: zap_remove %s failed: rc = %d
lustre/osd-zfs/osd_object.c(610): %s: lookup %s/%#llx failed: rc = %d
lustre/mdc/mdc_request.c(531): validate acl: %d
lustre/mdc/mdc_request.c(1503): %s: page error: %s at %llu: rc %d
lustre/lmv/lmv_fld.c(61): %s: invalid FID %s
lustre/lmv/lmv_obd.c(172): %sactivation of %s failed: %d
lustre/lmv/lmv_obd.c(308): target %s is not set up
lustre/lmv/lmv_obd.c(699): %s: invalid FID %s: rc = %d
lustre/osd-ldiskfs/osd_handler.c(2460): ** setting %s read-only **
lustre/osd-ldiskfs/osd_handler.c(7696): %s: can't mount %s: %d
lustre/osd-ldiskfs/osd_io.c(207): bi_next: %p, bi_flags: %lx, %sbi_rw: %lu,%sbi_opf: %u,%sbi_vcnt: %d, bi_idx: %d, bi->size: %d, bi_end_io: %p,bi_cnt: %d, bi_private: %p
lustre/lov/lov_obd.c(469): %s: event %d failed: rc = %d
lustre/ldlm/ldlm_resource.c(124): Failed to set %s, rc = %d.
lustre/target/tgt_lastrcvd.c(166): %s: slot %d not allocated
lustre/target/out_handler.c(142): %s: invalid fid %s: rc = %d
lustre/target/out_handler.c(549): %s: invalid FID %s: rc = %d
lustre/target/out_handler.c(611): %s: invalid FID %s: rc = %d
lustre/target/out_handler.c(1073): %s: invalid FID %s: rc = %d
lustre/target/tgt_grant.c(260): %s: tot_granted %llu > maxsize %llu
lustre/fld/fld_index.c(375): %s: Can't find "%s" obj %d
lustre/ofd/ofd_dev.c(373): param is empty
lustre/obdclass/llog.c(1406): %s: backup llog %s to itself (%s), objects %p/%p
lustre/obdclass/llog_obd.c(209): %s: ctxt %d lop_setup=%p failed: rc = %d
lustre/obdclass/llog_test.c(133): 1b: close log %s failed: %d
lustre/obdclass/llog_test.c(867): 5g: close log %s failed: %d
lustre/obdclass/llog_test.c(1244): 8a: close log %s failed: %d
lustre/obdclass/llog_test.c(1275): 8b: close log %s failed: %d
lustre/obdclass/llog_test.c(1316): 8d: close log %s failed: %d
lustre/obdclass/llog_test.c(2090): 10: close log %s failed: %d
lustre/obdclass/obd_mount.c(197): %s attach error %d
lustre/obdclass/obd_mount.c(553): del MDC UUID %s failed: rc = %d
lustre/obdclass/obd_config.c(2119): detach failed %d: %s
lustre/obdecho/echo.c(92): can't connect %d
lustre/obdecho/echo.c(512): %s: invalid seq %#llx
lustre/obdecho/echo_client.c(2199): lookup %s under parent %s: rc = %d
lustre/llite/vvp_io.c(869): Wrong IO type %u
lustre/llite/statahead.c(1320): %s: reading dir %s at %lluopendir_pid = %u : rc = %d
lustre/llite/dir.c(1417): %s: lookup %.*s failed: rc = %d
lustre/llite/namei.c(93): Invalid inode %s type
lustre/osc/osc_request.c(1797): Unexpected rc %d (%d requested)
lustre/osc/osc_request.c(1803): Unexpected rc %d (%d transferred)
lustre/mgc/mgc_request.c(65): missing name: %s

max keywords length is [15]:

lnet/selftest/brw_test.c(262): Bulk page %p (%d/%d) is corrupted!
lnet/selftest/brw_test.c(370): Bulk data from %s is corrupted!
lnet/selftest/brw_test.c(427): Bulk data from %s is corrupted!
lnet/selftest/ping_test.c(132): Unable to ping %s (%d): %d
lnet/selftest/ping_test.c(190): Unexpect magic %08x from %s
lnet/lnet/api-ni.c(2275): Can't load LND %s, module %s, rc=%d
lnet/lnet/api-ni.c(4113): Can't bind MD: %d
lnet/lnet/router.c(998): Can't bind MD: %d
lnet/lnet/lib-ptl.c(575): Invalid portal %d not in [0-%d]
lnet/lnet/lib-move.c(3524): Can't bind MD: %d
lnet/klnds/socklnd/socklnd_proto.c(295): ksnc_type bad: %u
lnet/klnds/socklnd/socklnd_proto.c(329): ksnc_type bad: %u
lnet/klnds/o2iblnd/o2iblnd_cb.c(767): Can't find page
lnet/klnds/o2iblnd/o2iblnd_cb.c(1752): Can't allocate %s txd for %s
lnet/klnds/o2iblnd/o2iblnd_cb.c(2572): Can't accept %s: message size %d too big (%d max)
lnet/klnds/o2iblnd/o2iblnd.c(213): Short message: %d
lnet/klnds/o2iblnd/o2iblnd.c(234): Short message: %d
lnet/klnds/o2iblnd/o2iblnd.c(862): cmid HCA(%s), kib_dev(%s) need failover
lnet/klnds/o2iblnd/o2iblnd.c(2342): Can't allocate %d tx descriptors
lnet/klnds/o2iblnd/o2iblnd.c(2715): Failed to bind %s:%pI4h to device(%p): %d
lnet/klnds/o2iblnd/o2iblnd.c(2784): Failed to bind %s:%pI4h to device(%p): %d
lustre/ptlrpc/gss/gss_pipefs.c(864): fail to get seq
lustre/ptlrpc/gss/sec_gss.c(2360): buf count only %d
lustre/ptlrpc/gss/gss_krb5_mech.c(638): bad krb5 header
lustre/ptlrpc/gss/gss_krb5_mech.c(1280): bad krb5 header
lustre/ptlrpc/gss/gss_krb5_mech.c(1448): bad krb5 header
lustre/ptlrpc/pack_generic.c(566): message length %d too small for %d buflens
lustre/ptlrpc/pack_generic.c(582): len: %d, required_len %d, bufcount: %d
lustre/ptlrpc/sec_config.c(84): unknown target %p(%s)
lustre/ptlrpc/sec_config.c(663): can't find conf
lustre/mdt/mdt_open.c(230): %s: replay trans %llu NID %s: rc = %d
lustre/mgs/mgs_llog.c(3539): Invalid param: %s
lustre/mgs/mgs_llog.c(3726): svname is empty
lustre/mgs/mgs_llog.c(5488): %s is not defined
lustre/lod/lod_qos.c(213): %s: statfs: rc = %d
lustre/lod/lod_object.c(6125): %s: can't locate %s:rc = %d
lustre/lod/lod_dev.c(1013): %s: can not find %s: rc = %d
lustre/lod/lod_dev.c(1030): %s: can not find %s: rc = %d
lustre/include/lustre_debug.h(53): bad page index %lu > %llu
lustre/osp/osp_object.c(744): %s: update error %s: rc = %d
lustre/osp/osp_sync.c(1414): %s: the catid %s for init llog %d is bad
lustre/osd-zfs/osd_xattr.c(1126): %s: lookup xattr %s failed: rc = %d
lustre/osd-zfs/osd_handler.c(1229): %s: lost %d loaned abuf(s)
lustre/osd-zfs/osd_handler.c(1232): %s: lost %d pinned dbuf(s)
lustre/osd-zfs/osd_object.c(887): %s: zap_remove(%s) failed: rc = %d
lustre/lmv/lmv_obd.c(315): target %s connect error %d
lustre/osd-ldiskfs/osd_handler.c(1727): transaction @0x%p commit error: %d
lustre/osd-ldiskfs/osd_io.c(1140): initial space: %lu:%u
lustre/lov/lov_obd.c(174): Target %s connect error %d
lustre/ldlm/ldlm_lockd.c(2190): unknown opcode %u
lustre/ldlm/ldlm_lockd.c(2368): invalid opcode %d
lustre/target/tgt_lastrcvd.c(1208): %s: can't update %s file: rc = %d
lustre/target/tgt_grant.c(257): %s: tot_pending %llu > tot_granted %llu
lustre/target/tgt_grant.c(1444): %s: cli %s/%p ted_pending(%lu) < grant_used(%lu)
lustre/target/tgt_grant.c(1453): %s: cli %s/%p tot_granted(%llu) < grant_used(%lu)
lustre/target/tgt_grant.c(1462): %s: cli %s/%p tot_pending(%llu) < grant_used(%lu)
lustre/target/tgt_handler.c(95): %s: invalid FID: %s
lustre/target/tgt_handler.c(929): %s: RPC source %s doesn't match %s
lustre/target/tgt_handler.c(1880): %s: sync returns %d
lustre/fld/fld_index.c(108): %s: lookup range %s error: rc = %d
lustre/fld/fld_index.c(202): %s: insert range %s failed: rc = %d
lustre/obdclass/llog.c(1065): %s: loghandle %p with no header
lustre/obdclass/genops.c(330): type %s has refcount (%d)
lustre/obdclass/llog_osd.c(263): %s: bad log %s %s header magic: %#x (expected %#x)
lustre/obdclass/obd_mount_server.c(134): %s not registered
lustre/obdclass/class_obd.c(390): No UUID passed!
lustre/obdclass/obd_mount.c(469): connect failed %d
lustre/obdclass/obd_config.c(376): No type passed!
lustre/obdclass/obd_config.c(382): No name passed!
lustre/obdclass/obd_config.c(387): No UUID passed!
lustre/obdclass/obd_config.c(1264): no device for: %s
lustre/obdclass/obd_config.c(2111): cleanup failed %d: %s
lustre/llite/dir.c(744): unknown magic: %lX
lustre/mdd/mdd_device.c(1874): Bad ioctl size %d
lustre/osc/osc_request.c(1625): %s: sync returns %d
lustre/mgc/mgc_request.c(61): name too long: %s
lustre/mgc/mgc_request.c(1488): MNE too large (%u)


 Comments   
Comment by Andreas Dilger [ 27/Feb/21 ]

Hi Feng Lei,
Thanks for filing this ticket.

It may be that some of your found messages are false positives if the strings are split across lines. Don't go too crazy making very verbose error messages, especially those that are printed frequently, since they may be printed thousands of times on thousands of nodes.

The messages need to be unique and explain enough of the state to understand what I going wrong, but do not need to be complete sentences or use proper grammar. It is better to have a good explanation of the error in Jira (or in the manual for those "normal" errors that happen under regular usage, and best if the cause of the error is fixed and never hit in the first place, or the error condition fixes itself and does not depend on the user to fix it for them.

When you are cleaning up these message strings, please also fix them to match the modern coding style:

  • console error messages should start with the device or subsystem name (eg. "testfs-MDT0000:") and end with the error code ": rc = %d\n"
  • keep error message strings on a single line, even if over 80 chars, to make it easier to search for them in the code
  • replace spaces with tabs to indent lines, including +/-6 lines before/after modified lines (or more lines if there are only a few nearby lines with spaces)
  • other checkpatch warnings on the modified lines

If the patch gets too large (say, over 200 lines) then consider to split it into a few smaller patches and submit each one separately. That reduces the number of conflicts with other patches, and allows some changes to land even if one or two of the patches have conflicts, or they need to be updated a few times to make the messages more clear.

If the patch is only changing white space and code style, then it should use the "Test-Parameters: trivial" label to minimize the testing. Note however that there are some error strings that are checked for while running tests. These are mostly annotated in the code already.

Comment by Li Xi [ 05/Mar/21 ]

I think there need to be some balance on the length of the error messages. For the messages with keyword < 8, the length is obviously too short which means lack of unique and explain.

Don't go too crazy making very verbose error messages, especially those that are printed frequently, since they may be printed thousands of times on thousands of nodes.

I think the most of the error messages are not print frequently. There are about 2800+ error messaeges, I think only 20 of them could be seen frequently in the cases of disconnection or recovery.

Agreed too many long messages is annoying. But I think the major effort should be spent on reducing the number of the error messages rather than the length of it. Ideally, the system prints zero error messages and fixes the problems automatically. But if a error message is printed, it should provide enough information both for the administrators and the developers.

The current error messages might be good enough for developers/debuggers who are reading and modifying the sources everyday, but I think it is less useful for new users/adminitrators who only need to know where to modify/configure and who to get help from. The administrators might be able to search in Google/Stackoverflow/Jira/Maillist, but won't be able to read the codes based on the function name and line number.

Thus, I think the "keyword" need to be long enough to be searchable. For the keyword < 14, I think we are very safe from adding too long string in each individual messages.

The longest error messages in Lustre are:

max keywords length is [115]:
        lustre/osd-ldiskfs/osd_handler.c(7610): %s: device %s extents feature is not guaranteed to work on big-endian systems. Use "bigendian_extents" mount option to override.
max keywords length is [132]:
        lustre/obdecho/echo_client.c(944): Local operations are NOT supported on client side. Only remote operations are supported. Metadata client must be run on server side.
        lustre/obdecho/echo_client.c(2973): Local operations are NOT supported on client side. Only remote operations are supported. Metadata client must be run on server side.
max keywords length is [136]:
        lustre/osd-ldiskfs/osd_handler.c(7706): %s: device %s LDISKFS does not support filesystems greater than 512TB and can cause data corruption. Use "force_over_512tb" mount option to override.
max keywords length is [156]:
        lustre/ptlrpc/nrs.c(1199): NRS: failing to register policy %s. Please check policy flags; external policies cannot act as fallback policies, or be started immediately upon registration without interaction with lprocfs
max keywords length is [415]:
        lustre/osd-ldiskfs/osd_io.c(185): ***** bio->bi_private is NULL!  This should never happen.  Normally, I would crash here, but instead I will dump the bio contents to the console.  Please report this to <https://jira.whamcloud.com/> , along with any interesting messages leading up to this point (like SCSI errors, perhaps).  Because bi_private is NULL, I can't wake up the thread that initiated this IO - you will probably have to reboot this node.

The last one with length of 415 is a little bit too long. The other ones with 100+ lengths still look fine. Reading the message could be finished in just a glance. So if a error message is shorter than 80, I don't think we need to worry at all. And I guess nobody would have time to write message with length of 80 for all 2800 error messages. If he/she does want to, that helps a lot on improving the quality of the codes, rather than making the error message too verbose.

And also, disks are getting bigger and bigger. We don't need worry that the increasing length of messages would flood our disks, do we? We can always archive the logs into a tarball or even use the 1PB Lustre to save them.

Comment by Andreas Dilger [ 05/Mar/21 ]

I'm not only considering the disk space (which can be quite a lot), but also the fact that longer error messages consume more RAM to store and more CPU to process them. That adds overhead for relatively little benefit. I agree that proper error messages that contain the device name, important data values, FID, error code are needed for debugging.

I'm not a fan error messages that are trying to explain and solve the problem, like "NRS: failing to register policy %s. Please check policy flags; external policies cannot act as fallback policies, or be started immediately upon registration without interaction with lprocfs". Having an error message like "testfs-OST0000: failed to register external NRS policy '%s': rc = -22" is enough if there is a good (hopefully much better) explanation in the manual about how to use/configure NRS policies.

The length of the bi_private error message is almost a joke, IMHO. Not only could a shorter message convey the same information, but if it referenced a specific LU or LUDOC ticket in Jira (rather than just a URL to file a bug), like "foofs-OST1234: bi_private is NULL. Unable to complete IO, OST may be unable to unmount. See LU-12345", the user could go directly to the ticket to read a much better/more detailed explanation of the problem.

Comment by Gerrit Updater [ 05/Mar/21 ]

Feng, Lei (flei@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41892
Subject: LU-14475 log: Rewrite some log messages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 15be2325e7d9924fc251877e847dcb1accc9bf31

Comment by Gerrit Updater [ 10/Oct/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/41892/
Subject: LU-14475 log: Rewrite some log messages
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: da1d93513fdff0a70257b13aa5649e478d4f70b6

Comment by Peter Jones [ 10/Oct/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:10:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.