[LU-14475] Make too short log messages meanful Created: 27/Feb/21 Updated: 10/Oct/21 Resolved: 10/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.0 |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Feng Lei | Assignee: | Feng Lei |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | easy | ||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
These log messages are too short. Rewrite them in a more meanful way. max keywords length is [3]: lustre/osd-ldiskfs/osd_iam_lvar.c(327): %#x < %#x lustre/obdclass/lu_object.c(2284): [%d]: %p %x (%p,%p,%p) %d %d "%s"@%p max keywords length is [4]: lnet/lnet/api-ni.c(4236): rc2 %d libcfs/include/libcfs/libcfs_private.h(119): LBUG max keywords length is [6]: lnet/klnds/o2iblnd/o2iblnd.c(164): Short %s: %d(%d) lnet/klnds/o2iblnd/o2iblnd.c(275): Short %s: %d(%d) lustre/osd-ldiskfs/osd_iam_lvar.c(279): used: %d max keywords length is [7]: lustre/ptlrpc/gss/gss_rawobj.c(121): buflen %u < %u lustre/ptlrpc/gss/gss_rawobj.c(139): buflen %u lustre/ptlrpc/gss/gss_rawobj.c(155): buflen %u < %u lustre/ptlrpc/gss/gss_rawobj.c(230): buflen %u < %u lustre/ptlrpc/gss/gss_pipefs.c(793): buflen %u < %u lustre/target/tgt_grant.c(571): %s: cli %s/%p dirty %ld pend %ld grant %ld lustre/target/tgt_grant.c(867): %s: cli %s/%p dirty %ld pend %ld grant %ld lustre/obdclass/obd_mount_server.c(1618): no obd %s lustre/obdclass/class_obd.c(596): do_div(%#llx,256) %llu != %llu lustre/obdecho/echo_client.c(1817): lookup %s: rc = %d max keywords length is [8]: lustre/ptlrpc/pack_generic.c(584): buffer %d length %d lustre/include/lustre_fid.h(778): Bad %llu to set %s lustre/osp/osp_sync.c(153): Bad %llu to set %s lustre/osd-ldiskfs/osd_iam_lvar.c(155): %p %8.8x "%*.*s" lustre/osd-ldiskfs/osd_io.c(1224): ino %lu, blk %llu create %d: rc %d lustre/ofd/ofd_fs.c(187): Bad %llu to set %s lustre/ofd/ofd_obd.c(1157): %s: Bad %llu to set %s : rc %d lustre/obdclass/lu_ref.c(96): lu_ref: %p %d %d %s:%d lustre/llite/symlink.c(77): %s: inode %s: rc = %d lustre/osc/osc_object.c(380): Bad %llu to set %s : rc %d max keywords length is [9]: lnet/klnds/socklnd/socklnd_proto.c(613): Bad nips %d from ip %pI4h
lnet/klnds/socklnd/socklnd_proto.c(683): Bad nips %d from ip %pI4h
lustre/ptlrpc/nrs_tbf.c(995): jobids \{%s} illegal
lustre/ptlrpc/nrs_tbf.c(1199): nids \{%s} illegal
lustre/ptlrpc/nrs_tbf.c(2663): %ss \{%s} illegal
lustre/target/tgt_grant.c(967): %s: cli %s/%p grant %ld want %llu current %llu
lustre/obdclass/class_obd.c(591): do_div(%#llx,256) != %llu
lustre/obdclass/obd_config.c(559): setup %s failed (%d)
libcfs/include/libcfs/libcfs_private.h(75): ASSERTION
max keywords length is [10]: lnet/selftest/rpc.c(433): LNet%s(%s, %d, %lld) failed: %d lnet/selftest/rpc.c(1058): RPC %p: bulk %p, service %d lnet/selftest/rpc.c(1461): rpcev %p, crpc %p, reqstev %p, replyev %p, bulkev %p lnet/selftest/ping_test.c(147): Bad magic %u from %s, %u expected. lnet/selftest/ping_test.c(156): Bad seq %u from %s, %u expected. lnet/lnet/api-ni.c(3245): net %s not found lnet/lnet/api-ni.c(3277): nid %s not found lnet/lnet/lib-ptl.c(769): Active ME %p on exit lnet/lnet/lib-move.c(4333): net %s not found lnet/klnds/socklnd/socklnd_proto.c(635): Zero IP[%d] from ip %pI4h lnet/klnds/socklnd/socklnd_proto.c(705): Zero IP[%d] from ip %pI4h lnet/klnds/o2iblnd/o2iblnd_cb.c(614): Can't map %u bytes (%u/%u)s: %d lustre/ptlrpc/niobuf.c(95): LNetPut(%s, %d, %lld) failed: %d lustre/ptlrpc/gss/sec_gss.c(2018): sec size %d too small lustre/ptlrpc/gss/gss_cli_upcall.c(270): obd %s not setup lustre/ptlrpc/gss/gss_krb5_mech.c(567): bad filler lustre/mgs/mgs_llog.c(205): Insane cfg lustre/mgs/mgs_llog.c(629): max index %d exceeded. lustre/mgs/mgs_llog.c(872): Insane cfg lustre/mgs/mgs_llog.c(2109): Insane cfg lustre/mgs/mgs_llog.c(3700): Insane cfg lustre/include/obd_class.h(428): Device %d not setup lustre/include/obd_class.h(478): %s: no %s operation lustre/include/obd_class.h(501): %s: no %s operation lustre/include/obd_class.h(564): %s: no %s operation lustre/include/obd_class.h(670): %s: no %s operation lustre/include/obd_class.h(693): %s: no %s operation lustre/include/obd_class.h(713): %s: no %s operation lustre/include/obd_class.h(733): %s: no %s operation lustre/include/obd_class.h(754): %s: no %s operation lustre/include/obd_class.h(772): %s: no %s operation lustre/include/obd_class.h(788): %s: no %s operation lustre/include/obd_class.h(827): %s: no %s operation lustre/include/obd_class.h(872): %s: no %s operation lustre/include/obd_class.h(917): %s: no %s operation lustre/include/obd_class.h(931): %s: no %s operation lustre/include/obd_class.h(944): %s: no %s operation lustre/include/obd_class.h(959): %s: no %s operation lustre/include/obd_class.h(974): %s: no %s operation lustre/include/obd_class.h(1064): %s: no %s operation lustre/include/obd_class.h(1129): %s: no %s operation lustre/include/obd_class.h(1154): %s: no %s operation lustre/include/obd_class.h(1176): %s: no %s operation lustre/include/obd_class.h(1260): %s: no %s operation lustre/include/lustre_fid.h(864): bad IGIF, %s lustre/mdc/mdc_request.c(2317): Bad magic %x!=%x lustre/mdc/mdc_locks.c(91): it disp: %X, status: %d lustre/osd-ldiskfs/osd_compat.c(99): found %s (%lu/%u) is mode %o lustre/fld/fld_request.c(108): exp: 0x%p (%s), srv: 0x%p (%s), idx: %llu lustre/obdclass/genops.c(491): Cleanup %s returned %d lustre/obdclass/llog_test.c(846): 5e: found %d records lustre/obdclass/llog_test.c(859): 5f: found %d records lustre/obdclass/obd_config.c(610): Device %d not setup lustre/llite/file.c(4241): %s: migrate %s, but FID %s is insane lustre/mdd/mdd_device.c(1879): Bad magic %x != %x max keywords length is [11]: lnet/lnet/lib-move.c(2217): no path to %s from net %s lnet/klnds/o2iblnd/o2iblnd_cb.c(3272): %s: REJECTED %d lnet/klnds/o2iblnd/o2iblnd.c(222): Bad magic: %08x lustre/ptlrpc/gss/gss_pipefs.c(971): failed msg %p (seq %u, uid %u, svc %u, nid %#llx, obd %.*s): errno %d lustre/ptlrpc/gss/gss_krb5_mech.c(1125): odd offset %d in page %d lustre/mgs/mgs_llog.c(4301): err %d on param '%s' lustre/lod/lod_lov.c(239): can't find %s device lustre/lod/lod_lov.c(548): can't find %s device lustre/lod/lod_lov.c(1095): %s: bad idx: %d of %d lustre/include/obd_class.h(413): NULL device lustre/include/obd_class.h(1190): NULL device lustre/lov/lov_obd.c(134): Target %s not set up lustre/lov/lov_dev.c(344): Target %s not set up lustre/ldlm/ldlm_lib.c(240): connection %s not found lustre/target/out_handler.c(915): %s: undo for %s:%d: rc = %d lustre/target/update_recovery.c(1415): %s: undo for %s:%d: rc = %d lustre/fld/fld_cache.c(347): NEW range =%s curr = %s lustre/obdclass/lu_ref.c(98): link: %s %p lustre/obdclass/obd_config.c(667): Precleanup %s returned %d lustre/mdd/mdd_orphans.c(455): %s: bad FID %s cleaning '%s' max keywords length is [12]: lnet/lnet/lib-move.c(2015): no route to %s from %s lnet/klnds/gnilnd/gnilnd.c(1570): bad command %d lnet/klnds/o2iblnd/o2iblnd.c(250): Bad checksum lustre/ptlrpc/gss/gss_pipefs.c(931): refresh ctx %p(uid %d) failed: %d/0x%08x: %s lustre/ptlrpc/gss/sec_gss.c(2375): gss version %u, expect %u lustre/ptlrpc/gss/sec_gss.c(2737): proc %d not support lustre/ptlrpc/gss/gss_svc_upcall.c(1019): handle size %u too large lustre/ptlrpc/gss/gss_cli_upcall.c(265): no such obd %s lustre/ptlrpc/gss/gss_cli_upcall.c(276): obd %s has stopped lustre/ptlrpc/gss/gss_krb5_mech.c(555): bad token id lustre/ptlrpc/sec_bulk.c(902): Invalid svc %u lustre/mdt/mdt_io.c(577): %s: wrong cmd %d received! lustre/mgs/mgs_llog.c(409): fsname %s is too long lustre/fid/fid_store.c(192): %s: Read only %d bytes of %d lustre/osp/osp_precreate.c(230): %s: can't sync lustre/osp/osp_md_object.c(511): %s: lookup %s %s wrong size %d lustre/osp/lwp_dev.c(632): Unknown key %s lustre/osp/osp_sync.c(1671): %s: can't sync lustre/lov/lov_obd.c(1317): ost %d is inactive lustre/lov/lov_dev.c(357): add failed (%d), deleting %s lustre/lov/lov_ea.c(188): LOV EA %s too small: %zu, need %u lustre/lov/lov_ea.c(257): %s: OST index %d missing lustre/target/tgt_grant.c(157): %s: cli %s/%p ted_dirty(%ld) > maxsize(%llu) lustre/target/tgt_grant.c(263): %s: tot_dirty %llu > maxsize %llu lustre/ofd/ofd_io.c(949): %s: wrong cmd %d received! lustre/obdecho/echo.c(364): cleaning up %u pages (%d obdos) lustre/obdecho/echo.c(440): cleaning up %d pages (%d obdos) lustre/obdecho/echo_client.c(2373): invalid oid %s lustre/obdecho/echo_client.c(2421): invalid oid %s lustre/mdd/mdd_object.c(2511): %s: restore %s HSM error: %d/%d lustre/mgc/mgc_request.c(337): logname %s is too long libcfs/include/libcfs/libcfs_fail.h(165): cfs_race id %x sleeping libcfs/include/libcfs/libcfs_fail.h(192): cfs_race id %x sleeping max keywords length is [13]: lnet/selftest/conrpc.c(1217): Out of memory lnet/selftest/conrpc.c(1263): Out of memory lnet/selftest/brw_test.c(344): BRW RPC to %s failed with %d lnet/selftest/console.c(414): Out of memory lnet/selftest/console.c(483): Out of memory lnet/selftest/console.c(1552): Out of memory lnet/selftest/console.c(1948): Out of memory lnet/selftest/console.c(1971): Out of memory lnet/lnet/config.c(278): Out of Memory lnet/lnet/lib-move.c(4216): %s, src %s: bad %s payload %d (0 expected) lnet/klnds/o2iblnd/o2iblnd_cb.c(2718): Can't accept %s: %d lnet/klnds/o2iblnd/o2iblnd_cb.c(3316): %s DISCONNECTED lnet/klnds/o2iblnd/o2iblnd.c(229): Bad version: %x lnet/klnds/o2iblnd/o2iblnd.c(269): Bad src nid: %s lnet/klnds/o2iblnd/o2iblnd.c(2556): Out of memory lustre/ptlrpc/sec.c(411): import %p (%s) with no sec lustre/ptlrpc/gss/sec_gss.c(2414): unknown proc %u lustre/ptlrpc/gss/gss_krb5_mech.c(572): bad EC or RRC lustre/ptlrpc/gss/gss_krb5_mech.c(853): page %d: odd offset %u len %u, blocksize %d lustre/ptlrpc/gss/gss_krb5_mech.c(1301): odd bodysize %d lustre/ptlrpc/gss/gss_krb5_mech.c(1369): error decrypt lustre/ptlrpc/nrs.c(292): NRS: arg '%s' is too long lustre/ptlrpc/pack_generic.c(575): buffer %d length %d is not valid lustre/ptlrpc/service.c(2917): cannot start %s thread #%d_%d: rc %d lustre/ptlrpc/sec_config.c(475): can't merge: %d lustre/ptlrpc/sec_bulk.c(893): Invalid type %u lustre/mdt/mdt_identity.c(92): no upcall set lustre/mdt/mdt_handler.c(3580): Invalid fid: %s lustre/osp/osp_md_object.c(521): %s: lookup %s %s invalid fid %s lustre/osp/osp_sync.c(1219): can't get llh lustre/osd-zfs/osd_handler.c(918): %s: can't open %s lustre/osd-zfs/osd_index.c(1059): %s: OIC for %s isn't filled lustre/lmv/lmv_obd.c(299): target %s not attached lustre/osd-ldiskfs/osd_handler.c(6133): %s: OIC for %s isn't filled lustre/osd-ldiskfs/osd_scrub.c(1758): Fail to find %.*s in %.*s (%lu/%u): rc = %d lustre/osd-ldiskfs/osd_quota.c(587): Too many(%d) trans qids! lustre/osd-ldiskfs/osd_oi.c(323): %s: can't open %s: rc = %d lustre/osd-ldiskfs/osd_oi.c(463): %s: can't open %s: rc = %d lustre/osd-ldiskfs/osd_oi.c(604): %s: lookup FLD %s: rc = %d lustre/osd-ldiskfs/osd_io.c(1872): %s: can't read %u@%llu on ino %lu: rc = %ld lustre/ldlm/ldlm_resource.c(940): Unknown type %d for ns %s lustre/target/tgt_lastrcvd.c(1314): %s: client idx %d has offset %lld lustre/target/tgt_handler.c(218): %s: short ioobj lustre/obdclass/obd_mount_server.c(303): can't set_fs %d lustre/obdclass/llog_test.c(690): 4: close log %s failed: %d lustre/obdclass/class_obd.c(481): Device %d not attached lustre/obdclass/obd_mount.c(202): %s setup error %d lustre/obdclass/obd_config.c(455): Device %d not attached lustre/obdclass/obd_config.c(572): OBD device %d still set up lustre/obdclass/obd_config.c(579): OBD device %d not attached lustre/obdclass/obd_config.c(2090): empty cleanup lustre/mdd/mdd_device.c(374): %s: cancel idx %u of catalog %s: rc = %d lustre/mdd/mdd_dir.c(1208): link_ea add '%.*s' failed %d %s lustre/mdd/mdd_dir.c(1212): link_ea del '%.*s' failed %d %s lustre/osc/osc_request.c(658): dirty %lu - %lu > dirty_max %lu lustre/osc/osc_request.c(674): dirty %lu - dirty_max %lu too big??? lustre/osc/osc_cache.c(671): max_pages: %#x chunkbits: %u chunk_mask: %#lx max keywords length is [14]: lnet/selftest/rpc.c(1060): Event: status %d, type %d, lnet %d lnet/selftest/rpc.c(1536): Dropping RPC (%s) from %s: status %d mlength %d type %u magic %u. lnet/selftest/brw_test.c(389): Bulk transfer %s %s has failed: %d lnet/lnet/api-ni.c(4208): %s: Short reply %d(%d min) lnet/lnet/api-ni.c(4218): %s: Short reply %d(%d expected) lnet/lnet/router.c(486): Routes to %s via %s and %s not supported lnet/lnet/config.c(366): Duplicate net %s. Ignore lnet/lnet/lib-move.c(1966): can't send to %s. no NI on %s lnet/lnet/lib-move.c(2971): Error sending %s to %s: %d lnet/lnet/lib-move.c(4231): %s, src %s: bad %s payload %d (%d max expected) lnet/lnet/lib-move.c(4690): Dropping PUT (%llu:%d:%s): MD (%d) invalid lnet/lnet/lib-move.c(4910): Dropping GET (%llu:%d:%s): MD (%d) invalid lnet/lnet/lib-socket.c(298): Error %d getting sock %s IP/port lnet/klnds/socklnd/socklnd.c(1239): Route %s %pI4h connected to %pI4h lnet/klnds/socklnd/socklnd_cb.c(1212): [%p] Error %d on read from %s ip %pI4h:%d lnet/klnds/o2iblnd/o2iblnd_cb.c(526): Stale rx from %s lnet/klnds/o2iblnd/o2iblnd_cb.c(3359): Timed out tx: %s, %lld seconds lustre/ptlrpc/gss/gss_rawobj.c(165): fail to alloc %u bytes lustre/ptlrpc/gss/gss_keyring.c(1336): invalid: data %p, len %lu lustre/ptlrpc/gss/sec_gss.c(1088): gss backend 0x%x not found lustre/ptlrpc/gss/gss_crypto.c(76): failed to set %s key, len %d, rc = %d lustre/ptlrpc/gss/gss_crypto.c(456): encrypt error %d lustre/ptlrpc/nrs.c(726): Policy %s is busy with %d references lustre/ptlrpc/pack_generic.c(2373): Invalid magic %#x lustre/mdt/mdt_handler.c(2306): Invalid hash: %#llx != %#llx lustre/mdt/mdt_handler.c(5600): param is empty lustre/mgs/mgs_llog.c(967): %s: modify %s/%s failed: rc = %d lustre/mgs/mgs_llog.c(2028): %s: writing log %s: rc = %d lustre/mgs/mgs_llog.c(3732): param is empty lustre/fid/fid_store.c(230): %s: Can't find "%s" obj %d lustre/lod/lod_lov.c(120): %s: qos_del_tgt(%s) failed:rc = %d lustre/lod/lod_lov.c(565): %s: device %d is not set up lustre/lod/lod_object.c(3605): Invalid magic %x lustre/lod/lod_dev.c(132): %s: invalid FID %s lustre/lod/lod_dev.c(2034): %s: disconnect #%d lustre/include/obd_class.h(1284): cleaned up obd lustre/include/lustre_fid.h(871): Too large OID %#llx to set IDIF %s lustre/include/lustre_fid.h(880): Too large OID %#llx to set REG %s lustre/include/obd_support.h(818): vmalloc of '%s' (%d bytes) failed lustre/osd-zfs/osd_xattr.c(1131): %s: free xattr %s failed: rc = %d lustre/osd-zfs/osd_xattr.c(1140): %s: free xattr %llu failed: rc = %d lustre/osd-zfs/osd_handler.c(958): %s: lookup for %s failed: rc = %d lustre/osd-zfs/osd_index.c(409): wrong parent: %s != %s lustre/osd-zfs/osd_index.c(1209): %s: zap_remove %s failed: rc = %d lustre/osd-zfs/osd_object.c(610): %s: lookup %s/%#llx failed: rc = %d lustre/mdc/mdc_request.c(531): validate acl: %d lustre/mdc/mdc_request.c(1503): %s: page error: %s at %llu: rc %d lustre/lmv/lmv_fld.c(61): %s: invalid FID %s lustre/lmv/lmv_obd.c(172): %sactivation of %s failed: %d lustre/lmv/lmv_obd.c(308): target %s is not set up lustre/lmv/lmv_obd.c(699): %s: invalid FID %s: rc = %d lustre/osd-ldiskfs/osd_handler.c(2460): ** setting %s read-only ** lustre/osd-ldiskfs/osd_handler.c(7696): %s: can't mount %s: %d lustre/osd-ldiskfs/osd_io.c(207): bi_next: %p, bi_flags: %lx, %sbi_rw: %lu,%sbi_opf: %u,%sbi_vcnt: %d, bi_idx: %d, bi->size: %d, bi_end_io: %p,bi_cnt: %d, bi_private: %p lustre/lov/lov_obd.c(469): %s: event %d failed: rc = %d lustre/ldlm/ldlm_resource.c(124): Failed to set %s, rc = %d. lustre/target/tgt_lastrcvd.c(166): %s: slot %d not allocated lustre/target/out_handler.c(142): %s: invalid fid %s: rc = %d lustre/target/out_handler.c(549): %s: invalid FID %s: rc = %d lustre/target/out_handler.c(611): %s: invalid FID %s: rc = %d lustre/target/out_handler.c(1073): %s: invalid FID %s: rc = %d lustre/target/tgt_grant.c(260): %s: tot_granted %llu > maxsize %llu lustre/fld/fld_index.c(375): %s: Can't find "%s" obj %d lustre/ofd/ofd_dev.c(373): param is empty lustre/obdclass/llog.c(1406): %s: backup llog %s to itself (%s), objects %p/%p lustre/obdclass/llog_obd.c(209): %s: ctxt %d lop_setup=%p failed: rc = %d lustre/obdclass/llog_test.c(133): 1b: close log %s failed: %d lustre/obdclass/llog_test.c(867): 5g: close log %s failed: %d lustre/obdclass/llog_test.c(1244): 8a: close log %s failed: %d lustre/obdclass/llog_test.c(1275): 8b: close log %s failed: %d lustre/obdclass/llog_test.c(1316): 8d: close log %s failed: %d lustre/obdclass/llog_test.c(2090): 10: close log %s failed: %d lustre/obdclass/obd_mount.c(197): %s attach error %d lustre/obdclass/obd_mount.c(553): del MDC UUID %s failed: rc = %d lustre/obdclass/obd_config.c(2119): detach failed %d: %s lustre/obdecho/echo.c(92): can't connect %d lustre/obdecho/echo.c(512): %s: invalid seq %#llx lustre/obdecho/echo_client.c(2199): lookup %s under parent %s: rc = %d lustre/llite/vvp_io.c(869): Wrong IO type %u lustre/llite/statahead.c(1320): %s: reading dir %s at %lluopendir_pid = %u : rc = %d lustre/llite/dir.c(1417): %s: lookup %.*s failed: rc = %d lustre/llite/namei.c(93): Invalid inode %s type lustre/osc/osc_request.c(1797): Unexpected rc %d (%d requested) lustre/osc/osc_request.c(1803): Unexpected rc %d (%d transferred) lustre/mgc/mgc_request.c(65): missing name: %s max keywords length is [15]: lnet/selftest/brw_test.c(262): Bulk page %p (%d/%d) is corrupted! lnet/selftest/brw_test.c(370): Bulk data from %s is corrupted! lnet/selftest/brw_test.c(427): Bulk data from %s is corrupted! lnet/selftest/ping_test.c(132): Unable to ping %s (%d): %d lnet/selftest/ping_test.c(190): Unexpect magic %08x from %s lnet/lnet/api-ni.c(2275): Can't load LND %s, module %s, rc=%d lnet/lnet/api-ni.c(4113): Can't bind MD: %d lnet/lnet/router.c(998): Can't bind MD: %d lnet/lnet/lib-ptl.c(575): Invalid portal %d not in [0-%d] lnet/lnet/lib-move.c(3524): Can't bind MD: %d lnet/klnds/socklnd/socklnd_proto.c(295): ksnc_type bad: %u lnet/klnds/socklnd/socklnd_proto.c(329): ksnc_type bad: %u lnet/klnds/o2iblnd/o2iblnd_cb.c(767): Can't find page lnet/klnds/o2iblnd/o2iblnd_cb.c(1752): Can't allocate %s txd for %s lnet/klnds/o2iblnd/o2iblnd_cb.c(2572): Can't accept %s: message size %d too big (%d max) lnet/klnds/o2iblnd/o2iblnd.c(213): Short message: %d lnet/klnds/o2iblnd/o2iblnd.c(234): Short message: %d lnet/klnds/o2iblnd/o2iblnd.c(862): cmid HCA(%s), kib_dev(%s) need failover lnet/klnds/o2iblnd/o2iblnd.c(2342): Can't allocate %d tx descriptors lnet/klnds/o2iblnd/o2iblnd.c(2715): Failed to bind %s:%pI4h to device(%p): %d lnet/klnds/o2iblnd/o2iblnd.c(2784): Failed to bind %s:%pI4h to device(%p): %d lustre/ptlrpc/gss/gss_pipefs.c(864): fail to get seq lustre/ptlrpc/gss/sec_gss.c(2360): buf count only %d lustre/ptlrpc/gss/gss_krb5_mech.c(638): bad krb5 header lustre/ptlrpc/gss/gss_krb5_mech.c(1280): bad krb5 header lustre/ptlrpc/gss/gss_krb5_mech.c(1448): bad krb5 header lustre/ptlrpc/pack_generic.c(566): message length %d too small for %d buflens lustre/ptlrpc/pack_generic.c(582): len: %d, required_len %d, bufcount: %d lustre/ptlrpc/sec_config.c(84): unknown target %p(%s) lustre/ptlrpc/sec_config.c(663): can't find conf lustre/mdt/mdt_open.c(230): %s: replay trans %llu NID %s: rc = %d lustre/mgs/mgs_llog.c(3539): Invalid param: %s lustre/mgs/mgs_llog.c(3726): svname is empty lustre/mgs/mgs_llog.c(5488): %s is not defined lustre/lod/lod_qos.c(213): %s: statfs: rc = %d lustre/lod/lod_object.c(6125): %s: can't locate %s:rc = %d lustre/lod/lod_dev.c(1013): %s: can not find %s: rc = %d lustre/lod/lod_dev.c(1030): %s: can not find %s: rc = %d lustre/include/lustre_debug.h(53): bad page index %lu > %llu lustre/osp/osp_object.c(744): %s: update error %s: rc = %d lustre/osp/osp_sync.c(1414): %s: the catid %s for init llog %d is bad lustre/osd-zfs/osd_xattr.c(1126): %s: lookup xattr %s failed: rc = %d lustre/osd-zfs/osd_handler.c(1229): %s: lost %d loaned abuf(s) lustre/osd-zfs/osd_handler.c(1232): %s: lost %d pinned dbuf(s) lustre/osd-zfs/osd_object.c(887): %s: zap_remove(%s) failed: rc = %d lustre/lmv/lmv_obd.c(315): target %s connect error %d lustre/osd-ldiskfs/osd_handler.c(1727): transaction @0x%p commit error: %d lustre/osd-ldiskfs/osd_io.c(1140): initial space: %lu:%u lustre/lov/lov_obd.c(174): Target %s connect error %d lustre/ldlm/ldlm_lockd.c(2190): unknown opcode %u lustre/ldlm/ldlm_lockd.c(2368): invalid opcode %d lustre/target/tgt_lastrcvd.c(1208): %s: can't update %s file: rc = %d lustre/target/tgt_grant.c(257): %s: tot_pending %llu > tot_granted %llu lustre/target/tgt_grant.c(1444): %s: cli %s/%p ted_pending(%lu) < grant_used(%lu) lustre/target/tgt_grant.c(1453): %s: cli %s/%p tot_granted(%llu) < grant_used(%lu) lustre/target/tgt_grant.c(1462): %s: cli %s/%p tot_pending(%llu) < grant_used(%lu) lustre/target/tgt_handler.c(95): %s: invalid FID: %s lustre/target/tgt_handler.c(929): %s: RPC source %s doesn't match %s lustre/target/tgt_handler.c(1880): %s: sync returns %d lustre/fld/fld_index.c(108): %s: lookup range %s error: rc = %d lustre/fld/fld_index.c(202): %s: insert range %s failed: rc = %d lustre/obdclass/llog.c(1065): %s: loghandle %p with no header lustre/obdclass/genops.c(330): type %s has refcount (%d) lustre/obdclass/llog_osd.c(263): %s: bad log %s %s header magic: %#x (expected %#x) lustre/obdclass/obd_mount_server.c(134): %s not registered lustre/obdclass/class_obd.c(390): No UUID passed! lustre/obdclass/obd_mount.c(469): connect failed %d lustre/obdclass/obd_config.c(376): No type passed! lustre/obdclass/obd_config.c(382): No name passed! lustre/obdclass/obd_config.c(387): No UUID passed! lustre/obdclass/obd_config.c(1264): no device for: %s lustre/obdclass/obd_config.c(2111): cleanup failed %d: %s lustre/llite/dir.c(744): unknown magic: %lX lustre/mdd/mdd_device.c(1874): Bad ioctl size %d lustre/osc/osc_request.c(1625): %s: sync returns %d lustre/mgc/mgc_request.c(61): name too long: %s lustre/mgc/mgc_request.c(1488): MNE too large (%u) |
| Comments |
| Comment by Andreas Dilger [ 27/Feb/21 ] |
|
Hi Feng Lei, It may be that some of your found messages are false positives if the strings are split across lines. Don't go too crazy making very verbose error messages, especially those that are printed frequently, since they may be printed thousands of times on thousands of nodes. The messages need to be unique and explain enough of the state to understand what I going wrong, but do not need to be complete sentences or use proper grammar. It is better to have a good explanation of the error in Jira (or in the manual for those "normal" errors that happen under regular usage, and best if the cause of the error is fixed and never hit in the first place, or the error condition fixes itself and does not depend on the user to fix it for them. When you are cleaning up these message strings, please also fix them to match the modern coding style:
If the patch gets too large (say, over 200 lines) then consider to split it into a few smaller patches and submit each one separately. That reduces the number of conflicts with other patches, and allows some changes to land even if one or two of the patches have conflicts, or they need to be updated a few times to make the messages more clear. If the patch is only changing white space and code style, then it should use the "Test-Parameters: trivial" label to minimize the testing. Note however that there are some error strings that are checked for while running tests. These are mostly annotated in the code already. |
| Comment by Li Xi [ 05/Mar/21 ] |
|
I think there need to be some balance on the length of the error messages. For the messages with keyword < 8, the length is obviously too short which means lack of unique and explain.
I think the most of the error messages are not print frequently. There are about 2800+ error messaeges, I think only 20 of them could be seen frequently in the cases of disconnection or recovery. Agreed too many long messages is annoying. But I think the major effort should be spent on reducing the number of the error messages rather than the length of it. Ideally, the system prints zero error messages and fixes the problems automatically. But if a error message is printed, it should provide enough information both for the administrators and the developers. The current error messages might be good enough for developers/debuggers who are reading and modifying the sources everyday, but I think it is less useful for new users/adminitrators who only need to know where to modify/configure and who to get help from. The administrators might be able to search in Google/Stackoverflow/Jira/Maillist, but won't be able to read the codes based on the function name and line number. Thus, I think the "keyword" need to be long enough to be searchable. For the keyword < 14, I think we are very safe from adding too long string in each individual messages. The longest error messages in Lustre are: max keywords length is [115]:
lustre/osd-ldiskfs/osd_handler.c(7610): %s: device %s extents feature is not guaranteed to work on big-endian systems. Use "bigendian_extents" mount option to override.
max keywords length is [132]:
lustre/obdecho/echo_client.c(944): Local operations are NOT supported on client side. Only remote operations are supported. Metadata client must be run on server side.
lustre/obdecho/echo_client.c(2973): Local operations are NOT supported on client side. Only remote operations are supported. Metadata client must be run on server side.
max keywords length is [136]:
lustre/osd-ldiskfs/osd_handler.c(7706): %s: device %s LDISKFS does not support filesystems greater than 512TB and can cause data corruption. Use "force_over_512tb" mount option to override.
max keywords length is [156]:
lustre/ptlrpc/nrs.c(1199): NRS: failing to register policy %s. Please check policy flags; external policies cannot act as fallback policies, or be started immediately upon registration without interaction with lprocfs
max keywords length is [415]:
lustre/osd-ldiskfs/osd_io.c(185): ***** bio->bi_private is NULL! This should never happen. Normally, I would crash here, but instead I will dump the bio contents to the console. Please report this to <https://jira.whamcloud.com/> , along with any interesting messages leading up to this point (like SCSI errors, perhaps). Because bi_private is NULL, I can't wake up the thread that initiated this IO - you will probably have to reboot this node.
The last one with length of 415 is a little bit too long. The other ones with 100+ lengths still look fine. Reading the message could be finished in just a glance. So if a error message is shorter than 80, I don't think we need to worry at all. And I guess nobody would have time to write message with length of 80 for all 2800 error messages. If he/she does want to, that helps a lot on improving the quality of the codes, rather than making the error message too verbose. And also, disks are getting bigger and bigger. We don't need worry that the increasing length of messages would flood our disks, do we? We can always archive the logs into a tarball or even use the 1PB Lustre to save them. |
| Comment by Andreas Dilger [ 05/Mar/21 ] |
|
I'm not only considering the disk space (which can be quite a lot), but also the fact that longer error messages consume more RAM to store and more CPU to process them. That adds overhead for relatively little benefit. I agree that proper error messages that contain the device name, important data values, FID, error code are needed for debugging. I'm not a fan error messages that are trying to explain and solve the problem, like "NRS: failing to register policy %s. Please check policy flags; external policies cannot act as fallback policies, or be started immediately upon registration without interaction with lprocfs". Having an error message like "testfs-OST0000: failed to register external NRS policy '%s': rc = -22" is enough if there is a good (hopefully much better) explanation in the manual about how to use/configure NRS policies. The length of the bi_private error message is almost a joke, IMHO. Not only could a shorter message convey the same information, but if it referenced a specific LU or LUDOC ticket in Jira (rather than just a URL to file a bug), like "foofs-OST1234: bi_private is NULL. Unable to complete IO, OST may be unable to unmount. See |
| Comment by Gerrit Updater [ 05/Mar/21 ] |
|
Feng, Lei (flei@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41892 |
| Comment by Gerrit Updater [ 10/Oct/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/41892/ |
| Comment by Peter Jones [ 10/Oct/21 ] |
|
Landed for 2.15 |