Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • Lustre 2.15.0
    • 9223372036854775807

    Description

      These log messages are too short. Rewrite them in a more meanful way.

      max keywords length is [3]:

      lustre/osd-ldiskfs/osd_iam_lvar.c(327): %#x < %#x
      lustre/obdclass/lu_object.c(2284): [%d]: %p %x (%p,%p,%p) %d %d "%s"@%p
      

      max keywords length is [4]:

      lnet/lnet/api-ni.c(4236): rc2 %d
      libcfs/include/libcfs/libcfs_private.h(119): LBUG
      

      max keywords length is [6]:

      lnet/klnds/o2iblnd/o2iblnd.c(164): Short %s: %d(%d)
      lnet/klnds/o2iblnd/o2iblnd.c(275): Short %s: %d(%d)
      lustre/osd-ldiskfs/osd_iam_lvar.c(279): used: %d
      

      max keywords length is [7]:

      lustre/ptlrpc/gss/gss_rawobj.c(121): buflen %u < %u
      lustre/ptlrpc/gss/gss_rawobj.c(139): buflen %u
      lustre/ptlrpc/gss/gss_rawobj.c(155): buflen %u < %u
      lustre/ptlrpc/gss/gss_rawobj.c(230): buflen %u < %u
      lustre/ptlrpc/gss/gss_pipefs.c(793): buflen %u < %u
      lustre/target/tgt_grant.c(571): %s: cli %s/%p dirty %ld pend %ld grant %ld
      lustre/target/tgt_grant.c(867): %s: cli %s/%p dirty %ld pend %ld grant %ld
      lustre/obdclass/obd_mount_server.c(1618): no obd %s
      lustre/obdclass/class_obd.c(596): do_div(%#llx,256) %llu != %llu
      lustre/obdecho/echo_client.c(1817): lookup %s: rc = %d
      

      max keywords length is [8]:

      lustre/ptlrpc/pack_generic.c(584): buffer %d length %d
      lustre/include/lustre_fid.h(778): Bad %llu to set %s
      lustre/osp/osp_sync.c(153): Bad %llu to set %s
      lustre/osd-ldiskfs/osd_iam_lvar.c(155): %p %8.8x "%*.*s"
      lustre/osd-ldiskfs/osd_io.c(1224): ino %lu, blk %llu create %d: rc %d
      lustre/ofd/ofd_fs.c(187): Bad %llu to set %s
      lustre/ofd/ofd_obd.c(1157): %s: Bad %llu to set %s : rc %d
      lustre/obdclass/lu_ref.c(96): lu_ref: %p %d %d %s:%d
      lustre/llite/symlink.c(77): %s: inode %s: rc = %d
      lustre/osc/osc_object.c(380): Bad %llu to set %s : rc %d
      

      max keywords length is [9]:

      lnet/klnds/socklnd/socklnd_proto.c(613): Bad nips %d from ip %pI4h
      lnet/klnds/socklnd/socklnd_proto.c(683): Bad nips %d from ip %pI4h
      lustre/ptlrpc/nrs_tbf.c(995): jobids \{%s} illegal
      lustre/ptlrpc/nrs_tbf.c(1199): nids \{%s} illegal
      lustre/ptlrpc/nrs_tbf.c(2663): %ss \{%s} illegal
      lustre/target/tgt_grant.c(967): %s: cli %s/%p grant %ld want %llu current %llu
      lustre/obdclass/class_obd.c(591): do_div(%#llx,256) != %llu
      lustre/obdclass/obd_config.c(559): setup %s failed (%d)
      libcfs/include/libcfs/libcfs_private.h(75): ASSERTION
      

      max keywords length is [10]:

      lnet/selftest/rpc.c(433): LNet%s(%s, %d, %lld) failed: %d
      lnet/selftest/rpc.c(1058): RPC %p: bulk %p, service %d
      lnet/selftest/rpc.c(1461): rpcev %p, crpc %p, reqstev %p, replyev %p, bulkev %p
      lnet/selftest/ping_test.c(147): Bad magic %u from %s, %u expected.
      lnet/selftest/ping_test.c(156): Bad seq %u from %s, %u expected.
      lnet/lnet/api-ni.c(3245): net %s not found
      lnet/lnet/api-ni.c(3277): nid %s not found
      lnet/lnet/lib-ptl.c(769): Active ME %p on exit
      lnet/lnet/lib-move.c(4333): net %s not found
      lnet/klnds/socklnd/socklnd_proto.c(635): Zero IP[%d] from ip %pI4h
      lnet/klnds/socklnd/socklnd_proto.c(705): Zero IP[%d] from ip %pI4h
      lnet/klnds/o2iblnd/o2iblnd_cb.c(614): Can't map %u bytes (%u/%u)s: %d
      lustre/ptlrpc/niobuf.c(95): LNetPut(%s, %d, %lld) failed: %d
      lustre/ptlrpc/gss/sec_gss.c(2018): sec size %d too small
      lustre/ptlrpc/gss/gss_cli_upcall.c(270): obd %s not setup
      lustre/ptlrpc/gss/gss_krb5_mech.c(567): bad filler
      lustre/mgs/mgs_llog.c(205): Insane cfg
      lustre/mgs/mgs_llog.c(629): max index %d exceeded.
      lustre/mgs/mgs_llog.c(872): Insane cfg
      lustre/mgs/mgs_llog.c(2109): Insane cfg
      lustre/mgs/mgs_llog.c(3700): Insane cfg
      lustre/include/obd_class.h(428): Device %d not setup
      lustre/include/obd_class.h(478): %s: no %s operation
      lustre/include/obd_class.h(501): %s: no %s operation
      lustre/include/obd_class.h(564): %s: no %s operation
      lustre/include/obd_class.h(670): %s: no %s operation
      lustre/include/obd_class.h(693): %s: no %s operation
      lustre/include/obd_class.h(713): %s: no %s operation
      lustre/include/obd_class.h(733): %s: no %s operation
      lustre/include/obd_class.h(754): %s: no %s operation
      lustre/include/obd_class.h(772): %s: no %s operation
      lustre/include/obd_class.h(788): %s: no %s operation
      lustre/include/obd_class.h(827): %s: no %s operation
      lustre/include/obd_class.h(872): %s: no %s operation
      lustre/include/obd_class.h(917): %s: no %s operation
      lustre/include/obd_class.h(931): %s: no %s operation
      lustre/include/obd_class.h(944): %s: no %s operation
      lustre/include/obd_class.h(959): %s: no %s operation
      lustre/include/obd_class.h(974): %s: no %s operation
      lustre/include/obd_class.h(1064): %s: no %s operation
      lustre/include/obd_class.h(1129): %s: no %s operation
      lustre/include/obd_class.h(1154): %s: no %s operation
      lustre/include/obd_class.h(1176): %s: no %s operation
      lustre/include/obd_class.h(1260): %s: no %s operation
      lustre/include/lustre_fid.h(864): bad IGIF, %s
      lustre/mdc/mdc_request.c(2317): Bad magic %x!=%x
      lustre/mdc/mdc_locks.c(91): it disp: %X, status: %d
      lustre/osd-ldiskfs/osd_compat.c(99): found %s (%lu/%u) is mode %o
      lustre/fld/fld_request.c(108): exp: 0x%p (%s), srv: 0x%p (%s), idx: %llu
      lustre/obdclass/genops.c(491): Cleanup %s returned %d
      lustre/obdclass/llog_test.c(846): 5e: found %d records
      lustre/obdclass/llog_test.c(859): 5f: found %d records
      lustre/obdclass/obd_config.c(610): Device %d not setup
      lustre/llite/file.c(4241): %s: migrate %s, but FID %s is insane
      lustre/mdd/mdd_device.c(1879): Bad magic %x != %x
      

      max keywords length is [11]:

      lnet/lnet/lib-move.c(2217): no path to %s from net %s
      lnet/klnds/o2iblnd/o2iblnd_cb.c(3272): %s: REJECTED %d
      lnet/klnds/o2iblnd/o2iblnd.c(222): Bad magic: %08x
      lustre/ptlrpc/gss/gss_pipefs.c(971): failed msg %p (seq %u, uid %u, svc %u, nid %#llx, obd %.*s): errno %d
      lustre/ptlrpc/gss/gss_krb5_mech.c(1125): odd offset %d in page %d
      lustre/mgs/mgs_llog.c(4301): err %d on param '%s'
      lustre/lod/lod_lov.c(239): can't find %s device
      lustre/lod/lod_lov.c(548): can't find %s device
      lustre/lod/lod_lov.c(1095): %s: bad idx: %d of %d
      lustre/include/obd_class.h(413): NULL device
      lustre/include/obd_class.h(1190): NULL device
      lustre/lov/lov_obd.c(134): Target %s not set up
      lustre/lov/lov_dev.c(344): Target %s not set up
      lustre/ldlm/ldlm_lib.c(240): connection %s not found
      lustre/target/out_handler.c(915): %s: undo for %s:%d: rc = %d
      lustre/target/update_recovery.c(1415): %s: undo for %s:%d: rc = %d
      lustre/fld/fld_cache.c(347): NEW range =%s curr = %s
      lustre/obdclass/lu_ref.c(98): link: %s %p
      lustre/obdclass/obd_config.c(667): Precleanup %s returned %d
      lustre/mdd/mdd_orphans.c(455): %s: bad FID %s cleaning '%s'
      

      max keywords length is [12]:

      lnet/lnet/lib-move.c(2015): no route to %s from %s
      lnet/klnds/gnilnd/gnilnd.c(1570): bad command %d
      lnet/klnds/o2iblnd/o2iblnd.c(250): Bad checksum
      lustre/ptlrpc/gss/gss_pipefs.c(931): refresh ctx %p(uid %d) failed: %d/0x%08x: %s
      lustre/ptlrpc/gss/sec_gss.c(2375): gss version %u, expect %u
      lustre/ptlrpc/gss/sec_gss.c(2737): proc %d not support
      lustre/ptlrpc/gss/gss_svc_upcall.c(1019): handle size %u too large
      lustre/ptlrpc/gss/gss_cli_upcall.c(265): no such obd %s
      lustre/ptlrpc/gss/gss_cli_upcall.c(276): obd %s has stopped
      lustre/ptlrpc/gss/gss_krb5_mech.c(555): bad token id
      lustre/ptlrpc/sec_bulk.c(902): Invalid svc %u
      lustre/mdt/mdt_io.c(577): %s: wrong cmd %d received!
      lustre/mgs/mgs_llog.c(409): fsname %s is too long
      lustre/fid/fid_store.c(192): %s: Read only %d bytes of %d
      lustre/osp/osp_precreate.c(230): %s: can't sync
      lustre/osp/osp_md_object.c(511): %s: lookup %s %s wrong size %d
      lustre/osp/lwp_dev.c(632): Unknown key %s
      lustre/osp/osp_sync.c(1671): %s: can't sync
      lustre/lov/lov_obd.c(1317): ost %d is inactive
      lustre/lov/lov_dev.c(357): add failed (%d), deleting %s
      lustre/lov/lov_ea.c(188): LOV EA %s too small: %zu, need %u
      lustre/lov/lov_ea.c(257): %s: OST index %d missing
      lustre/target/tgt_grant.c(157): %s: cli %s/%p ted_dirty(%ld) > maxsize(%llu)
      lustre/target/tgt_grant.c(263): %s: tot_dirty %llu > maxsize %llu
      lustre/ofd/ofd_io.c(949): %s: wrong cmd %d received!
      lustre/obdecho/echo.c(364): cleaning up %u pages (%d obdos)
      lustre/obdecho/echo.c(440): cleaning up %d pages (%d obdos)
      lustre/obdecho/echo_client.c(2373): invalid oid %s
      lustre/obdecho/echo_client.c(2421): invalid oid %s
      lustre/mdd/mdd_object.c(2511): %s: restore %s HSM error: %d/%d
      lustre/mgc/mgc_request.c(337): logname %s is too long
      libcfs/include/libcfs/libcfs_fail.h(165): cfs_race id %x sleeping
      libcfs/include/libcfs/libcfs_fail.h(192): cfs_race id %x sleeping
      

      max keywords length is [13]:

      lnet/selftest/conrpc.c(1217): Out of memory
      lnet/selftest/conrpc.c(1263): Out of memory
      lnet/selftest/brw_test.c(344): BRW RPC to %s failed with %d
      lnet/selftest/console.c(414): Out of memory
      lnet/selftest/console.c(483): Out of memory
      lnet/selftest/console.c(1552): Out of memory
      lnet/selftest/console.c(1948): Out of memory
      lnet/selftest/console.c(1971): Out of memory
      lnet/lnet/config.c(278): Out of Memory
      lnet/lnet/lib-move.c(4216): %s, src %s: bad %s payload %d (0 expected)
      lnet/klnds/o2iblnd/o2iblnd_cb.c(2718): Can't accept %s: %d
      lnet/klnds/o2iblnd/o2iblnd_cb.c(3316): %s DISCONNECTED
      lnet/klnds/o2iblnd/o2iblnd.c(229): Bad version: %x
      lnet/klnds/o2iblnd/o2iblnd.c(269): Bad src nid: %s
      lnet/klnds/o2iblnd/o2iblnd.c(2556): Out of memory
      lustre/ptlrpc/sec.c(411): import %p (%s) with no sec
      lustre/ptlrpc/gss/sec_gss.c(2414): unknown proc %u
      lustre/ptlrpc/gss/gss_krb5_mech.c(572): bad EC or RRC
      lustre/ptlrpc/gss/gss_krb5_mech.c(853): page %d: odd offset %u len %u, blocksize %d
      lustre/ptlrpc/gss/gss_krb5_mech.c(1301): odd bodysize %d
      lustre/ptlrpc/gss/gss_krb5_mech.c(1369): error decrypt
      lustre/ptlrpc/nrs.c(292): NRS: arg '%s' is too long
      lustre/ptlrpc/pack_generic.c(575): buffer %d length %d is not valid
      lustre/ptlrpc/service.c(2917): cannot start %s thread #%d_%d: rc %d
      lustre/ptlrpc/sec_config.c(475): can't merge: %d
      lustre/ptlrpc/sec_bulk.c(893): Invalid type %u
      lustre/mdt/mdt_identity.c(92): no upcall set
      lustre/mdt/mdt_handler.c(3580): Invalid fid: %s
      lustre/osp/osp_md_object.c(521): %s: lookup %s %s invalid fid %s
      lustre/osp/osp_sync.c(1219): can't get llh
      lustre/osd-zfs/osd_handler.c(918): %s: can't open %s
      lustre/osd-zfs/osd_index.c(1059): %s: OIC for %s isn't filled
      lustre/lmv/lmv_obd.c(299): target %s not attached
      lustre/osd-ldiskfs/osd_handler.c(6133): %s: OIC for %s isn't filled
      lustre/osd-ldiskfs/osd_scrub.c(1758): Fail to find %.*s in %.*s (%lu/%u): rc = %d
      lustre/osd-ldiskfs/osd_quota.c(587): Too many(%d) trans qids!
      lustre/osd-ldiskfs/osd_oi.c(323): %s: can't open %s: rc = %d
      lustre/osd-ldiskfs/osd_oi.c(463): %s: can't open %s: rc = %d
      lustre/osd-ldiskfs/osd_oi.c(604): %s: lookup FLD %s: rc = %d
      lustre/osd-ldiskfs/osd_io.c(1872): %s: can't read %u@%llu on ino %lu: rc = %ld
      lustre/ldlm/ldlm_resource.c(940): Unknown type %d for ns %s
      lustre/target/tgt_lastrcvd.c(1314): %s: client idx %d has offset %lld
      lustre/target/tgt_handler.c(218): %s: short ioobj
      lustre/obdclass/obd_mount_server.c(303): can't set_fs %d
      lustre/obdclass/llog_test.c(690): 4: close log %s failed: %d
      lustre/obdclass/class_obd.c(481): Device %d not attached
      lustre/obdclass/obd_mount.c(202): %s setup error %d
      lustre/obdclass/obd_config.c(455): Device %d not attached
      lustre/obdclass/obd_config.c(572): OBD device %d still set up
      lustre/obdclass/obd_config.c(579): OBD device %d not attached
      lustre/obdclass/obd_config.c(2090): empty cleanup
      lustre/mdd/mdd_device.c(374): %s: cancel idx %u of catalog %s: rc = %d
      lustre/mdd/mdd_dir.c(1208): link_ea add '%.*s' failed %d %s
      lustre/mdd/mdd_dir.c(1212): link_ea del '%.*s' failed %d %s
      lustre/osc/osc_request.c(658): dirty %lu - %lu > dirty_max %lu
      lustre/osc/osc_request.c(674): dirty %lu - dirty_max %lu too big???
      lustre/osc/osc_cache.c(671): max_pages: %#x chunkbits: %u chunk_mask: %#lx
      

      max keywords length is [14]:

      lnet/selftest/rpc.c(1060): Event: status %d, type %d, lnet %d
      lnet/selftest/rpc.c(1536): Dropping RPC (%s) from %s: status %d mlength %d type %u magic %u.
      lnet/selftest/brw_test.c(389): Bulk transfer %s %s has failed: %d
      lnet/lnet/api-ni.c(4208): %s: Short reply %d(%d min)
      lnet/lnet/api-ni.c(4218): %s: Short reply %d(%d expected)
      lnet/lnet/router.c(486): Routes to %s via %s and %s not supported
      lnet/lnet/config.c(366): Duplicate net %s. Ignore
      lnet/lnet/lib-move.c(1966): can't send to %s. no NI on %s
      lnet/lnet/lib-move.c(2971): Error sending %s to %s: %d
      lnet/lnet/lib-move.c(4231): %s, src %s: bad %s payload %d (%d max expected)
      lnet/lnet/lib-move.c(4690): Dropping PUT (%llu:%d:%s): MD (%d) invalid
      lnet/lnet/lib-move.c(4910): Dropping GET (%llu:%d:%s): MD (%d) invalid
      lnet/lnet/lib-socket.c(298): Error %d getting sock %s IP/port
      lnet/klnds/socklnd/socklnd.c(1239): Route %s %pI4h connected to %pI4h
      lnet/klnds/socklnd/socklnd_cb.c(1212): [%p] Error %d on read from %s ip %pI4h:%d
      lnet/klnds/o2iblnd/o2iblnd_cb.c(526): Stale rx from %s
      lnet/klnds/o2iblnd/o2iblnd_cb.c(3359): Timed out tx: %s, %lld seconds
      lustre/ptlrpc/gss/gss_rawobj.c(165): fail to alloc %u bytes
      lustre/ptlrpc/gss/gss_keyring.c(1336): invalid: data %p, len %lu
      lustre/ptlrpc/gss/sec_gss.c(1088): gss backend 0x%x not found
      lustre/ptlrpc/gss/gss_crypto.c(76): failed to set %s key, len %d, rc = %d
      lustre/ptlrpc/gss/gss_crypto.c(456): encrypt error %d
      lustre/ptlrpc/nrs.c(726): Policy %s is busy with %d references
      lustre/ptlrpc/pack_generic.c(2373): Invalid magic %#x
      lustre/mdt/mdt_handler.c(2306): Invalid hash: %#llx != %#llx
      lustre/mdt/mdt_handler.c(5600): param is empty
      lustre/mgs/mgs_llog.c(967): %s: modify %s/%s failed: rc = %d
      lustre/mgs/mgs_llog.c(2028): %s: writing log %s: rc = %d
      lustre/mgs/mgs_llog.c(3732): param is empty
      lustre/fid/fid_store.c(230): %s: Can't find "%s" obj %d
      lustre/lod/lod_lov.c(120): %s: qos_del_tgt(%s) failed:rc = %d
      lustre/lod/lod_lov.c(565): %s: device %d is not set up
      lustre/lod/lod_object.c(3605): Invalid magic %x
      lustre/lod/lod_dev.c(132): %s: invalid FID %s
      lustre/lod/lod_dev.c(2034): %s: disconnect #%d
      lustre/include/obd_class.h(1284): cleaned up obd
      lustre/include/lustre_fid.h(871): Too large OID %#llx to set IDIF %s
      lustre/include/lustre_fid.h(880): Too large OID %#llx to set REG %s
      lustre/include/obd_support.h(818): vmalloc of '%s' (%d bytes) failed
      lustre/osd-zfs/osd_xattr.c(1131): %s: free xattr %s failed: rc = %d
      lustre/osd-zfs/osd_xattr.c(1140): %s: free xattr %llu failed: rc = %d
      lustre/osd-zfs/osd_handler.c(958): %s: lookup for %s failed: rc = %d
      lustre/osd-zfs/osd_index.c(409): wrong parent: %s != %s
      lustre/osd-zfs/osd_index.c(1209): %s: zap_remove %s failed: rc = %d
      lustre/osd-zfs/osd_object.c(610): %s: lookup %s/%#llx failed: rc = %d
      lustre/mdc/mdc_request.c(531): validate acl: %d
      lustre/mdc/mdc_request.c(1503): %s: page error: %s at %llu: rc %d
      lustre/lmv/lmv_fld.c(61): %s: invalid FID %s
      lustre/lmv/lmv_obd.c(172): %sactivation of %s failed: %d
      lustre/lmv/lmv_obd.c(308): target %s is not set up
      lustre/lmv/lmv_obd.c(699): %s: invalid FID %s: rc = %d
      lustre/osd-ldiskfs/osd_handler.c(2460): ** setting %s read-only **
      lustre/osd-ldiskfs/osd_handler.c(7696): %s: can't mount %s: %d
      lustre/osd-ldiskfs/osd_io.c(207): bi_next: %p, bi_flags: %lx, %sbi_rw: %lu,%sbi_opf: %u,%sbi_vcnt: %d, bi_idx: %d, bi->size: %d, bi_end_io: %p,bi_cnt: %d, bi_private: %p
      lustre/lov/lov_obd.c(469): %s: event %d failed: rc = %d
      lustre/ldlm/ldlm_resource.c(124): Failed to set %s, rc = %d.
      lustre/target/tgt_lastrcvd.c(166): %s: slot %d not allocated
      lustre/target/out_handler.c(142): %s: invalid fid %s: rc = %d
      lustre/target/out_handler.c(549): %s: invalid FID %s: rc = %d
      lustre/target/out_handler.c(611): %s: invalid FID %s: rc = %d
      lustre/target/out_handler.c(1073): %s: invalid FID %s: rc = %d
      lustre/target/tgt_grant.c(260): %s: tot_granted %llu > maxsize %llu
      lustre/fld/fld_index.c(375): %s: Can't find "%s" obj %d
      lustre/ofd/ofd_dev.c(373): param is empty
      lustre/obdclass/llog.c(1406): %s: backup llog %s to itself (%s), objects %p/%p
      lustre/obdclass/llog_obd.c(209): %s: ctxt %d lop_setup=%p failed: rc = %d
      lustre/obdclass/llog_test.c(133): 1b: close log %s failed: %d
      lustre/obdclass/llog_test.c(867): 5g: close log %s failed: %d
      lustre/obdclass/llog_test.c(1244): 8a: close log %s failed: %d
      lustre/obdclass/llog_test.c(1275): 8b: close log %s failed: %d
      lustre/obdclass/llog_test.c(1316): 8d: close log %s failed: %d
      lustre/obdclass/llog_test.c(2090): 10: close log %s failed: %d
      lustre/obdclass/obd_mount.c(197): %s attach error %d
      lustre/obdclass/obd_mount.c(553): del MDC UUID %s failed: rc = %d
      lustre/obdclass/obd_config.c(2119): detach failed %d: %s
      lustre/obdecho/echo.c(92): can't connect %d
      lustre/obdecho/echo.c(512): %s: invalid seq %#llx
      lustre/obdecho/echo_client.c(2199): lookup %s under parent %s: rc = %d
      lustre/llite/vvp_io.c(869): Wrong IO type %u
      lustre/llite/statahead.c(1320): %s: reading dir %s at %lluopendir_pid = %u : rc = %d
      lustre/llite/dir.c(1417): %s: lookup %.*s failed: rc = %d
      lustre/llite/namei.c(93): Invalid inode %s type
      lustre/osc/osc_request.c(1797): Unexpected rc %d (%d requested)
      lustre/osc/osc_request.c(1803): Unexpected rc %d (%d transferred)
      lustre/mgc/mgc_request.c(65): missing name: %s
      

      max keywords length is [15]:

      lnet/selftest/brw_test.c(262): Bulk page %p (%d/%d) is corrupted!
      lnet/selftest/brw_test.c(370): Bulk data from %s is corrupted!
      lnet/selftest/brw_test.c(427): Bulk data from %s is corrupted!
      lnet/selftest/ping_test.c(132): Unable to ping %s (%d): %d
      lnet/selftest/ping_test.c(190): Unexpect magic %08x from %s
      lnet/lnet/api-ni.c(2275): Can't load LND %s, module %s, rc=%d
      lnet/lnet/api-ni.c(4113): Can't bind MD: %d
      lnet/lnet/router.c(998): Can't bind MD: %d
      lnet/lnet/lib-ptl.c(575): Invalid portal %d not in [0-%d]
      lnet/lnet/lib-move.c(3524): Can't bind MD: %d
      lnet/klnds/socklnd/socklnd_proto.c(295): ksnc_type bad: %u
      lnet/klnds/socklnd/socklnd_proto.c(329): ksnc_type bad: %u
      lnet/klnds/o2iblnd/o2iblnd_cb.c(767): Can't find page
      lnet/klnds/o2iblnd/o2iblnd_cb.c(1752): Can't allocate %s txd for %s
      lnet/klnds/o2iblnd/o2iblnd_cb.c(2572): Can't accept %s: message size %d too big (%d max)
      lnet/klnds/o2iblnd/o2iblnd.c(213): Short message: %d
      lnet/klnds/o2iblnd/o2iblnd.c(234): Short message: %d
      lnet/klnds/o2iblnd/o2iblnd.c(862): cmid HCA(%s), kib_dev(%s) need failover
      lnet/klnds/o2iblnd/o2iblnd.c(2342): Can't allocate %d tx descriptors
      lnet/klnds/o2iblnd/o2iblnd.c(2715): Failed to bind %s:%pI4h to device(%p): %d
      lnet/klnds/o2iblnd/o2iblnd.c(2784): Failed to bind %s:%pI4h to device(%p): %d
      lustre/ptlrpc/gss/gss_pipefs.c(864): fail to get seq
      lustre/ptlrpc/gss/sec_gss.c(2360): buf count only %d
      lustre/ptlrpc/gss/gss_krb5_mech.c(638): bad krb5 header
      lustre/ptlrpc/gss/gss_krb5_mech.c(1280): bad krb5 header
      lustre/ptlrpc/gss/gss_krb5_mech.c(1448): bad krb5 header
      lustre/ptlrpc/pack_generic.c(566): message length %d too small for %d buflens
      lustre/ptlrpc/pack_generic.c(582): len: %d, required_len %d, bufcount: %d
      lustre/ptlrpc/sec_config.c(84): unknown target %p(%s)
      lustre/ptlrpc/sec_config.c(663): can't find conf
      lustre/mdt/mdt_open.c(230): %s: replay trans %llu NID %s: rc = %d
      lustre/mgs/mgs_llog.c(3539): Invalid param: %s
      lustre/mgs/mgs_llog.c(3726): svname is empty
      lustre/mgs/mgs_llog.c(5488): %s is not defined
      lustre/lod/lod_qos.c(213): %s: statfs: rc = %d
      lustre/lod/lod_object.c(6125): %s: can't locate %s:rc = %d
      lustre/lod/lod_dev.c(1013): %s: can not find %s: rc = %d
      lustre/lod/lod_dev.c(1030): %s: can not find %s: rc = %d
      lustre/include/lustre_debug.h(53): bad page index %lu > %llu
      lustre/osp/osp_object.c(744): %s: update error %s: rc = %d
      lustre/osp/osp_sync.c(1414): %s: the catid %s for init llog %d is bad
      lustre/osd-zfs/osd_xattr.c(1126): %s: lookup xattr %s failed: rc = %d
      lustre/osd-zfs/osd_handler.c(1229): %s: lost %d loaned abuf(s)
      lustre/osd-zfs/osd_handler.c(1232): %s: lost %d pinned dbuf(s)
      lustre/osd-zfs/osd_object.c(887): %s: zap_remove(%s) failed: rc = %d
      lustre/lmv/lmv_obd.c(315): target %s connect error %d
      lustre/osd-ldiskfs/osd_handler.c(1727): transaction @0x%p commit error: %d
      lustre/osd-ldiskfs/osd_io.c(1140): initial space: %lu:%u
      lustre/lov/lov_obd.c(174): Target %s connect error %d
      lustre/ldlm/ldlm_lockd.c(2190): unknown opcode %u
      lustre/ldlm/ldlm_lockd.c(2368): invalid opcode %d
      lustre/target/tgt_lastrcvd.c(1208): %s: can't update %s file: rc = %d
      lustre/target/tgt_grant.c(257): %s: tot_pending %llu > tot_granted %llu
      lustre/target/tgt_grant.c(1444): %s: cli %s/%p ted_pending(%lu) < grant_used(%lu)
      lustre/target/tgt_grant.c(1453): %s: cli %s/%p tot_granted(%llu) < grant_used(%lu)
      lustre/target/tgt_grant.c(1462): %s: cli %s/%p tot_pending(%llu) < grant_used(%lu)
      lustre/target/tgt_handler.c(95): %s: invalid FID: %s
      lustre/target/tgt_handler.c(929): %s: RPC source %s doesn't match %s
      lustre/target/tgt_handler.c(1880): %s: sync returns %d
      lustre/fld/fld_index.c(108): %s: lookup range %s error: rc = %d
      lustre/fld/fld_index.c(202): %s: insert range %s failed: rc = %d
      lustre/obdclass/llog.c(1065): %s: loghandle %p with no header
      lustre/obdclass/genops.c(330): type %s has refcount (%d)
      lustre/obdclass/llog_osd.c(263): %s: bad log %s %s header magic: %#x (expected %#x)
      lustre/obdclass/obd_mount_server.c(134): %s not registered
      lustre/obdclass/class_obd.c(390): No UUID passed!
      lustre/obdclass/obd_mount.c(469): connect failed %d
      lustre/obdclass/obd_config.c(376): No type passed!
      lustre/obdclass/obd_config.c(382): No name passed!
      lustre/obdclass/obd_config.c(387): No UUID passed!
      lustre/obdclass/obd_config.c(1264): no device for: %s
      lustre/obdclass/obd_config.c(2111): cleanup failed %d: %s
      lustre/llite/dir.c(744): unknown magic: %lX
      lustre/mdd/mdd_device.c(1874): Bad ioctl size %d
      lustre/osc/osc_request.c(1625): %s: sync returns %d
      lustre/mgc/mgc_request.c(61): name too long: %s
      lustre/mgc/mgc_request.c(1488): MNE too large (%u)
      

      Attachments

        Issue Links

          Activity

            [LU-14475] Make too short log messages meanful
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/41892/
            Subject: LU-14475 log: Rewrite some log messages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: da1d93513fdff0a70257b13aa5649e478d4f70b6

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/41892/ Subject: LU-14475 log: Rewrite some log messages Project: fs/lustre-release Branch: master Current Patch Set: Commit: da1d93513fdff0a70257b13aa5649e478d4f70b6

            Feng, Lei (flei@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41892
            Subject: LU-14475 log: Rewrite some log messages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 15be2325e7d9924fc251877e847dcb1accc9bf31

            gerrit Gerrit Updater added a comment - Feng, Lei (flei@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41892 Subject: LU-14475 log: Rewrite some log messages Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 15be2325e7d9924fc251877e847dcb1accc9bf31

            I'm not only considering the disk space (which can be quite a lot), but also the fact that longer error messages consume more RAM to store and more CPU to process them. That adds overhead for relatively little benefit. I agree that proper error messages that contain the device name, important data values, FID, error code are needed for debugging.

            I'm not a fan error messages that are trying to explain and solve the problem, like "NRS: failing to register policy %s. Please check policy flags; external policies cannot act as fallback policies, or be started immediately upon registration without interaction with lprocfs". Having an error message like "testfs-OST0000: failed to register external NRS policy '%s': rc = -22" is enough if there is a good (hopefully much better) explanation in the manual about how to use/configure NRS policies.

            The length of the bi_private error message is almost a joke, IMHO. Not only could a shorter message convey the same information, but if it referenced a specific LU or LUDOC ticket in Jira (rather than just a URL to file a bug), like "foofs-OST1234: bi_private is NULL. Unable to complete IO, OST may be unable to unmount. See LU-12345", the user could go directly to the ticket to read a much better/more detailed explanation of the problem.

            adilger Andreas Dilger added a comment - I'm not only considering the disk space (which can be quite a lot), but also the fact that longer error messages consume more RAM to store and more CPU to process them. That adds overhead for relatively little benefit. I agree that proper error messages that contain the device name, important data values, FID, error code are needed for debugging. I'm not a fan error messages that are trying to explain and solve the problem, like " NRS: failing to register policy %s. Please check policy flags; external policies cannot act as fallback policies, or be started immediately upon registration without interaction with lprocfs ". Having an error message like " testfs-OST0000: failed to register external NRS policy '%s': rc = -22 " is enough if there is a good (hopefully much better) explanation in the manual about how to use/configure NRS policies. The length of the bi_private error message is almost a joke, IMHO. Not only could a shorter message convey the same information, but if it referenced a specific LU or LUDOC ticket in Jira (rather than just a URL to file a bug), like " foofs-OST1234: bi_private is NULL. Unable to complete IO, OST may be unable to unmount. See LU-12345 ", the user could go directly to the ticket to read a much better/more detailed explanation of the problem.
            lixi_wc Li Xi added a comment - - edited

            I think there need to be some balance on the length of the error messages. For the messages with keyword < 8, the length is obviously too short which means lack of unique and explain.

            Don't go too crazy making very verbose error messages, especially those that are printed frequently, since they may be printed thousands of times on thousands of nodes.

            I think the most of the error messages are not print frequently. There are about 2800+ error messaeges, I think only 20 of them could be seen frequently in the cases of disconnection or recovery.

            Agreed too many long messages is annoying. But I think the major effort should be spent on reducing the number of the error messages rather than the length of it. Ideally, the system prints zero error messages and fixes the problems automatically. But if a error message is printed, it should provide enough information both for the administrators and the developers.

            The current error messages might be good enough for developers/debuggers who are reading and modifying the sources everyday, but I think it is less useful for new users/adminitrators who only need to know where to modify/configure and who to get help from. The administrators might be able to search in Google/Stackoverflow/Jira/Maillist, but won't be able to read the codes based on the function name and line number.

            Thus, I think the "keyword" need to be long enough to be searchable. For the keyword < 14, I think we are very safe from adding too long string in each individual messages.

            The longest error messages in Lustre are:

            max keywords length is [115]:
                    lustre/osd-ldiskfs/osd_handler.c(7610): %s: device %s extents feature is not guaranteed to work on big-endian systems. Use "bigendian_extents" mount option to override.
            max keywords length is [132]:
                    lustre/obdecho/echo_client.c(944): Local operations are NOT supported on client side. Only remote operations are supported. Metadata client must be run on server side.
                    lustre/obdecho/echo_client.c(2973): Local operations are NOT supported on client side. Only remote operations are supported. Metadata client must be run on server side.
            max keywords length is [136]:
                    lustre/osd-ldiskfs/osd_handler.c(7706): %s: device %s LDISKFS does not support filesystems greater than 512TB and can cause data corruption. Use "force_over_512tb" mount option to override.
            max keywords length is [156]:
                    lustre/ptlrpc/nrs.c(1199): NRS: failing to register policy %s. Please check policy flags; external policies cannot act as fallback policies, or be started immediately upon registration without interaction with lprocfs
            max keywords length is [415]:
                    lustre/osd-ldiskfs/osd_io.c(185): ***** bio->bi_private is NULL!  This should never happen.  Normally, I would crash here, but instead I will dump the bio contents to the console.  Please report this to <https://jira.whamcloud.com/> , along with any interesting messages leading up to this point (like SCSI errors, perhaps).  Because bi_private is NULL, I can't wake up the thread that initiated this IO - you will probably have to reboot this node.
            

            The last one with length of 415 is a little bit too long. The other ones with 100+ lengths still look fine. Reading the message could be finished in just a glance. So if a error message is shorter than 80, I don't think we need to worry at all. And I guess nobody would have time to write message with length of 80 for all 2800 error messages. If he/she does want to, that helps a lot on improving the quality of the codes, rather than making the error message too verbose.

            And also, disks are getting bigger and bigger. We don't need worry that the increasing length of messages would flood our disks, do we? We can always archive the logs into a tarball or even use the 1PB Lustre to save them.

            lixi_wc Li Xi added a comment - - edited I think there need to be some balance on the length of the error messages. For the messages with keyword < 8, the length is obviously too short which means lack of unique and explain. Don't go too crazy making very verbose error messages, especially those that are printed frequently, since they may be printed thousands of times on thousands of nodes. I think the most of the error messages are not print frequently. There are about 2800+ error messaeges, I think only 20 of them could be seen frequently in the cases of disconnection or recovery. Agreed too many long messages is annoying. But I think the major effort should be spent on reducing the number of the error messages rather than the length of it. Ideally, the system prints zero error messages and fixes the problems automatically. But if a error message is printed, it should provide enough information both for the administrators and the developers. The current error messages might be good enough for developers/debuggers who are reading and modifying the sources everyday, but I think it is less useful for new users/adminitrators who only need to know where to modify/configure and who to get help from. The administrators might be able to search in Google/Stackoverflow/Jira/Maillist, but won't be able to read the codes based on the function name and line number. Thus, I think the "keyword" need to be long enough to be searchable. For the keyword < 14, I think we are very safe from adding too long string in each individual messages. The longest error messages in Lustre are: max keywords length is [115]: lustre/osd-ldiskfs/osd_handler.c(7610): %s: device %s extents feature is not guaranteed to work on big-endian systems. Use "bigendian_extents" mount option to override. max keywords length is [132]: lustre/obdecho/echo_client.c(944): Local operations are NOT supported on client side. Only remote operations are supported. Metadata client must be run on server side. lustre/obdecho/echo_client.c(2973): Local operations are NOT supported on client side. Only remote operations are supported. Metadata client must be run on server side. max keywords length is [136]: lustre/osd-ldiskfs/osd_handler.c(7706): %s: device %s LDISKFS does not support filesystems greater than 512TB and can cause data corruption. Use "force_over_512tb" mount option to override. max keywords length is [156]: lustre/ptlrpc/nrs.c(1199): NRS: failing to register policy %s. Please check policy flags; external policies cannot act as fallback policies, or be started immediately upon registration without interaction with lprocfs max keywords length is [415]: lustre/osd-ldiskfs/osd_io.c(185): ***** bio->bi_private is NULL! This should never happen. Normally, I would crash here, but instead I will dump the bio contents to the console. Please report this to <https://jira.whamcloud.com/> , along with any interesting messages leading up to this point (like SCSI errors, perhaps). Because bi_private is NULL, I can't wake up the thread that initiated this IO - you will probably have to reboot this node. The last one with length of 415 is a little bit too long. The other ones with 100+ lengths still look fine. Reading the message could be finished in just a glance. So if a error message is shorter than 80, I don't think we need to worry at all. And I guess nobody would have time to write message with length of 80 for all 2800 error messages. If he/she does want to, that helps a lot on improving the quality of the codes, rather than making the error message too verbose. And also, disks are getting bigger and bigger. We don't need worry that the increasing length of messages would flood our disks, do we? We can always archive the logs into a tarball or even use the 1PB Lustre to save them.
            adilger Andreas Dilger added a comment - - edited

            Hi Feng Lei,
            Thanks for filing this ticket.

            It may be that some of your found messages are false positives if the strings are split across lines. Don't go too crazy making very verbose error messages, especially those that are printed frequently, since they may be printed thousands of times on thousands of nodes.

            The messages need to be unique and explain enough of the state to understand what I going wrong, but do not need to be complete sentences or use proper grammar. It is better to have a good explanation of the error in Jira (or in the manual for those "normal" errors that happen under regular usage, and best if the cause of the error is fixed and never hit in the first place, or the error condition fixes itself and does not depend on the user to fix it for them.

            When you are cleaning up these message strings, please also fix them to match the modern coding style:

            • console error messages should start with the device or subsystem name (eg. "testfs-MDT0000:") and end with the error code ": rc = %d\n"
            • keep error message strings on a single line, even if over 80 chars, to make it easier to search for them in the code
            • replace spaces with tabs to indent lines, including +/-6 lines before/after modified lines (or more lines if there are only a few nearby lines with spaces)
            • other checkpatch warnings on the modified lines

            If the patch gets too large (say, over 200 lines) then consider to split it into a few smaller patches and submit each one separately. That reduces the number of conflicts with other patches, and allows some changes to land even if one or two of the patches have conflicts, or they need to be updated a few times to make the messages more clear.

            If the patch is only changing white space and code style, then it should use the "Test-Parameters: trivial" label to minimize the testing. Note however that there are some error strings that are checked for while running tests. These are mostly annotated in the code already.

            adilger Andreas Dilger added a comment - - edited Hi Feng Lei, Thanks for filing this ticket. It may be that some of your found messages are false positives if the strings are split across lines. Don't go too crazy making very verbose error messages, especially those that are printed frequently, since they may be printed thousands of times on thousands of nodes. The messages need to be unique and explain enough of the state to understand what I going wrong, but do not need to be complete sentences or use proper grammar. It is better to have a good explanation of the error in Jira (or in the manual for those "normal" errors that happen under regular usage, and best if the cause of the error is fixed and never hit in the first place, or the error condition fixes itself and does not depend on the user to fix it for them. When you are cleaning up these message strings, please also fix them to match the modern coding style: console error messages should start with the device or subsystem name (eg. " testfs-MDT0000: ") and end with the error code " : rc = %d\n " keep error message strings on a single line, even if over 80 chars, to make it easier to search for them in the code replace spaces with tabs to indent lines, including +/-6 lines before/after modified lines (or more lines if there are only a few nearby lines with spaces) other checkpatch warnings on the modified lines If the patch gets too large (say, over 200 lines) then consider to split it into a few smaller patches and submit each one separately. That reduces the number of conflicts with other patches, and allows some changes to land even if one or two of the patches have conflicts, or they need to be updated a few times to make the messages more clear. If the patch is only changing white space and code style, then it should use the " Test-Parameters: trivial " label to minimize the testing. Note however that there are some error strings that are checked for while running tests. These are mostly annotated in the code already.

            People

              flei Feng Lei
              flei Feng Lei
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: