Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.12.8, Lustre 2.15.2
-
None
-
3
-
9223372036854775807
Description
Regular error messages on the console:
kernel: LustreError: 8537:0:(lprocfs_jobstats.c:283:lprocfs_job_stats_log()) Invalid jobid size (37), expect(32)
While this ostensibly seems like the JobID is too large, the problem is that lprocfs_job_stats_log() is only called on the MDS and OSS using the jobid extracted from the RPC in ptlrpc_body->pb_jobid[32] so it is impossible to be larger than 32 bytes.
/* 31 usable bytes string + null terminator. */ #define LUSTRE_JOBID_SIZE 32 int lprocfs_job_stats_log(struct obd_device *obd, char *jobid, int event, long amount) { if (strlen(jobid) >= LUSTRE_JOBID_SIZE) { CERROR("Invalid jobid size (%lu), expect(%d)\n", (unsigned long)strlen(jobid) + 1, LUSTRE_JOBID_SIZE); RETURN(-EINVAL); } } void mdt_counter_incr(struct ptlrpc_request *req, int opcode, long amount) { if (exp->exp_obd && obd2obt(exp->exp_obd)->obt_jobstats.ojs_hash && (exp_connect_flags(exp) & OBD_CONNECT_JOBSTATS)) lprocfs_job_stats_log(exp->exp_obd, lustre_msg_get_jobid(req->rq_reqmsg), opcode, amount); } static inline void ofd_counter_incr(struct obd_export *exp, int opcode, char *jobid, long amount) { if (exp->exp_obd && obd2obt(exp->exp_obd)->obt_jobstats.ojs_hash && (exp_connect_flags(exp) & OBD_CONNECT_JOBSTATS)) lprocfs_job_stats_log(exp->exp_obd, jobid, opcode, amount); }
One likely possibility is that pb_jobid is not NUL terminated, and the strlen() overflows into the next bytes in the RPC.
Rather than repeatedly complain about this and not do anything, it would be better to just NUL terminate the string (as it should be) and then proceed with the truncated JobId