[LU-16376] lprocfs_job_stats_log() Invalid jobid size (37), expect(32) Created: 08/Dec/22  Updated: 07/Feb/24  Resolved: 20/Dec/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.8, Lustre 2.15.2
Fix Version/s: Lustre 2.16.0, Lustre 2.15.3

Type: Bug Priority: Minor
Reporter: Andreas Dilger Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-17512 add conditional operator for 'jobid_n... Open
is related to LU-16599 clearing jobstats should match output... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Regular error messages on the console:

kernel: LustreError: 8537:0:(lprocfs_jobstats.c:283:lprocfs_job_stats_log()) Invalid jobid size (37), expect(32)

While this ostensibly seems like the JobID is too large, the problem is that lprocfs_job_stats_log() is only called on the MDS and OSS using the jobid extracted from the RPC in ptlrpc_body->pb_jobid[32] so it is impossible to be larger than 32 bytes.

/* 31 usable bytes string + null terminator. */ 
#define LUSTRE_JOBID_SIZE       32

int lprocfs_job_stats_log(struct obd_device *obd, char *jobid,
                          int event, long amount)
{
        if (strlen(jobid) >= LUSTRE_JOBID_SIZE) {
                CERROR("Invalid jobid size (%lu), expect(%d)\n",
                       (unsigned long)strlen(jobid) + 1, LUSTRE_JOBID_SIZE);
                RETURN(-EINVAL);
        }
}

void mdt_counter_incr(struct ptlrpc_request *req, int opcode, long amount)
{
        if (exp->exp_obd && obd2obt(exp->exp_obd)->obt_jobstats.ojs_hash &&
            (exp_connect_flags(exp) & OBD_CONNECT_JOBSTATS))
                lprocfs_job_stats_log(exp->exp_obd,
                                      lustre_msg_get_jobid(req->rq_reqmsg),
                                      opcode, amount);
}

static inline void ofd_counter_incr(struct obd_export *exp, int opcode,
                                    char *jobid, long amount)
{
        if (exp->exp_obd && obd2obt(exp->exp_obd)->obt_jobstats.ojs_hash &&
            (exp_connect_flags(exp) & OBD_CONNECT_JOBSTATS))
                lprocfs_job_stats_log(exp->exp_obd, jobid, opcode, amount);
}

One likely possibility is that pb_jobid is not NUL terminated, and the strlen() overflows into the next bytes in the RPC.

Rather than repeatedly complain about this and not do anything, it would be better to just NUL terminate the string (as it should be) and then proceed with the truncated JobId



 Comments   
Comment by Gerrit Updater [ 08/Dec/22 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49351
Subject: LU-16376 obdclass: NUL terminate log jobid strings
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 754e05e41da401627652bb88571ab101baba590e

Comment by Gerrit Updater [ 20/Dec/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49351/
Subject: LU-16376 obdclass: NUL terminate long jobid strings
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9eba5d57297f807fddf046356c846478bbf232f4

Comment by Peter Jones [ 20/Dec/22 ]

Landed for 2.16

Comment by Gerrit Updater [ 22/Dec/22 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49490
Subject: LU-16376 obdclass: NUL terminate long jobid strings
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 3ce08283b9a5ea0b50c68874a6acd38b84c23dc6

Comment by Gerrit Updater [ 11/Apr/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49490/
Subject: LU-16376 obdclass: NUL terminate long jobid strings
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: 727638a1d0e72a043c2798f081f166a0e3a39268

Generated at Sat Feb 10 03:26:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.