Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16376

lprocfs_job_stats_log() Invalid jobid size (37), expect(32)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0, Lustre 2.15.3
    • Lustre 2.12.8, Lustre 2.15.2
    • None
    • 3
    • 9223372036854775807

    Description

      Regular error messages on the console:

      kernel: LustreError: 8537:0:(lprocfs_jobstats.c:283:lprocfs_job_stats_log()) Invalid jobid size (37), expect(32)
      

      While this ostensibly seems like the JobID is too large, the problem is that lprocfs_job_stats_log() is only called on the MDS and OSS using the jobid extracted from the RPC in ptlrpc_body->pb_jobid[32] so it is impossible to be larger than 32 bytes.

      /* 31 usable bytes string + null terminator. */ 
      #define LUSTRE_JOBID_SIZE       32
      
      int lprocfs_job_stats_log(struct obd_device *obd, char *jobid,
                                int event, long amount)
      {
              if (strlen(jobid) >= LUSTRE_JOBID_SIZE) {
                      CERROR("Invalid jobid size (%lu), expect(%d)\n",
                             (unsigned long)strlen(jobid) + 1, LUSTRE_JOBID_SIZE);
                      RETURN(-EINVAL);
              }
      }
      
      void mdt_counter_incr(struct ptlrpc_request *req, int opcode, long amount)
      {
              if (exp->exp_obd && obd2obt(exp->exp_obd)->obt_jobstats.ojs_hash &&
                  (exp_connect_flags(exp) & OBD_CONNECT_JOBSTATS))
                      lprocfs_job_stats_log(exp->exp_obd,
                                            lustre_msg_get_jobid(req->rq_reqmsg),
                                            opcode, amount);
      }
      
      static inline void ofd_counter_incr(struct obd_export *exp, int opcode,
                                          char *jobid, long amount)
      {
              if (exp->exp_obd && obd2obt(exp->exp_obd)->obt_jobstats.ojs_hash &&
                  (exp_connect_flags(exp) & OBD_CONNECT_JOBSTATS))
                      lprocfs_job_stats_log(exp->exp_obd, jobid, opcode, amount);
      }
      

      One likely possibility is that pb_jobid is not NUL terminated, and the strlen() overflows into the next bytes in the RPC.

      Rather than repeatedly complain about this and not do anything, it would be better to just NUL terminate the string (as it should be) and then proceed with the truncated JobId

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: