Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17414

liblnetconfig printing bogus error numbers for users

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.10.0, Lustre 2.14.0, Lustre 2.16.0, Lustre 2.12.9, Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      I was lead to looking at lustre_lnet_config_ni(), and found that it is returning somewhat "random" error codes to the user and then lnetctl is printing them like "errno: -1". However, it turns out that "-1" is not the POSIX error code "-EPERM" as one might expect, but rather "LUSTRE_CFG_RC_BAD_PARAM", which is IMHO very confusing.

      Also, lustre_lnet_config_ni() is also returning POSIX error codes from the l_ioctl() call in the function, so it is impossible to know which error type is actually being returned.

      This code should be changed to use existing POSIX error numbers that have similar meanings, or could at least fill the same role (with some creative mapping):

          #define LUSTRE_CFG_RC_NO_ERR                     0  => fine
          #define LUSTRE_CFG_RC_BAD_PARAM                 -1  => -EINVAL
          #define LUSTRE_CFG_RC_MISSING_PARAM             -2  => -EFAULT
          #define LUSTRE_CFG_RC_OUT_OF_RANGE_PARAM        -3  => -ERANGE
          #define LUSTRE_CFG_RC_OUT_OF_MEM                -4  => -ENOMEM
          #define LUSTRE_CFG_RC_GENERIC_ERR               -5  => -ENODATA
          #define LUSTRE_CFG_RC_NO_MATCH                  -6  => -ENOMSG
          #define LUSTRE_CFG_RC_MATCH                     -7  => -EXFULL
          #define LUSTRE_CFG_RC_SKIP                      -8  => -EBADSLT
          #define LUSTRE_CFG_RC_LAST_ELEM                 -9  => -ECHRNG
          #define LUSTRE_CFG_RC_MARSHAL_FAIL              -10 => -ENOSTR
      

      I don't think "overloading" the POSIX error codes to mean something similar is worse than using random numbers to report errors.

      It looks like these error definitions are only in userspace, but I'm not 100% sure. The vast majority of LUSTRE_CFG_RC_* errors are only used inside liblnetconfig, so changing them won't affect the library at all.

      The only other place these error values are used are in lnetctl.c, which mostly only checking "!= LUSTRE_CFG_RC_NO_ERR", this should be relatively safe to change, and lustre/tests/lutf/python/ which should also be OK to change I think (as long as the error codes from one node are not interpreted by lnetctl on another node).

      If binary compatibility is needed, it looks like none of the values currently overlap, so it would be possible to check both the old and new values for the return in the few places that are not checking LUSTRE_CFG_RC_NO_ERR, and then at some point in the future start actually returning them...

      Attachments

        Activity

          People

            arshad512 Arshad Hussain
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: