Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16765

Allow longer JobID names

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.16.0
    • None
    • 9223372036854775807

    Description

      There have been requests to increase the NobID length from the current 32-character limit. However, since the JobID is embedded in a fixed-size field in every RPC changing it would be a RPC protocol change between clients and servers, so would only be possible in a major release (eg. Lustre 3.0 or possibly Lustre 2.17, the last such RPC protocol change was in Lustre 2.0). This may require a considerable amount of code change in RPC handling and not be interoperable between old and new clients/servers. It would also consume space in every RPC that is sent, even though most clients can fit the JobID into 32 characters.

      Just increasing the space a few characters may fix some cases, but it may still not be enough for all of the cases where there are long hostnames, process names, user IDs, process IDs, etc. so it could need to be much larger (potentially hundreds of bytes) to allow every possible JobID to fit.

      Since the JobID field is at the end of ptlrpc_body_v3 it may be possible to increase the size of this buffer without totally changing the RPC protocol, handling it via a new MSG_JOBID2 request flag instead of bumping the protocol version, but this would need extensive interop testing.

      The servers would need to handle the longer pb_jobid field, and old servers should not overflow or LASSERT on the size of the ptlrpc_body_v3 buffer or pb_jobid field. It would be OK if they truncated the longer pb_jobid to the current 32-character limit.

      There are a few options available in the short term to avoid some of the issues seen with long JobIDs:

      • use shorter primary hostnames for the clients, if even the short hostnames are 16+ chars and are almost all the same except for the last digit(s). The longer names could still be used via /etc/hosts or DNS, but since the JobID is generated in the kernel it only has access to the primary hostname.
      • specify a constant hostname/alias/ID in the jobid_name for each client instead of using "%H" so that is used less space. This would need to be set explicitly for each client rather than using the same string for all clients.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: