Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17512

add conditional operator for 'jobid_name'

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      It would be useful to add a conditional operator like jobid_name=%j?%H:%e:%u in jobid_interpret_string() to allow using jobid_var if it is set, otherwise use the short hostname.

      This allows adding the hostname for interactive jobs that don't have the jobid_var environment variable set (e.g. on login nodes or commands that are run on compute nodes outside of the configured job scheduler), while also not splitting up the job stats for a single 5000-node job into 5000x separate JobIDs to track on the servers for cases where jobid_var is set.

      Attachments

        Issue Links

          Activity

            [LU-17512] add conditional operator for 'jobid_name'

            "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/doc/manual/+/58598/
            Subject: LU-17512 jobid: describe '%j?' jobid_name conditional
            Project: doc/manual
            Branch: master
            Current Patch Set:
            Commit: 4762b758fb12a35fc9c80090941a7b7b2c6a1326

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/doc/manual/+/58598/ Subject: LU-17512 jobid: describe '%j?' jobid_name conditional Project: doc/manual Branch: master Current Patch Set: Commit: 4762b758fb12a35fc9c80090941a7b7b2c6a1326

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/doc/manual/+/58598
            Subject: LU-17512 jobid: describe '%j?' jobid_name conditional
            Project: doc/manual
            Branch: master
            Current Patch Set: 1
            Commit: 373f11918bfc13f552340e09e920f0fbffe89bcf

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/doc/manual/+/58598 Subject: LU-17512 jobid: describe '%j?' jobid_name conditional Project: doc/manual Branch: master Current Patch Set: 1 Commit: 373f11918bfc13f552340e09e920f0fbffe89bcf
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55332/
            Subject: LU-17512 utils: new ? operator for jobid_name
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ecdcaa398668291dcfad26c4e21745dbfee19bc4

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55332/ Subject: LU-17512 utils: new ? operator for jobid_name Project: fs/lustre-release Branch: master Current Patch Set: Commit: ecdcaa398668291dcfad26c4e21745dbfee19bc4

            "Maximilian Dilger <mdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55332
            Subject: LU-17512 utils: new ? operator for jobid_name
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 3ca46d97d655cc89d30b613753107b78ad4d7756

            gerrit Gerrit Updater added a comment - "Maximilian Dilger <mdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55332 Subject: LU-17512 utils: new ? operator for jobid_name Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3ca46d97d655cc89d30b613753107b78ad4d7756

            The "jobid_name" parameter can contain any of the "%" formats as desired by the admin. The goal of this ticket is to have a "conditional" operator like "?" in the jobid_name string that means "use the first parameter if set, and if not then use the second parameter."

            The expected use case would be to have a format string like "%j?%H" that means "use the jobid if available, but if not then use the short hostname", but this is just an example of how it could be used. The jobid value is not always available (eg. on nodes with interactive users instead of compute nodes running a job), and a common usage is "jobid_name=%j:%e:%u" (jobid:executable:user) but if the jobid is unavailable then it prints as ":ls:0" which is ugly and not very useful.

            Being able to conditionally use the short hostname in this case for the interactive nodes not running a job would be useful, and is the goal of this ticket.

            To implement this, in jobid_interpret_string() it should check and print the first parameter (eg. jobid) as it does today but if it is NULL or zero-length (l == 0) and the next charactesr are '?%' then the second parameter should be printed, otherwise 'c' should be advanced by 3 characters to skip the "?%x" characters for the next format.

            adilger Andreas Dilger added a comment - The " jobid_name " parameter can contain any of the " % " formats as desired by the admin. The goal of this ticket is to have a "conditional" operator like " ? " in the jobid_name string that means "use the first parameter if set, and if not then use the second parameter." The expected use case would be to have a format string like " %j?%H " that means "use the jobid if available, but if not then use the short hostname", but this is just an example of how it could be used. The jobid value is not always available (eg. on nodes with interactive users instead of compute nodes running a job), and a common usage is " jobid_name=%j:%e:%u " ( jobid:executable:user ) but if the jobid is unavailable then it prints as " :ls:0 " which is ugly and not very useful. Being able to conditionally use the short hostname in this case for the interactive nodes not running a job would be useful, and is the goal of this ticket. To implement this, in jobid_interpret_string() it should check and print the first parameter (eg. jobid) as it does today but if it is NULL or zero-length (l == 0) and the next charactesr are '?%' then the second parameter should be printed, otherwise 'c' should be advanced by 3 characters to skip the " ?%x " characters for the next format.
            mdilger Max Dilger added a comment -

            For some clarification, do you want to use jobid_set_current to use the short hostname if a jobid is not currently set? Or to just return the short host name in the expanded string.

            mdilger Max Dilger added a comment - For some clarification, do you want to use jobid_set_current to use the short hostname if a jobid is not currently set? Or to just return the short host name in the expanded string.

            This also has the benefit of shortening the JobID string so that more useful information can fit into the current size limits.

            adilger Andreas Dilger added a comment - This also has the benefit of shortening the JobID string so that more useful information can fit into the current size limits.

            People

              mdilger Max Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: