Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      Some new user interface options for jobstats have been added via LU-9221, create some documentation around them.

      Attachments

        Issue Links

          Activity

            [LUDOC-381] Improve documentation for jobstats

            Also, there should be some documentation added for patch https://review.whamcloud.com/31691 "LU-10698 obdclass: allow specifying complex jobids". The commit comment is pretty reasonable:

            Allow specifying a format string for the jobid_name variable to create
                a jobid for processes on the client.  The jobid_name is used when
                jobid_var=nodelocal, if jobid_name contains "%j", or as a fallback if
                getting the specified jobid_var from the environment fails.
                
                The jobid_node string allows the following escape sequences:
                
                    %e = executable name
                    %g = group ID
                    %h = hostname (system utsname)
                    %j = jobid from jobid_var environment variable
                    %p = process ID
                    %u = user ID
                
                Any unknown escape sequences are dropped. Other arbitrary characters
                pass through unmodified, up to the maximum jobid string size of 32,
                though whitespace within the jobid is not copied.
                
                This allows, for example, specifying an arbitrary prefix, such as the
                cluster name, in addition to the traditional "procname.uid" format,
                to distinguish between jobs running on clients in different clusters:
                
                    lctl set_param jobid_var=nodelocal jobid_name=cluster2.%e.%u
                or
                    lctl set_param jobid_var=SLURM_JOB_ID jobid_name=cluster2.%j.%e
                
                To use an environment-specified JobID, if available, but fall back to
                a static string for all processes that do not have a valid JobID:
                
                    lctl set_param jobid_var=SLURM_JOB_ID jobid_name=unknown
            
            adilger Andreas Dilger added a comment - Also, there should be some documentation added for patch https://review.whamcloud.com/31691 " LU-10698 obdclass: allow specifying complex jobids ". The commit comment is pretty reasonable: Allow specifying a format string for the jobid_name variable to create a jobid for processes on the client. The jobid_name is used when jobid_var=nodelocal, if jobid_name contains "%j", or as a fallback if getting the specified jobid_var from the environment fails. The jobid_node string allows the following escape sequences: %e = executable name %g = group ID %h = hostname (system utsname) %j = jobid from jobid_var environment variable %p = process ID %u = user ID Any unknown escape sequences are dropped. Other arbitrary characters pass through unmodified, up to the maximum jobid string size of 32, though whitespace within the jobid is not copied. This allows, for example, specifying an arbitrary prefix, such as the cluster name, in addition to the traditional "procname.uid" format, to distinguish between jobs running on clients in different clusters: lctl set_param jobid_var=nodelocal jobid_name=cluster2.%e.%u or lctl set_param jobid_var=SLURM_JOB_ID jobid_name=cluster2.%j.%e To use an environment-specified JobID, if available, but fall back to a static string for all processes that do not have a valid JobID: lctl set_param jobid_var=SLURM_JOB_ID jobid_name=unknown

            Purging the Cache
            The cache can be purged of a specific job by writing the JobID to the jobid_name proc file. Any items in the cache that are more than 300 seconds old will also be purged at this time.

            Lifecycle of a mapping
            A new mapping is created when a lookup is performed, and there is no map in the cache. At this time, the JobID is determined
            Each time the map is accessed, it is checked to see if it needs to be refreshed (every 30 seconds). The timer is then reset to the current time. Each map has its own timer.
            During a purge, if the JobID matches the item to be purged, or if the timer is more than 300 seconds.

            Determining JobID
            The JobID will be determined as follows:
            1) The jobid_var proc variable, which can be “procname_uid”, or the name of a variable in the application’s environment, typically the environment variable containing the job name assigned by the scheduler
            2) If 1 is not available, defaulting to the “procname_uid” scheme.
            3) All Lustre threads are filtered out
            4) If none are available, the JobID stored in the inode is used
            5) If there is no JobID stored in the inode, it will remain blank.

            This is a change from the current method which simply returns an empty JobID if nothing is available from the environment. The reason for doing this is to identify processes (and users) running on a node that is not scheduled, or are taking up significant resources, and provide read-ahead accounting properly.

            bevans Ben Evans (Inactive) added a comment - Purging the Cache The cache can be purged of a specific job by writing the JobID to the jobid_name proc file. Any items in the cache that are more than 300 seconds old will also be purged at this time. Lifecycle of a mapping A new mapping is created when a lookup is performed, and there is no map in the cache. At this time, the JobID is determined Each time the map is accessed, it is checked to see if it needs to be refreshed (every 30 seconds). The timer is then reset to the current time. Each map has its own timer. During a purge, if the JobID matches the item to be purged, or if the timer is more than 300 seconds. Determining JobID The JobID will be determined as follows: 1) The jobid_var proc variable, which can be “procname_uid”, or the name of a variable in the application’s environment, typically the environment variable containing the job name assigned by the scheduler 2) If 1 is not available, defaulting to the “procname_uid” scheme. 3) All Lustre threads are filtered out 4) If none are available, the JobID stored in the inode is used 5) If there is no JobID stored in the inode, it will remain blank. This is a change from the current method which simply returns an empty JobID if nothing is available from the environment. The reason for doing this is to identify processes (and users) running on a node that is not scheduled, or are taking up significant resources, and provide read-ahead accounting properly.

            People

              dkosach Dzmitry Kosach
              bevans Ben Evans (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: