Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.3.0
    • Lustre 2.3.0
    • None
    • 4306

    Description

      This feature is to collect filesystem operation stats for the jobs running on Lustre.

      When some job sheculer (SLURM, for instance) is running on lustre client, the lustre client will pack the job id into each request (open, unlink, write...), and server will collect those information then expose them via procfs.

      Attachments

        Issue Links

          Activity

            [LU-694] Job Stats

            Just updated examples in this bug to be more clear, since it showed up in a Google search. I prefer not to use "lustre" as the fsname in examples, since it is very non-obvious that this needs to be replaced with the actual fsname and is not an fixed part of the parameter being specified (like the "sys.jobid_var" part is).

            adilger Andreas Dilger added a comment - Just updated examples in this bug to be more clear, since it showed up in a Google search. I prefer not to use "lustre" as the fsname in examples, since it is very non-obvious that this needs to be replaced with the actual fsname and is not an fixed part of the parameter being specified (like the "sys.jobid_var" part is).

            Thanks for the input. I agree that fs names are useful, for example:

            • In the cases where there is only one MGS at a site, you need fs_name to distinguish the fs.
            • In the cases where you have not mounted the fs, you can still identify the fs.

            So, how about supporting: <mount point|fsname>?

            If the mount point is not valid, then error

            rhenwood Richard Henwood (Inactive) added a comment - Thanks for the input. I agree that fs names are useful, for example: In the cases where there is only one MGS at a site, you need fs_name to distinguish the fs. In the cases where you have not mounted the fs, you can still identify the fs. So, how about supporting: <mount point|fsname> ? If the mount point is not valid, then error

            Nathan, it boggles my mind as well. But I know for a fact that folks out there have done it, because they complained about LMT not being able to handle two filesystem having exactly the same name. They seemed to think it was Livermore's responsibility to factor in additional information like IP addresses to uniquely identify filesystems with the same name. I of course declined.

            But one has to sympathize with the users. Configuring lustre is so horribly bad that something like the filesystem name is completely non-obvious to most people. You set it once in some cryptic way, and it none too clear from that point forward how it is used at all.

            Which I suppose is a long winded way of agreeing that filesystem names really need to be unique, and bending over backwards to differentiate filesystems with the same name is a path to madness. But we also need to promote the name to a first-class object that is used in a sane way in the command-line tools and throughout lustre. We also need to clearly document filesystem name usage.

            morrone Christopher Morrone (Inactive) added a comment - Nathan, it boggles my mind as well. But I know for a fact that folks out there have done it, because they complained about LMT not being able to handle two filesystem having exactly the same name. They seemed to think it was Livermore's responsibility to factor in additional information like IP addresses to uniquely identify filesystems with the same name. I of course declined. But one has to sympathize with the users. Configuring lustre is so horribly bad that something like the filesystem name is completely non-obvious to most people. You set it once in some cryptic way, and it none too clear from that point forward how it is used at all. Which I suppose is a long winded way of agreeing that filesystem names really need to be unique, and bending over backwards to differentiate filesystems with the same name is a path to madness. But we also need to promote the name to a first-class object that is used in a sane way in the command-line tools and throughout lustre. We also need to clearly document filesystem name usage.

            The intent of the MGS was to provide config info for all the filesystems at a site, so the fs name is unique. If multiple MGS's are being used, on different nodes, the filesystem name could overlap – but you'd have to be masochistic to use the same filesystem name for two different filesystems at a single site.
            Masochistic to the point where I would say this should be disallowed by any and all configuration management systems, and if you do it by hand anyhow, you reap the unpleasant rewards.

            nrutman Nathan Rutman added a comment - The intent of the MGS was to provide config info for all the filesystems at a site, so the fs name is unique. If multiple MGS's are being used, on different nodes, the filesystem name could overlap – but you'd have to be masochistic to use the same filesystem name for two different filesystems at a single site. Masochistic to the point where I would say this should be disallowed by any and all configuration management systems, and if you do it by hand anyhow, you reap the unpleasant rewards.

            I have been advised that a filesystem name may not uniquely identify a lustre filesystem.

            I am not sure what a better choice for the command you have above is, but some thought as to an alternative to fs name would be valuable.

            Hi, Richard, fs name should be unique on a single MGS namespace, and most 'lctl conf_param' uses fsname to identify a filesystem. Do ou suggest that we'd set jobstats parameters per target server but not per fs? I'm not sure if I followed your comment correctly?

            niu Niu Yawei (Inactive) added a comment - I have been advised that a filesystem name may not uniquely identify a lustre filesystem. I am not sure what a better choice for the command you have above is, but some thought as to an alternative to fs name would be valuable. Hi, Richard, fs name should be unique on a single MGS namespace, and most 'lctl conf_param' uses fsname to identify a filesystem. Do ou suggest that we'd set jobstats parameters per target server but not per fs? I'm not sure if I followed your comment correctly?

            I have been advised that a filesystem name may not uniquely identify a lustre filesystem.

            I am not sure what a better choice for the command you have above is, but some thought as to an alternative to fs name would be valuable.

            rhenwood Richard Henwood (Inactive) added a comment - I have been advised that a filesystem name may not uniquely identify a lustre filesystem. I am not sure what a better choice for the command you have above is, but some thought as to an alternative to fs name would be valuable.

            Hi, Ihara

            The patch will not be landed for 2.2, which version it should be landed for is not decided yet.

            niu Niu Yawei (Inactive) added a comment - Hi, Ihara The patch will not be landed for 2.2, which version it should be landed for is not decided yet.

            Hello, Niu
            new patches for this feature will be landed for 2.2?

            ihara Shuichi Ihara (Inactive) added a comment - Hello, Niu new patches for this feature will be landed for 2.2?

            follow-up patch which moves 'jobid_var' to global: http://review.whamcloud.com/1683

            niu Niu Yawei (Inactive) added a comment - follow-up patch which moves 'jobid_var' to global: http://review.whamcloud.com/1683
            niu Niu Yawei (Inactive) added a comment - http://review.whamcloud.com/1397

            People

              niu Niu Yawei (Inactive)
              niu Niu Yawei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: