[LU-13031] store JobID of program that created file in inodes at create time Created: 28/Nov/19  Updated: 07/Dec/23  Resolved: 07/Aug/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0, Lustre 2.15.4

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: Thomas Bertschinger
Resolution: Fixed Votes: 0
Labels: LTS15, llnl, lug23dd, medium

Issue Links:
Duplicate
duplicates LU-11901 Store JobID of program that initially... Resolved
Gantt End to Start
has to be done before LU-16798 lfs find: new --jobid option Closed
Related
is related to LU-16007 NRS Jobid default RPC aggregation Open
is related to LUDOC-373 Add information about Lustre xattrs Open
is related to LU-15743 "lfs find" is missing "-xattr" support Open
is related to LU-17219 lfs find: add ability to print extend... Open
Rank (Obsolete): 9223372036854775807

 Description   

It would be useful to store the JobID that created a file in an xattr on the MDT inode. This would allow tracking the source of files after the fact, and can help users with the provenance of their file data (e.g. which run the files came from, which can then be used to determine runtime parameters used, whether the run was successful or not, etc.).

It isn't clear whether this would be needed on the OST inodes as well, but I think initially the MDT inode would be enough, as this would be the only xattr visible to users. The OST xattr would only be accessible for low-level scanning tools.



 Comments   
Comment by Olaf Faaland [ 18/Dec/19 ]

I agree this would be useful. If you're not in a hurry, we would like to try to implement it.

Comment by Andreas Dilger [ 18/Dec/19 ]

Olaf,
I don't think anyone is working on this right now, so I'd be happy for tour contribution in this area. The jobid is large enough (32 bytes) that it is reasonable to store it in its own xattr on the file. For ext4's sake, the xattr should be named "job" (3-byte xattr names are the most efficient to store), and probably stored in the "lustre" or "user" namespace so that it is visible to regular users, as it is probably not critical if users can change or remove this xattr for whatever reason.

Comment by Gerrit Updater [ 12/May/23 ]

"Thomas Bertschinger <bertschinger@lanl.gov>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50982
Subject: LU-13031 jobstats: store jobid in xattr when files are created
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 26c63b60e0dcd145a8d032a9f4780c921435e184

Comment by Thomas Bertschinger [ 12/May/23 ]

@Patrick Farrell
Here's a link to my question on the mailing list archive http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2023-May/018629.html If you want to respond with your comments either in the mailing list or in the ticket here to continue the discussion.

Comment by Patrick Farrell [ 12/May/23 ]

Thanks, Thomas.

Could you lay out the implications of using user vs lustre namespaces as you see them?  What do we gain, what we do lose, and/or what requirements can't we meet in certain settings?

Comment by Thomas Bertschinger [ 12/May/23 ]

Requirements (or at least, Wants):

  • Can preserve xattr with tools like `cp` and `rsync` (it's not useful if you copy your data and the jobid gets overwritten with stuff like "cp.1234")
  • Can preserve xattr when moving files to other filesystems, e.g., to an archive, since the jobid is likely useful to keep around long term

User namespace
Pros:

  • Enables things like `cp --preserve=xattr` and `rsync --xattrs`
  • Allows interoperability with other filesystems (xattr can be preserved if a file is copied to an FS that doesn't know about the "lustre" namespace)

Cons:

  • violates expectation that the kernel will not write user data, which could be problematic if this becomes enforced at some point

Lustre namespace:
Pros:

  • Allows flexibility in terms of being written from both user and kernel space; we can decide our own rules

Cons:

  • Lustre xattr is "virtual" and must be backed by an xattr in a traditional namespace (example, lustre.lov and trusted.lov) since the kernel (AFAICT) will reject storing a lustre.* xattr directly, which increases complexity.
  • Prevents interoperability-- standard tools like `rsync` won't be able to preserve this xattr because tools that copy files to other filesystems will have to be specially coded to translate lustre.job into user.job

Trusted namespace: I think this would only be considered as backing the lustre namespace and wouldn't be used on its own.

System namespace:
Pros:
-Arguably the most correct choice. According to man 7 xattr: "Extended system attributes are used by the kernel to store system objects...  Read and write access permissions to system attributes depend on the policy ... implemented by filesystems in the kernel." So this fits our need of being able to write it from both user and kernel processes.
Cons:
-Is not interoperable with other filesystems because other filesystems likely will not allow user processes to write a "system.job" xattr.

 

My opinion is that the user namespace is the best choice because every other choice fails to meet the goals above, but I will admit that not being very experienced with the kernel side of things, I can't say how severe the problem with the user namespace is.

Comment by Patrick Farrell [ 12/May/23 ]

Thank you for that; I'm persuaded we should do the user xattr, the issues around interop and user tools look decisive.  If we're forced to change tack in the future, we can deal with that then.  (I think I was the only one actively objecting, Andreas indicated OK at LUG, so you should be good to go.)

Comment by Andreas Dilger [ 12/May/23 ]

I'm inclined toward the user.job xattr, but it is never a good idea to unilaterally make policy decisions in the kernel that cannot be changed.

As such, it might make sense to have a tunable parameter like "mdt.*.job_xattr=user.job" (or mdd.*.job_xattr is possibly better, though e.g. "mdt.*.job_cleanup_interval" already exists), and then this could be changed in the future if there is some conflict (e.g. some site already uses the "user.job" xattr for some other purpose).

I don't think the job_xattr should allow totally arbitrary values (e.g. overwriting trusted.lov or trusted.lma or security.* would be bad). One option is to only allow a limited selection of valid xattr namespaces, and possibly names:

  • NONE to turn this feature off
  • user, or trusted or system (if admin wants to restrict the ability of regular users to change this value?), with implicit ".job" added
  • user.* (or trusted.* or system.*) to also allow specifying the xattr name

If we allow the xattr name portion to be specified (which I'm not sure about, but putting it out for completeness), it should have some reasonable limits:

  • <= 7 characters long to avoid wasting valuable xattr space in the inode
  • only ASCII alpha-numeric characters, plus "." for the name separator, to avoid spaces, control characters, symbols (=, ,, etc.) that might cause problems
  • should not conflict with other known xattrs, which is tricky if we allow the name to be arbitrary. Possibly if in trusted (and system?) it should only allow trusted.job to avoid future conflicts?
  • maybe restrict it to contain "job" (or maybe "pbs", "slurm", ...) to reduce the chance of namespace clashes in user or system? However, I'm reluctant to restrict names in user since this shouldn't have any fatal side effects (e.g. data corruption like in trusted or system), and the admin is supposed to know what they are doing...

Thoughts?

Comment by Andreas Dilger [ 12/May/23 ]

For the implementation, any such restrictions/checking would be done in the sysfs write handler when the xattr is first set, and for actual xattr creation this would just copy the job_xattr string during actual file creation with no need for additional sanity checking, since it should always be consistent.

For the "don't want user.job to be cp.12345" issue, I think that should be handled properly. Initially, the xattr would be set to cp.PID or SLURM_JOBID or whatever by the MDS internally at file creation time. However, if cp, rsync, tar, dcp, dsync ... is also copying xattrs, the initial value would be overwritten later in the copy process, which is what we would want I think.

Comment by Gerrit Updater [ 07/Aug/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50982/
Subject: LU-13031 jobstats: store jobid in xattr when files are created
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 23a2db28dcf1422a6a6da575e907fd257106d402

Comment by Peter Jones [ 07/Aug/23 ]

Landed for 2.16

Comment by James A Simmons [ 25/Aug/23 ]

The new maloo test fail in interop.

Comment by Thomas Bertschinger [ 25/Aug/23 ]

Does interop refer to some testing that mixes different client and server versions? Is the failure occurring when the server version is older than the client and the MDS doesn't have this new change?

If it just needs a version check like:

(( $MDS1_VERSION >= $(version_code 2.15.??) )) || skip ...

that should be simple enough, I can submit a fix for that.

The version I get when I check out this commit and run `lctl get_param version` is 2.15.57. Is that the correct code to put in the version check? Or do I need one higher than that (since I guess you could check out a commit newer than the 2.15.57 tag but older than this change and its version would pass...). If I need one higher than 2.15.57, do I use 2.15.58, 2.16.0, or something else?

Comment by Patrick Farrell [ 25/Aug/23 ]

Yes, you've got this exactly right.  That's what interop is and that check is what's needed.

Generally you'd just use 2.15.57, the tag that was live when the patch was merged.  We don't do interop testing between different development versions, just between dev versions and older full versions.  It's also fine if in some narrow scenario we skip a test that would pass.

So that MDS version check you suggested for 2.15.57 is what's needed.  Oops on the reviewers (ie, including me) for not noting that at the time.

Comment by Andreas Dilger [ 25/Aug/23 ]

The version_code helper supports 4-component version numbers, so since this is being done after the landing it is helpful to include the full version from "git describe" of the commit hash of the patch itself. If it were before landing it could use the 4-component version of the patch before landing, even though it is slightly inaccurate..

It is also helpful to include a brief description of why the test is being skipped. Please see some examples in patch https://review.whamcloud.com/52009 "LU-16097 tests: skip quota subtests in interop".

Comment by Gerrit Updater [ 25/Aug/23 ]

"Thomas Bertschinger <bertschinger@lanl.gov>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52095
Subject: LU-13031 tests: skip sanity/test_205h,205i in interop
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1532029b43cfdc41b6747ea2d4076f12d28e8be0

Comment by Gerrit Updater [ 06/Sep/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52095/
Subject: LU-13031 tests: skip sanity/test_205h,205i in interop
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a6df532162556028c2ab9b974989fc0cca68d4fe

Comment by Gerrit Updater [ 10/Sep/23 ]

"Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52332
Subject: LU-13031 tests: avoid new user xattr in interop
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: ed56abd1070e9812e55728b14c760580b03074e0

Comment by Andreas Dilger [ 27/Sep/23 ]

Thomas, I just realized that this is storing the xattr on the MDT inode, but it would also be useful if the same xattr was also stored on the OST objects for debugging/scanning/recovery purposes. Is this something that you would be interested to work on?

Comment by Thomas Bertschinger [ 27/Sep/23 ]

Hi Andreas,

I'd be willing to work on adding that in. I'm pretty busy at the moment so I likely won't be able to start working on it for 2-3 weeks, but if it can wait that long I'm happy to give it a shot.

Comment by Andreas Dilger [ 28/Sep/23 ]

No rush. For some reason I thought this was part of the original implementation, but upon looking at the patch again I saw it was only on the MDT inode.

I think it makes sense to put this in ofd_attr_handle_id() where it is initializing the object ownership the first time the OST object is written to (in the "if (mask != 0)" block if it cleared the SUID/SGID bits). That allows a "one shot" setting of the user.job xattr without having to check each time whether it exists or not. The parameter obdfilter.*.jobid_xattr should be used like mdt.*.jobid_xattr to control the xattr name.

Comment by Gerrit Updater [ 13/Nov/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52332/
Subject: LU-13031 tests: avoid new user xattr in interop
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: fac50dcc27f09384981d995e1234943f06a17969

Comment by Gerrit Updater [ 07/Dec/23 ]

"Thomas Bertschinger <bertschinger@lanl.gov>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53367
Subject: LU-13031 ofd: add jobid xattr to ost object
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a3bfb3d3739574bf298fb875257dcaab8bc77a61

Generated at Sat Feb 10 02:57:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.