[LU-16205] fid2path for encrypted files Created: 04/Oct/22  Updated: 25/Apr/23  Resolved: 13/Mar/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Sebastien Buisson Assignee: Sebastien Buisson
Resolution: Fixed Votes: 0
Labels: encryption, security

Issue Links:
Related
is related to LU-16639 job_stat_exit() should not have any i... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807
Epic Link: Client side Encrypted backup/restore

 Description   

Two cases to support:

  • fid2path with the encryption key
  • fid2path without the encryption key


 Comments   
Comment by Gerrit Updater [ 21/Oct/22 ]

"Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48930
Subject: LU-16205 sec: fid2path for encrypted files
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: dc41113a309d3eaafb7389cf9c8e31570f78ee73

Comment by Gerrit Updater [ 03/Nov/22 ]

"Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49028
Subject: LU-16205 sec: reserve flag for fid2path for encrypted files
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d67702bd2f3fb55fa7099e6890ae507d06cbf0a9

Comment by Gerrit Updater [ 13/Dec/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49028/
Subject: LU-16205 sec: reserve flag for fid2path for encrypted files
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6f74bb60ff6c58f4a2647556124c501100330f4c

Comment by Gerrit Updater [ 03/Feb/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48930/
Subject: LU-16205 sec: fid2path for encrypted files
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: fa9da556ad22b1485c53cf0337dc6872d89aedfa

Comment by Peter Jones [ 03/Feb/23 ]

Landed for 2.16

Comment by Andreas Dilger [ 10/Mar/23 ]

It looks like this patch has somehow resulted in a leak of job_stats on the server that result in a message being printed at unmount time:

[ 1095.059674] Lustre: Failing over lustre-MDT0000
[ 1095.090215] LustreError: 31568:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items
[ 1095.093184] LustreError: 31568:0:(lprocfs_jobstats.c:137:job_stat_exit()) Skipped 20 previous similar messages
[ 1095.146954] Lustre: server umount lustre-MDT0000 complete

According to a Kibana log search, this message first appeared on 2023-01-13 for patch commit a747abbea881 (v4 of "LU-16205 sec: fid2path for encrypted files") and commit e53c0eaa56fe (v3 of "LU-16310 sec: Lustre/HSM on enc file with enc key") that was based on top of it. This repeated for each refresh of those patches, and then picked up significantly on 2023-02-03 after the patches landed.

I picked sanity-sec test_63 to speed up the search for large date ranges, but this message is present in many other subtests on master.

Comment by Sebastien Buisson [ 13/Mar/23 ]

I will have a look. At first sight it looks weird that v4 of patch #48930 introduced this behavior, as compared to v3 it only modifies sanity-sec.sh, and for cosmetic changes. But a rebase was also made when pushing v4, so it might be a collateral damage.

Comment by Sebastien Buisson [ 13/Mar/23 ]

I pushed debug patch https://review.whamcloud.com/50275 to investigate this issue. It is a revert of:

LU-16494 fileset: check fileset for operations by fid
LU-16310 sec: Lustre/HSM on enc file with enc key
LU-16205 sec: fid2path for encrypted files

With that build, I still get the messages about job stats when stopping an MDT target:

[ 1329.906800] Lustre: Failing over lustre-MDT0000
[ 1329.977350] LustreError: 34261:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items
[ 1329.979183] LustreError: 34261:0:(lprocfs_jobstats.c:137:job_stat_exit()) Skipped 9 previous similar messages
[ 1330.065565] Lustre: server umount lustre-MDT0000 complete

To reproduce, I just configured job stats on a client with # lctl set_param jobid_var=procname_uid, then did some IOs, then dumped job stats on the MDT with lctl get_param mdt.*.job_stats, and finally unmounted the MDT target.

So I doubt those messages are due to the fid2path patches. I think this ticket can be re-closed.

Comment by Andreas Dilger [ 13/Mar/23 ]

Ah, it seems I was mislead by my random selection of sanity-sec test_63 to find the first occurrence. Since that subtest was added in this patch it is (in hindsight) obvious that there would be no cases of the error message before the patch. I was tricked by the fact that the errors were seen before the patch landed, and then became common after it landed. I will need to re-do the searches with a different subtest that existed in the tree earlier.

Comment by Andreas Dilger [ 13/Mar/23 ]

Filed LU-16639 to track this jobid issue further.

Generated at Sat Feb 10 03:24:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.