[LU-16205] fid2path for encrypted files Created: 04/Oct/22 Updated: 25/Apr/23 Resolved: 13/Mar/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sebastien Buisson | Assignee: | Sebastien Buisson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | encryption, security | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Epic Link: | Client side Encrypted backup/restore | ||||||||
| Description |
|
Two cases to support:
|
| Comments |
| Comment by Gerrit Updater [ 21/Oct/22 ] |
|
"Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48930 |
| Comment by Gerrit Updater [ 03/Nov/22 ] |
|
"Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49028 |
| Comment by Gerrit Updater [ 13/Dec/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49028/ |
| Comment by Gerrit Updater [ 03/Feb/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48930/ |
| Comment by Peter Jones [ 03/Feb/23 ] |
|
Landed for 2.16 |
| Comment by Andreas Dilger [ 10/Mar/23 ] |
|
It looks like this patch has somehow resulted in a leak of job_stats on the server that result in a message being printed at unmount time: [ 1095.059674] Lustre: Failing over lustre-MDT0000 [ 1095.090215] LustreError: 31568:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items [ 1095.093184] LustreError: 31568:0:(lprocfs_jobstats.c:137:job_stat_exit()) Skipped 20 previous similar messages [ 1095.146954] Lustre: server umount lustre-MDT0000 complete According to a Kibana log search, this message first appeared on 2023-01-13 for patch commit a747abbea881 (v4 of "LU-16205 sec: fid2path for encrypted files") and commit e53c0eaa56fe (v3 of "LU-16310 sec: Lustre/HSM on enc file with enc key") that was based on top of it. This repeated for each refresh of those patches, and then picked up significantly on 2023-02-03 after the patches landed. I picked sanity-sec test_63 to speed up the search for large date ranges, but this message is present in many other subtests on master. |
| Comment by Sebastien Buisson [ 13/Mar/23 ] |
|
I will have a look. At first sight it looks weird that v4 of patch #48930 introduced this behavior, as compared to v3 it only modifies sanity-sec.sh, and for cosmetic changes. But a rebase was also made when pushing v4, so it might be a collateral damage. |
| Comment by Sebastien Buisson [ 13/Mar/23 ] |
|
I pushed debug patch https://review.whamcloud.com/50275 to investigate this issue. It is a revert of: LU-16494 fileset: check fileset for operations by fid LU-16310 sec: Lustre/HSM on enc file with enc key LU-16205 sec: fid2path for encrypted files With that build, I still get the messages about job stats when stopping an MDT target: [ 1329.906800] Lustre: Failing over lustre-MDT0000 [ 1329.977350] LustreError: 34261:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items [ 1329.979183] LustreError: 34261:0:(lprocfs_jobstats.c:137:job_stat_exit()) Skipped 9 previous similar messages [ 1330.065565] Lustre: server umount lustre-MDT0000 complete To reproduce, I just configured job stats on a client with # lctl set_param jobid_var=procname_uid, then did some IOs, then dumped job stats on the MDT with lctl get_param mdt.*.job_stats, and finally unmounted the MDT target. So I doubt those messages are due to the fid2path patches. I think this ticket can be re-closed. |
| Comment by Andreas Dilger [ 13/Mar/23 ] |
|
Ah, it seems I was mislead by my random selection of sanity-sec test_63 to find the first occurrence. Since that subtest was added in this patch it is (in hindsight) obvious that there would be no cases of the error message before the patch. I was tricked by the fact that the errors were seen before the patch landed, and then became common after it landed. I will need to re-do the searches with a different subtest that existed in the tree earlier. |
| Comment by Andreas Dilger [ 13/Mar/23 ] |
|
Filed |