[LU-17080] hsm_archive request got stuck in action queue forever when the file has immutable bit set Created: 01/Sep/23  Updated: 01/Sep/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.9
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Huaqing Wang Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Epic/Theme: hsm
Severity: 2
Rank (Obsolete): 9223372036854775807

 Description   

Hi,

We've found an issue that lustre hsm archive request would get stuck in hsm action queue when user does "chattr +i" or "chattr +a" operation for a new file. I couldn't find existing ticket for this so just create this new one, please let me know if this is duplicate to any existing ticket. The fact it stuck in HSM action queue forever makes me wonder this may not be expected behavior and seems a bug. Thanks in advance for any insights here!

To reproduce this:
On a lustre client which mounted with a lustre file system:

[root@ip-10-0-130-193 ns1]# echo "test hsm_archive stuck" > test_hsm_archive.txt
[root@ip-10-0-130-193 ns1]# chattr +i test_hsm_archive.txt 
[root@ip-10-0-130-193 ns1]# lsattr test_hsm_archive.txt
----i----------- test_hsm_archive.txt
[root@ip-10-0-130-193 ns1]# lfs hsm_archive test_hsm_archive.txt 
[root@ip-10-0-130-193 ns1]# lfs hsm_state test_hsm_archive.txt
test_hsm_archive.txt: (0x00000000)

On MDS we'll notice the request stuck in hsm action queue forever:

[root@ip-198-19-37-27 ns1]# sudo lctl get_param -n 'mdt.*.hsm.actions'
lrh=[type=10680000 len=136 idx=1/430] fid=[0x200000402:0x5:0x0] dfid=[0x200000402:0x5:0x0] compound/cookie=0x0/0x64c7be8e action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]

Tracing through the code, it seems stuck due to EPERM returned when setting HSM exists (mdt_hsm_set_exists)
Stack trace:
https://github.com/lustre/lustre-release/blob/v2_12_90/lustre/mdd/mdd_object.c#L1307-L1308
https://github.com/lustre/lustre-release/blob/v2_12_90/lustre/mdd/mdd_object.c#L1878
https://github.com/lustre/lustre-release/blob/v2_12_90/lustre/mdt/mdt_hsm.c#L77
https://github.com/lustre/lustre-release/blob/v2_12_90/lustre/mdt/mdt_coordinator.c#L1180
https://github.com/lustre/lustre-release/blob/v2_12_90/lustre/mdt/mdt_coordinator.c#L1250-L1257

Huaqing Wang
AWS Fsx for Lustre


Generated at Sat Feb 10 03:32:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.