Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.12.9
-
None
-
2
-
9223372036854775807
Description
Hi,
We've found an issue that lustre hsm archive request would get stuck in hsm action queue when user does "chattr +i" or "chattr +a" operation for a new file. I couldn't find existing ticket for this so just create this new one, please let me know if this is duplicate to any existing ticket. The fact it stuck in HSM action queue forever makes me wonder this may not be expected behavior and seems a bug. Thanks in advance for any insights here!
To reproduce this:
On a lustre client which mounted with a lustre file system:
[root@ip-10-0-130-193 ns1]# echo "test hsm_archive stuck" > test_hsm_archive.txt
[root@ip-10-0-130-193 ns1]# chattr +i test_hsm_archive.txt
[root@ip-10-0-130-193 ns1]# lsattr test_hsm_archive.txt
----i----------- test_hsm_archive.txt
[root@ip-10-0-130-193 ns1]# lfs hsm_archive test_hsm_archive.txt
[root@ip-10-0-130-193 ns1]# lfs hsm_state test_hsm_archive.txt
test_hsm_archive.txt: (0x00000000)
On MDS we'll notice the request stuck in hsm action queue forever:
[root@ip-198-19-37-27 ns1]# sudo lctl get_param -n 'mdt.*.hsm.actions'
lrh=[type=10680000 len=136 idx=1/430] fid=[0x200000402:0x5:0x0] dfid=[0x200000402:0x5:0x0] compound/cookie=0x0/0x64c7be8e action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]
Tracing through the code, it seems stuck due to EPERM returned when setting HSM exists (mdt_hsm_set_exists)
Stack trace:
https://github.com/lustre/lustre-release/blob/v2_12_90/lustre/mdd/mdd_object.c#L1307-L1308
https://github.com/lustre/lustre-release/blob/v2_12_90/lustre/mdd/mdd_object.c#L1878
https://github.com/lustre/lustre-release/blob/v2_12_90/lustre/mdt/mdt_hsm.c#L77
https://github.com/lustre/lustre-release/blob/v2_12_90/lustre/mdt/mdt_coordinator.c#L1180
https://github.com/lustre/lustre-release/blob/v2_12_90/lustre/mdt/mdt_coordinator.c#L1250-L1257
Huaqing Wang
AWS Fsx for Lustre