Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17080

hsm_archive request got stuck in action queue forever when the file has immutable bit set

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.9
    • None
    • 2
    • 9223372036854775807

    Description

      Hi,

      We've found an issue that lustre hsm archive request would get stuck in hsm action queue when user does "chattr +i" or "chattr +a" operation for a new file. I couldn't find existing ticket for this so just create this new one, please let me know if this is duplicate to any existing ticket. The fact it stuck in HSM action queue forever makes me wonder this may not be expected behavior and seems a bug. Thanks in advance for any insights here!

      To reproduce this:
      On a lustre client which mounted with a lustre file system:

      [root@ip-10-0-130-193 ns1]# echo "test hsm_archive stuck" > test_hsm_archive.txt
      [root@ip-10-0-130-193 ns1]# chattr +i test_hsm_archive.txt 
      [root@ip-10-0-130-193 ns1]# lsattr test_hsm_archive.txt
      ----i----------- test_hsm_archive.txt
      [root@ip-10-0-130-193 ns1]# lfs hsm_archive test_hsm_archive.txt 
      [root@ip-10-0-130-193 ns1]# lfs hsm_state test_hsm_archive.txt
      test_hsm_archive.txt: (0x00000000)

      On MDS we'll notice the request stuck in hsm action queue forever:

      [root@ip-198-19-37-27 ns1]# sudo lctl get_param -n 'mdt.*.hsm.actions'
      lrh=[type=10680000 len=136 idx=1/430] fid=[0x200000402:0x5:0x0] dfid=[0x200000402:0x5:0x0] compound/cookie=0x0/0x64c7be8e action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]

      Tracing through the code, it seems stuck due to EPERM returned when setting HSM exists (mdt_hsm_set_exists)
      Stack trace:
      https://github.com/lustre/lustre-release/blob/v2_12_90/lustre/mdd/mdd_object.c#L1307-L1308
      https://github.com/lustre/lustre-release/blob/v2_12_90/lustre/mdd/mdd_object.c#L1878
      https://github.com/lustre/lustre-release/blob/v2_12_90/lustre/mdt/mdt_hsm.c#L77
      https://github.com/lustre/lustre-release/blob/v2_12_90/lustre/mdt/mdt_coordinator.c#L1180
      https://github.com/lustre/lustre-release/blob/v2_12_90/lustre/mdt/mdt_coordinator.c#L1250-L1257

      Huaqing Wang
      AWS Fsx for Lustre

      Attachments

        Activity

          People

            wc-triage WC Triage
            huaqing Huaqing Wang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: