Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15777

HSM changelog indicates success for a failed restore

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The layout swap can fail at the end of an HSM restore operation, but the error code isn't set in the generated changelog record.

      00000004:00000001:4.0:1650499429.250580:0:29125:0:(mdd_object.c:2378:mdd_swap_layouts()) Process leaving via stop (rc=18446744073709551521 : -95 : 0xffffffffffffffa1)
      00000004:00000001:4.0:1650499429.250674:0:29125:0:(mdt_coordinator.c:1383:hsm_swap_layouts()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1)
      00000004:00000001:4.0:1650499429.251150:0:29125:0:(mdt_coordinator.c:1589:hsm_cdt_request_completed()) Process leaving via out (rc=18446744073709551521 : -95 : 0xffffffffffffffa1)
      00000004:00000001:4.0:1650499429.251157:0:29125:0:(mdt_coordinator.c:1600:hsm_cdt_request_completed()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1)
      00000004:00000001:4.0:1650499429.251344:0:29125:0:(mdt_coordinator.c:1720:mdt_hsm_update_request_state()) Process leaving via out (rc=18446744073709551521 : -95 : 0xffffffffffffffa1)
      00000004:00000001:4.0:1650499429.251362:0:29125:0:(mdt_hsm.c:144:mdt_hsm_progress()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1)

      The issue can be reproduced by setting fail_loc=OBD_FAIL_MDS_HSM_SWAP_LAYOUTS (0x152):

      # lfs hsm_state /mnt/lustre/testdir0/testfile2
      /mnt/lustre/testdir0/testfile2: (0x0000000d) released exists archived, archive_id:1
      # lctl set_param fail_loc=0x152
      fail_loc=0x152
      # lfs hsm_restore /mnt/lustre/testdir0/testfile2
      # lfs path2fid /mnt/lustre/testdir0/testfile2
      [0x200000bd1:0x6:0x0]

      The copytool fails to restore the file, as expected:

      lhsmtool_posix: 1652985060.492662 lhsmtool_posix[436615]: Action completed, notifying coordinator cookie=0x62868c6a, FID=[0x200000bd1:0x6:0x0], hp_flags=0 err=0
      lhsmtool_posix: 1652985060.507948 lhsmtool_posix[436615]: llapi_hsm_action_end() on '/mnt/lustre/.lustre/fid/0x200000bd1:0x6:0x0' failed: Operation not supported (95)

      # lfs changelog testfs-MDT0000
      86 16HSM 18:31:00.507625903 2022.05.19 0x0 t=[0x200000bd1:0x6:0x0] ef=0xf u=0:0 nid=0@lo
      87 16HSM 18:31:00.507859827 2022.05.19 0x180 t=[0x200000bd1:0x6:0x0] ef=0xf u=0:0 nid=0@lo
      88 16HSM 18:31:00.507874817 2022.05.19 0x80 t=[0x200000bd1:0x6:0x0] ef=0xf u=0:0 nid=0@lo

      The last changelog record has flags 0x80 == 0b10000000; according to the CLF_HSM_* macros, bits 0-6 are the error code and 7-9 are the HSM operation, so the error code is 0 and the HSM operation is 1, which is HE_RESTORE from enum hsm_event, so the EOPNOTSUPP (Operation not supported) that the copytool received is not being encoded in the changelog record flags, as it should be.

      Attachments

        Activity

          People

            nangelinas Nikitas Angelinas
            nangelinas Nikitas Angelinas
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: