Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
The layout swap can fail at the end of an HSM restore operation, but the error code isn't set in the generated changelog record.
00000004:00000001:4.0:1650499429.250580:0:29125:0:(mdd_object.c:2378:mdd_swap_layouts()) Process leaving via stop (rc=18446744073709551521 : -95 : 0xffffffffffffffa1)
00000004:00000001:4.0:1650499429.250674:0:29125:0:(mdt_coordinator.c:1383:hsm_swap_layouts()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1)
00000004:00000001:4.0:1650499429.251150:0:29125:0:(mdt_coordinator.c:1589:hsm_cdt_request_completed()) Process leaving via out (rc=18446744073709551521 : -95 : 0xffffffffffffffa1)
00000004:00000001:4.0:1650499429.251157:0:29125:0:(mdt_coordinator.c:1600:hsm_cdt_request_completed()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1)
00000004:00000001:4.0:1650499429.251344:0:29125:0:(mdt_coordinator.c:1720:mdt_hsm_update_request_state()) Process leaving via out (rc=18446744073709551521 : -95 : 0xffffffffffffffa1)
00000004:00000001:4.0:1650499429.251362:0:29125:0:(mdt_hsm.c:144:mdt_hsm_progress()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1)
The issue can be reproduced by setting fail_loc=OBD_FAIL_MDS_HSM_SWAP_LAYOUTS (0x152):
# lfs hsm_state /mnt/lustre/testdir0/testfile2
/mnt/lustre/testdir0/testfile2: (0x0000000d) released exists archived, archive_id:1
# lctl set_param fail_loc=0x152
fail_loc=0x152
# lfs hsm_restore /mnt/lustre/testdir0/testfile2
# lfs path2fid /mnt/lustre/testdir0/testfile2
[0x200000bd1:0x6:0x0]
The copytool fails to restore the file, as expected:
lhsmtool_posix: 1652985060.492662 lhsmtool_posix[436615]: Action completed, notifying coordinator cookie=0x62868c6a, FID=[0x200000bd1:0x6:0x0], hp_flags=0 err=0
lhsmtool_posix: 1652985060.507948 lhsmtool_posix[436615]: llapi_hsm_action_end() on '/mnt/lustre/.lustre/fid/0x200000bd1:0x6:0x0' failed: Operation not supported (95)# lfs changelog testfs-MDT0000
86 16HSM 18:31:00.507625903 2022.05.19 0x0 t=[0x200000bd1:0x6:0x0] ef=0xf u=0:0 nid=0@lo
87 16HSM 18:31:00.507859827 2022.05.19 0x180 t=[0x200000bd1:0x6:0x0] ef=0xf u=0:0 nid=0@lo
88 16HSM 18:31:00.507874817 2022.05.19 0x80 t=[0x200000bd1:0x6:0x0] ef=0xf u=0:0 nid=0@lo
The last changelog record has flags 0x80 == 0b10000000; according to the CLF_HSM_* macros, bits 0-6 are the error code and 7-9 are the HSM operation, so the error code is 0 and the HSM operation is 1, which is HE_RESTORE from enum hsm_event, so the EOPNOTSUPP (Operation not supported) that the copytool received is not being encoded in the changelog record flags, as it should be.