[LU-15777] HSM changelog indicates success for a failed restore Created: 22/Apr/22  Updated: 24/Nov/23  Resolved: 04/Oct/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Nikitas Angelinas Assignee: Nikitas Angelinas
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The layout swap can fail at the end of an HSM restore operation, but the error code isn't set in the generated changelog record.

00000004:00000001:4.0:1650499429.250580:0:29125:0:(mdd_object.c:2378:mdd_swap_layouts()) Process leaving via stop (rc=18446744073709551521 : -95 : 0xffffffffffffffa1)
00000004:00000001:4.0:1650499429.250674:0:29125:0:(mdt_coordinator.c:1383:hsm_swap_layouts()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1)
00000004:00000001:4.0:1650499429.251150:0:29125:0:(mdt_coordinator.c:1589:hsm_cdt_request_completed()) Process leaving via out (rc=18446744073709551521 : -95 : 0xffffffffffffffa1)
00000004:00000001:4.0:1650499429.251157:0:29125:0:(mdt_coordinator.c:1600:hsm_cdt_request_completed()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1)
00000004:00000001:4.0:1650499429.251344:0:29125:0:(mdt_coordinator.c:1720:mdt_hsm_update_request_state()) Process leaving via out (rc=18446744073709551521 : -95 : 0xffffffffffffffa1)
00000004:00000001:4.0:1650499429.251362:0:29125:0:(mdt_hsm.c:144:mdt_hsm_progress()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1)

The issue can be reproduced by setting fail_loc=OBD_FAIL_MDS_HSM_SWAP_LAYOUTS (0x152):

# lfs hsm_state /mnt/lustre/testdir0/testfile2
/mnt/lustre/testdir0/testfile2: (0x0000000d) released exists archived, archive_id:1
# lctl set_param fail_loc=0x152
fail_loc=0x152
# lfs hsm_restore /mnt/lustre/testdir0/testfile2
# lfs path2fid /mnt/lustre/testdir0/testfile2
[0x200000bd1:0x6:0x0]

The copytool fails to restore the file, as expected:

lhsmtool_posix: 1652985060.492662 lhsmtool_posix[436615]: Action completed, notifying coordinator cookie=0x62868c6a, FID=[0x200000bd1:0x6:0x0], hp_flags=0 err=0
lhsmtool_posix: 1652985060.507948 lhsmtool_posix[436615]: llapi_hsm_action_end() on '/mnt/lustre/.lustre/fid/0x200000bd1:0x6:0x0' failed: Operation not supported (95)

# lfs changelog testfs-MDT0000
86 16HSM 18:31:00.507625903 2022.05.19 0x0 t=[0x200000bd1:0x6:0x0] ef=0xf u=0:0 nid=0@lo
87 16HSM 18:31:00.507859827 2022.05.19 0x180 t=[0x200000bd1:0x6:0x0] ef=0xf u=0:0 nid=0@lo
88 16HSM 18:31:00.507874817 2022.05.19 0x80 t=[0x200000bd1:0x6:0x0] ef=0xf u=0:0 nid=0@lo

The last changelog record has flags 0x80 == 0b10000000; according to the CLF_HSM_* macros, bits 0-6 are the error code and 7-9 are the HSM operation, so the error code is 0 and the HSM operation is 1, which is HE_RESTORE from enum hsm_event, so the EOPNOTSUPP (Operation not supported) that the copytool received is not being encoded in the changelog record flags, as it should be.



 Comments   
Comment by Gerrit Updater [ 22/Apr/22 ]

"Nikitas Angelinas <nikitas.angelinas@hpe.com>" uploaded a new patch: https://review.whamcloud.com/47121
Subject: LU-15777 hsm: set changelog error for restore layout swap failure
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4879c784fa4120627b0b5adf14bf3eb2aa135551

Comment by Gerrit Updater [ 04/Oct/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47121/
Subject: LU-15777 hsm: set changelog error for restore layout swap failure
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 09fe64719b888cd212b6cffe923545b7650f230f

Comment by Peter Jones [ 04/Oct/22 ]

Landed for 2.16

Comment by Gerrit Updater [ 03/Jul/23 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51542
Subject: LU-15777 hsm: set changelog error for restore layout swap failure
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: a8b0be596cdb8f2e4973d7ec9359cced2242a409

Generated at Sat Feb 10 03:21:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.