[LU-17430] interop sanity-hsm test_114: request on <fid> is not SUCCEED on mds1 Created: 16/Jan/24  Updated: 04/Feb/24  Resolved: 04/Feb/24

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Etienne Aujames
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for eaujames <eaujames@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/bb8dce7a-a62a-4093-9cfb-8324978304a1

test_114 failed with the following error:

request on 0x200000402:0x253:0x0 is not SUCCEED on mds1

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/101373 - 4.18.0-477.27.1.el8_8.x86_64
servers: https://build.whamcloud.com/job/lustre-b2_15/81 - 4.18.0-513.9.1.el8_lustre.x86_64

The test fails with:

== sanity-hsm test 114: Incompatible request does not set other requests as STARTED ========================================================== 15:42:52 (1705333372)
0+0 records in
0+0 records out
0 bytes copied, 0.000421867 s, 0.0 kB/s
0+0 records in
0+0 records out
0 bytes copied, 0.000465835 s, 0.0 kB/s
CMD: trevis-121vm9 mkdir -p /tmp/arc1/sanity-hsm.test_114/
Starting copytool 'agt1' on 'trevis-121vm9' with cmdline 'lhsmtool_posix --archive-format=v2 --hsm-root=/tmp/arc1/sanity-hsm.test_114/ --daemon --pid-file=/var/run/lhsmtool_posix.pid  "/mnt/lustre2"'
CMD: trevis-121vm9 lhsmtool_posix --archive-format=v2 --hsm-root=/tmp/arc1/sanity-hsm.test_114/ --daemon --pid-file=/var/run/lhsmtool_posix.pid  "/mnt/lustre2" < /dev/null > "/autotest/autotest-2/2024-01-15/lustre-reviews_custom_101373_1003_c53f4cf7-0b84-488f-b7c2-5d058bec8e18//sanity-hsm.test_114.copytool_log.trevis-121vm9.log" 2>&1
CMD: trevis-102vm7 /usr/sbin/lctl set_param mdt.lustre-MDT0000.hsm_control='disabled'
mdt.lustre-MDT0000.hsm_control=disabled
CMD: trevis-102vm7 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
CMD: trevis-102vm7 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x200000402:0x253:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
CMD: trevis-102vm7 /usr/sbin/lctl set_param mdt.lustre-MDT0000.hsm_control='enabled'
mdt.lustre-MDT0000.hsm_control=enabled
CMD: trevis-102vm7 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
CMD: trevis-102vm7 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x200000402:0x253:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
Waiting 200s for 'SUCCEED'
...
Waiting 10s for 'SUCCEED'
...
CMD: trevis-102vm7 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x200000402:0x253:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
Waiting 0s for 'SUCCEED'
Update not seen after 200s: want 'SUCCEED' got 'STARTED'
 sanity-hsm test_114: @@@@@@ FAIL: request on 0x200000402:0x253:0x0 is not SUCCEED on mds1 

The request is not received by the copytool:

lhsmtool_posix: 1705333373.574471 lhsmtool_posix[144023]: action=0 src=(null) dst=(null) mount_point=/mnt/lustre2
lhsmtool_posix: 1705333373.578036 lhsmtool_posix[144024]: waiting for message from kernel
exiting: Terminated

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-hsm test_114 - request on 0x200000402:0x253:0x0 is not SUCCEED on mds1



 Comments   
Comment by Etienne Aujames [ 16/Jan/24 ]

The test 114 misses a server compatibility check. This was added along with https://review.whamcloud.com/48658 "LU-16188 mdt: fix incompatible HSM request handling" server patch.

Comment by Gerrit Updater [ 17/Jan/24 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53694
Subject: LU-17430 tests: fix interop sanity-hsm testing with b2_15
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4afd057cb09998dcbea1cdacaa7e561d6827c98b

Comment by Gerrit Updater [ 04/Feb/24 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53694/
Subject: LU-17430 tests: fix interop sanity-hsm testing with b2_15
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2f590444ea1cb9aaa351fce4d89e9dfc81130ff0

Comment by Peter Jones [ 04/Feb/24 ]

Merged for 2.16

Generated at Sat Feb 10 03:35:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.