[LU-11833] wait_request_state return failure when call hsm_archive and hsm_remove multiple times Created: 02/Jan/19  Updated: 02/Jan/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Qian Yingjin Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: HSM

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The following test script will result in wait_reqeust_state failure:

 

test_8() {
 local file=$DIR/$tdir/$tfile
 local hsm_root=$(hsm_root)
 local mdtidx=${4:-0}
 local fid
copytool setup -m "$MOUNT" -a "$HSM_ARCHIVE_NUMBER"
mkdir -p $DIR/$tdir
 do_facet $SINGLEAGT "echo -n new_data > $file"
 fid=$(path2fid $file)
$LFS hsm_state $file
 $LFS hsm_archive --archive $HSM_ARCHIVE_NUMBER $file ||
 error "Archive $file failed"
 $LCTL get_param -n ${MDT_PREFIX}${mdtidx}.hsm.actions
 wait_request_state $fid ARCHIVE SUCCEED
 $LFS hsm_remove $file
 wait_request_state $fid REMOVE SUCCEED
$LFS hsm_state $file
 $LFS hsm_archive --archive $HSM_ARCHIVE_NUMBER $file ||
 error "Archive $file failed"
 $LCTL get_param -n ${MDT_PREFIX}${mdtidx}.hsm.actions
 wait_request_state $fid ARCHIVE SUCCEED
 $LFS hsm_remove $file
 wait_request_state $fid REMOVE SUCCEED
}
run_test 8 "HSM problem..."

The failure report is as follows:

== sanity-pcc test 8: Problem Finding... ============================================================= 15:58:50 (1546415930)
Starting copytool agt1 on qian
/mnt/lustre/d8.sanity-pcc/f8.sanity-pcc: (0x00000000)
lrh=[type=10680000 len=136 idx=1/1] fid=[0x200000401:0x4:0x0] dfid=[0x200000401:0x4:0x0] compound/cookie=0x0/0x5c2c6f2c action=ARCHIVE archive#=2 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]
Waiting 200 secs for update
Changed after 1s: from 'WAITING' to 'STARTED'
Updated after 2s: wanted 'SUCCEED' got 'SUCCEED'
/mnt/lustre/d8.sanity-pcc/f8.sanity-pcc: (0x00000000), archive_id:2
lrh=[type=10680000 len=136 idx=1/1] fid=[0x200000401:0x4:0x0] dfid=[0x200000401:0x4:0x0] compound/cookie=0x0/0x5c2c6f2c action=ARCHIVE archive#=2 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=SUCCEED data=[]
lrh=[type=10680000 len=136 idx=1/2] fid=[0x200000401:0x4:0x0] dfid=[0x200000401:0x4:0x0] compound/cookie=0x0/0x5c2c6f2d action=REMOVE archive#=2 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=SUCCEED data=[]
lrh=[type=10680000 len=136 idx=1/3] fid=[0x200000401:0x4:0x0] dfid=[0x200000401:0x4:0x0] compound/cookie=0x0/0x5c2c6f2e action=ARCHIVE archive#=2 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]
Waiting 200 secs for update
Changed after 1s: from 'SUCCEED
WAITING' to 'SUCCEED
SUCCEED'
Waiting 190 secs for update
Changed after 14s: from 'SUCCEED
SUCCEED' to ''
Waiting 180 secs for update
Waiting 170 secs for update
Waiting 160 secs for update
Waiting 150 secs for update
Waiting 140 secs for update
Waiting 130 secs for update
Waiting 120 secs for update
Waiting 110 secs for update
Waiting 100 secs for update
Waiting 90 secs for update
Waiting 80 secs for update
Waiting 70 secs for update
Waiting 60 secs for update
Waiting 50 secs for update
Waiting 40 secs for update
Waiting 30 secs for update
Waiting 20 secs for update
Waiting 10 secs for update
Update not seen after 200s: wanted 'SUCCEED' got ''

 


Generated at Sat Feb 10 02:47:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.