[LU-7229] Interop 2.7.60<->2.7.0 - sanity-hsm test_60: Expected progress update after at least 5 seconds Created: 29/Sep/15  Updated: 09/Sep/16  Resolved: 02/Feb/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None
Environment:

Client : master branch/ build 3194, RHEL 7
Server: b2_7 branch


Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/ea40cd80-62dd-11e5-b25a-5254006e85c2.

The sub-test test_60 failed with the following error:

Expected progress update after at least 5 seconds

Appears to be similar to LU-4643.
Encountered similar issue for interop testing for 2.7.60 tag
Client : master branch/ build 3194, RHEL 7
Server: b2_7 branch

5242880 bytes copied in 3 seconds.
sanity-hsm test_60: @@@@@@ FAIL: Expected progress update after at least 5 seconds


 Comments   
Comment by Bob Glossman (Inactive) [ 23/Dec/15 ]

another on master:
https://testing.hpdd.intel.com/test_sets/02180664-a9b8-11e5-b0df-5254006e85c2

Comment by Li Xi (Inactive) [ 30/Dec/15 ]

I hit the problem on b2_5 too.

From the log, the start time (1450902565.939239) and end time (1450902570.065967) is really on edge. That means, if time() function in copy tool rounds the second in different way with “date +%s” in the test script, this could cause some error. I'd suggest to relax the check in test script.

1450902555.749244 lhsmtool_posix[27518]: action=0 src=(null) dst=(null) mount_point=/mnt/lustre2
1450902555.753241 lhsmtool_posix[27519]: waiting for message from kernel
1450902565.907960 lhsmtool_posix[27519]: copytool fs=lustre archive#=2 item_count=1
1450902565.908222 lhsmtool_posix[27519]: waiting for message from kernel
1450902565.908308 lhsmtool_posix[27522]: '[0x200000402:0x1cc:0x0]' action ARCHIVE reclen 72, cookie=0x567b0468
1450902565.910695 lhsmtool_posix[27522]: processing file 'd60.sanity-hsm/f60.sanity-hsm'
1450902565.934773 lhsmtool_posix[27522]: archiving '/mnt/lustre2/.lustre/fid/0x200000402:0x1cc:0x0' to '/tmp/arc1/01cc/0000/0402/0000/0002/0000/0x200000402:0x1cc:0x0_tmp'
1450902565.936299 lhsmtool_posix[27522]: saving stripe info of '/mnt/lustre2/.lustre/fid/0x200000402:0x1cc:0x0' in /tmp/arc1/01cc/0000/0402/0000/0002/0000/0x200000402:0x1cc:0x0_tmp.lov
1450902565.939239 lhsmtool_posix[27522]: start copy of 39000000 bytes from '/mnt/lustre2/.lustre
^^^^^^^^^^^^^^^^^^^^^^^^
/fid/0x200000402:0x1cc:0x0' to '/tmp/arc1/01cc/0000/0402/0000/0002/0000/0x200000402:0x1cc:0x0_tmp'
1450902570.065967 lhsmtool_posix[27522]: %13
^^^^^^^^^^^^^^^^^^^^^^^^
1450902570.072808 lhsmtool_posix[27522]: bandwith control: 1048576B/s excess=1048576 sleep for 1.000000000s
1450902575.274958 lhsmtool_posix[27522]: %26
1450902575.281565 lhsmtool_posix[27522]: bandwith control: 1048576B/s excess=1048576 sleep for 1.000000000s

exiting: Terminated

Comment by Gerrit Updater [ 30/Dec/15 ]

Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/17742
Subject: LU-7229 hsm: relax time check of sanity-hsm test_60
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f6a19bc0a83dad63b7e2a5b00ddc29ff1bd83cec

Comment by Li Xi (Inactive) [ 30/Jan/16 ]

I hit similar issue on test 16:

== sanity-hsm test 16: Test CT bandwith control option == 14:26:05 (1454131565)
Purging archive on atest-vm40
Starting copytool agt1 on atest-vm40
20+0 records in
20+0 records out
20971520 bytes (21 MB) copied, 0.0172246 s, 1.2 GB/s
Changed after 0s: from '' to 'STARTED'
Waiting 100 secs for update
Waiting 90 secs for update
Updated after 18s: wanted 'SUCCEED' got 'SUCCEED'
sanity-hsm test_16: @@@@@@ FAIL: Transfer is too fast 19 < 20
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:4421:error_noexit()
= /usr/lib64/lustre/tests/test-framework.sh:4452:error()
= /usr/lib64/lustre/tests/sanity-hsm.sh:1522:test_16()
= /usr/lib64/lustre/tests/test-framework.sh:4691:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:4728:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:4594:run_test()
= /usr/lib64/lustre/tests/sanity-hsm.sh:1526:main()
Dumping lctl log to /tmp/test_logs/autotest_235/sanity-hsm/sanity-hsm.test_16.*.1454131585.log
FAIL 16 (23s)

Comment by Gerrit Updater [ 02/Feb/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17742/
Subject: LU-7229 hsm: relax time check of sanity-hsm test_60
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 792742b78f0afe16c7a23d765601b5a4a0231a9e

Comment by Joseph Gmitter (Inactive) [ 02/Feb/16 ]

Landed for 2.8

Comment by Elena Gryaznova [ 07/Jul/16 ]

> I hit similar issue on test 16:
> sanity-hsm test_16: @@@@@@ FAIL: Transfer is too fast 19 < 20

patch http://review.whamcloud.com/17742/ contains test_60 fix only.
Is other ticket failed for test_16 failure?

Thanks.

Comment by Elena Gryaznova [ 07/Jul/16 ]

sorry for typo: s/ticket failed/ticket filled/

Generated at Sat Feb 10 02:07:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.