[LU-9721] replay-dual test_20: recovery time 350 >= 1.5x original time 169 Created: 29/Jun/17  Updated: 21/Mar/22  Resolved: 11/Sep/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/ca1b1b48-5ccb-11e7-a74a-5254006e85c2.

The sub-test test_20 failed with the following error:

recovery time 350 >= 1.5x original time 169

server and client: b2_10 #2 tag-rc1 DNE ZFS

test log

Started lustre-MDT0000
CMD: trevis-19vm1.trevis.hpdd.intel.com,trevis-19vm2 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid 
trevis-19vm2: h2tcp: deprecated, use h2nettype instead
trevis-19vm1: h2tcp: deprecated, use h2nettype instead
trevis-19vm2: trevis-19vm2.trevis.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
trevis-19vm1: trevis-19vm1.trevis.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
trevis-19vm2: CMD: trevis-19vm2.trevis.hpdd.intel.com lctl get_param -n at_max
trevis-19vm1: CMD: trevis-19vm1.trevis.hpdd.intel.com lctl get_param -n at_max
trevis-19vm2: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 139 sec
trevis-19vm1: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 139 sec
Starting client: trevis-19vm1.trevis.hpdd.intel.com:  -o user_xattr,flock trevis-19vm4@tcp:/lustre /mnt/lustre2
CMD: trevis-19vm1.trevis.hpdd.intel.com mkdir -p /mnt/lustre2
CMD: trevis-19vm1.trevis.hpdd.intel.com mount -t lustre -o user_xattr,flock trevis-19vm4@tcp:/lustre /mnt/lustre2
 replay-dual test_20: @@@@@@ FAIL: recovery time 350 >= 1.5x original time 169 


 Comments   
Comment by Sarah Liu [ 29/Jun/17 ]

I have searched Maloo for similar failure and found most of them only seen in 2016 during 2.9 testing with ldiskfs. For zfs, this seems the first time hit.

Comment by Andreas Dilger [ 21/Mar/22 ]

+1 on master https://testing.whamcloud.com/test_sets/5e64d041-dec6-412a-aacd-a6d71502ae23

Generated at Sat Feb 10 02:28:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.