[LU-5902] replay-dual test_20: FAIL: recovery time is growing 215 > 107 Created: 11/Nov/14  Updated: 01/Dec/14  Resolved: 01/Dec/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.5.4
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Jian Yu Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/100/
Distro/Arch: RHEL6.5/x86_64
MDSCOUNT=2


Issue Links:
Related
is related to LU-5079 conf-sanity test_47 timeout Resolved
Severity: 3
Rank (Obsolete): 16488

 Description   

replay-dual test_20 failed as follows:

Starting client: onyx-31vm6.onyx.hpdd.intel.com: -o user_xattr,flock onyx-31vm3@tcp:/lustre /mnt/lustre2
CMD: onyx-31vm6.onyx.hpdd.intel.com mkdir -p /mnt/lustre2
CMD: onyx-31vm6.onyx.hpdd.intel.com mount -t lustre -o user_xattr,flock onyx-31vm3@tcp:/lustre /mnt/lustre2
 replay-dual test_20: @@@@@@ FAIL: recovery time is growing 215 > 107 

Maloo report: https://testing.hpdd.intel.com/test_sets/c448fd34-68a4-11e4-a63a-5254006e85c2



 Comments   
Comment by Jian Yu [ 11/Nov/14 ]

This is a regression failure introduced by Lustre b2_5 build #100.

Here is a for-test-only patch trying to reproduce the failure on Lustre b2_5 build #100: http://review.whamcloud.com/12669

Comment by Jian Yu [ 12/Nov/14 ]

The same regression failure also occurred on master branch:
https://testing.hpdd.intel.com/test_sets/70704a84-6721-11e4-987b-5254006e85c2

Comment by Jian Yu [ 12/Nov/14 ]

It was the patches http://review.whamcloud.com/11213 (master) and http://review.whamcloud.com/12365 (b2_5) for LU-5079 that caused the regressions.

Comment by Jian Yu [ 14/Nov/14 ]

Hi Tappro,

I saw that replay-dual test 20 was added by you:

commit e94350fb29ff57e72de8b03aebcabafb56b2b722
Author: tappro <tappro>
Date:   Mon Nov 3 13:34:52 2008 +0000

    - fix recovery time growing
      b:16389
      i:rread,nathan

The comparison codes in the test are as follows:

    [ $TIER2 -ge $((TIER1 * 2)) ] && \
        error "recovery time is growing $TIER2 > $TIER1"

While the current error messages are:

recovery time is growing 208 > 102
recovery time is growing 216 > 106
recovery time is growing 181 > 90
recovery time is growing 207 > 103
recovery time is growing 247 > 122
recovery time is growing 217 > 106
recovery time is growing 208 > 102
recovery time is growing 177 > 88
recovery time is growing 218 > 109
recovery time is growing 214 > 103
recovery time is growing 215 > 107
recovery time is growing 217 > 108
recovery time is growing 180 > 90

Was there a specific rule to define $((TIER1 * 2)) ? Can we increase this value?

Comment by Peter Jones [ 01/Dec/14 ]

As per Yu Jian this can be closed as a duplicate of LU-5079

Generated at Sat Feb 10 01:55:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.