[LU-1745] Test failure on test suite recovery-small, subtest test_105 Created: 14/Aug/12  Updated: 17/Sep/12  Resolved: 17/Aug/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0, Lustre 2.3.0, Lustre 2.1.2
Fix Version/s: Lustre 2.3.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Bob Glossman (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Server: lustre-master-tag-2.2.92-RHEL6
Client: 2.6.38-fc15
[root@client-1 tests]# uname -a
Linux client-1.lab.whamcloud.com 2.6.38.6-rc1 #1 SMP Fri Aug 10 17:29:30 PDT 2012 x86_64 x86_64 x86_64 GNU/Linux


Issue Links:
Duplicate
is duplicated by LU-1967 2.2<->2.3 Test failure on test suite ... Resolved
Related
is related to LU-1095 Console message cleanup Reopened
Severity: 3
Rank (Obsolete): 4489

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/822950b2-e650-11e1-afac-52540035b04c.

The sub-test test_105 failed with the following error:

IR state must be OFF at client-2

== recovery-small test 105: IR: NON IR clients support == 22:47:10 (1344923230)
mgs.MGS.ir_timeout
Stopping client client-2 /mnt/lustre (opts:)
Starting client: client-2: -o flock,user_xattr,acl,noir fat-amd-1@tcp:/lustre /mnt/lustre
 recovery-small test_105: @@@@@@ FAIL: IR state must be OFF at client-2 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:3614:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:3636:error()
  = /usr/lib64/lustre/tests/recovery-small.sh:1446:test_105()
  = /usr/lib64/lustre/tests/test-framework.sh:3869:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:3898:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:3772:run_test()
  = /usr/lib64/lustre/tests/recovery-small.sh:1477:main()

I checked the IR state on client-2 and it's enabled
[root@client-2 ~]# cat /proc/fs/lustre/mgc/MGC10.10.4.132@tcp/ir_state
imperative_recovery: ENABLED
client_state:

  • { client: lustre-client, nidtbl_version: 16 }


 Comments   
Comment by Andreas Dilger [ 15/Aug/12 ]

It isn't clear from the comments what Lustre version the client2 node is running. If one node is running master, but the other is running a versions without LU-1095 (http://review.whamcloud.com/2853) applied, the I think it would cause this problem.

What is suspicious is that it reports that the IR state should be "OFF" instead of "DISABLED" as was introduced with the new patch. What is needed here is for the LU-1095 patch to be landed on b2_2 as well so that interop tests can pass.

Comment by Sarah Liu [ 15/Aug/12 ]

Both client-1 and client-2 are running master which contain this commit

Comment by James A Simmons [ 15/Aug/12 ]

If that is the case then how did it pass maloo before. Something is strange here.

Comment by James A Simmons [ 15/Aug/12 ]

Looking at the above log shows that recovery-small.sh was not updated. With current master the error output will always be ENABLED/DISABLED. I bet if you do a diff between the test recovery-small.sh and the one in the git repo will show them out of sync. Andreas is right about needing a patch for b2_2. I will wipe up a patch for you.

Comment by James A Simmons [ 15/Aug/12 ]

Doh! I see where code was not updated in master. Patch is at http://review.whamcloud.com/#change,3667

Comment by James A Simmons [ 16/Aug/12 ]

Patch for b2_2 to pass inter-op test. http://review.whamcloud.com/#change,3698

Comment by Peter Jones [ 16/Aug/12 ]

Bob will take care of this one

Comment by Bob Glossman (Inactive) [ 16/Aug/12 ]

James, Thanks for the patch to b2_2. That branch is closed for update right now, so we won't be landing that right away.

This is not an issue for inter-op between 2.1 and current, as the changes are all in subtests that didn't exist in the 2.1 version of recovery-small.sh
No need for a b2_1 patch.

Comment by Peter Jones [ 17/Aug/12 ]

Landed for 2.3

Generated at Sat Feb 10 01:19:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.