Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1745

Test failure on test suite recovery-small, subtest test_105

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.3.0
    • Lustre 2.2.0, Lustre 2.3.0, Lustre 2.1.2
    • None
    • 3
    • 4489

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/822950b2-e650-11e1-afac-52540035b04c.

      The sub-test test_105 failed with the following error:

      IR state must be OFF at client-2

      == recovery-small test 105: IR: NON IR clients support == 22:47:10 (1344923230)
      mgs.MGS.ir_timeout
      Stopping client client-2 /mnt/lustre (opts:)
      Starting client: client-2: -o flock,user_xattr,acl,noir fat-amd-1@tcp:/lustre /mnt/lustre
       recovery-small test_105: @@@@@@ FAIL: IR state must be OFF at client-2 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:3614:error_noexit()
        = /usr/lib64/lustre/tests/test-framework.sh:3636:error()
        = /usr/lib64/lustre/tests/recovery-small.sh:1446:test_105()
        = /usr/lib64/lustre/tests/test-framework.sh:3869:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:3898:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:3772:run_test()
        = /usr/lib64/lustre/tests/recovery-small.sh:1477:main()
      

      I checked the IR state on client-2 and it's enabled
      [root@client-2 ~]# cat /proc/fs/lustre/mgc/MGC10.10.4.132@tcp/ir_state
      imperative_recovery: ENABLED
      client_state:

      • { client: lustre-client, nidtbl_version: 16 }

      Attachments

        Issue Links

          Activity

            [LU-1745] Test failure on test suite recovery-small, subtest test_105
            pjones Peter Jones added a comment -

            Landed for 2.3

            pjones Peter Jones added a comment - Landed for 2.3

            James, Thanks for the patch to b2_2. That branch is closed for update right now, so we won't be landing that right away.

            This is not an issue for inter-op between 2.1 and current, as the changes are all in subtests that didn't exist in the 2.1 version of recovery-small.sh
            No need for a b2_1 patch.

            bogl Bob Glossman (Inactive) added a comment - James, Thanks for the patch to b2_2. That branch is closed for update right now, so we won't be landing that right away. This is not an issue for inter-op between 2.1 and current, as the changes are all in subtests that didn't exist in the 2.1 version of recovery-small.sh No need for a b2_1 patch.
            pjones Peter Jones added a comment -

            Bob will take care of this one

            pjones Peter Jones added a comment - Bob will take care of this one

            Patch for b2_2 to pass inter-op test. http://review.whamcloud.com/#change,3698

            simmonsja James A Simmons added a comment - Patch for b2_2 to pass inter-op test. http://review.whamcloud.com/#change,3698

            Doh! I see where code was not updated in master. Patch is at http://review.whamcloud.com/#change,3667

            simmonsja James A Simmons added a comment - Doh! I see where code was not updated in master. Patch is at http://review.whamcloud.com/#change,3667

            Looking at the above log shows that recovery-small.sh was not updated. With current master the error output will always be ENABLED/DISABLED. I bet if you do a diff between the test recovery-small.sh and the one in the git repo will show them out of sync. Andreas is right about needing a patch for b2_2. I will wipe up a patch for you.

            simmonsja James A Simmons added a comment - Looking at the above log shows that recovery-small.sh was not updated. With current master the error output will always be ENABLED/DISABLED. I bet if you do a diff between the test recovery-small.sh and the one in the git repo will show them out of sync. Andreas is right about needing a patch for b2_2. I will wipe up a patch for you.

            If that is the case then how did it pass maloo before. Something is strange here.

            simmonsja James A Simmons added a comment - If that is the case then how did it pass maloo before. Something is strange here.
            sarah Sarah Liu added a comment -

            Both client-1 and client-2 are running master which contain this commit

            sarah Sarah Liu added a comment - Both client-1 and client-2 are running master which contain this commit

            It isn't clear from the comments what Lustre version the client2 node is running. If one node is running master, but the other is running a versions without LU-1095 (http://review.whamcloud.com/2853) applied, the I think it would cause this problem.

            What is suspicious is that it reports that the IR state should be "OFF" instead of "DISABLED" as was introduced with the new patch. What is needed here is for the LU-1095 patch to be landed on b2_2 as well so that interop tests can pass.

            adilger Andreas Dilger added a comment - It isn't clear from the comments what Lustre version the client2 node is running. If one node is running master, but the other is running a versions without LU-1095 ( http://review.whamcloud.com/2853 ) applied, the I think it would cause this problem. What is suspicious is that it reports that the IR state should be "OFF" instead of "DISABLED" as was introduced with the new patch. What is needed here is for the LU-1095 patch to be landed on b2_2 as well so that interop tests can pass.

            People

              bogl Bob Glossman (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: