Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10443

sanity - test_255c: Ladvise test 13, bad lock count, returned 100, actual 0

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.11.0
    • Lustre 2.11.0
    • None
    • Rolling Upgrade/Downgrade
      Servers/Clients = 2.10.56_62 lustre version
      ldiskfs
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

      This issue relates to the following test suite run:

      https://testing.hpdd.intel.com/test_sets/515b6450-ebfb-11e7-8c23-52540065bddc

      This issue occurred while performing rolling upgrade/downgrade testing for tag 2.10.56_62.
      Both the servers and clients are 2.10.56_62 lustre versions. Hence both must support Lockahead.

      test logs:

      == sanity test 255c: suite of ladvise lockahead tests ================================================ 07:51:57 (1514361117)
      Starting test test10 at 1514361118
      Finishing test test10 at 1514361118
      Starting test test11 at 1514361118
      Finishing test test11 at 1514361118
      Starting test test12 at 1514361118
      Finishing test test12 at 1514361119
      Starting test test13 at 1514361119
      Finishing test test13 at 1514361119
       sanity test_255c: @@@@@@ FAIL: Ladvise test 13, bad lock count, returned  100, actual 0 
      

      Might be related to LU-10136 and LU-10104

      Attachments

        Issue Links

          Activity

            [LU-10443] sanity - test_255c: Ladvise test 13, bad lock count, returned 100, actual 0
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31254/
            Subject: LU-10443 test: Handle file lifecycle correctly
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e528677e1630093362394ae36d725c321d0da4f2

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31254/ Subject: LU-10443 test: Handle file lifecycle correctly Project: fs/lustre-release Branch: master Current Patch Set: Commit: e528677e1630093362394ae36d725c321d0da4f2
            pjones Peter Jones added a comment -

            Thanks Patrick! Good to know that this is not something that would affect those using the feature for real.

            pjones Peter Jones added a comment - Thanks Patrick! Good to know that this is not something that would affect those using the feature for real.

            This bug is not specific to interop. Just a question of timing. Patch should resolve.

            paf Patrick Farrell (Inactive) added a comment - This bug is not specific to interop. Just a question of timing. Patch should resolve.

            Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/31254
            Subject: LU-10443 test: Handle file lifecycle correctly
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 286e0f7839267b8bf4119ce084427ffeb20455f6

            gerrit Gerrit Updater added a comment - Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/31254 Subject: LU-10443 test: Handle file lifecycle correctly Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 286e0f7839267b8bf4119ce084427ffeb20455f6

            ... woah. Well, this test should never pass and neither should any of the others, really. We unlink the file and check the lock count after that. I guess we're just consistently winning the race.

            I'll get a patch generated. This is a bug in the test, if that affects urgency.

            paf Patrick Farrell (Inactive) added a comment - ... woah. Well, this test should never pass and neither should any of the others, really. We unlink the file and check the lock count after that. I guess we're just consistently winning the race. I'll get a patch generated. This is a bug in the test, if that affects urgency.

            Tomorrow or early next week, sorry, I didn't realize this was urgent (passed on to me from the LWG today). I can try to reproduce the procedure described.

            So to be clear, sanity was run with everything at 2.10.2, OSSes were unmounted and upgraded - I assume the file system remains up for this, this is a failover type scenario? Then sanity run again. Then the same for the MDS, followed by sanity.

            And then finally, clients were all unmounted and upgraded, then remounted, followed by sanity, which had this failure?

            I can replicate most of this, but I strongly suspect I won't hit this issue. Lockahead creates no on disk state, only LDLM state which should be destroyed well, A) when the file is deleted, and B) when clients are unmounted and remounted. I think it's way more likely we hit some other rare issue with the test rather than this being upgrade related.

            First I'll dig through the logs and see if I can find anything.

            paf Patrick Farrell (Inactive) added a comment - - edited Tomorrow or early next week, sorry, I didn't realize this was urgent (passed on to me from the LWG today). I can try to reproduce the procedure described. So to be clear, sanity was run with everything at 2.10.2, OSSes were unmounted and upgraded - I assume the file system remains up for this, this is a failover type scenario? Then sanity run again. Then the same for the MDS, followed by sanity. And then finally, clients were all unmounted and upgraded, then remounted, followed by sanity, which had this failure? I can replicate most of this, but I strongly suspect I won't hit this issue. Lockahead creates no on disk state, only LDLM state which should be destroyed well, A) when the file is deleted, and B) when clients are unmounted and remounted. I think it's way more likely we hit some other rare issue with the test rather than this being upgrade related. First I'll dig through the logs and see if I can find anything.
            pjones Peter Jones added a comment -

            paf when do you expect to have a chance to get to this?

            pjones Peter Jones added a comment - paf when do you expect to have a chance to get to this?

            Steps followed for Rolling Upgrade testing:
            1. Setup Lustre with clients and servers both having 2.10.2 GA version.
            2. Upgrade OSS to 2.10.56_62 , Ran Sanity.sh
            3. Upgrade MDS to 2.10.56_62 , Ran Sanity.sh
            4. Upgrade Clients to 2.10.56_62, Ran Sanity.sh and hit this issue.

            standan Saurabh Tandan (Inactive) added a comment - Steps followed for Rolling Upgrade testing: 1. Setup Lustre with clients and servers both having 2.10.2 GA version. 2. Upgrade OSS to 2.10.56_62 , Ran Sanity.sh 3. Upgrade MDS to 2.10.56_62 , Ran Sanity.sh 4. Upgrade Clients to 2.10.56_62, Ran Sanity.sh and hit this issue.

            Sure, will take a look.

            paf Patrick Farrell (Inactive) added a comment - Sure, will take a look.

            People

              paf Patrick Farrell (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: