Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7812

sanity test_120e: 1 blocking RPC occured

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for wangshilong <wshilong@ddn.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/6ce8d47a-db1b-11e5-877a-5254006e85c2.

      The sub-test test_120e failed with the following error:

      1 blocking RPC occured.
      

      Please provide additional information about the failure here.

      Info required for matching: sanity 120e

      Attachments

        Issue Links

          Activity

            [LU-7812] sanity test_120e: 1 blocking RPC occured
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24811/
            Subject: LU-7812 tests: address race condition for sanity:120

            {e,f}

            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e240fb5099af8e62c532d314317095800ebb6864

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24811/ Subject: LU-7812 tests: address race condition for sanity:120 {e,f} Project: fs/lustre-release Branch: master Current Patch Set: Commit: e240fb5099af8e62c532d314317095800ebb6864

            Yes, I think so. We can merge these two tickets.

            jay Jinshan Xiong (Inactive) added a comment - Yes, I think so. We can merge these two tickets.

            There is a very similar looking fail in test_120f. see LU-7889. Does it need a similar test fix?

            bogl Bob Glossman (Inactive) added a comment - There is a very similar looking fail in test_120f. see LU-7889 . Does it need a similar test fix?

            Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/24811
            Subject: LU-7812 tests: add a race condition for sanity:120e
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f4a1c85cb6353e057ea562a1094781b1924a9050

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/24811 Subject: LU-7812 tests: add a race condition for sanity:120e Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f4a1c85cb6353e057ea562a1094781b1924a9050

            We need to find out why cancel_lru_locks didn't actually drop this lock. ...

            cancel_lru_locks osc cancels unused locks which are the locks with zero l_readers and l_writers. There exists a race window that stat $DIR/$tdir $DIR/$tdir/f1 > /dev/null has completed but ldlm callback thread is still held the glimpse locks when cancel_lru_locks osc is being called, therefore the glimpse locks are not supposed to be 'unused' and miss the cancellation.

            THis problem can be fixed by adding some delay between cancel_lru_locks osc and stat $DIR/$tdir $DIR/$tdir/f1 > /dev/null.

            jay Jinshan Xiong (Inactive) added a comment - We need to find out why cancel_lru_locks didn't actually drop this lock. ... cancel_lru_locks osc cancels unused locks which are the locks with zero l_readers and l_writers . There exists a race window that stat $DIR/$tdir $DIR/$tdir/f1 > /dev/null has completed but ldlm callback thread is still held the glimpse locks when cancel_lru_locks osc is being called, therefore the glimpse locks are not supposed to be 'unused' and miss the cancellation. THis problem can be fixed by adding some delay between cancel_lru_locks osc and stat $DIR/$tdir $DIR/$tdir/f1 > /dev/null .
            bogl Bob Glossman (Inactive) added a comment - - edited more on master: https://testing.hpdd.intel.com/test_sets/114e1984-cece-11e6-af6a-5254006e85c2 https://testing.hpdd.intel.com/test_sets/edf384e2-d0b2-11e6-bbdd-5254006e85c2 https://testing.hpdd.intel.com/test_sets/93fba206-d163-11e6-bbdd-5254006e85c2 https://testing.hpdd.intel.com/test_sets/9c3a97e2-d69f-11e6-b630-5254006e85c2 https://testing.hpdd.intel.com/test_sets/187943b6-d6bf-11e6-bb30-5254006e85c2 https://testing.hpdd.intel.com/test_sets/8a1d30ea-d773-11e6-923b-5254006e85c2
            bogl Bob Glossman (Inactive) added a comment - - edited more on master: https://testing.hpdd.intel.com/test_sets/59612658-c668-11e6-8cb7-5254006e85c2 https://testing.hpdd.intel.com/test_sets/7457d9e4-c897-11e6-8a5b-5254006e85c2
            bogl Bob Glossman (Inactive) added a comment - - edited more on master, sles12sp1 https://testing.hpdd.intel.com/test_sets/ad4ab7c4-b6d8-11e6-a559-5254006e85c2 https://testing.hpdd.intel.com/test_sets/c864a7d0-b729-11e6-a559-5254006e85c2 https://testing.hpdd.intel.com/test_sets/a0975c86-b759-11e6-a559-5254006e85c2 this may be a 100% fail on sles12 too
            bogl Bob Glossman (Inactive) added a comment - - edited

            more on master, sles11sp4 client and server:
            https://testing.hpdd.intel.com/test_sets/fe524e90-b5af-11e6-a223-5254006e85c2
            https://testing.hpdd.intel.com/test_sets/48ea6166-b5c1-11e6-a223-5254006e85c2

            I think this may be a 100% fail on sles11sp4.
            raising this ticket to Blocker

            bogl Bob Glossman (Inactive) added a comment - - edited more on master, sles11sp4 client and server: https://testing.hpdd.intel.com/test_sets/fe524e90-b5af-11e6-a223-5254006e85c2 https://testing.hpdd.intel.com/test_sets/48ea6166-b5c1-11e6-a223-5254006e85c2 I think this may be a 100% fail on sles11sp4. raising this ticket to Blocker

            People

              green Oleg Drokin
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: