Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15086

replay-dual: test_10: @@@@@@ FAIL: test_10 failed with 2

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Vladimir Saveliev <vlaidimir.saveliev@hpe.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/2f1edc30-7ddf-4a6c-bc7a-fffbf58435b8

      This is relatively regular failure:

      Error: 'test_10 failed with 2' 
      Failure Rate: 6.00% of most recent 100 runs, 0 skipped (all branches)
      

      Attachments

        Activity

          [LU-15086] replay-dual: test_10: @@@@@@ FAIL: test_10 failed with 2
          pjones Peter Jones added a comment -

          Landed for 2.15

          pjones Peter Jones added a comment - Landed for 2.15

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45308/
          Subject: LU-15086 ptlrpc: fix timeout after spurious wakeup
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: b8383035406a4b7bee2e6d8674eaef480b3e3b35

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45308/ Subject: LU-15086 ptlrpc: fix timeout after spurious wakeup Project: fs/lustre-release Branch: master Current Patch Set: Commit: b8383035406a4b7bee2e6d8674eaef480b3e3b35

          with the patch above I can't reproduce the problem. would like to hear Neil's opinion.

          bzzz Alex Zhuravlev added a comment - with the patch above I can't reproduce the problem. would like to hear Neil's opinion.
          pjones Peter Jones added a comment -

          Would we be better off reverting LU-12362 from master?

          pjones Peter Jones added a comment - Would we be better off reverting LU-12362 from master?

          "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45308
          Subject: LU-15086 ptlrpc: fix timeout after spurious wakeup
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: a495c00874105caa299fdf4dfaa482df9b24ad2e

          gerrit Gerrit Updater added a comment - "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45308 Subject: LU-15086 ptlrpc: fix timeout after spurious wakeup Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a495c00874105caa299fdf4dfaa482df9b24ad2e
          bzzz Alex Zhuravlev added a comment - - edited

          added a simple check to pltrpcd to catch too long waiting:

          LustreError: 5019:0:(ptlrpcd.c:515:ptlrpcd()) ASSERTION( timeout == 0 || end - start < timeout + 4 ) failed: timeout 10, end 97, start 83, diff 14, count 3

          i.e. 10 seconds wait was requested, but actually spent 14 seconds waiting, wait_woken() was called 3 times.

          yet another interesting example from failed replay-dual:

          LustreError: 5009:0:(ptlrpcd.c:513:ptlrpcd()) timeout 19, end 711, start 620, diff 91, count 9
          bzzz Alex Zhuravlev added a comment - - edited added a simple check to pltrpcd to catch too long waiting: LustreError: 5019:0:(ptlrpcd.c:515:ptlrpcd()) ASSERTION( timeout == 0 || end - start < timeout + 4 ) failed: timeout 10, end 97, start 83, diff 14, count 3 i.e. 10 seconds wait was requested, but actually spent 14 seconds waiting, wait_woken() was called 3 times. yet another interesting example from failed replay-dual: LustreError: 5009:0:(ptlrpcd.c:513:ptlrpcd()) timeout 19, end 711, start 620, diff 91, count 9

          People

            bzzz Alex Zhuravlev
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: