Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-994

1.8<->2.1.54 Test failure on test suite replay-single 62

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • None
    • Lustre 2.2.0
    • None
    • 3
    • 6488

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/c460154a-3e73-11e1-b417-5254004bbbd3.

      Attachments

        Issue Links

          Activity

            [LU-994] 1.8<->2.1.54 Test failure on test suite replay-single 62
            pjones Peter Jones added a comment -

            As per Oleg ok to close - this appears because the version of 1.8.x being tested with does not contain this fix

            pjones Peter Jones added a comment - As per Oleg ok to close - this appears because the version of 1.8.x being tested with does not contain this fix
            pjones Peter Jones added a comment - Yes. http://git.whamcloud.com/?p=fs/lustre-release.git;a=commit;h=8d4b77e5961c06847f9603ebc607118742ea1a51
            bobijam Zhenyu Xu added a comment -

            it looks like a recovery timeout extension issue, and LU-1036 also involve the same issue. LU-889 reworks the recovery timeout extension logic.

            Does the rehit in RC1 contains patch of LU-889?

            bobijam Zhenyu Xu added a comment - it looks like a recovery timeout extension issue, and LU-1036 also involve the same issue. LU-889 reworks the recovery timeout extension logic. Does the rehit in RC1 contains patch of LU-889 ?
            pjones Peter Jones added a comment -

            Are you sure that this is a duplicate of LU1036? It seems to relate to a different test # (62 rather than 52) and seems to still be occurring in RC1...

            pjones Peter Jones added a comment - Are you sure that this is a duplicate of LU1036? It seems to relate to a different test # (62 rather than 52) and seems to still be occurring in RC1...
            bobijam Zhenyu Xu added a comment -

            dup of LU-1036

            bobijam Zhenyu Xu added a comment - dup of LU-1036
            sarah Sarah Liu added a comment - - edited

            I use default value for both obd_timeout and ldlm_timeout. So I guess they accord? Yes, AT is enabled on client side. TIMEOUT in my config file is also set to 20.

            sarah Sarah Liu added a comment - - edited I use default value for both obd_timeout and ldlm_timeout. So I guess they accord? Yes, AT is enabled on client side. TIMEOUT in my config file is also set to 20.
            bobijam Zhenyu Xu added a comment -

            I cannot reproduce it on my vm machines.

            Sarah, would you mind checking what timeout values used on server and client, do they accord? Does AT (adaptive timeout) opened on client side?

            bobijam Zhenyu Xu added a comment - I cannot reproduce it on my vm machines. Sarah, would you mind checking what timeout values used on server and client, do they accord? Does AT (adaptive timeout) opened on client side?
            bobijam Zhenyu Xu added a comment -

            Just for the record.

            MDS recoery start at 1326480370.284875, setting recovery windows in 60 seconds.

            Client replay req x1390912056905666/t304942678018 (LDLM_ENQUEUE) at 1326480370.283516, and its timeout is 62 seconds

            At 1326480430.284992 MDS closed its recovery window, while the replay req timedout at 1326480432.535461, so the client was evicted and failed the test.

            The failure due to that the replay req's timeout time is over the MDS recovery window.

            bobijam Zhenyu Xu added a comment - Just for the record. MDS recoery start at 1326480370.284875, setting recovery windows in 60 seconds. Client replay req x1390912056905666/t304942678018 (LDLM_ENQUEUE) at 1326480370.283516, and its timeout is 62 seconds At 1326480430.284992 MDS closed its recovery window, while the replay req timedout at 1326480432.535461, so the client was evicted and failed the test. The failure due to that the replay req's timeout time is over the MDS recovery window.
            pjones Peter Jones added a comment -

            Bobi

            Could you look into this one please?

            Thanks

            Peter

            pjones Peter Jones added a comment - Bobi Could you look into this one please? Thanks Peter

            People

              bobijam Zhenyu Xu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: