Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12706

sanity-quota test_4a: FAIL: Passed grace time 20, 1566910527, 1566910563

Details

    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/619381e2-c8e8-11e9-90ad-52540065bddc

      test_4a failed with the following error:

      Passed grace time 20, 1566910527, 1566910563
      

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-quota test_4a - Passed grace time 20, 1566910527, 1566910563

      Attachments

        Issue Links

          Activity

            [LU-12706] sanity-quota test_4a: FAIL: Passed grace time 20, 1566910527, 1566910563
            yujian Jian Yu added a comment - Lustre 2.15.6 RC1: https://testing.whamcloud.com/test_sets/e16070a3-5dcb-4939-95ee-f459ee1ed5d3

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56910/
            Subject: LU-12706 tests: sanity-quota 4a sync timeout fix
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: 9b8ca98c0cb9b61266fe0bd864dc264cbb08a3fa

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56910/ Subject: LU-12706 tests: sanity-quota 4a sync timeout fix Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: 9b8ca98c0cb9b61266fe0bd864dc264cbb08a3fa

            "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56910
            Subject: LU-12706 tests: sanity-quota 4a sync timeout fix
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: 7b5ff36b3ddf8e1e0fbdf6940a493cc7bdb736b5

            gerrit Gerrit Updater added a comment - "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56910 Subject: LU-12706 tests: sanity-quota 4a sync timeout fix Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 7b5ff36b3ddf8e1e0fbdf6940a493cc7bdb736b5
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55216/
            Subject: LU-12706 tests: sanity-quota 4a sync timeout fix
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 9e7b239bbd26b601127073bb0c6789cb9def7073

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55216/ Subject: LU-12706 tests: sanity-quota 4a sync timeout fix Project: fs/lustre-release Branch: master Current Patch Set: Commit: 9e7b239bbd26b601127073bb0c6789cb9def7073

            Since it is a timeout-related issue, it is entirely possible that it is just a victim of running on a VM with a busy host or network. Possibly increasing the timeout slightly would avoid this issue.

            In several failures I saw the timeout was more then 40 seconds. I think  it is too long especially when sync_all_data is called 4 times per test_file_soft. Plus test_file_soft is called 3 times for user, group and project, i.e. 12 times. Instead of increasing timeout, I would propose to set striping on a test direcotry to create objects on 1 OST and call sync only on that OST. In failed tests I saw there were 8 OSTs. So this might help.

            scherementsev Sergey Cheremencev added a comment - Since it is a timeout-related issue, it is entirely possible that it is just a victim of running on a VM with a busy host or network. Possibly increasing the timeout slightly would avoid this issue. In several failures I saw the timeout was more then 40 seconds. I think  it is too long especially when sync_all_data is called 4 times per test_file_soft. Plus test_file_soft is called 3 times for user, group and project, i.e. 12 times. Instead of increasing timeout, I would propose to set striping on a test direcotry to create objects on 1 OST and call sync only on that OST. In failed tests I saw there were 8 OSTs. So this might help.

            "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55216
            Subject: LU-12706 tests: sanity-quota 4a sync timeout fix
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c403027907bfa55794deb47cdb84bb220aeaa419

            gerrit Gerrit Updater added a comment - "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55216 Subject: LU-12706 tests: sanity-quota 4a sync timeout fix Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c403027907bfa55794deb47cdb84bb220aeaa419

            This has hit 17 times in the past 6 months across all branches, about 1/week in the last 3 months:
            https://testing.whamcloud.com/search?horizon=15552000&query_bugs=LU-12706&status%5B%5D=FAIL&test_set_script_id=61149410-4a46-11e0-a7f6-52540025f9af&sub_test_script_id=774b419c-51c7-11e0-bb3d-52540025f9af&source=sub_tests#redirect

            Since it is a timeout-related issue, it is entirely possible that it is just a victim of running on a VM with a busy host or network. Possibly increasing the timeout slightly would avoid this issue.

            adilger Andreas Dilger added a comment - This has hit 17 times in the past 6 months across all branches, about 1/week in the last 3 months: https://testing.whamcloud.com/search?horizon=15552000&query_bugs=LU-12706&status%5B%5D=FAIL&test_set_script_id=61149410-4a46-11e0-a7f6-52540025f9af&sub_test_script_id=774b419c-51c7-11e0-bb3d-52540025f9af&source=sub_tests#redirect Since it is a timeout-related issue, it is entirely possible that it is just a victim of running on a VM with a busy host or network. Possibly increasing the timeout slightly would avoid this issue.
            eaujames Etienne Aujames added a comment - +1 on b2_15: https://testing.whamcloud.com/test_sets/b22ad252-c04f-4509-b051-ad0d1f2151c7
            adilger Andreas Dilger added a comment - +1 on master: https://testing.whamcloud.com/test_sets/7a39d9e5-6217-4747-a35f-a140cc8e9fbf
            tappro Mikhail Pershin added a comment - +1 on master https://testing.whamcloud.com/test_sessions/5cff7b51-2d77-4a4c-a772-fddc7ffb208a  

            People

              scherementsev Sergey Cheremencev
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: