Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11170

sanity test 415 fails with 'rename took N > M sec'

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0
    • Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.6, Lustre 2.15.0
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for James Nunez <james.a.nunez@intel.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/98efeea0-7f0b-11e8-8fe6-52540065bddc

      sanity test 415 fails for DNE with ZFS with the error

      total: 500 open/close in 0.87 seconds: 572.35 ops/second
      rename 500 files took 283 sec
       sanity test_415: @@@@@@ FAIL: rename took 283 sec 
      

      So far, this test only fails for ZFS.

      This test started failing on 2018-07-03 with logs at https://testing.whamcloud.com/test_sets/98efeea0-7f0b-11e8-8fe6-52540065bddc

      Other test failures at
      https://testing.whamcloud.com/test_sets/8de6a208-8945-11e8-9028-52540065bddc
      https://testing.whamcloud.com/test_sets/9bb5e252-8e9c-11e8-b0aa-52540065bddc
      https://testing.whamcloud.com/test_sets/029c73ea-8e9e-11e8-87f3-52540065bddc

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_415 - rename took 283 sec
      sanity test_415 - rename took 154 > 125 sec

      Attachments

        Issue Links

          Activity

            [LU-11170] sanity test 415 fails with 'rename took N > M sec'
            ssmirnov Serguei Smirnov added a comment - +1 on master: https://testing.whamcloud.com/test_sets/d8e1c14f-849d-46a5-b422-c77955590561

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49724/
            Subject: LU-11170 tests: add debugging to sanity/415
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 6594babc73851fab335c514cd1fee018425e7bb3

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49724/ Subject: LU-11170 tests: add debugging to sanity/415 Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6594babc73851fab335c514cd1fee018425e7bb3
            qian_wc Qian Yingjin added a comment - +1 on master: https://testing.whamcloud.com/test_sets/c37adf87-d38a-4a38-afa4-effadab56dd5
            scherementsev Sergey Cheremencev added a comment - +1 master https://testing.whamcloud.com/test_sets/55564f7c-ab54-4536-a2e5-1ec9777a2363

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49724
            Subject: LU-11170 tests: add debugging to sanity/415
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e33e8906febd161a89bb190648e74dcc3b7414cb

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49724 Subject: LU-11170 tests: add debugging to sanity/415 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e33e8906febd161a89bb190648e74dcc3b7414cb

            Still hitting this about 1-3% of runs during normal testing. Failures usually complete within 140s, one as long as 269s.

            The fact that ZFS is also very slow means that the failures may actually be caused by a regression in the COS code, because commits on ZFS can take a long time. I'm going to push a patch to improve the debugability of the patch, hopefully we can identify the source of the problem.

            adilger Andreas Dilger added a comment - Still hitting this about 1-3% of runs during normal testing. Failures usually complete within 140s, one as long as 269s. The fact that ZFS is also very slow means that the failures may actually be caused by a regression in the COS code, because commits on ZFS can take a long time. I'm going to push a patch to improve the debugability of the patch, hopefully we can identify the source of the problem.
            aboyko Alexander Boyko added a comment - master https://testing.whamcloud.com/test_sets/8ff591d9-c8e4-43ab-84d3-357f79417e08
            sarah Sarah Liu added a comment - on 2.15 aarch64 client https://testing.whamcloud.com/test_sets/771da1f7-4616-4a2a-b3fc-d0028e98a485
            adilger Andreas Dilger added a comment - +1 on master: https://testing.whamcloud.com/test_sets/2e6c4a29-6de8-4e85-bc7c-a986208af412
            pjones Peter Jones added a comment -

            Tried to avoid again

            pjones Peter Jones added a comment - Tried to avoid again

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47341/
            Subject: LU-11170 test: increase time limit in sanity test_415
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b974d7d8a9759601b6f851895ee32714ccf56f28

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47341/ Subject: LU-11170 test: increase time limit in sanity test_415 Project: fs/lustre-release Branch: master Current Patch Set: Commit: b974d7d8a9759601b6f851895ee32714ccf56f28

            People

              vkuznetsov Vitaliy Kuznetsov
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: