Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10734

sanity test_160g: User cl8 still found in changelog_users

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0
    • Lustre 2.11.0
    • 3
    • 9223372036854775807

    Description

      sanity test_160g - User cl8 still found in changelog_users
      ^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^

      This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

      This issue relates to the following test suite run:
      https://testing.hpdd.intel.com/test_sets/5a8495f4-1bfa-11e8-a6ad-52540065bddc
      https://testing.hpdd.intel.com/test_sets/34e243bc-1be3-11e8-a7cd-52540065bddc

      test_160g failed with the following error:

      User cl8 still found in changelog_users
      

      This may be a dup of LU-9624
      I can't tell if it is so I am raising a fresh ticket.
      Will let somebody else decide if it's a dup or not.

      Attachments

        Issue Links

          Activity

            [LU-10734] sanity test_160g: User cl8 still found in changelog_users
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/31604/
            Subject: LU-10734 tests: ensure current GC interval is over
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 821087e65882a9885964ed07d6f2a630dfb599d5

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/31604/ Subject: LU-10734 tests: ensure current GC interval is over Project: fs/lustre-release Branch: master Current Patch Set: Commit: 821087e65882a9885964ed07d6f2a630dfb599d5

            This fail is blocked for now. test 160g was added to ALWAYS_EXCEPT in a patch landed to master for LU-10680. May need to look for similar fails if and when test 160g is taken back out of ALWAYS_EXCEPT.

            bogl Bob Glossman (Inactive) added a comment - This fail is blocked for now. test 160g was added to ALWAYS_EXCEPT in a patch landed to master for LU-10680 . May need to look for similar fails if and when test 160g is taken back out of ALWAYS_EXCEPT.

            Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: https://review.whamcloud.com/31604
            Subject: LU-10734 tests: ensure current GC interval is over
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 72032b016ea8ab62cc681e72b5565ba207a6c316

            gerrit Gerrit Updater added a comment - Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: https://review.whamcloud.com/31604 Subject: LU-10734 tests: ensure current GC interval is over Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 72032b016ea8ab62cc681e72b5565ba207a6c316
            bfaccini Bruno Faccini (Inactive) added a comment - - edited

            Eh eh, after taking some time to think about it, I was wondering if the only regression/side effect coming from patch https://review.whamcloud.com/27535 ("a37134d LU-9624 tests: fix pre-DNE test exceptions/llog usage"), that we strongly suspect to be the cause of these failures, is to have slightly reduced the execution/elapsed time of sanity/test_160g script's beginning/prologue that may now take less than the 2 seconds of delay interval between 2x garbage-collection thread runs (as it has just run in sanity/test_160f, when sanity.sh is being fully executed during auto-tests) being used/configured ("changelog_min_gc_interval=2").
            And this seems to be confirmed during my reproducer testing.

            So a simple "sleep 2" at the beginning of sanity/test_160g should fix this problem.

            bfaccini Bruno Faccini (Inactive) added a comment - - edited Eh eh, after taking some time to think about it, I was wondering if the only regression/side effect coming from patch https://review.whamcloud.com/27535 ("a37134d LU-9624 tests: fix pre-DNE test exceptions/llog usage"), that we strongly suspect to be the cause of these failures, is to have slightly reduced the execution/elapsed time of sanity/test_160g script's beginning/prologue that may now take less than the 2 seconds of delay interval between 2x garbage-collection thread runs (as it has just run in sanity/test_160f, when sanity.sh is being fully executed during auto-tests) being used/configured ("changelog_min_gc_interval=2"). And this seems to be confirmed during my reproducer testing. So a simple "sleep 2" at the beginning of sanity/test_160g should fix this problem.
            pjones Peter Jones added a comment -

            > It didn't fail during normal testing, but I guess SLES is not part of regular testing.

            Well, it is tested regularly, but due to the round robin system used for pre-landing review test runs, it is not guaranteed to run before everything lands unless people proactively request this with test parameters.

            pjones Peter Jones added a comment - > It didn't fail during normal testing, but I guess SLES is not part of regular testing. Well, it is tested regularly, but due to the round robin system used for pre-landing review test runs, it is not guaranteed to run before everything lands unless people proactively request this with test parameters.

            Note also that with patch https://review.whamcloud.com/31552 "LU-10680 mdd: disable changelog garbage collection by default" test_160f and test_160g need to be modified to set changelog_gc=1 at the start of each test, and remove the tests from ALWAYS_EXCEPT so that the tests will run properly.

            adilger Andreas Dilger added a comment - Note also that with patch https://review.whamcloud.com/31552 " LU-10680 mdd: disable changelog garbage collection by default " test_160f and test_160g need to be modified to set changelog_gc=1 at the start of each test, and remove the tests from ALWAYS_EXCEPT so that the tests will run properly.

            It looks like this failure relates to the landing of patch https://review.whamcloud.com/27535 "LU-9624 tests: fix pre-DNE test exceptions/llog usage". It didn't fail during normal testing, but I guess SLES is not part of regular testing.

            adilger Andreas Dilger added a comment - It looks like this failure relates to the landing of patch https://review.whamcloud.com/27535 " LU-9624 tests: fix pre-DNE test exceptions/llog usage ". It didn't fail during normal testing, but I guess SLES is not part of regular testing.

            Having a better look to the recent changes that may have introduced this regression, I think that "a37134d LU-9624 tests: fix pre-DNE test exceptions/llog usage" could better be the cause of it.

            Hope to get more about this soon now.

            bfaccini Bruno Faccini (Inactive) added a comment - Having a better look to the recent changes that may have introduced this regression, I think that "a37134d LU-9624 tests: fix pre-DNE test exceptions/llog usage" could better be the cause of it. Hope to get more about this soon now.

            +1 on master, all with DNE
            testing.hpdd.intel.com/test_sessions/7a5adbc7-2d4b-425a-9e71-a4674823a0df
            testing.hpdd.intel.com/test_sessions/a9ae8e29-d45d-49b6-a639-a6fba84f5dfc

            tappro Mikhail Pershin added a comment - +1 on master, all with DNE testing.hpdd.intel.com/test_sessions/7a5adbc7-2d4b-425a-9e71-a4674823a0df testing.hpdd.intel.com/test_sessions/a9ae8e29-d45d-49b6-a639-a6fba84f5dfc

            People

              bfaccini Bruno Faccini (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: