Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6920

sanity test_205 failed with old jobstats not expired

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      Test 205 was executed for 5 iterations

      Using JobID environment variable FAKE_JOBID=id.205.mkdir.24449
      sanity test_205: @@@@@@ FAIL: old jobstats not expired
      Trace dump:
      = /usr/lib64/lustre/tests/test-framework.sh:4732:error_noexit()
      = /usr/lib64/lustre/tests/test-framework.sh:4763:error()
      = /usr/lib64/lustre/tests/sanity.sh:11638:test_205()
      = /usr/lib64/lustre/tests/test-framework.sh:5010:run_one()
      = /usr/lib64/lustre/tests/test-framework.sh:5047:run_one_logged()
      = /usr/lib64/lustre/tests/test-framework.sh:4864:run_test()
      = /usr/lib64/lustre/tests/sanity.sh:11660:main()
      Dumping lctl log to /tmp/test_logs/1438010993/sanity.test_205.*.1438011013.log
      fre0106: Warning: Permanently added 'fre0107,192.168.101.7' (RSA) to the list of known hosts.

      fre0105: Warning: Permanently added 'fre0107,192.168.101.7' (RSA) to the list of known hosts.

      fre0108: Warning: Permanently added 'fre0107,192.168.101.7' (RSA) to the list of known hosts.

      mdt.lustre-MDT0000.job_cleanup_interval=600
      fre0105: warning: 'lctl conf_param' is deprecated, use 'lctl set_param -P' instead
      Waiting 90 secs for update
      Updated after 9s: wanted 'procname_uid' got 'procname_uid'
      lustre-MDT0000: Deregistered changelog user 'cl4'
      FAIL 205 (24s)
      sanity: FAIL: test_205 old jobstats not expired
      debug=super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck

      Attachments

        Issue Links

          Activity

            [LU-6920] sanity test_205 failed with old jobstats not expired

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16753/
            Subject: LU-6920 test: add some slack to jobstats expiry in test_205
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 13e34c1d0e5472759d1350b62fa0663bbcd59fa0

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16753/ Subject: LU-6920 test: add some slack to jobstats expiry in test_205 Project: fs/lustre-release Branch: master Current Patch Set: Commit: 13e34c1d0e5472759d1350b62fa0663bbcd59fa0

            Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/16753
            Subject: LU-6920 test: try to reproduce failure on sanity test_205
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 376ea32167938839f209772e12b68a7acd96cc29

            gerrit Gerrit Updater added a comment - Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/16753 Subject: LU-6920 test: try to reproduce failure on sanity test_205 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 376ea32167938839f209772e12b68a7acd96cc29

            It may be that increasing the timeout for jobid expiry by a second or two would be enough. The most efficient way might be to use wait_update() with a maximum of (left + 5) seconds so that it doesn't wait longer than needed.

            adilger Andreas Dilger added a comment - It may be that increasing the timeout for jobid expiry by a second or two would be enough. The most efficient way might be to use wait_update() with a maximum of (left + 5) seconds so that it doesn't wait longer than needed.

            Bob, assigning this to you for further investigation. It seems to be failing regularly for SLES11 SP3/SP4 tests, and even if we could get those patches to pass once, this is failing for more test runs than it is passing and we would just be introducing a regression by landing those patches.

            adilger Andreas Dilger added a comment - Bob, assigning this to you for further investigation. It seems to be failing regularly for SLES11 SP3/SP4 tests, and even if we could get those patches to pass once, this is failing for more test runs than it is passing and we would just be introducing a regression by landing those patches.

            It looks like this test is only failing on SLES11 SP3/SP4 and not others.

            adilger Andreas Dilger added a comment - It looks like this test is only failing on SLES11 SP3/SP4 and not others.
            bogl Bob Glossman (Inactive) added a comment - another sles11sp4 on master: https://testing.hpdd.intel.com/test_sets/5f9b33a8-6cdf-11e5-a8d6-5254006e85c2
            bogl Bob Glossman (Inactive) added a comment - another one seen with sles11sp4 on master: https://testing.hpdd.intel.com/test_sets/a1bab5ba-6a0f-11e5-b8d9-5254006e85c2

            It's easily reproducible on master

            Lustre: DEBUG MARKER: Test: mkdir /mnt/lustre/d205.sanity.expire
            Lustre: DEBUG MARKER: Using JobID environment variable FAKE_JOBID=id.205.mkdir.18733
            Lustre: DEBUG MARKER: sanity test_205: @@@@@@ FAIL: old jobstats not expired
            Lustre: lustre-MDD0000: changelog off
            Lustre: DEBUG MARKER: == sanity test complete, duration 34 sec == 19:52:57 (1441191177)
            
            app Ashish Purkar added a comment - It's easily reproducible on master Lustre: DEBUG MARKER: Test: mkdir /mnt/lustre/d205.sanity.expire Lustre: DEBUG MARKER: Using JobID environment variable FAKE_JOBID=id.205.mkdir.18733 Lustre: DEBUG MARKER: sanity test_205: @@@@@@ FAIL: old jobstats not expired Lustre: lustre-MDD0000: changelog off Lustre: DEBUG MARKER: == sanity test complete, duration 34 sec == 19:52:57 (1441191177)

            People

              bogl Bob Glossman (Inactive)
              aditya.pandit@seagate.com Aditya Pandit (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: