Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9306

sanity-hsm test 24d is failing with 'request on 0x200000405:0x24:0x0 is not SUCCEED on mds1'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      sanity_hsm test_24d is failing. From the test log, we wait for an update for over 200 seconds:

      CMD: onyx-39vm7 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x200000405:0x24:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
      CMD: onyx-39vm7 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x200000405:0x24:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
      Update not seen after 200s: wanted 'SUCCEED' got 'STARTED'
       sanity-hsm test_24d: @@@@@@ FAIL: request on 0x200000405:0x24:0x0 is not SUCCEED on mds1 
      

      There is nothing obviously wrong in the console logs for any of the nodes.

      The copytool_log for this test is nearly empty and doesn’t provide any information on what is causing this problem. The full copytool_log for this test is:

      1491012689.288932 lhsmtool_posix[24069]: action=0 src=(null) dst=(null) mount_point=/mnt/lustre3
      1491012689.334255 lhsmtool_posix[24070]: waiting for message from kernel
      exiting: Terminated
      

      This test failure could be leading to a cascade of failures. After test 24d fails, the following tests fail 24e, 24f, 25b, 26, 27b, 28, 29b, 29c, 30b, 30c, 31b, and many more. I don’t know if all the failures are related, but we should clean up the first test that’s failing.

      So far, I’ve only seen this test fail for review-dne-part-2. So, the issue may be DNE related?

      This test started to fail on the master branch on 2017-03-25 and has failed about 19 times since then. The patch for LU-8911, https://review.whamcloud.com/#/c/24185/, is the last patch that made modifications to this test and sanity-hsm.

      Here are links to some of the failed test logs:
      2017-04-06 - https://testing.hpdd.intel.com/test_sets/81096390-1ae7-11e7-9073-5254006e85c2
      2017-04-05 - https://testing.hpdd.intel.com/test_sets/ad0ce212-1a3f-11e7-9de9-5254006e85c2
      2017-04-05 - https://testing.hpdd.intel.com/test_sets/28ab074e-19ed-11e7-b742-5254006e85c2
      2017-04-05 - https://testing.hpdd.intel.com/test_sets/2bd0287a-19cd-11e7-8920-5254006e85c2
      2017-04-04 - https://testing.hpdd.intel.com/test_sets/550c4e1a-1952-11e7-9de9-5254006e85c2
      2017-04-03 - https://testing.hpdd.intel.com/test_sets/d986e31e-18c9-11e7-8920-5254006e85c2

      Attachments

        Activity

          People

            jhammond John Hammond
            jamesanunez James Nunez (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: