Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11709

sanity-hsm test 260c fails with 'request on 0x2000013a0:0x2ba:0x0 is not SUCCEED on mds1'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.12.0
    • DNE
    • 3
    • 9223372036854775807

    Description

      sanity-hsm test_260c fails to successfully archive a file and then starts the copytool. We see this test fail in review-dne-part-2 and review-dne-zfs-part-2 test groups.

      Looking at logs for this failure at https://testing.whamcloud.com/test_sets/89b3068c-f258-11e8-b67f-52540065bddc , the last thing we see in the client test_log is waiting for the archive wait state to go from ARCHVIE to SUCCESS

      Starting copytool agt1 on onyx-38vm2
      CMD: onyx-38vm2 lhsmtool_posix  --daemon --hsm-root "/tmp/arc1/sanity-hsm.test_260c/" "/mnt/lustre2" < /dev/null > "/autotest/autotest/2018-11-27/lustre-reviews-el7_5-x86_64--review-dne-zfs-part-2--1_10_1__60277___2ab2a1f1-8a91-44cd-aca5-35ecd1cac72c/sanity-hsm.test_260c.copytool_log.onyx-38vm2.log" 2>&1
      CMD: onyx-38vm2 mkdir -p /tmp/arc1/sanity-hsm.test_260c/
      Starting copytool agt1 on onyx-38vm2
      CMD: onyx-38vm2 lhsmtool_posix  --daemon --hsm-root "/tmp/arc1/sanity-hsm.test_260c/" --archive 2 "/mnt/lustre2" < /dev/null > "/autotest/autotest/2018-11-27/lustre-reviews-el7_5-x86_64--review-dne-zfs-part-2--1_10_1__60277___2ab2a1f1-8a91-44cd-aca5-35ecd1cac72c/sanity-hsm.test_260c.copytool2_log.onyx-38vm2.log" 2>&1
      CMD: onyx-38vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x2000013a0:0x2ba:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
      CMD: onyx-38vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x2000013a0:0x2ba:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
      Waiting 200 secs for update
      …
      CMD: onyx-38vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x2000013a0:0x2ba:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
      CMD: onyx-38vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x2000013a0:0x2ba:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
      CMD: onyx-38vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x2000013a0:0x2ba:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
      CMD: onyx-38vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x2000013a0:0x2ba:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
      Update not seen after 200s: wanted 'SUCCEED' got 'WAITING'
       sanity-hsm test_260c: @@@@@@ FAIL: request on 0x2000013a0:0x2ba:0x0 is not SUCCEED on mds1 
      

      There are no obvious issues found in the console nor in dmesg logs.

      sanity-hsm test_206c just got updated with patch https://review.whamcloud.com/33478 that landed on November 26, 2018. This failure started on November 27, 2018.

      Logs for other failures are at:
      https://testing.whamcloud.com/test_sets/849ea658-f24c-11e8-86c0-52540065bddc
      https://testing.whamcloud.com/test_sets/5714df60-f254-11e8-815b-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: