Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11256

replay-vbr test 7f is failing with 'Restart of mds1 failed!'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.0, Lustre 2.10.4, Lustre 2.10.5
    • None
    • 3
    • 9223372036854775807

    Description

      replay-vbr test_7f fails on mounting an MDS. It’s not clear when this test started failing with this error, but it looks like this test didn’t fail on MDS mount for two and a half months and started failing again on July 14, 2018. All failures of this type since March of 2018 are listed below.

      Looking at the failure at https://testing.whamcloud.com/test_sets/5b253cd8-878f-11e8-9028-52540065bddc, in the test_log, the only sign of trouble is when we try and mount the failover MDS

      Failing mds1 on trevis-4vm8
      + pm -h powerman --off trevis-4vm8
      Command completed successfully
      reboot facets: mds1
      + pm -h powerman --on trevis-4vm8
      Command completed successfully
      Failover mds1 to trevis-4vm7
      12:30:33 (1531571433) waiting for trevis-4vm7 network 900 secs ...
      12:30:33 (1531571433) network interface is UP
      CMD: trevis-4vm7 hostname
      mount facets: mds1
      CMD: trevis-4vm7 test -b /dev/lvm-Role_MDS/P1
      CMD: trevis-4vm7 e2label /dev/lvm-Role_MDS/P1
      trevis-4vm7: e2label: No such file or directory while trying to open /dev/lvm-Role_MDS/P1
      trevis-4vm7: Couldn't find valid filesystem superblock.
      Starting mds1:   -o loop /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
      CMD: trevis-4vm7 mkdir -p /mnt/lustre-mds1; mount -t lustre   -o loop 		                   /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
      trevis-4vm7: mount: /dev/lvm-Role_MDS/P1: failed to setup loop device: No such file or directory
      Start of /dev/lvm-Role_MDS/P1 on mds1 failed 32
       replay-vbr test_7f: @@@@@@ FAIL: Restart of mds1 failed! 
      

      In all the following cases, test 7g hangs when test 7f fails in this way.

      2018-08-15 2.10.5 RC2 – fails in “test_7f.5 last”
      https://testing.whamcloud.com/test_sets/a75d306e-a081-11e8-8ee3-52540065bddc
      2018-08-02 2.10.4.14 – fails in “test_7f.5 last”
      https://testing.whamcloud.com/test_sets/7405ad54-9645-11e8-a9f7-52540065bddc
      2018-07-14 2.10.4.8 - fails in “test_7f.1 last”
      https://testing.whamcloud.com/test_sets/5b253cd8-878f-11e8-9028-52540065bddc
      2018-04-12 2.11.50.51 - fails in “test_7f.4 last”
      https://testing.whamcloud.com/test_sets/37bad538-3e69-11e8-b45c-52540065bddc
      2018-03-03 2.10.3.35 - fails in “test_7f.4 last”
      https://testing.whamcloud.com/test_sets/f33a4326-1f0f-11e8-a6ca-52540065bddc

      In the following test session, replay-vbr test 7e fails in the way described above and test 7f hangs
      2018-07-15 2.10.4.8 - fails in “test_7e.5 last”
      https://testing.whamcloud.com/test_sets/0ca6ca46-87fc-11e8-b376-52540065bddc
      2018-03-14 2.10.59 - fails in “test_7e.5 last”
      https://testing.whamcloud.com/test_sets/d26f60b0-2809-11e8-b6a0-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: