Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11795

replay-vbr test 8b fails with 'Restart of mds1 failed!'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.4
    • 3
    • 9223372036854775807

    Description

      replay-vbr test_8b fails with 'Restart of mds1 failed!'. So far, this test has only failed once; https://testing.whamcloud.com/test_sets/4fca0808-fd1b-11e8-8512-52540065bddc .

      Looking at the client test_log, we see the MDS has problems

      mount facets: mds1
      CMD: trevis-16vm8 dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1
      CMD: trevis-16vm8 test -b /dev/lvm-Role_MDS/P1
      CMD: trevis-16vm8 loop_dev=\$(losetup -j /dev/lvm-Role_MDS/P1 | cut -d : -f 1);
      			 if [[ -z \$loop_dev ]]; then
      				loop_dev=\$(losetup -f);
      				losetup \$loop_dev /dev/lvm-Role_MDS/P1 || loop_dev=;
      			 fi;
      			 echo -n \$loop_dev
      trevis-16vm8: losetup: /dev/lvm-Role_MDS/P1: failed to set up loop device: No such file or directory
      CMD: trevis-16vm8 test -b /dev/lvm-Role_MDS/P1
      CMD: trevis-16vm8 e2label /dev/lvm-Role_MDS/P1
      trevis-16vm8: e2label: No such file or directory while trying to open /dev/lvm-Role_MDS/P1
      trevis-16vm8: Couldn't find valid filesystem superblock.
      Starting mds1:   -o loop /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
      CMD: trevis-16vm8 mkdir -p /mnt/lustre-mds1; mount -t lustre   -o loop /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
      trevis-16vm8: mount: /dev/lvm-Role_MDS/P1: failed to setup loop device: No such file or directory
      Start of /dev/lvm-Role_MDS/P1 on mds1 failed 32
       replay-vbr test_8b: @@@@@@ FAIL: Restart of mds1 failed! 
      

      Looking at the MDS1 (vm8) console log, we see replay-vbr test 8a start up, MDS1 disconnect, some stack traces (possibly for the replay-vbr test_8c hang) and the next Lustre test script output is for replay-single test 0a. Similar console log content for MDS2 (vm7).

      In the dmesg log for the OSS (vm5), we see some errors for test 8b

      [ 8057.763283] Lustre: DEBUG MARKER: == replay-vbr test 8b: create | unlink, create shouldn't fail ======================================== 17:11:00 (1544490660)
      [ 8058.291385] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-16vm3: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
      [ 8058.489932] Lustre: DEBUG MARKER: trevis-16vm3: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
      [ 8067.329126] LNetError: 6975:0:(socklnd.c:1679:ksocknal_destroy_conn()) Completing partial receive from 12345-10.9.4.191@tcp[1], ip 10.9.4.191:7988, with error, wanted: 152, left: 152, last alive is 5 secs ago
      [ 8067.330996] LustreError: 6975:0:(events.c:305:request_in_callback()) event type 2, status -5, service ost
      [ 8067.331917] LustreError: 26344:0:(pack_generic.c:590:__lustre_unpack_msg()) message length 0 too small for magic/version check
      [ 8067.332999] LustreError: 26344:0:(sec.c:2068:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.9.4.191@tcp x1619515714571600
      [ 8077.892741] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-vbr test_8b: @@@@@@ FAIL: Restart of mds1 failed! 
      [ 8078.081333] Lustre: DEBUG MARKER: replay-vbr test_8b: @@@@@@ FAIL: Restart of mds1 failed!
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: