Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.4
-
3
-
9223372036854775807
Description
replay-vbr test_8b fails with 'Restart of mds1 failed!'. So far, this test has only failed once; https://testing.whamcloud.com/test_sets/4fca0808-fd1b-11e8-8512-52540065bddc .
Looking at the client test_log, we see the MDS has problems
mount facets: mds1 CMD: trevis-16vm8 dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 CMD: trevis-16vm8 test -b /dev/lvm-Role_MDS/P1 CMD: trevis-16vm8 loop_dev=\$(losetup -j /dev/lvm-Role_MDS/P1 | cut -d : -f 1); if [[ -z \$loop_dev ]]; then loop_dev=\$(losetup -f); losetup \$loop_dev /dev/lvm-Role_MDS/P1 || loop_dev=; fi; echo -n \$loop_dev trevis-16vm8: losetup: /dev/lvm-Role_MDS/P1: failed to set up loop device: No such file or directory CMD: trevis-16vm8 test -b /dev/lvm-Role_MDS/P1 CMD: trevis-16vm8 e2label /dev/lvm-Role_MDS/P1 trevis-16vm8: e2label: No such file or directory while trying to open /dev/lvm-Role_MDS/P1 trevis-16vm8: Couldn't find valid filesystem superblock. Starting mds1: -o loop /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1 CMD: trevis-16vm8 mkdir -p /mnt/lustre-mds1; mount -t lustre -o loop /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1 trevis-16vm8: mount: /dev/lvm-Role_MDS/P1: failed to setup loop device: No such file or directory Start of /dev/lvm-Role_MDS/P1 on mds1 failed 32 replay-vbr test_8b: @@@@@@ FAIL: Restart of mds1 failed!
Looking at the MDS1 (vm8) console log, we see replay-vbr test 8a start up, MDS1 disconnect, some stack traces (possibly for the replay-vbr test_8c hang) and the next Lustre test script output is for replay-single test 0a. Similar console log content for MDS2 (vm7).
In the dmesg log for the OSS (vm5), we see some errors for test 8b
[ 8057.763283] Lustre: DEBUG MARKER: == replay-vbr test 8b: create | unlink, create shouldn't fail ======================================== 17:11:00 (1544490660) [ 8058.291385] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-16vm3: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 [ 8058.489932] Lustre: DEBUG MARKER: trevis-16vm3: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 [ 8067.329126] LNetError: 6975:0:(socklnd.c:1679:ksocknal_destroy_conn()) Completing partial receive from 12345-10.9.4.191@tcp[1], ip 10.9.4.191:7988, with error, wanted: 152, left: 152, last alive is 5 secs ago [ 8067.330996] LustreError: 6975:0:(events.c:305:request_in_callback()) event type 2, status -5, service ost [ 8067.331917] LustreError: 26344:0:(pack_generic.c:590:__lustre_unpack_msg()) message length 0 too small for magic/version check [ 8067.332999] LustreError: 26344:0:(sec.c:2068:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.9.4.191@tcp x1619515714571600 [ 8077.892741] Lustre: DEBUG MARKER: /usr/sbin/lctl mark replay-vbr test_8b: @@@@@@ FAIL: Restart of mds1 failed! [ 8078.081333] Lustre: DEBUG MARKER: replay-vbr test_8b: @@@@@@ FAIL: Restart of mds1 failed!