Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
Lustre 2.17.0
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for jianyu <yujian@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/7318d056-303a-4312-af07-6e96e5ad1e37
test_25 failed with the following error:
== ost-pools test 25: Create new pool and restart MDS ==== 21:05:29 (1764709529)
CMD: onyx-80vm11 lctl pool_new lustre.testpool1
onyx-80vm11: Pool lustre.testpool1 created
CMD: onyx-80vm11 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool1 2>/dev/null || echo foo
CMD: onyx-80vm12 lctl get_param -n lod.lustre-MDT0001-mdtlov.pools.testpool1 2>/dev/null || echo foo
CMD: onyx-80vm11 lctl get_param -n lod.lustre-MDT0002-mdtlov.pools.testpool1 2>/dev/null || echo foo
CMD: onyx-80vm1.onyx.whamcloud.com lctl get_param -n lov.lustre-*.pools.testpool1 2>/dev/null || echo foo
CMD: onyx-80vm11 lctl pool_add lustre.testpool1 OST0000; sync
onyx-80vm11: OST lustre-OST0000_UUID added to pool lustre.testpool1
CMD: onyx-80vm1.onyx.whamcloud.com lctl get_param -n lov.lustre-*.pools.testpool1 | sort -u |
tr '\n' ' '
Failing mds1 on onyx-80vm11
CMD: onyx-80vm11 grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Stopping /mnt/lustre-mds1 (opts:) on onyx-80vm11
CMD: onyx-80vm11 umount -d /mnt/lustre-mds1
CMD: onyx-80vm11 lsmod | grep lnet > /dev/null &&
lctl dl | grep ' ST ' || true
21:05:36 (1764709536) shut down
facet: mds1 facet_host: onyx-80vm11 facet_failover_host: onyx-80vm11
Failover mds1 to onyx-80vm11
mount facets: mds1
CMD: onyx-80vm11 dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1
CMD: onyx-80vm11 dmsetup status /dev/mapper/mds1_flakey 2>&1
CMD: onyx-80vm11 test -b /dev/mapper/mds1_flakey
CMD: onyx-80vm11 e2label /dev/mapper/mds1_flakey
Start mds1: mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1
CMD: onyx-80vm11 mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1
CMD: onyx-80vm11 /usr/sbin/lctl get_param -n health_check
CMD: onyx-80vm11 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/opt/iozone/bin:/usr/lib64/openmpi/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config TESTLOG_PREFIX=/autotest/autotest-2/2025-12-02/lustre-master_full-dne-part-1_4677_7_48951508-b72b-41b2-b97c-9a9735efc9b9//ost-pools TESTNAME=test_25 CONFIG=/usr/lib64/lustre/tests/cfg/autotest_config.sh bash rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all\"
onyx-80vm11: CMD: onyx-80vm11 hostname -I
onyx-80vm11: onyx-80vm11.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
CMD: onyx-80vm11 e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
pdsh@onyx-80vm1: onyx-80vm11: ssh exited with exit code 1
CMD: onyx-80vm11 e2label /dev/mapper/mds1_flakey 2>/dev/null
Started lustre-MDT0000
21:05:51 (1764709551) targets are mounted
21:05:51 (1764709551) facet_failover done
CMD: onyx-80vm11 lctl get_param -n at_min
CMD: onyx-80vm11 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/opt/iozone/bin:/usr/lib64/openmpi/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config TESTLOG_PREFIX=/autotest/autotest-2/2025-12-02/lustre-master_full-dne-part-1_4677_7_48951508-b72b-41b2-b97c-9a9735efc9b9//ost-pools TESTNAME=test_25 CONFIG=/usr/lib64/lustre/tests/cfg/autotest_config.sh bash rpc.sh wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0000.ost_server_uuid 50
onyx-80vm11: CMD: onyx-80vm11 hostname -I
onyx-80vm11: onyx-80vm11.onyx.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0000.ost_server_uuid 50
onyx-80vm11: os[cp].lustre-OST0000-osc-MDT0000.ost_server_uuid in FULL state after 0 sec
Test session details:
clients: https://build.whamcloud.com/job/lustre-master/4677 - 4.18.0-553.85.1.el8_10.x86_64
servers: https://build.whamcloud.com/job/lustre-master/4677 - 4.18.0-553.85.1.el8_lustre.x86_64
<<Please provide additional information about the failure here>>
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
ost-pools test_25 - Timeout occurred after 839 minutes, last suite running was ost-pools