Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.15.0
-
DNE
-
3
-
9223372036854775807
Description
replay-vbr test_7a started failing with the error message 'Test 7a.2 failed' on 03 AUG 2021 and fails 100% of the time for the full-dne-part-2 test sessions. For Lustre 2.14.53 build #4206, this test does NOT fail. For Lustre 2.14.53.7 build #4207, this test fails 100% of the time for DNE test sessions.
Looking at a recent failure at https://testing.whamcloud.com/test_sets/9b87da49-9024-48ba-91b9-e5d006b73d65, we see the following in the suite_log
test_7a.2 first: createmany -o /mnt/lustre/d7a.replay-vbr/f7a.replay-vbr- 1 CMD: trevis-68vm5.trevis.whamcloud.com createmany -o /mnt/lustre/d7a.replay-vbr/f7a.replay-vbr- 1 total: 1 open/close in 0.00 seconds: 284.40 ops/second test_7a.2 lost: rm /mnt/lustre2/d7a.replay-vbr/f7a.replay-vbr-0 CMD: trevis-68vm6 rm /mnt/lustre2/d7a.replay-vbr/f7a.replay-vbr-0 test_7a.2 last: mkdir /mnt/lustre/d7a.replay-vbr/f7a.replay-vbr-0 CMD: trevis-68vm5.trevis.whamcloud.com mkdir /mnt/lustre/d7a.replay-vbr/f7a.replay-vbr-0 CMD: trevis-68vm6 grep -c /mnt/lustre2' ' /proc/mounts Stopping client trevis-68vm6 /mnt/lustre2 (opts:) CMD: trevis-68vm6 lsof -t /mnt/lustre2 pdsh@trevis-68vm5: trevis-68vm6: ssh exited with exit code 1 CMD: trevis-68vm6 umount /mnt/lustre2 2>&1 Failing mds1 on trevis-68vm8 CMD: trevis-68vm8 grep -c /mnt/lustre-mds1' ' /proc/mounts || true Stopping /mnt/lustre-mds1 (opts:) on trevis-68vm8 CMD: trevis-68vm8 umount -d /mnt/lustre-mds1 CMD: trevis-68vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' || true reboot facets: mds1 Failover mds1 to trevis-68vm8 CMD: trevis-68vm8 hostname mount facets: mds1 CMD: trevis-68vm8 dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 CMD: trevis-68vm8 dmsetup status /dev/mapper/mds1_flakey 2>&1 CMD: trevis-68vm8 dmsetup table /dev/mapper/mds1_flakey CMD: trevis-68vm8 dmsetup suspend --nolockfs --noflush /dev/mapper/mds1_flakey CMD: trevis-68vm8 dmsetup load /dev/mapper/mds1_flakey --table \"0 4194304 linear 252:0 0\" CMD: trevis-68vm8 dmsetup resume /dev/mapper/mds1_flakey CMD: trevis-68vm8 test -b /dev/mapper/mds1_flakey CMD: trevis-68vm8 e2label /dev/mapper/mds1_flakey Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 CMD: trevis-68vm8 mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 CMD: trevis-68vm8 /usr/sbin/lctl get_param -n health_check CMD: trevis-68vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all\" 4 trevis-68vm8: CMD: trevis-68vm8 /usr/sbin/lctl get_param -n version 2>/dev/null trevis-68vm8: CMD: trevis-68vm8 /usr/sbin/lctl get_param -n version 2>/dev/null trevis-68vm8: CMD: trevis-68vm7 /usr/sbin/lctl get_param -n version 2>/dev/null trevis-68vm8: CMD: trevis-68vm8.trevis.whamcloud.com /usr/sbin/lctl get_param -n version 2>/dev/null trevis-68vm8: trevis-68vm8.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 CMD: trevis-68vm8 e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' pdsh@trevis-68vm5: trevis-68vm8: ssh exited with exit code 1 CMD: trevis-68vm8 e2label /dev/mapper/mds1_flakey 2>/dev/null Started lustre-MDT0000 CMD: trevis-68vm5.trevis.whamcloud.com lctl get_param -n at_max affected facets: mds1 CMD: trevis-68vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh _wait_recovery_complete *.lustre-MDT0000.recovery_status 1475 trevis-68vm8: CMD: trevis-68vm8 /usr/sbin/lctl get_param -n version 2>/dev/null trevis-68vm8: CMD: trevis-68vm8 /usr/sbin/lctl get_param -n version 2>/dev/null trevis-68vm8: CMD: trevis-68vm7 /usr/sbin/lctl get_param -n version 2>/dev/null trevis-68vm8: CMD: trevis-68vm8.trevis.whamcloud.com /usr/sbin/lctl get_param -n version 2>/dev/null trevis-68vm8: trevis-68vm8.trevis.whamcloud.com: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 1475 trevis-68vm8: *.lustre-MDT0000.recovery_status status: COMPLETE Waiting for orphan cleanup... CMD: trevis-68vm8 /usr/sbin/lctl list_param osp.*osc*.old_sync_processed 2> /dev/null osp.lustre-OST0000-osc-MDT0000.old_sync_processed osp.lustre-OST0000-osc-MDT0002.old_sync_processed osp.lustre-OST0001-osc-MDT0000.old_sync_processed osp.lustre-OST0001-osc-MDT0002.old_sync_processed osp.lustre-OST0002-osc-MDT0000.old_sync_processed osp.lustre-OST0002-osc-MDT0002.old_sync_processed osp.lustre-OST0003-osc-MDT0000.old_sync_processed osp.lustre-OST0003-osc-MDT0002.old_sync_processed osp.lustre-OST0004-osc-MDT0000.old_sync_processed osp.lustre-OST0004-osc-MDT0002.old_sync_processed osp.lustre-OST0005-osc-MDT0000.old_sync_processed osp.lustre-OST0005-osc-MDT0002.old_sync_processed osp.lustre-OST0006-osc-MDT0000.old_sync_processed osp.lustre-OST0006-osc-MDT0002.old_sync_processed osp.lustre-OST0007-osc-MDT0000.old_sync_processed osp.lustre-OST0007-osc-MDT0002.old_sync_processed wait 40 secs maximumly for trevis-68vm8,trevis-68vm9 mds-ost sync done. CMD: trevis-68vm8,trevis-68vm9 /usr/sbin/lctl get_param -n osp.*osc*.old_sync_processed replay-vbr test_7a: @@@@@@ FAIL: Test 7a.2 failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6237:error() = /usr/lib64/lustre/tests/replay-vbr.sh:727:test_7a()
This must be a results of patch https://review.whamcloud.com/38553; 3e04b0fd6c3dd36372f33c54ea5f401c27485d60 “LU-13417 mdd: set default LMV on ROOT”. We may need to use the routine mkdir_on_mdt0() as a temporary fix
Logs for more failures are at
https://testing.whamcloud.com/test_sets/17efe0ba-7e4a-4e7f-b7f5-02383e1314c5
https://testing.whamcloud.com/test_sets/a00d3625-d4b0-48ef-88b1-e50707d75462
https://testing.whamcloud.com/test_sets/08c2b227-3285-438e-87b6-1d34e147a412
Attachments
Issue Links
- is related to
-
LU-15042 sanity test_133b: The counter for setattr on mds1 was not incremented
- Resolved
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...