Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.1.0, Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.4
-
3
-
5213
Description
The test is as following:
test_0b() { 1) replay_barrier $SINGLEMDS 2) touch $MOUNT2/$tfile 3) touch $MOUNT1/$tfile-2 4) umount $MOUNT2 5) facet_failover $SINGLEMDS 6) umount -f $MOUNT1 7) zconf_mount `hostname` $MOUNT1 || error "mount1 fais" 8) zconf_mount `hostname` $MOUNT2 || error "mount2 fais" 9) checkstat $MOUNT1/$tfile-2 && return 1 10) checkstat $MOUNT2/$tfile && return 2 11) return 0 } run_test 0b "lost client during waiting for next transno"
Currently, with VBR enabled, before step 6), whether client1's requests are replayed or not is uncertain. If replayed, then "$MOUNT1/$tfile-2" should exist, then check 9) should fail. So the check 9) is incorrect and unnecessary.
Failure log:
https://maloo.whamcloud.com/test_sets/1e5580d0-d0ba-11e0-8d02-52540025f9af
... Starting client: client-22vm1.lab.whamcloud.com: -o user_xattr,acl client-22vm3@tcp:/lustre /mnt/lustre debug=-1 subsystem_debug=0xffb7e3ff debug_mb=32 Starting client: client-22vm1.lab.whamcloud.com: -o user_xattr,acl client-22vm3@tcp:/lustre /mnt/lustre2 debug=-1 subsystem_debug=0xffb7e3ff debug_mb=32 replay-dual test_0b: @@@@@@ FAIL: test_0b failed with 1 ...
As shown on MDS side:
https://maloo.whamcloud.com/test_logs/2675b6b8-d0ba-11e0-8d02-52540025f9af
... Lustre: 23777:0:(ldlm_lib.c:2029:target_queue_recovery_request()) Next recovery transno: 8589934645, current: 8589934653, replaying ...
Client1's open_create request was replayed, so above check 9) failed.
Info required for matching: replay-dual 0b