[LU-10518] replay-single test 53g failed with 'close_pid should not exist' Created: 16/Jan/18 Updated: 13/Jan/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0, Lustre 2.10.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
replay-single test_53g fails for failover test sessions. The last lines in the client test_log are: Failover mds1 to onyx-42vm8
02:53:10 (1515725590) waiting for onyx-42vm8 network 900 secs ...
02:53:10 (1515725590) network interface is UP
CMD: onyx-42vm8 hostname
mount facets: mds1
CMD: onyx-42vm8 lsmod | grep zfs >&/dev/null || modprobe zfs;
zpool list -H lustre-mdt1 >/dev/null 2>&1 ||
zpool import -f -o cachefile=none -d /dev/lvm-Role_MDS lustre-mdt1
CMD: onyx-42vm8 zfs get -H -o value lustre:svname lustre-mdt1/mdt1
Starting mds1: lustre-mdt1/mdt1 /mnt/lustre-mds1
CMD: onyx-42vm8 mkdir -p /mnt/lustre-mds1; mount -t lustre lustre-mdt1/mdt1 /mnt/lustre-mds1
CMD: onyx-42vm8 /usr/sbin/lctl get_param -n health_check
CMD: onyx-42vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all\" 4
onyx-42vm8: onyx-42vm8.onyx.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
CMD: onyx-42vm8 zfs get -H -o value lustre:svname lustre-mdt1/mdt1 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
CMD: onyx-42vm8 zfs get -H -o value lustre:svname lustre-mdt1/mdt1 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
CMD: onyx-42vm8 zfs get -H -o value lustre:svname lustre-mdt1/mdt1 2>/dev/null
Started lustre-MDT0000
replay-single test_53g: @@@@@@ FAIL: close_pid should not exist
Test 53g looks like the following, up to the error:
1388 test_53g() {
1389 cancel_lru_locks mdc # cleanup locks from former test cases
1390
1391 mkdir $DIR/${tdir}-1 || error "mkdir $DIR/${tdir}-1 failed"
1392 mkdir $DIR/${tdir}-2 || error "mkdir $DIR/${tdir}-2 failed"
1393 multiop $DIR/${tdir}-1/f O_c &
1394 close_pid=$!
1395
1396 #define OBD_FAIL_MDS_REINT_NET_REP 0x119
1397 do_facet $SINGLEMDS "lctl set_param fail_loc=0x119"
1398 mcreate $DIR/${tdir}-2/f &
1399 open_pid=$!
1400 sleep 1
1401
1402 #define OBD_FAIL_MDS_CLOSE_NET 0x115
1403 do_facet $SINGLEMDS "lctl set_param fail_loc=0x80000115"
1404 kill -USR1 $close_pid
1405 cancel_lru_locks mdc # force the close
1406 do_facet $SINGLEMDS "lctl set_param fail_loc=0"
1407
1408 #bz20647: make sure all pids are exists before failover
1409 [ -d /proc/$close_pid ] || error "close_pid doesn't exist"
1410 [ -d /proc/$open_pid ] || error "open_pid doesn't exists"
1411 replay_barrier_nodf $SINGLEMDS
1412 fail_nodf $SINGLEMDS
1413 wait $open_pid || error "open_pid failed"
1414 sleep 2
1415 # close should be gone
1416 [ -d /proc/$close_pid ] && error "close_pid should not exist"
This test has failed with this error only a couple of times: |
| Comments |
| Comment by Jian Yu [ 06/Feb/18 ] |
|
More failure instances on master branch under failover test group: |