Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.7.0, Lustre 2.10.0
-
None
-
OpenSFS cluster with two MDSs each with one MDT, three OSSs each with two OSTs and three clients. Running lustre-master tag 2.6.93.
-
3
-
17477
Description
replay-dual is failing with "FAIL: test_10 failed with 1". Results are at https://testing.hpdd.intel.com/test_sessions/fff27cc4-addd-11e4-a0b6-5254006e85c2 .
From the client's test log:
== replay-dual test 10: resending a replayed unlink == 06:45:24 (1423061124) Filesystem 1K-blocks Used Available Use% Mounted on mds01@o2ib:/scratch 1181102496 2829748 1116810332 1% /lustre/scratch c13: mcreate: cannot create `/lustre/scratch/fsa-c13' with mode 0100644: File exists c12: mcreate: cannot create `/lustre/scratch/fsa-c12' with mode 0100644: File exists c11: mcreate: cannot create `/lustre/scratch/fsa-c11' with mode 0100644: File exists fail_loc=0x80000119 Failing mds1 on mds01 Stopping /lustre/scratch/mdt0 (opts:) on mds01 pdsh@c13: mds01: ssh exited with exit code 1 reboot facets: mds1 Failover mds1 to mds01 06:45:44 (1423061144) waiting for mds01 network 900 secs ... 06:45:44 (1423061144) network interface is UP mount facets: mds1 Starting mds1: /dev/lvm-sdc/MDT0 /lustre/scratch/mdt0 Started scratch-MDT0000 c13: mdc.scratch-MDT0000-mdc-*.mds_server_uuid in FULL state after 245 sec fail_loc=0 replay-dual test_10: @@@@@@ FAIL: test_10 failed with 1
What's actually failing is the mcreate in the replay_barrier() function:
replay_barrier() { local facet=$1 do_facet $facet "sync; sync; sync" df $MOUNT # make sure there will be no seq change local clients=${CLIENTS:-$HOSTNAME} local f=fsa-\\\$\(hostname\) do_nodes $clients "mcreate $MOUNT/$f; rm $MOUNT/$f" do_nodes $clients "if [ -d $MOUNT2 ]; then mcreate $MOUNT2/$f; rm $MOUNT2/$f; fi"
Every test session I've checked, about 10, every time test 10 fails with this error, it is preceded by a test 9 failure 'post-failover df: 1' LU-6057. Maybe when test 9 fails, it does not clean up.
Test 10 has failed intermittently with the "mcreate: cannot create * with mode 0100644: File exists" error since December 2014.