Details
-
Bug
-
Resolution: Incomplete
-
Minor
-
None
-
Lustre 2.1.0
-
None
-
CHAOS 5, RHEL 6, Lustre 2.0.66-2chaos
-
3
-
6571
Description
We have seen client evictions while running mdtest on 55 nodes, with 8 takes
per node and continuous OSS failover taking place.
The command being run is the following:
srun -v -O -l -D /p/lcrater2/surya1/tmp/ -N 55 -n $((55*8)) -- \
mdtest /p/lcrater2/surya1/tmp -I 10000 >MDTEST.stdout 2>MDTEST.stderr
Out of the 3 times this has been run with OSS failover/failback happening in
the background, none have completed successfully. Two of the runs suffered from
client evictions, and the third suffered from a node failure (it is unclear
whether this was due to Lustre or not).
Also, for what its worth, these issues have not yet appeared when failover is
not running in the background. I have run the above command a couple times
without simultaneously failing the OSS nodes, and those runs were able to
complete successfully.