LU-563 - Client evictions seen when running mdtest and repeated failover
========================================================================

Issue: http://jira.whamcloud.com/browse/LU-563
Environment: CHAOS 5, RHEL 6, Lustre 2.0.66-2chaos
Contact: Prakash Surya <surya1@llnl.gov>

Description
===========

We have seen client evictions while running mdtest on 55 nodes, with 8 takes
per node and continuous OSS failover taking place.

The command being run is the following:
{noformat}
srun -v -O -l -D /p/lcrater2/surya1/tmp/ -N 55 -n $((55*8))  -- \
	mdtest /p/lcrater2/surya1/tmp -I 10000 >MDTEST.stdout 2>MDTEST.stderr
{noformat}

Out of the 3 times this has been run with OSS failover/failback happening in
the background, none have completed successfully. Two of the runs suffered from
client evictions, and the third suffered from a node failure (it is unclear
whether this was due to Lustre or not).

Also, for what its worth, these issues have not yet appeared when failover is
not running in the background. I have run the above command a couple times
without simultaneously failing the OSS nodes, and those runs were able to
complete successfully.

Contents
========

This archive contains some of the results obtained by running the above
mentioned tests. The results for each test are contained in their own separate
directory. Thus, the results for the first test run is contained in directory
run1, the second test in run2, etc.

Each directory should contain 4 files:

 * MDTEST.cli          - contains the command line which was executed to kick
                         off the test.

 * MDTEST.stdout       - The stdout output of the command executed.

 * MDTEST.stderr       - The stderr output of the command executed.

 * lustre.log-$DATE.gz - A collection of the Lustre console log messages for
                         the day the test was run.

Also, the root directory should contain the fail.sh script. This script was
simultaneously executed while the aforementioned tests were running to stress
the failover/failback capabilities of the filesystem.
