Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-563

Client evictions seen when running mdtest and repeated failover

    XMLWordPrintable

Details

    • Bug
    • Resolution: Incomplete
    • Minor
    • None
    • Lustre 2.1.0
    • None
    • CHAOS 5, RHEL 6, Lustre 2.0.66-2chaos
    • 3
    • 6571

    Description

      We have seen client evictions while running mdtest on 55 nodes, with 8 takes
      per node and continuous OSS failover taking place.

      The command being run is the following:

      srun -v -O -l -D /p/lcrater2/surya1/tmp/ -N 55 -n $((55*8))  -- \
              mdtest /p/lcrater2/surya1/tmp -I 10000 >MDTEST.stdout 2>MDTEST.stderr
      

      Out of the 3 times this has been run with OSS failover/failback happening in
      the background, none have completed successfully. Two of the runs suffered from
      client evictions, and the third suffered from a node failure (it is unclear
      whether this was due to Lustre or not).

      Also, for what its worth, these issues have not yet appeared when failover is
      not running in the background. I have run the above command a couple times
      without simultaneously failing the OSS nodes, and those runs were able to
      complete successfully.

      Attachments

        1. MDTEST7.tar.gz
          9.34 MB
        2. mdtest-results.tar.gz
          1.00 MB
        3. results.tar.gz
          7.98 MB
        4. results.tar.gz
          6.48 MB

        Activity

          People

            bobijam Zhenyu Xu
            prakash Prakash Surya (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: