Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6773

DNE2 Failover and recovery soak testing



    • Task
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • None
    • None
    • 9223372036854775807


      With async update, cross-MDT operation do not need to synchronize updates on each target. Instead, updates are recorded on each target and recovery of the filesystem from failure takes place using these update records. All operations across MDTs are enabled, for example cross-MDT rename and link succeeds and does not return -EXDEV, so a workload like dbench that is doing renames should function correctly in a striped directory.
      1. Setup Lustre with 4 MDS (each MDS has one MDT), 4 OSTs, and at least 8 clients.
      2. Each client will create a striped directory (with stripe count = 4). Under each striped directory,
      1. 1/2 of clients will keep doing tar, untar in the striped directory.
      2. 1/2 of clients will do dbench under striped directory.
      3. Randomly reboot one of the MDSes at least once every 30 minutes and fail over to the backup MDS if the test configuration allows it.
      4. The test should keep running at least 24 hours without report application error
      The goal of the failover and recovery soak testing is not necessarily to resolve every issue found during testing, especially non-DNE issues, but rather to have a good idea of the relative stability of DNE + Async Commits during recovery.


        Issue Links



              jamesanunez James Nunez (Inactive)
              rhenwood Richard Henwood (Inactive)
              0 Vote for this issue
              5 Start watching this issue