[LU-6773] DNE2 Failover and recovery soak testing - Whamcloud Community JIRA

Details

Type: Task
Resolution: Fixed
Priority: Blocker
Fix Version/s: None
Affects Version/s: None
Labels:
None

Rank (Obsolete):
9223372036854775807

Description

With async update, cross-MDT operation do not need to synchronize updates on each target. Instead, updates are recorded on each target and recovery of the filesystem from failure takes place using these update records. All operations across MDTs are enabled, for example cross-MDT rename and link succeeds and does not return -EXDEV, so a workload like dbench that is doing renames should function correctly in a striped directory.
1. Setup Lustre with 4 MDS (each MDS has one MDT), 4 OSTs, and at least 8 clients.
2. Each client will create a striped directory (with stripe count = 4). Under each striped directory,
1. 1/2 of clients will keep doing tar, untar in the striped directory.
2. 1/2 of clients will do dbench under striped directory.
3. Randomly reboot one of the MDSes at least once every 30 minutes and fail over to the backup MDS if the test configuration allows it.
4. The test should keep running at least 24 hours without report application error
The goal of the failover and recovery soak testing is not necessarily to resolve every issue found during testing, especially non-DNE issues, but rather to have a good idea of the relative stability of DNE + Async Commits during recovery.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

recovery-mds-scale.suite_log.c24.log
699 kB
02/Aug/15 1:41 AM
recovery-mds-scale.suite_log.c24.log
245 kB
25/Jul/15 7:06 AM
recovery-mds-scale.test_failover_mds.test_log.c24.log
610 kB
28/Jul/15 4:22 PM
test_logs.tgz
0.2 kB
02/Aug/15 1:41 AM

Issue Links

is blocked by

LU-6837 MDS panic during 24 hours failover test.

Resolved

LU-6840 update memory reply data in DNE update replay

Resolved

LU-6852 MDS is evicted during 24-24 hours failover.

Resolved

is blocking

LU-6858 Demonstrate DNE2 functionality

Open

is related to

LU-6831 The ticket for tracking all DNE2 bugs

Reopened

mentioned in: Page Loading...

(1 mentioned in)

Activity

People

Assignee:: James Nunez (Inactive)

Reporter:: Richard Henwood (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 29/Jun/15 10:14 PM

Updated:: 14/Jun/18 9:41 PM

Resolved:: 26/Aug/15 4:42 PM