Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.7.0, Lustre 2.5.4
-
None
-
Combined MGT/MDT, racing multiple mount commands.
-
3
-
9223372036854775807
Description
The patch for LU-5573 (http://review.whamcloud.com/#/c/12353/), which closed LU-5299, does not cover some cases.
Specifically, the code which enables the combined MGT/MDT to start correctly also disables the race protection for a combined MGT/MDT.
So racing multiple mount commands on a combined MGT/MDT can still cause this problem.
I've taken a look, and I don't see any easy way to fix this in the current context. I can provide dumps if needed, and I'll attach a log now.
Note the attempts to start MDT0000. There are five, four of which start after the first one but before it has completed.
Bruno,
Here is a simple reproducer:
1. create and start a Lustre file system with mgt/mdt combo
2. umount the mgt and mdt
3. run the following 'test_mount' script 5 times in parallel:
cat test_mount
#!/bin/bash
mount -t lustre -o nosvc,abort_recov --verbose /dev/sdd /tmp/lustre/scratch/mgt
mount -t lustre -o nomgs,abort_recov --verbose /dev/sdd /tmp/lustre/scratch/mdt
for ((i=0;i<5;i++));do ./test_mount & done;