Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.8.0
-
TOSS 2 (RHEL 6.7 based)
kernel 2.6.32-573.22.1.1chaos.ch5.4.x86_64
Lustre 2.8.0+patches 2.8-llnl-preview1
zfs-0.6.5.4-1.ch5.4.x86_64
1 MGS - separate server
40 MDTs - each on separate server
10 OSTs - each on separate server
-
3
-
9223372036854775807
Description
See LU-8044
With many MDTs, if MDT0000 cannot connect with one of the other MDTs, (perhaps only on initial startup, I don't know), MDT0000 appears to ignore connection requests from clients.
Seems as if MDT0000 ought to be able to allow mounts, and the filesystem should simply function without the apparently broken MDT.
Actually, it does not require all MDTs to be connect, but it does require the config log of one MDT is executed, before it can accept the connection request. Sorry, I did not make it clear in the last comments.
Yes, this example does make sense. But if the user know one or some MDTs can not get back, it needs to manually deactivate these MDTs on client and other MDTs (which probably cause the failure of this ticket)
then the recovery efforts on these MDTs will be stopped, and those recovery MDTs will be able to accept the connection from clients, and of course clients will only be able to access the file on restored MDTs. Sorry again, I might gave the obscure information in the last comment.
And there are even such test cases in conf-sanity.sh 70c and 70d, please check. Thanks.