[LU-11471] IO Errors during failover with very few number of OSTs Created: 24/Dec/15  Updated: 16/Jan/22  Resolved: 16/Jan/22

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Rajeshwaran Ganesan Assignee: Mikhail Pershin
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Lustre 2.5.X


Issue Links:
Related
is related to LU-10995 DoM2: allow MDT-only filesystems Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

one of our customer is noticed IO errors during failover with fewer OST. we would like to add a note into the Best practice section or OST planning section
========================
During Fail over situation with few OSTs possibly cause I/O errors on the client side.



 Comments   
Comment by Rajeshwaran Ganesan [ 04/Jan/16 ]

we would like add, If the MDS has no currently active OSTs, create requests fail with an I/O error. And we will see the IO errors at the clients.

Comment by Joseph Gmitter (Inactive) [ 21/Sep/16 ]

Hi Rajeshwaran,

Are you familiar with how to push such an update to the manual?

For details on how to submit changes to the manual, please see:

Comment by Andreas Dilger [ 11/Mar/17 ]

I wonder whether this should rather be considered a bug in the code, and the MDS should block file creations if all of the OSTs become unavailable after startup?

Comment by Rajeshwaran Ganesan [ 06/Sep/17 ]

please close this case

Comment by Andreas Dilger [ 04/Oct/18 ]

Reopening this issue. With the advent of Data-on-MDT we will at some point want to allow filesystems with only MDTs to be created. At that point, this check has to be removed.

As a starting point, we could add a tunable that allows this behavior to be selected by the admin - return an error if no OSTs are available, or cause the client to block and wait for an OST to become available. I think in the case where an OST was previously available, but they are temporarily offline due to failover, the client should block. If the file being created has a DoM component at the start, then it should not block.

Comment by Mikhail Pershin [ 16/Jan/22 ]

Main ticket for remaining work is LU-10995

Generated at Sat Feb 10 02:44:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.