[LU-3731] Improve understanding of LustreError for "not available for connect (no target)" to be better understood by users Created: 09/Aug/13 Updated: 16/Oct/13 Resolved: 17/Sep/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0, Lustre 2.4.0, Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.5.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Bobbie Lind (Inactive) | Assignee: | Bobbie Lind (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Rank (Obsolete): | 9629 |
| Description |
|
When running in an HA environment servers and clients can report a LustreError of no target available. This error is technically correct depending on the server it is reported on. If it is reported on one of the failover pairs but the OST is mounted on the other then this "Error" is not one that requires action. This error, however, does need to be addressed if no servers are able to mount the target. Making this error a little more user friendly will help users understand when it is an error that needs to be addressed and when it can be thought of as a warning. |
| Comments |
| Comment by Brian Murrell (Inactive) [ 09/Aug/13 ] |
|
I think this issue is as old as kerosene. But seriously, it is indeed a very misleading "error", especially when you are not a Lustre expert. Ideally, this situation would be a NOOP (no error message, etc.) if the node that is in standby for the target knew that it was active somewhere else. I'm not sure that's an easy problem though. I wonder if imperative recovery can help a standby node know this. Or given that the target it's complaining about should be shared with another node, MMP should be in place and maybe that could help the server, but that would require it knows which block device the target is on, which I would think it only ever knows when the target is started. But even just clarifying the message might help. Maybe the message should be more english friendly. Something like "that target is not running on this node. perhaps it's running on $node" where I would guess it knows which other $node (ignoring that it could be on any number of other nodes for the moment) it could be running on by knowing which node(s) are failover for the target. |
| Comment by Bobbie Lind (Inactive) [ 09/Aug/13 ] |
|
Brian I was thinking something along the lines of "If you are running an HA pair check that the target is mounted on the other server." Which puts the understanding of this error message back on the user rather than doing any code changes. It would, however, clarify what may be going on. |
| Comment by Bobbie Lind (Inactive) [ 23/Aug/13 ] |
|
Patch submitted here http://review.whamcloud.com/7438 |
| Comment by Bobbie Lind (Inactive) [ 17/Sep/13 ] |
|
Patch landed to master as 1699de20882a56d6fda16f23897de0c0b4e2f610 |