Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.4.0
-
3
-
8640
Description
I see this message at startup time on the MDS. If it's safe to ignore, it should be removed. If it's important, it should be refactored to be understandable by an admin (I don't even know what it means, and it's a console message).
2013-06-11 12:53:52 LustreError: 11-0: lc2-OST0007-osc-MDT0000: Communicating with 10.1.1.48@o2ib9, operation ost_connect failed with -19.
Yes, I think I agree. Since we don't have any better infrastructure for reporting things like this, I'm more OK with the message if we just try and suppressed the "noise".
I still don't think the console is the "right" place for it, but that's all we have at the moment. It would be really cool to be able to, instead, post some sort of event that a another process (e.g. userspace daemon) could consume and then decide what to do (e.g. ignore, ping monitoring software, send email, etc). But that's a whole 'nother can of worms.
I think having some sort of timer (or number of resends) to suppress the message would go a long way in this particular case.