[LU-9492] MDT reports passing recovery deadline prematurely Created: 11/May/17  Updated: 23/Feb/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Ned Bass Assignee: Emoly Liu
Resolution: Unresolved Votes: 1
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

During MDT recovery multiple console messages appear containing the phrase "Recovery already passed deadline MM:SS". The MM:SS displays the minutes and seconds _remaining _ until the recovery deadline expires. This is confusing to system administrators. There are two issues to address here.

1. The wording of the message seems to be incorrect.
2. Even if the wording was correct, It is unclear why this message is emitted.

The clarity of log messages pertaining to recovery is critically important, as that is a time when system administrators tend to watch the logs closely and they need to understand what is happening.

May 10 09:17:07 zinc1 kernel: Lustre: lsh-MDT0000: Will be in recovery for at least 5:00, or until 2827 clients reconnect
May 10 09:18:37 zinc1 kernel: Lustre: lsh-MDT0000: Recovery already passed deadline 3:30. If you do not want to wait more, please abort the recovery by force.
May 10 09:18:37 zinc1 kernel: Lustre: lsh-MDT0000: Recovery already passed deadline 3:29. If you do not want to wait more, please abort the recovery by force.
May 10 09:18:38 zinc1 kernel: Lustre: lsh-MDT0000: Recovery already passed deadline 3:28. If you do not want to wait more, please abort the recovery by force.
May 10 09:18:40 zinc1 kernel: Lustre: lsh-MDT0000: Recovery already passed deadline 3:26. If you do not want to wait more, please abort the recovery by force.
May 10 09:18:45 zinc1 kernel: Lustre: lsh-MDT0000: Recovery already passed deadline 3:22. If you do not want to wait more, please abort the recovery by force.
May 10 09:18:53 zinc1 kernel: Lustre: lsh-MDT0000: Recovery already passed deadline 3:14. If you do not want to wait more, please abort the recovery by force.
May 10 09:19:09 zinc1 kernel: Lustre: lsh-MDT0000: Recovery already passed deadline 2:58. If you do not want to wait more, please abort the recovery by force.
May 10 09:19:41 zinc1 kernel: Lustre: lsh-MDT0000: Recovery already passed deadline 2:26. If you do not want to wait more, please abort the recovery by force.
May 10 09:20:45 zinc1 kernel: Lustre: lsh-MDT0000: Recovery already passed deadline 1:22. If you do not want to wait more, please abort the recovery by force.
May 10 09:22:07 zinc1 kernel: Lustre: lsh-MDT0000: Recovery over after 5:01, of 2827 clients 2651 recovered and 0 were evicted.



 Comments   
Comment by Peter Jones [ 11/May/17 ]

Emoly

Could you please assist with this one?

Thanks

Peter

Comment by Gerrit Updater [ 18/May/17 ]

Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/27178
Subject: LU-9492 ldlm: correct recovery console messages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e4b8bd18f49de3c7588daaf5a850e552ac18260d

Generated at Sat Feb 10 02:26:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.