[LU-11771] bad output in target_handle_reconnect: Recovery already passed deadline 71578:57 Created: 13/Dec/18 Updated: 08/Oct/19 Resolved: 25/May/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.3 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sergey Cheremencev | Assignee: | James A Simmons |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
In functions target_handle_reconnect and target_handle_connect I've found incorrect using of linux kernel time types. now = ktime_get_seconds(); deadline = jiffies_to_msecs(target->obd_recovery_timer.expires) / MSEC_PER_SEC; Comparing jiffies converted to seconds and seconds from CLOCK_MONOTONIC is incorrect. 2018-07-31 18:51:46 [ 8201.235800] Lustre: fs1-OST0000: Recovery already passed deadline 71578:57. If you do not want to wait more, please abort the recovery by force. ... 2018-07-31 18:51:46 [ 8201.236177] Lustre: fs1-OST0000: Denying connection for new client 71f8ec29-a676-0a96-3d1d-97b43c72e168(at 172.18.1.101@o2ib), waiting for 13 known clients (1 recovered, 11 in progress, and 1 evicted) to recover in 71578:57 |
| Comments |
| Comment by Gerrit Updater [ 13/Dec/18 ] |
|
Sergey Cheremencev (c17829@cray.com) uploaded a new patch: https://review.whamcloud.com/33848 |
| Comment by James A Simmons [ 13/Dec/18 ] |
|
What you are suggesting is the the seconds since boot don't match the jiffies mapped to seconds since boot. Note if you build lustre on a system with CONF_HZ=1000 and install on a system with CONF_HZ=100 this patch will break. |
| Comment by Andreas Dilger [ 13/Dec/18 ] |
|
It would also be useful to improve the error message "If you do not want to wait more, please abort the recovery by force." to be more specific, like "Please run 'lctl --device fs1-OST0000 abort_recovery' to force recovery to finish. This evicts clients and may cause application IO errors." |
| Comment by Andreas Dilger [ 13/Dec/18 ] |
|
This problem was introduced in patch https://review.whamcloud.com/29295 " |
| Comment by James A Simmons [ 14/Dec/18 ] |
|
The reason for this is that jiffies is initialized to 5 minutes before the machine actually boots. So jiffies starts at -300 * HZ while ktime actually starts at 0. I will look carefully at the difference between the two some time tomorrow. |
| Comment by Gerrit Updater [ 14/Dec/18 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33857 |
| Comment by Gerrit Updater [ 17/Dec/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33883 |
| Comment by Gerrit Updater [ 04/Jan/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33857/ |
| Comment by Peter Jones [ 04/Jan/19 ] |
|
Andreas's patch just landed. Do we need Sergey's and/or James's too or can they now be abandoned? |
| Comment by James A Simmons [ 04/Jan/19 ] |
|
The patch that landed was a fix for something else. We need to land the other patch. |
| Comment by Gerrit Updater [ 08/Apr/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33883/ |
| Comment by James A Simmons [ 08/Apr/19 ] |
|
Patch landed that resolves this issue. |
| Comment by Gerrit Updater [ 09/Apr/19 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/34626 |
| Comment by Gerrit Updater [ 10/Apr/19 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34629 |
| Comment by James A Simmons [ 10/Apr/19 ] |
|
So something changed which now makes this patch fail. Even the back ported version to 2.12 doesn't have these kinds of failures. |
| Comment by Gerrit Updater [ 11/Apr/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34629/ |
| Comment by Gerrit Updater [ 18/Apr/19 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/34710 |
| Comment by Gerrit Updater [ 25/May/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34710/ |
| Comment by James A Simmons [ 25/May/19 ] |
|
Fix landed. We just need to let it soak. |
| Comment by Gerrit Updater [ 20/Jun/19 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/35276 |
| Comment by Gerrit Updater [ 11/Aug/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35276/ |