[LU-13114] Orphan changelog cleaning process at mount blocking for hours Created: 07/Jan/20  Updated: 16/Jan/22  Resolved: 16/Jan/22

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Stephane Thiell Assignee: Mikhail Pershin
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

CentOS 7.6, Lustre 2.12.3_4


Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Following a changelog-related crash reported in LU-13113, MDT0 took ~2h20 to mount:

Jan 04 18:56:01 fir-md1-s1 kernel: Lustre: fir-MDD0000: changelog on
Jan 04 18:56:01 fir-md1-s1 kernel: Lustre: 21788:0:(mdd_device.c:542:mdd_changelog_llog_init()) fir-MDD0000 : orphan changelog records found, starting from index 19457684034 to index 20588833107, being cleared now
Jan 04 21:16:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: in recovery but waiting for the first client to connect
Jan 04 21:16:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Will be in recovery for at least 5:00, or until 1286 clients reconnect

I guess this might be a consequence of leaking changelogs. Readers do hang quite frequently and we see "fir-MDD0000: catalog [0x5:0xa:0x0] crosses index zero" when this happens.



 Comments   
Comment by Peter Jones [ 07/Jan/20 ]

Mike

Could you please advise?

Thanks

Peter

Generated at Sat Feb 10 02:58:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.