[LU-13114] Orphan changelog cleaning process at mount blocking for hours Created: 07/Jan/20 Updated: 16/Jan/22 Resolved: 16/Jan/22 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Stephane Thiell | Assignee: | Mikhail Pershin |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 7.6, Lustre 2.12.3_4 |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Following a changelog-related crash reported in LU-13113, MDT0 took ~2h20 to mount: Jan 04 18:56:01 fir-md1-s1 kernel: Lustre: fir-MDD0000: changelog on Jan 04 18:56:01 fir-md1-s1 kernel: Lustre: 21788:0:(mdd_device.c:542:mdd_changelog_llog_init()) fir-MDD0000 : orphan changelog records found, starting from index 19457684034 to index 20588833107, being cleared now Jan 04 21:16:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: in recovery but waiting for the first client to connect Jan 04 21:16:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Will be in recovery for at least 5:00, or until 1286 clients reconnect I guess this might be a consequence of leaking changelogs. Readers do hang quite frequently and we see "fir-MDD0000: catalog [0x5:0xa:0x0] crosses index zero" when this happens. |
| Comments |
| Comment by Peter Jones [ 07/Jan/20 ] |
|
Mike Could you please advise? Thanks Peter |