Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13114

Orphan changelog cleaning process at mount blocking for hours

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.12.3
    • None
    • CentOS 7.6, Lustre 2.12.3_4
    • 3
    • 9223372036854775807

    Description

      Following a changelog-related crash reported in LU-13113, MDT0 took ~2h20 to mount:

      Jan 04 18:56:01 fir-md1-s1 kernel: Lustre: fir-MDD0000: changelog on
      Jan 04 18:56:01 fir-md1-s1 kernel: Lustre: 21788:0:(mdd_device.c:542:mdd_changelog_llog_init()) fir-MDD0000 : orphan changelog records found, starting from index 19457684034 to index 20588833107, being cleared now
      
      Jan 04 21:16:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: in recovery but waiting for the first client to connect
      Jan 04 21:16:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Will be in recovery for at least 5:00, or until 1286 clients reconnect
      

      I guess this might be a consequence of leaking changelogs. Readers do hang quite frequently and we see "fir-MDD0000: catalog [0x5:0xa:0x0] crosses index zero" when this happens.

      Attachments

        Activity

          People

            tappro Mikhail Pershin
            sthiell Stephane Thiell
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: