Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7011

Kernel part of llog subsystem can do self-repairing in some cases

Details

    • Improvement
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.8.0
    • 9223372036854775807

    Description

      While working on LU-6696 ticket the tool to repair corrupted llog catalogs were introduced. The same job could be done in kernel code to repair llogs online if possible.

      Attachments

        Issue Links

          Activity

            [LU-7011] Kernel part of llog subsystem can do self-repairing in some cases

            Fixed via a number of other patches, probably https://review.whamcloud.com/48776 ("LU-16203 llog: skip bad records in llog") is the most important one.

            adilger Andreas Dilger added a comment - Fixed via a number of other patches, probably https://review.whamcloud.com/48776 (" LU-16203 llog: skip bad records in llog ") is the most important one.

            tappro I think you fixed the kernel llog code a few years ago?

            adilger Andreas Dilger added a comment - tappro I think you fixed the kernel llog code a few years ago?

            In fact I think we should find the reason of these issues with bad tail, it is not normal behavior and something is wrong there definitely. I mean it is not some sort of corruption due to disk issues, etc., but issue in our code causing that.

            tappro Mikhail Pershin added a comment - In fact I think we should find the reason of these issues with bad tail, it is not normal behavior and something is wrong there definitely. I mean it is not some sort of corruption due to disk issues, etc., but issue in our code causing that.
            di.wang Di Wang added a comment -
            Di, it is possible only for fixed size llog in fact, the catalog is the only real example we have now.
            

            Yes, it seems except catalog, all important plain logs are not fixed size. change log, config log, update log, unlink log are all not fixed size. Unfortunately, most of the corruptions seems happen in plain log, at least that happens in DNE test. Hmm, actually most of them are header and tail are not matched to each other, (lrh_len != tail_len or lrh_idx != tail_index), Hmm, some of them even use LASSERT to check, probably we should change that to CERROR.

            di.wang Di Wang added a comment - Di, it is possible only for fixed size llog in fact, the catalog is the only real example we have now. Yes, it seems except catalog, all important plain logs are not fixed size. change log, config log, update log, unlink log are all not fixed size. Unfortunately, most of the corruptions seems happen in plain log, at least that happens in DNE test. Hmm, actually most of them are header and tail are not matched to each other, (lrh_len != tail_len or lrh_idx != tail_index), Hmm, some of them even use LASSERT to check, probably we should change that to CERROR.

            It seems possible to do at least some basic repair of variable-sized llog records. For example, if a corrupt llog record is found (i.e. hdr len != tail len), one option would be to scan the rest of the chunk for potential matching llog hdr/tail pairs that allow resyncing the stream. A second option (easier to implement, but recovers fewer logs) would be to jump to the start of the next llog chunk and clear the records between the corrupt chunk and the start of the new chunk.

            adilger Andreas Dilger added a comment - It seems possible to do at least some basic repair of variable-sized llog records. For example, if a corrupt llog record is found (i.e. hdr len != tail len), one option would be to scan the rest of the chunk for potential matching llog hdr/tail pairs that allow resyncing the stream. A second option (easier to implement, but recovers fewer logs) would be to jump to the start of the next llog chunk and clear the records between the corrupt chunk and the start of the new chunk.

            Andreas, I agree in general, the difference is that we are more restricted inside kernel, e.g. we can't just do repair in the current context, but have to start separate repair thread, exclusively accessing that llog. I mean that tool is much simpler to implement than auto-repair, there are no problems with concurrent access, transactions, etc. Meanwhile I agree that auto-repair in preferable and I am going to implement some basic checks/repairs at least.

            Di, it is possible only for fixed size llog in fact, the catalog is the only real example we have now.

            tappro Mikhail Pershin added a comment - Andreas, I agree in general, the difference is that we are more restricted inside kernel, e.g. we can't just do repair in the current context, but have to start separate repair thread, exclusively accessing that llog. I mean that tool is much simpler to implement than auto-repair, there are no problems with concurrent access, transactions, etc. Meanwhile I agree that auto-repair in preferable and I am going to implement some basic checks/repairs at least. Di, it is possible only for fixed size llog in fact, the catalog is the only real example we have now.
            di.wang Di Wang added a comment -

            Mike, just curious, will you check/repair both catalog and plain log in this patch? Thanks

            di.wang Di Wang added a comment - Mike, just curious, will you check/repair both catalog and plain log in this patch? Thanks

            It makes more sense to have the llog code repair, skip, and/or clear broken records rather than using an external tool. If the external tool can detect and fix these problems (after the user's MDS has crashed and waited all night for them to figure out the problem and run the tool), why not just add enough checks into the llog processing to clean it up immediately? That avoids the MDS downtime, and avoids the need for the user to even know an llog repair tool exists and that they need to run it.

            adilger Andreas Dilger added a comment - It makes more sense to have the llog code repair, skip, and/or clear broken records rather than using an external tool. If the external tool can detect and fix these problems (after the user's MDS has crashed and waited all night for them to figure out the problem and run the tool), why not just add enough checks into the llog processing to clean it up immediately? That avoids the MDS downtime, and avoids the need for the user to even know an llog repair tool exists and that they need to run it.
            adilger Andreas Dilger added a comment - Tool is http://review.whamcloud.com/15245

            I think we need both the tool and online repair. Start with tool for now.

            tappro Mikhail Pershin added a comment - I think we need both the tool and online repair. Start with tool for now.

            People

              tappro Mikhail Pershin
              tappro Mikhail Pershin
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: