Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7010

"Local llog found corrupted" during DNE2 recovery

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • Lustre 2.8.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Recent recovery issues in Maloo show the following:

      00000040:00020000:1.0:1439669743.543046:0:5740:0:(llog.c:489:llog_process_thread()) Local llog found corrupted
      00000040:00100000:1.0:1439669743.545890:0:5740:0:(llog.c:167:llog_cancel_rec()) Canceling 1 in log 0x1:1024
      00000040:00100000:1.0:1439669743.546205:0:5740:0:(llog.c:167:llog_cancel_rec()) Canceling 64838 in log 0x1:1024
      00000040:00100000:1.0:1439669743.546229:0:5740:0:(llog.c:167:llog_cancel_rec()) Canceling 64864 in log 0x1:1024
      00000040:00100000:1.0:1439669743.546242:0:5740:0:(llog.c:167:llog_cancel_rec()) Canceling 64896 in log 0x1:1024
      00000040:00100000:1.0:1439669743.546254:0:5740:0:(llog.c:167:llog_cancel_rec()) Canceling 64897 in log 0x1:1024
      00000040:00100000:1.0:1439669743.546267:0:5740:0:(llog.c:167:llog_cancel_rec()) Canceling 64899 in log 0x1:1024
      

      As I can see, the DNE2 'update recovery' may return -EIO error if some update was applied with error. That cause whole llog processing to stop and cancel all other updates. After that recovery stops with various errors.

      Here is an example, test_70b:
      https://testing.hpdd.intel.com/test_sets/1a8282a6-43d8-11e5-a4bc-5254006e85c2

      Attachments

        Issue Links

          Activity

            People

              di.wang Di Wang
              tappro Mikhail Pershin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: