Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7010

"Local llog found corrupted" during DNE2 recovery

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Critical Critical
    • Lustre 2.8.0
    • None
    • None
    • 3
    • 9223372036854775807

      Recent recovery issues in Maloo show the following:

      00000040:00020000:1.0:1439669743.543046:0:5740:0:(llog.c:489:llog_process_thread()) Local llog found corrupted
      00000040:00100000:1.0:1439669743.545890:0:5740:0:(llog.c:167:llog_cancel_rec()) Canceling 1 in log 0x1:1024
      00000040:00100000:1.0:1439669743.546205:0:5740:0:(llog.c:167:llog_cancel_rec()) Canceling 64838 in log 0x1:1024
      00000040:00100000:1.0:1439669743.546229:0:5740:0:(llog.c:167:llog_cancel_rec()) Canceling 64864 in log 0x1:1024
      00000040:00100000:1.0:1439669743.546242:0:5740:0:(llog.c:167:llog_cancel_rec()) Canceling 64896 in log 0x1:1024
      00000040:00100000:1.0:1439669743.546254:0:5740:0:(llog.c:167:llog_cancel_rec()) Canceling 64897 in log 0x1:1024
      00000040:00100000:1.0:1439669743.546267:0:5740:0:(llog.c:167:llog_cancel_rec()) Canceling 64899 in log 0x1:1024
      

      As I can see, the DNE2 'update recovery' may return -EIO error if some update was applied with error. That cause whole llog processing to stop and cancel all other updates. After that recovery stops with various errors.

      Here is an example, test_70b:
      https://testing.hpdd.intel.com/test_sets/1a8282a6-43d8-11e5-a4bc-5254006e85c2

            di.wang Di Wang (Inactive)
            tappro Mikhail Pershin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: