Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8886

LFSCK failed to resume from the last checkpoint

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      It is found that the LFSCK failed just after resuming from the last checkpoint. The log is as following:

      00100000:10000000:8.0:1480550565.633225:0:8251:0:(lfsck_lib.c:2573:lfsck_post_generic()) soaked-MDT0000-osd: waiting for assistant to do lfsck_layout post, rc = -61
      00100000:10000000:8.0:1480550565.633247:0:8254:0:(lfsck_engine.c:1781:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant unknown status: rc = 0
      00100000:10000000:8.0:1480550565.633252:0:8254:0:(lfsck_engine.c:1805:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant sync before exit
      00100000:10000000:25.0:1480550565.633853:0:8253:0:(lfsck_engine.c:1805:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant sync before exit
      00100000:10000000:8.0:1480550565.636178:0:8254:0:(lfsck_engine.c:1811:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant synced before exit: rc = 0
      00100000:10000000:8.0:1480550565.636190:0:8254:0:(lfsck_engine.c:1838:lfsck_assistant_engine()) soaked-MDT0000-osd: lfsck_namespace LFSCK assistant thread exit: rc = 0
      00100000:10000000:25.0:1480550565.650715:0:8253:0:(lfsck_engine.c:1811:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant synced before exit: rc = 0
      00100000:10000000:25.0:1480550565.650725:0:8253:0:(lfsck_engine.c:1838:lfsck_assistant_engine()) soaked-MDT0000-osd: lfsck_layout LFSCK assistant thread exit: rc = -61
      00100000:10000000:24.0:1480550565.650761:0:8251:0:(lfsck_lib.c:2585:lfsck_post_generic()) soaked-MDT0000-osd: the assistant has done lfsck_layout post, rc = -61
      00100000:10000000:24.0:1480550565.650810:0:8251:0:(lfsck_layout.c:4680:lfsck_layout_master_post()) soaked-MDT0000-osd: layout LFSCK master post done: rc = 0
      00100000:10000000:24.0:1480550565.650813:0:8251:0:(lfsck_lib.c:2573:lfsck_post_generic()) soaked-MDT0000-osd: waiting for assistant to do lfsck_namespace post, rc = -61
      00100000:10000000:24.0:1480550565.650815:0:8251:0:(lfsck_lib.c:2585:lfsck_post_generic()) soaked-MDT0000-osd: the assistant has done lfsck_namespace post, rc = -61
      

      Attachments

        Activity

          [LU-8886] LFSCK failed to resume from the last checkpoint

          No, has that patch landed on master? If not, why not?

          Cliff,

          As James mentioned, such patch has already been in master. But I am not sure whether the build your testing on Lola contains such patch or not.

          yong.fan nasf (Inactive) added a comment - No, has that patch landed on master? If not, why not? Cliff, As James mentioned, such patch has already been in master. But I am not sure whether the build your testing on Lola contains such patch or not.

          LU-8647 landed to master in October under the ticket number LU-8569 at https://review.whamcloud.com/#/c/22723/
          (... that's what is stated in the ticket LU-8647)

          jamesanunez James Nunez (Inactive) added a comment - LU-8647 landed to master in October under the ticket number LU-8569 at https://review.whamcloud.com/#/c/22723/ (... that's what is stated in the ticket LU-8647 )

          No, has that patch landed on master? If not, why not?

          cliffw Cliff White (Inactive) added a comment - No, has that patch landed on master? If not, why not?

          The new failure looks like LU-8647. Have you applied the patch https://jira.hpdd.intel.com/browse/LU-8647 in the test?

          yong.fan nasf (Inactive) added a comment - The new failure looks like LU-8647 . Have you applied the patch https://jira.hpdd.intel.com/browse/LU-8647 in the test?

          Tried the patch, does not appear to be working - first time lfsk started the node crashed

          Dec  2 09:56:17 lola-8 kernel: LustreError: 14454:0:(lfsck_namespace.c:4492:lfsck_namespace_double_scan()) ASSERTION( list_empty(&lad->lad_req_list) ) failed:
          Dec  2 09:56:17 lola-8 kernel: LustreError: 14454:0:(lfsck_namespace.c:4492:lfsck_namespace_double_scan()) LBUG
          Dec  2 09:56:17 lola-8 kernel: Pid: 14454, comm: lfsck
          Dec  2 09:56:17 lola-8 kernel:
          Dec  2 09:56:17 lola-8 kernel: Call Trace:
          Dec  2 09:56:17 lola-8 kernel: [<ffffffffa081e875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
          Dec  2 09:56:17 lola-8 kernel: [<ffffffffa081e9bf>] lbug_with_loc+0x3f/0x90 [libcfs]
          Dec  2 09:56:17 lola-8 kernel: [<ffffffffa11d1c2e>] lfsck_namespace_double_scan+0xee/0x120 [lfsck]
          Dec  2 09:56:17 lola-8 kernel: [<ffffffffa11cee40>] lfsck_master_engine+0x590/0x1460 [lfsck]
          Dec  2 09:56:17 lola-8 kernel: [<ffffffff81067650>] ? default_wake_function+0x0/0x20
          Dec  2 09:56:17 lola-8 kernel: [<ffffffffa11ce8b0>] ? lfsck_master_engine+0x0/0x1460 [lfsck]
          Dec  2 09:56:17 lola-8 kernel: [<ffffffff810a138e>] kthread+0x9e/0xc0
          Dec  2 09:56:17 lola-8 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
          Dec  2 09:56:17 lola-8 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0
          Dec  2 09:56:17 lola-8 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
          
          cliffw Cliff White (Inactive) added a comment - Tried the patch, does not appear to be working - first time lfsk started the node crashed Dec 2 09:56:17 lola-8 kernel: LustreError: 14454:0:(lfsck_namespace.c:4492:lfsck_namespace_double_scan()) ASSERTION( list_empty(&lad->lad_req_list) ) failed: Dec 2 09:56:17 lola-8 kernel: LustreError: 14454:0:(lfsck_namespace.c:4492:lfsck_namespace_double_scan()) LBUG Dec 2 09:56:17 lola-8 kernel: Pid: 14454, comm: lfsck Dec 2 09:56:17 lola-8 kernel: Dec 2 09:56:17 lola-8 kernel: Call Trace: Dec 2 09:56:17 lola-8 kernel: [<ffffffffa081e875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Dec 2 09:56:17 lola-8 kernel: [<ffffffffa081e9bf>] lbug_with_loc+0x3f/0x90 [libcfs] Dec 2 09:56:17 lola-8 kernel: [<ffffffffa11d1c2e>] lfsck_namespace_double_scan+0xee/0x120 [lfsck] Dec 2 09:56:17 lola-8 kernel: [<ffffffffa11cee40>] lfsck_master_engine+0x590/0x1460 [lfsck] Dec 2 09:56:17 lola-8 kernel: [<ffffffff81067650>] ? default_wake_function+0x0/0x20 Dec 2 09:56:17 lola-8 kernel: [<ffffffffa11ce8b0>] ? lfsck_master_engine+0x0/0x1460 [lfsck] Dec 2 09:56:17 lola-8 kernel: [<ffffffff810a138e>] kthread+0x9e/0xc0 Dec 2 09:56:17 lola-8 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Dec 2 09:56:17 lola-8 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0 Dec 2 09:56:17 lola-8 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20

          Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/24056
          Subject: LU-8886 lfsck: handle -ENODATA for the end of iteration
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: f6a09af4d58f7d49e0988a9a8e595b1b152d972b

          gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/24056 Subject: LU-8886 lfsck: handle -ENODATA for the end of iteration Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f6a09af4d58f7d49e0988a9a8e595b1b152d972b

          People

            yong.fan nasf (Inactive)
            yong.fan nasf (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: