Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10419

LFSCK fails to start, hangs systems.

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.12.0, Lustre 2.10.5
    • Lustre 2.11.0, Lustre 2.10.2, Lustre 2.10.3
    • Soak performance cluster - Lustre version=2.10.2_4_gb151f34
    • 3
    • 9223372036854775807

    Description

      We do OSS failover, trigger LFSCK:

      
      

      lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all -A{code]

      The lfsck start hangs, lfsck is not started, the clients wedge in state 'comp' the entire system wedges. I have dumped Lustre Logs from all MDS, attached. I have crash-dumped all the MDT nodes and the dumps are available on Spirit. lfsck_layout is unkillable.

      Attachments

        1. soak-10.lustre.log.gz
          2.57 MB
        2. soak-11.lustre.log.gz
          2.22 MB
        3. soak-8.lustre.log.gz
          2.14 MB
        4. soak-9.lustre.log.gz
          2.33 MB

        Issue Links

          Activity

            [LU-10419] LFSCK fails to start, hangs systems.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31600/
            Subject: Revert "LU-10419 lfsck: skip dead target"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 9ba637b8949b1b8a5f2506e654a9b62d5c0cc245

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31600/ Subject: Revert " LU-10419 lfsck: skip dead target" Project: fs/lustre-release Branch: master Current Patch Set: Commit: 9ba637b8949b1b8a5f2506e654a9b62d5c0cc245

            Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/31600
            Subject: Revert "LU-10419 lfsck: skip dead target"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 1387fa1c012dfdf5eb4f90efeb06edd45788064f

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/31600 Subject: Revert " LU-10419 lfsck: skip dead target" Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1387fa1c012dfdf5eb4f90efeb06edd45788064f
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31475/
            Subject: LU-10419 lfsck: skip dead target
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 012834c5e7c7be50ff117cee4ac473d7fee4294d

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31475/ Subject: LU-10419 lfsck: skip dead target Project: fs/lustre-release Branch: master Current Patch Set: Commit: 012834c5e7c7be50ff117cee4ac473d7fee4294d

            With the current patch, lfsck does not stop. Currently also having mount timeouts. I have crashed dumped soak-8 while lfsck was hanging, logs are available on spirit.
            /scratch/dumps/soak-8.spirit.hpdd.intel.com/10.10.1.108-2018-03-06-19:16:47

            cliffw Cliff White (Inactive) added a comment - With the current patch, lfsck does not stop. Currently also having mount timeouts. I have crashed dumped soak-8 while lfsck was hanging, logs are available on spirit. /scratch/dumps/soak-8.spirit.hpdd.intel.com/10.10.1.108-2018-03-06-19:16:47

            Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/31475
            Subject: LU-10419 lfsck: skip dead target
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: be9f2eedf5039fa6308460aca6a84daa6b8003b1

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/31475 Subject: LU-10419 lfsck: skip dead target Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: be9f2eedf5039fa6308460aca6a84daa6b8003b1

            Logs are on spirit /scratch/logs/syslogs and /scratch/logs/console. The crash dumps are in /scratch/dumps on spirit.

            cliffw Cliff White (Inactive) added a comment - Logs are on spirit /scratch/logs/syslogs and /scratch/logs/console. The crash dumps are in /scratch/dumps on spirit.

            cliffw,

            Where can I get related logs?

            Thanks!

            yong.fan nasf (Inactive) added a comment - cliffw , Where can I get related logs? Thanks!

            Seeing this again on DNE-enable system. version=2.10.57_58_gf24340c
            I can crash dump systems if desired

            cliffw Cliff White (Inactive) added a comment - Seeing this again on DNE-enable system. version=2.10.57_58_gf24340c I can crash dump systems if desired
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30768/
            Subject: LU-10419 lfsck: no delay for notify RPC
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 39816213632cf9083530f1a8b644459d13e3c980

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30768/ Subject: LU-10419 lfsck: no delay for notify RPC Project: fs/lustre-release Branch: master Current Patch Set: Commit: 39816213632cf9083530f1a8b644459d13e3c980

            People

              yong.fan nasf (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: