Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14126

parallel e2fsck does not work well with MMP

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Running parallel e2fsck 1.45.6.wc2 with -fy -m 256 on a filesystem with a lot of (otherwise trivial) errors that need to be fixed caused e2fsck to abort because of concurrent MMP block updates:

      [Thread 240] Inode 495454238 symlink missing NUL terminator.  [Thread 240] Fix? yes
       
      MMP check failed: UNEXPECTED INCONSISTENCY: the filesystem is being modified while fsck is running.
      MMP_block:
          mmp_magic: 0x4d4d50
          mmp_check_interval: 5
          mmp_sequence: e24d4d50
          mmp_update_date: Thu Nov  5 18:46:29 2020
          mmp_update_time: 1604630789
          mmp_node_name: mdt04
          mmp_device_name: /dev/vg_mdt0003/mdt0003
      MMP check failed: UNEXPECTED INCONSISTENCY: the filesystem is being modified while fsck is running.
      MMP_block:
          mmp_magic: 0x4d4d50
          mmp_check_interval: 5
          mmp_sequence: e24d4d50
          mmp_update_date: Thu Nov  5 18:46:29 2020
          mmp_update_time: 1604630789
          mmp_node_name: mdt04
          mmp_device_name: /dev/vg_mdt0003/mdt0003
      MMP check failed: UNEXPECTED INCONSISTENCY: the filesystem is being modified while fsck is running.
      MMP_block:
          mmp_magic: 0x4d4d50
          mmp_check_interval: 5
          mmp_sequence: e24d4d50
          mmp_update_date: Thu Nov  5 18:46:29 2020
          mmp_update_time: 1604630789
          mmp_node_name: mdt04
          mmp_device_name: /dev/vg_mdt0003/mdt0003
      [Thread 63] 
      scratch-MDT0003: ***** FILE SYSTEM WAS MODIFIED *****
      [Thread 96] 
      scratch-MDT0003: ***** FILE SYSTEM WAS MODIFIED *****
      [Thread 76] 
      scratch-MDT0003: ***** FILE SYSTEM WAS MODIFIED *****
      

      Running with "-fn -m 256" worked without problems for a long time before we gave up (with hundreds of the "symlink missing NUL terminator" errors, which was fixed in LU-1540, included in Lustre 2.1.3, 2.4.0, and e2fsprogs-1.42.3.wc3).

      It is likely that the large number of inodes to be fixed caused two threads to update the MMP block at the same time, or they checked that the MMP block had not been modified and raced with another thread that updated it. The patch https://review.whamcloud.com/39874 "LU-8465 e2fsck: update mmp block in one thread" should already be included in the 1.45.6.wc2 release.

      Attachments

        Issue Links

          Activity

            People

              wshilong Wang Shilong (Inactive)
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: