[LU-14126] parallel e2fsck does not work well with MMP Created: 07/Nov/20  Updated: 23/Nov/20  Resolved: 23/Nov/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Andreas Dilger Assignee: Wang Shilong (Inactive)
Resolution: Fixed Votes: 0
Labels: e2fsck

Issue Links:
Related
is related to LU-8465 parallel e2fsck performance at scale Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Running parallel e2fsck 1.45.6.wc2 with -fy -m 256 on a filesystem with a lot of (otherwise trivial) errors that need to be fixed caused e2fsck to abort because of concurrent MMP block updates:

[Thread 240] Inode 495454238 symlink missing NUL terminator.  [Thread 240] Fix? yes
 
MMP check failed: UNEXPECTED INCONSISTENCY: the filesystem is being modified while fsck is running.
MMP_block:
    mmp_magic: 0x4d4d50
    mmp_check_interval: 5
    mmp_sequence: e24d4d50
    mmp_update_date: Thu Nov  5 18:46:29 2020
    mmp_update_time: 1604630789
    mmp_node_name: mdt04
    mmp_device_name: /dev/vg_mdt0003/mdt0003
MMP check failed: UNEXPECTED INCONSISTENCY: the filesystem is being modified while fsck is running.
MMP_block:
    mmp_magic: 0x4d4d50
    mmp_check_interval: 5
    mmp_sequence: e24d4d50
    mmp_update_date: Thu Nov  5 18:46:29 2020
    mmp_update_time: 1604630789
    mmp_node_name: mdt04
    mmp_device_name: /dev/vg_mdt0003/mdt0003
MMP check failed: UNEXPECTED INCONSISTENCY: the filesystem is being modified while fsck is running.
MMP_block:
    mmp_magic: 0x4d4d50
    mmp_check_interval: 5
    mmp_sequence: e24d4d50
    mmp_update_date: Thu Nov  5 18:46:29 2020
    mmp_update_time: 1604630789
    mmp_node_name: mdt04
    mmp_device_name: /dev/vg_mdt0003/mdt0003
[Thread 63] 
scratch-MDT0003: ***** FILE SYSTEM WAS MODIFIED *****
[Thread 96] 
scratch-MDT0003: ***** FILE SYSTEM WAS MODIFIED *****
[Thread 76] 
scratch-MDT0003: ***** FILE SYSTEM WAS MODIFIED *****

Running with "-fn -m 256" worked without problems for a long time before we gave up (with hundreds of the "symlink missing NUL terminator" errors, which was fixed in LU-1540, included in Lustre 2.1.3, 2.4.0, and e2fsprogs-1.42.3.wc3).

It is likely that the large number of inodes to be fixed caused two threads to update the MMP block at the same time, or they checked that the MMP block had not been modified and raced with another thread that updated it. The patch https://review.whamcloud.com/39874 "LU-8465 e2fsck: update mmp block in one thread" should already be included in the 1.45.6.wc2 release.



 Comments   
Comment by Andreas Dilger [ 07/Nov/20 ]

Shilong, could you please take a look.

Comment by Gerrit Updater [ 08/Nov/20 ]

Wang Shilong (wshilong@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40572
Subject: LU-14126 e2fsck: update mmp block race
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: cb4e43445539dc82e1df8e646226bcccce894e16

Comment by Wang Shilong (Inactive) [ 08/Nov/20 ]

It will be nice that patch could be applied to try pfsck on customer site again if possible.

Comment by Gerrit Updater [ 23/Nov/20 ]

Wang Shilong (wshilong@whamcloud.com) merged in patch https://review.whamcloud.com/40572/
Subject: LU-14126 e2fsck: update mmp block race
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 23547cbe57df859f70836fc18b7e449b319f54da

Generated at Sat Feb 10 03:07:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.