Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.3.0, Lustre 2.1.3
-
None
-
3
-
4476
Description
Currently, the get_mmp_update_interval() in mmp.sh is implemented as follows:
get_mmp_update_interval() { local facet=$1 local device=$2 local interval interval=$(do_facet $facet "$DEBUGFS -c -R dump_mmp $device 2>/dev/null \ | grep 'MMP Update Interval' | cut -d' ' -f4") [ -z "$interval" ] && interval=1 echo $interval }
The 'MMP Update Interval' string is incorrect now since debugfs has changed it to 'update_interval':
# debugfs -c -R dump_mmp /dev/vda5 debugfs 1.42.3.wc3 (15-Aug-2012) /dev/vda5: catastrophic mode - not reading inode or group bitmaps block_number: 16416 update_interval: 5 check_interval: 5 sequence: ff4d4d50 time: 1345213198 -- Fri Aug 17 07:19:58 2012 node_name: client-16vm3 device_name: /dev/vda5 magic: 0x4d4d50
And after the patch for LU-264 is landed, the default value for MMP update interval has been changed to 5 seconds instead of 1 second.
The 'MMP Check Interval' string in get_mmp_check_interval() is also needed to be updated to 'check_interval'.
In addition, after looking into e2fsck/unix.c and ldiskfs/kernel_patches/patches/ext4-mmp-rhel6.patch, I found there is a more simple and reliable way than using an extra expect script to fix the issue in LU-1689:
We can just run "tune2fs -E mmp_update_interval=$interval $device" to increase the time of "sleep(2 * mmp_check_interval + 1)" in ext2fs_mmp_start() (which is called by e2fsck in try_open_fs()->ext2fs_open2()). A new sequence number is written into the MMP block before that sleep. So, after e2fsck goes into ext2fs_mmp_start() and sets a new sequence number successfully, mount operation will always fail before e2fsck goes into ext2fs_mmp_stop() to set EXT4_MMP_SEQ_CLEAN into the MMP block.