[LU-3485] add resize2fs testing for OST/MDT filesystems Created: 20/Jun/13  Updated: 13/Dec/23  Resolved: 04/Mar/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Critical
Reporter: Andreas Dilger Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 8765

 Description   

We need to add resize2fs (ext4 resizer) testing for MDT and OST filesystems, so that we are sure this functionality works together with the Lustre features, and that it does not corrupt Lustre filesystems when used.

It is important when testing resize2fs that the target filesystems have a reasonably large number of files in them. It is also critical that e2fsck be run both before and after resize2fs is used, in order to verify that no corruption was introduced during this process.

Until LU-137 (ioctl passthrough) is finished, only offline resizing will be possible. That means the MDT and OST filesystems need to be unmounted before resize2fs is run, and remounted afterward.

It might make sense to add these checks as part of the sanity-scrub.sh tests, the large LUN testing, or maybe as an early subtest in lfsck.sh (though the old lfsck test will soon become obsolete, so I'm reluctant to use that).



 Comments   
Comment by James Nunez (Inactive) [ 24/Jul/13 ]

Proposed patch is at

http://review.whamcloud.com/#/c/7094

Comment by James Nunez (Inactive) [ 05/Sep/13 ]

The submitted patch was failing on my test system due to not being able to resize the file system back to its original size. I now see what the problem is and at least one reason the resize2fs man page warns “The minimum size of the filesystem as estimated by resize2fs may be incorrect”.

Here’s what was happening. I’m on the MDS/MGS with the file system unmounted. I reformat the MDS file system to be 20,000 KB (yes this is small and possibly unrealistic):

[root@mds jnunez]# mkfs.lustre --fsname=lscratch --mgs --mdt --reformat --device-size=200000 --index=0 /dev/lvm-MDS/P1

   Permanent disk data:
Target:     lscratch:MDT0000
Index:      0
Lustre FS:  lscratch
Mount type: ldiskfs
Flags:      0x65
              (MDT MGS first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

device size = 3340MB
formatting backing filesystem ldiskfs on /dev/lvm-MDS/P1
	target name  lscratch:MDT0000
	4k blocks     50000
	options        -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lscratch:MDT0000  -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/lvm-MDS/P1 50000
Writing CONFIGS/mountdata

Check the reported minimum block count with resize2fs and dumpe2fs:

[root@mds jnunez]# resize2fs -P /dev/lvm-MDS/P1
resize2fs 1.42.7.wc1 (12-Apr-2013)
Estimated minimum size of the filesystem: 50000
[root@mds jnunez]# dumpe2fs /dev/lvm-MDS/P1 | grep "Block count:"
dumpe2fs 1.42.7.wc1 (12-Apr-2013)
Block count:              50000

So, everything is fine to this point and it looks like the minimum size of the file system is 50000 blocks. Now, let’s increase the size of the file system and check:

[root@mds jnunez]# resize2fs -p /dev/lvm-MDS/P1 60000
resize2fs 1.42.7.wc1 (12-Apr-2013)
Resizing the filesystem on /dev/lvm-MDS/P1 to 60000 (4k) blocks.
The filesystem on /dev/lvm-MDS/P1 is now 60000 blocks long.

[root@mds jnunez]# dumpe2fs /dev/lvm-MDS/P1 | grep "Block count:"
dumpe2fs 1.42.7.wc1 (12-Apr-2013)
Block count:              60000
[root@mds jnunez]# resize2fs -P /dev/lvm-MDS/P1
resize2fs 1.42.7.wc1 (12-Apr-2013)
Estimated minimum size of the filesystem: 60000

That worked, but the minimum size of the file system is now 60000 blocks. Let’s go back to the original size of the file system:

[root@mds jnunez]# resize2fs -p /dev/lvm-MDS/P1 50000
resize2fs 1.42.7.wc1 (12-Apr-2013)
resize2fs: New size smaller than minimum (60000)
[root@mds jnunez]# dumpe2fs /dev/lvm-MDS/P1 | grep "Block count:"
dumpe2fs 1.42.7.wc1 (12-Apr-2013)
Block count:              60000

We can’t go back to the original size of the file system, even though we haven’t written anything to the file system.

Near the end of resize2fs.c when calculating the minimum blocks for the file system, we see the comment

 
	 * If at this point we've already added up more "needed" than
	 * the current size, just return current size as minimum.

There’s my problem; resize2fs just returns the number of blocks in the current file system if the calculated amount of blocks needed for the file system exceeds the current number of blocks.

Why do I care?

In the patch, I was saving the value of the original number of blocks in the file system and, at the end of the test, trying to return the file system back to the original size. The resize back to the original size will occasionally fail due to an originally “small”, smaller than resize2fs minimum calculation, file system. This failure to resize isn’t due to a problem in Lustre, so, the test should not fail/error. Maybe I’ll just return the file system to the max of the original file system size or the minimum file system size as dictated by resize2fs, i.e. max (original file system size, resize2fs calculated minimum file system size).

Maybe others knew about this, but I had to trip over it myself.

Comment by Peter Jones [ 17/Sep/13 ]

Yu, Jian

Could you please advise on this one?

Thanks

Peter

Comment by James Nunez (Inactive) [ 17/Sep/13 ]

The issue right now is that the existing test 2 and test 3 in the large lun script are timing out in most of the auto test runs. Is this a problem with the existing tests, an issue with llverdev, or just what is to be expected? So, the new tests are not able to run due to the time out.

Comment by Andreas Dilger [ 17/Sep/13 ]

James, I don't think there is a huge benefit to try and reduce the filesystem size back to the original. There are a number of reasons why this may not work, so I think it would be better to just test the expansion for now, and then just a smaller shrink at the end. Expansion is the only thing the vast majority of users will be doing anyway.

I'm also fine to just test this on a smaller test filesystem via conf-sanity on a regular basis, and leave the large LUN testing for only occasional testing (e.g. only once a month or whatever).

Comment by Jian Yu [ 23/Sep/13 ]

Here is the patch for master branch to add a test case into conf-sanity.sh for running resize2fs on MDT and OST filesystems: http://review.whamcloud.com/7796

Comment by Jian Yu [ 28/Oct/13 ]

The above patch for conf-sanity.sh landed on master branch.
I'll update the patch for large-lun.sh in http://review.whamcloud.com/7094.

Comment by Jodi Levi (Inactive) [ 04/Mar/14 ]

Patch landed to Master. If more work is needed, please open a new ticket.

Generated at Sat Feb 10 01:34:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.