[LU-1741] Test failure on test suite conf-sanity test_18 Created: 12/Aug/12 Updated: 18/Apr/13 Resolved: 19/Feb/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.3 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 2218 | ||||||||
| Description |
|
This issue was created by maloo for Peter Jones <pjones@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/6379bf88-e46d-11e1-af05-52540035b04c. Error: 'expected journal size > 32M, found 32M' |
| Comments |
| Comment by Oleg Drokin [ 12/Aug/12 ] |
|
Recent change to MDT device size to only be 1G makes mkfs.lustre to not change default journal size to 400M (as that would be too big). Thus every test depending on larger journal would fail. |
| Comment by Andreas Dilger [ 13/Aug/12 ] |
|
We used to have a lower limit on the journal size in mkfs.lustre or mke2fs as well, because the minimum journal size was something very small like 4MB. It would be fine to make this test pass for >= 32MB as well, since the default journal size in mke2fs has increased to 32MB as well. That said, there seems to be something wrong with the test framework as well. The test checks that the MDSSIZE is large enough to run the test: test_18()
{
local MIN=2000000
local OK=
# check if current MDSSIZE is large enough
[ $MDSSIZE -ge $MIN ] && OK=1 && myMDSSIZE=$MDSSIZE && \
log "use MDSSIZE=$MDSSIZE"
:
:
[ -z "$OK" ] && skip_env "$MDSDEV too small for ${MIN}kB MDS" && return
echo "mount mds with large journal..."
local OLD_MDSSIZE=$MDSSIZE
MDSSIZE=$myMDSSIZE
reformat_and_config
echo "mount lustre system..."
:
:
}
and the test output seems to confirm that the size should be large enough, but the actual format command doesn't seem to take the larger MDSSIZE into account (note 262144 at the end is the 1GB filesystem block count with 4kB blocks): use MDSSIZE=1073741824
mount mds with large journal...
mke2fs -j -b 4096 -L lustre-MDT0000 -I 512 -i 2048 -q
-O dirdata,uninit_bg,^extents,dir_nlink,huge_file,flex_bg
-E lazy_itable_init,lazy_journal_init -F /dev/lvm-MDS/P1 262144
Writing CONFIGS/mountdata
Looking into the reformat_and_config() function, it appears it doesn't use MDSSIZE at all. I suspect it looks like a bug was introduced with the landing of commit c04dddd71d140c61f654681c1993f5bb1ded4f48 (change http://review.whamcloud.com/2907) that means this value is ignored. It used to be used for the "--device-size" argument to mkfs.lustre, and while code appears to be doing this: var=${type}SIZE if [ -n "${!var}" ]; then opts+=" --device-size=${!var}" fi I guess this isn't working for some reason. |
| Comment by Chris Gearing (Inactive) [ 13/Aug/12 ] |
|
As a work around I've changed the mds size to 2GB. |
| Comment by Andreas Dilger [ 13/Aug/12 ] |
|
I don't think that we could go much lower than 2GB anyway. In https://maloo.whamcloud.com/test_sessions/b55d305a-e4b8-11e1-9681-52540035b04c, the "full" run was running out of space on the MDT in all of the parallel-scale tests. |
| Comment by Peter Jones [ 14/Aug/12 ] |
|
Bobijam I think that we are going to be able to live with the workaround for the 2.3 release itself but could you please investigate this in more detail so that we can more fully understand why this was failing and whether there is an issue that would be important in production environments Thanks Peter |
| Comment by Zhenyu Xu [ 14/Aug/12 ] |
|
found the culprit local MIN=2000000 ==============> we require at least 2G MDT
# check if current MDSSIZE is large enough
[ $MDSSIZE -ge $MIN ] && OK=1 && myMDSSIZE=$MDSSIZE && \
log "use MDSSIZE=$MDSSIZE"
use MDSSIZE=1073741824 ===============> MDSSIZE has been defined to use 1T
# check if the block device is large enough
[ -z "$OK" ] && $(is_blkdev $SINGLEMDS $MDSDEV $MIN) && OK=1 &&
myMDSSIZE=$MIN && log "use device $MDSDEV with MIN=$MIN"
we skipped this test, actually MDS device only has 1G (4k * 262144) space, and mkfs has no choice but format this 1G sized MDT and failed the test.
we need always check block device's size bigger than $MIN. |
| Comment by Zhenyu Xu [ 14/Aug/12 ] |
|
patch tracking at http://review.whamcloud.com/3632 patch description test: fix conf_sanity test_18 test case * We need always check block device's size bigger than its minimum requirement. * Correct code indentation for formatall(). |
| Comment by Peter Jones [ 16/Aug/12 ] |
|
Thanks for identifying this issue Bobijam. Hopefully the patch will still land in time for the 2.3 release but we do not need to track this as a blocker because the autotest workaround is preventing this causing test failures |
| Comment by Li Wei (Inactive) [ 20/Aug/12 ] |
|
I'm not sure if the root cause has been identified here. In the very first Maloo report, the MDSSIZE is clearly wrong: == conf-sanity test 18: check mkfs creates large journals == 02:01:09 (1344762069) use MDSSIZE=1073741824 Since the unit is KB, the value is around 1 TB. I think the test did the right thing, although with an incorrect MDSSIZE. Test Framework also passed the device size as expected: CMD: client-30vm3 /usr/sbin/mkfs.lustre --mgs --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=lov.stripesize=1048576 --param=lov.stripecount=0 --param=mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype=ldiskfs --device-size=1073741824 --mkfsoptions=\"-E lazy_itable_init\" --mountfsoptions=errors=remount-ro,iopen_nopriv,user_xattr,acl --reformat /dev/lvm-MDS/P1 See the "--device-size=1073741824" option? mkfs.lustre did not specify a journal size, which is what the code was written to do, it seems. And, mke2fs, as I have tried, chooses 32 MB in the case. If MDSSIZE were 1000000, the test should have called skip(). Hence, I don't think the test has to be changed. |
| Comment by Zhenyu Xu [ 20/Aug/12 ] |
|
the problem here is what we specified MDT device size is bigger than what the MDT device actually has, and the MDT device real size does not meed the test's minimum requiremtn. The patch will detect this situation, even though we specified a big enough MDT size, we still check whethere the MDT device size is big enough. |
| Comment by Li Wei (Inactive) [ 20/Aug/12 ] |
|
Ah, the problem is actually TT-820. I think the test does not need any fix; if all tests were to defend against test environment inconsistencies like this (i.e., MDSSIZE > block device size), we'd have a lot of work to do, right? The consistency checks should probably be done in Test Framework instead. For the moment, fixing TT-820 shall fix this test failure, no? |
| Comment by Zhenyu Xu [ 20/Aug/12 ] |
|
I have the different idea, this specific test need a big enough MDT, so the specific test case need to make sure of that. |