Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Upstream, Lustre 2.12.0
-
3
-
9223372036854775807
Description
run_e2fsck ... ... "-n" does not return non-zero exit status if fs errors found.
it makes fs consistency checks by run_e2fsck almost useless.
I see e2fsck checks in many tests in sanity and conf-sanity scripts:
[zam@vm1 lustre-wc-rel]$ grep -e "run_e2fsck.*-n" lustre/tests/*.sh
lustre/tests/conf-sanity.sh: run_e2fsck $(facet_active_host $SINGLEMDS) $mdsdev "-n"
lustre/tests/conf-sanity.sh: run_e2fsck $mds1host $mds1dev "-n"
lustre/tests/sanity-lfsck.sh: run_e2fsck $(facet_active_host $SINGLEMDS) $(mdsdevname 1) "-n" |
lustre/tests/sanity-lfsck.sh: run_e2fsck $(facet_active_host $SINGLEMDS) $(mdsdevname 1) "-n"
lustre/tests/sanity.sh: run_e2fsck $(facet_active_host mds${mds_index}) $devname -n
lustre/tests/sanity.sh: run_e2fsck $(facet_active_host mds$mdt_index) $devname -n ||
lustre/tests/sanity.sh: run_e2fsck $(facet_active_host mds$idx) $dev -n ||
[zam@vm1 lustre-wc-rel]$
for example test_804 in sanity.sh:
for idx in $(seq $MDSCOUNT); do dev=$(mdsdevname $idx) rc=0 stop mds${idx} run_e2fsck $(facet_active_host mds$idx) $dev -n || rc=$? start mds${idx} $dev $MDS_MOUNT_OPTS || error "mount mds$idx failed" df $MOUNT > /dev/null 2>&1 # e2fsck should not return error [ $rc -eq 0 ] || error "e2fsck detected error on MDT${idx}: rc=$rc" done
this code will never fails, because e2fsck exit code is lost in run_e2fsck function:
another example is a test for LU-2634, it is about:
Short symlinks on MDT filesystems formatted with the "extents" feature appear to be created with the EXT4_EXTENTS_FL in osd-ldiskfs, but that shouldn't be happening. e2fsck considers this a corruption and deletes the symlink.
the test runs e2fsck at the end
#umount
umount_client $MOUNT || error "umount client failed"
stop_mds || error "stop mds failed"
stop_ost || error "stop ost failed"
#run e2fsck
run_e2fsck $(facet_active_host $SINGLEMDS) $mdsdev "-n"
}
intention is to check the fs and fail the test if FS corrupted. there is no attempt to parse fsck output but only checking the exit code.
however run_e2fsck coverts all exit codes below or equal 4 (FSCK_MAX_ERR) to 0:
# Run e2fsck on MDT or OST device.
run_e2fsck() {
local node=$1
local target_dev=$2
local extra_opts=$3
local cmd="$E2FSCK -d -v -t -t -f $extra_opts $target_dev"
local log=$TMP/e2fsck.log
local rc=0
echo $cmd
do_node $node $cmd 2>&1 | tee $log
rc=${PIPESTATUS[0]}
if [ -n "$(grep "DNE mode isn't supported" $log)" ]; then
rm -f $log
if [ $MDSCOUNT -gt 1 ]; then
skip "DNE mode isn't supported!"
cleanupall
exit_status
else
error "It's not DNE mode."
fi
fi
rm -f $log
[ $rc -le $FSCK_MAX_ERR ] ||
error "$cmd returned $rc, should be <= $FSCK_MAX_ERR"
return 0
}
It should be return $rc at the end.
FYI, e2fsck exit codes are:
The exit code returned by e2fsck is the sum of the following conditions:
0 - No errors
1 - File system errors corrected
2 - File system errors corrected, system should
be rebooted
4 - File system errors left uncorrected
8 - Operational error
16 - Usage or syntax error
32 - E2fsck canceled by user request
128 - Shared library error
A workaround is to set FSCK_MAX_ERR to 0 before calling run_e2fsck , but nobody uses it in the tests. Or the variable is set globally in Maloo setup ... it means the default setting should be changed.