Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Upstream, Lustre 2.12.0
-
3
-
9223372036854775807
Description
run_e2fsck ... ... "-n" does not return non-zero exit status if fs errors found.
it makes fs consistency checks by run_e2fsck almost useless.
I see e2fsck checks in many tests in sanity and conf-sanity scripts:
[zam@vm1 lustre-wc-rel]$ grep -e "run_e2fsck.*-n" lustre/tests/*.sh lustre/tests/conf-sanity.sh: run_e2fsck $(facet_active_host $SINGLEMDS) $mdsdev "-n" lustre/tests/conf-sanity.sh: run_e2fsck $mds1host $mds1dev "-n" lustre/tests/sanity-lfsck.sh: run_e2fsck $(facet_active_host $SINGLEMDS) $(mdsdevname 1) "-n" | lustre/tests/sanity-lfsck.sh: run_e2fsck $(facet_active_host $SINGLEMDS) $(mdsdevname 1) "-n" lustre/tests/sanity.sh: run_e2fsck $(facet_active_host mds${mds_index}) $devname -n lustre/tests/sanity.sh: run_e2fsck $(facet_active_host mds$mdt_index) $devname -n || lustre/tests/sanity.sh: run_e2fsck $(facet_active_host mds$idx) $dev -n || [zam@vm1 lustre-wc-rel]$
for example test_804 in sanity.sh:
for idx in $(seq $MDSCOUNT); do dev=$(mdsdevname $idx) rc=0 stop mds${idx} run_e2fsck $(facet_active_host mds$idx) $dev -n || rc=$? start mds${idx} $dev $MDS_MOUNT_OPTS || error "mount mds$idx failed" df $MOUNT > /dev/null 2>&1 # e2fsck should not return error [ $rc -eq 0 ] || error "e2fsck detected error on MDT${idx}: rc=$rc" done
this code will never fails, because e2fsck exit code is lost in run_e2fsck function:
another example is a test for LU-2634, it is about:
Short symlinks on MDT filesystems formatted with the "extents" feature appear to be created with the EXT4_EXTENTS_FL in osd-ldiskfs, but that shouldn't be happening. e2fsck considers this a corruption and deletes the symlink.
the test runs e2fsck at the end
#umount umount_client $MOUNT || error "umount client failed" stop_mds || error "stop mds failed" stop_ost || error "stop ost failed" #run e2fsck run_e2fsck $(facet_active_host $SINGLEMDS) $mdsdev "-n" }
intention is to check the fs and fail the test if FS corrupted. there is no attempt to parse fsck output but only checking the exit code.
however run_e2fsck coverts all exit codes below or equal 4 (FSCK_MAX_ERR) to 0:
# Run e2fsck on MDT or OST device. run_e2fsck() { local node=$1 local target_dev=$2 local extra_opts=$3 local cmd="$E2FSCK -d -v -t -t -f $extra_opts $target_dev" local log=$TMP/e2fsck.log local rc=0 echo $cmd do_node $node $cmd 2>&1 | tee $log rc=${PIPESTATUS[0]} if [ -n "$(grep "DNE mode isn't supported" $log)" ]; then rm -f $log if [ $MDSCOUNT -gt 1 ]; then skip "DNE mode isn't supported!" cleanupall exit_status else error "It's not DNE mode." fi fi rm -f $log [ $rc -le $FSCK_MAX_ERR ] || error "$cmd returned $rc, should be <= $FSCK_MAX_ERR" return 0 }
It should be return $rc at the end.
FYI, e2fsck exit codes are:
The exit code returned by e2fsck is the sum of the following conditions: 0 - No errors 1 - File system errors corrected 2 - File system errors corrected, system should be rebooted 4 - File system errors left uncorrected 8 - Operational error 16 - Usage or syntax error 32 - E2fsck canceled by user request 128 - Shared library error
A workaround is to set FSCK_MAX_ERR to 0 before calling run_e2fsck , but nobody uses it in the tests. Or the variable is set globally in Maloo setup ... it means the default setting should be changed.