Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.10.4
-
None
-
x86_64, zfs, 3 MDTs, all on 1 MDS, , 2.10.4 + many patches.
-
3
-
9223372036854775807
Description
Hi,
I presume this is related to LU-11111 and LU-10888.
lctl lfsck_start -M dagg-MDT0000 -t namespace -A -n
completed ok
lctl lfsck_start -M dagg-MDT0000 -t namespace -A
completed on mdt1 and mdt2 but stuck on mdt0.
this is the summary of repairs, and md0 did not progress from here:
[warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace | egrep 'status:|repaired|checked_' | grep -v ' 0$' status: scanning-phase2 checked_phase1: 33226737 checked_phase2: 10901477 dangling_repaired: 28 striped_shards_repaired: 102 name_hash_repaired: 51 status: completed checked_phase1: 32652269 checked_phase2: 12379442 dangling_repaired: 28 striped_shards_repaired: 125 status: completed checked_phase1: 32662678 checked_phase2: 12378342 unmatched_pairs_repaired: 1 dangling_repaired: 11 striped_shards_repaired: 96
lfsck_namespace was using 100% of a cpu but the checked_phase2 counter wasn't going up.
kill -9 on lfsck_namespace didn't work
I didn't try lfsk stop_lfsck this time.
mdt0 wouldn't umount. had to reset the MDS.
I did a sysrq 't' and 'w' before resetting the MDS and those start at
Sep 23 00:18:42
in the attached messages file.
hopefully that might help.
please let us know if there's something else we can help with.
cheers,
robin