[LU-14197] sanity-lfsck test_28: Timeout occurred after 251 mins, last suite running was sanity-lfsck Created: 08/Dec/20  Updated: 08/Dec/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Li Xi Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

== sanity-lfsck test 28: Skip the failed MDT(s) when handle orphan MDT-objects ======================= 04:12:55 (1607400775)
#####
The target name entry is lost. The LFSCK should insert the
orphan MDT-object under .lustre/lost+found/MDTxxxx. But if
the MDT (on which the orphan MDT-object resides) has ever
failed to respond some name entry verification during the
first stage-scanning, then the LFSCK should skip to handle
orphan MDT-object on this MDT. But other MDTs should not
be affected.
#####
Inject failure stub on MDT0 to simulate the case that
d1/a1's name entry will be removed, but the d1/a1's object
and its linkEA are kept in the system. And the case that
d2/a2's name entry will be removed, but the d2/a2's object
and its linkEA are kept in the system.
CMD: trevis-28vm4 /usr/sbin/lctl set_param fail_loc=0x1624
fail_loc=0x1624
CMD: trevis-28vm10 /usr/sbin/lctl set_param fail_loc=0x1624
fail_loc=0x1624
CMD: trevis-28vm4 /usr/sbin/lctl set_param fail_loc=0
fail_loc=0
CMD: trevis-28vm10 /usr/sbin/lctl set_param fail_loc=0
fail_loc=0
Inject failure, to simulate the MDT0 fail to handle
MDT1 LFSCK request during the first-stage scanning.
CMD: trevis-28vm10 /usr/sbin/lctl set_param fail_loc=0x161c fail_val=0
fail_loc=0x161c
fail_val=0
Trigger namespace LFSCK on all devices to find out orphan object
CMD: trevis-28vm4 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r -A
Started LFSCK on the device lustre-MDT0000: scrub namespace
CMD: trevis-28vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/

{ print \$2 }'
CMD: trevis-28vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print $2 }

'
CMD: trevis-28vm10 /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.lfsck_namespace |
awk '/^status/

{ print \$2 }'
CMD: trevis-28vm10 /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.lfsck_namespace |
awk '/^status/ { print $2 }

'
CMD: trevis-28vm10 /usr/sbin/lctl set_param fail_loc=0 fail_val=0
fail_loc=0
fail_val=0
CMD: trevis-28vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
CMD: trevis-28vm10 /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.lfsck_namespace
Trigger namespace LFSCK on all devices again to cleanup
CMD: trevis-28vm4 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r -A
Started LFSCK on the device lustre-MDT0000: scrub namespace
CMD: trevis-28vm4 /usr/sbin/lctl lfsck_query -t namespace -M lustre-MDT0000 -w |
awk '/^namespace_mdts_completed/

{ print \$2 }

'


Generated at Sat Feb 10 03:07:41 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.