[LU-10036] lfsck stuck in init on multiple MDS Created: 26/Sep/17  Updated: 04/Jan/18  Resolved: 04/Jan/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.1
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Cliff White (Inactive) Assignee: nasf (Inactive)
Resolution: Duplicate Votes: 0
Labels: soak
Environment:

soak stress cluster - b2.10.1 + LU-9983 diag patch


Issue Links:
Duplicate
duplicates LU-10419 LFSCK fails to start, hangs systems. Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

soak starts lfsck:

2017-09-26 16:54:20,710:fsmgmt.fsmgmt:INFO     executing cmd:
                lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all
                if [[ $? != 0 ]]; then
                        lctl lfsck_start -M soaked-MDT0000 -s 1000 -t namespace,layout
                        if [[ $? != 0 ]]; then
                                lctl lfsck_start -M soaked-MDT0000 -s 1000 -t namespace
                        fi
                fi

lfsck shows 'stopped' on soak-8, 'init' on all other MDS nodes. No apparent progress

----------------
soak-8
----------------
mdd.soaked-MDT0000.lfsck_namespace=
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: stopped
flags:
param:
last_completed_time: 1506405886
time_since_last_completed: 41575 seconds
latest_start_time: 1506410669
time_since_latest_start: 36792 seconds
last_checkpoint_time: 1506411504
time_since_last_checkpoint: 35957 seconds
latest_start_position: 77, N/A, N/A
last_checkpoint_position: 500695836, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 2156
.....
----------------
soak-9
----------------
mdd.soaked-MDT0001.lfsck_namespace=
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: init
flags:
param:
last_completed_time: N/A
time_since_last_completed: N/A
latest_start_time: N/A
time_since_latest_start: N/A
last_checkpoint_time: N/A
....
----------------
soak-10
----------------
mdd.soaked-MDT0002.lfsck_namespace=
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: init
flags:
param:
last_completed_time: N/A
time_since_last_completed: N/A
latest_start_time: N/A
time_since_latest_start: N/A
last_checkpoint_time: N/A
time_since_last_checkpoint: N/A
latest_start_position: N/A, N/A, N/A
last_checkpoint_position: N/A, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 0
....
----------------
soak-11
----------------
mdd.soaked-MDT0003.lfsck_namespace=
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: init
flags:
param:
last_completed_time: N/A
time_since_last_completed: N/A
latest_start_time: N/A
time_since_latest_start: N/A
last_checkpoint_time: N/A
time_since_last_checkpoint: N/A
latest_start_position: N/A, N/A, N/A
last_checkpoint_position: N/A, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 0
checked_phase2: 0
updated_phase1: 0
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 0
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 0
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 0
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 0
run_time_phase1: 0 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 0 items/sec
average_speed_phase2: 0 objs/sec
average_speed_total: 0 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A

soak is currently wedged with this failure.



 Comments   
Comment by Joseph Gmitter (Inactive) [ 26/Sep/17 ]

Hi Fan Yong,

Can you please comment on this one?

Thanks.
Joe

Comment by nasf (Inactive) [ 29/Sep/17 ]

If you want to start namespace on all MDTs, then please add "-A" option:

lctl lfsck_start -M soaked-MDT0000 -s 1000 -t namespace -A

If you do NOT want to continue with the last run, then please add reset option:

lctl lfsck_start -M soaked-MDT0000 -s 1000 -t namespace -A -r
Comment by Cliff White (Inactive) [ 19/Dec/17 ]

We seem to have hit this problem again, on version=2.10.2_4_gb151f34,
Lfsck hangs during lfsck start, can't be stopped. Requires a reboot of all MDS, then lfsck_start then lfsck_stop to clear the issue.

Comment by nasf (Inactive) [ 04/Jan/18 ]

It will be handled via LU-10419

Generated at Sat Feb 10 02:31:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.