[LU-10036] lfsck stuck in init on multiple MDS Created: 26/Sep/17 Updated: 04/Jan/18 Resolved: 04/Jan/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Cliff White (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
soak stress cluster - b2.10.1 + |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
soak starts lfsck: 2017-09-26 16:54:20,710:fsmgmt.fsmgmt:INFO executing cmd:
lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all
if [[ $? != 0 ]]; then
lctl lfsck_start -M soaked-MDT0000 -s 1000 -t namespace,layout
if [[ $? != 0 ]]; then
lctl lfsck_start -M soaked-MDT0000 -s 1000 -t namespace
fi
fi
lfsck shows 'stopped' on soak-8, 'init' on all other MDS nodes. No apparent progress ---------------- soak-8 ---------------- mdd.soaked-MDT0000.lfsck_namespace= name: lfsck_namespace magic: 0xa0621a0b version: 2 status: stopped flags: param: last_completed_time: 1506405886 time_since_last_completed: 41575 seconds latest_start_time: 1506410669 time_since_latest_start: 36792 seconds last_checkpoint_time: 1506411504 time_since_last_checkpoint: 35957 seconds latest_start_position: 77, N/A, N/A last_checkpoint_position: 500695836, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 2156 ..... ---------------- soak-9 ---------------- mdd.soaked-MDT0001.lfsck_namespace= name: lfsck_namespace magic: 0xa0621a0b version: 2 status: init flags: param: last_completed_time: N/A time_since_last_completed: N/A latest_start_time: N/A time_since_latest_start: N/A last_checkpoint_time: N/A .... ---------------- soak-10 ---------------- mdd.soaked-MDT0002.lfsck_namespace= name: lfsck_namespace magic: 0xa0621a0b version: 2 status: init flags: param: last_completed_time: N/A time_since_last_completed: N/A latest_start_time: N/A time_since_latest_start: N/A last_checkpoint_time: N/A time_since_last_checkpoint: N/A latest_start_position: N/A, N/A, N/A last_checkpoint_position: N/A, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 0 .... ---------------- soak-11 ---------------- mdd.soaked-MDT0003.lfsck_namespace= name: lfsck_namespace magic: 0xa0621a0b version: 2 status: init flags: param: last_completed_time: N/A time_since_last_completed: N/A latest_start_time: N/A time_since_latest_start: N/A last_checkpoint_time: N/A time_since_last_checkpoint: N/A latest_start_position: N/A, N/A, N/A last_checkpoint_position: N/A, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 0 checked_phase2: 0 updated_phase1: 0 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 0 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 0 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 0 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 linkea_overflow_cleared: 0 success_count: 0 run_time_phase1: 0 seconds run_time_phase2: 0 seconds average_speed_phase1: 0 items/sec average_speed_phase2: 0 objs/sec average_speed_total: 0 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A soak is currently wedged with this failure. |
| Comments |
| Comment by Joseph Gmitter (Inactive) [ 26/Sep/17 ] |
|
Hi Fan Yong, Can you please comment on this one? Thanks. |
| Comment by nasf (Inactive) [ 29/Sep/17 ] |
|
If you want to start namespace on all MDTs, then please add "-A" option: lctl lfsck_start -M soaked-MDT0000 -s 1000 -t namespace -A If you do NOT want to continue with the last run, then please add reset option: lctl lfsck_start -M soaked-MDT0000 -s 1000 -t namespace -A -r |
| Comment by Cliff White (Inactive) [ 19/Dec/17 ] |
|
We seem to have hit this problem again, on version=2.10.2_4_gb151f34, |
| Comment by nasf (Inactive) [ 04/Jan/18 ] |
|
It will be handled via |