[LU-10988] LBUG in lfsck Created: 02/May/18  Updated: 20/Aug/18  Resolved: 12/May/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.3
Fix Version/s: Lustre 2.12.0, Lustre 2.10.5

Type: Bug Priority: Minor
Reporter: SC Admin (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-10887 2 MDTs stuck in WAITING Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Hi,

we tested warble1 hardware all we could for about a week and found no hardware issues. we also replaced sas cards and cables just to be safe.

warble1 now is 3.10.0-693.21.1.el7.x86_64 and zfs 0.7.8 and has these patches applied

usr/src/lustre-2.10.3/lu10212-estale.patch
usr/src/lustre-2.10.3/lu10707-ksocklnd-revert-jiffies.patch
usr/src/lustre-2.10.3/lu10707-lnet-route-jiffies.patch
usr/src/lustre-2.10.3/lu10887-lfsck.patch
usr/src/lustre-2.10.3/lu8990-put-root.patch

when the dagg MDT's were mounted on warble1 they COMPLETED ok and then about 5 seconds later it hit an LBUG in lfsck.

...
2018-05-02 22:06:06 [ 2919.828067] Lustre: dagg-MDT0000: Client 22c84389-af1f-9970-0e9b-70c3a4861afd (at 10.8.49.155@tcp201) reconnecting
2018-05-02 22:06:06 [ 2919.828113] Lustre: dagg-MDT0002: Recovery already passed deadline 0:31. If you do not want to wait more, please abort the recovery by force.
2018-05-02 22:06:38 [ 2951.686211] Lustre: dagg-MDT0002: recovery is timed out, evict stale exports
2018-05-02 22:06:38 [ 2951.694197] Lustre: dagg-MDT0002: disconnecting 1 stale clients
2018-05-02 22:06:38 [ 2951.736799] Lustre: 24680:0:(ldlm_lib.c:2544:target_recovery_thread()) too long recovery - read logs
2018-05-02 22:06:38 [ 2951.746774] Lustre: dagg-MDT0002: Recovery over after 6:24, of 125 clients 124 recovered and 1 was evicted.
2018-05-02 22:06:38 [ 2951.746775] LustreError: dumping log to /tmp/lustre-log.1525262798.24680
2018-05-02 22:06:44 [ 2957.910031] LustreError: 33236:0:(dt_object.c:213:dt_mode_to_dft()) LBUG
2018-05-02 22:06:44 [ 2957.917615] Pid: 33236, comm: lfsck_namespace
2018-05-02 22:06:44 [ 2957.922760]
2018-05-02 22:06:44 [ 2957.922760] Call Trace:
2018-05-02 22:06:44 [ 2957.928142]  [<ffffffffc06457ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
2018-05-02 22:06:44 [ 2957.935374]  [<ffffffffc064583c>] lbug_with_loc+0x4c/0xb0 [libcfs]
2018-05-02 22:06:44 [ 2957.942270]  [<ffffffffc0d82573>] dt_mode_to_dft+0x73/0x80 [obdclass]
2018-05-02 22:06:44 [ 2957.949398]  [<ffffffffc115ac81>] lfsck_namespace_repair_dangling+0x621/0xf40 [lfsck]
2018-05-02 22:06:44 [ 2957.957911]  [<ffffffffc0d7ea22>] ? htable_lookup+0x102/0x180 [obdclass]
2018-05-02 22:06:44 [ 2957.965289]  [<ffffffffc1186f4a>] lfsck_namespace_striped_dir_rescan+0x86a/0x1220 [lfsck]
2018-05-02 22:06:44 [ 2957.974129]  [<ffffffffc115ce71>] lfsck_namespace_assistant_handler_p1+0x18d1/0x1f40 [lfsck]
2018-05-02 22:06:44 [ 2957.983217]  [<ffffffff8102954d>] ? __switch_to+0xcd/0x500
2018-05-02 22:06:44 [ 2957.989375]  [<ffffffffc114098e>] lfsck_assistant_engine+0x3ce/0x20b0 [lfsck]
2018-05-02 22:06:44 [ 2957.997154]  [<ffffffff810cb0b5>] ? sched_clock_cpu+0x85/0xc0
2018-05-02 22:06:44 [ 2958.003538]  [<ffffffff8102954d>] ? __switch_to+0xcd/0x500
2018-05-02 22:06:44 [ 2958.009648]  [<ffffffff810c7c70>] ? default_wake_function+0x0/0x20
2018-05-02 22:06:44 [ 2958.016449]  [<ffffffffc11405c0>] ? lfsck_assistant_engine+0x0/0x20b0 [lfsck]
2018-05-02 22:06:44 [ 2958.024186]  [<ffffffff810b4031>] kthread+0xd1/0xe0
2018-05-02 22:06:44 [ 2958.029662]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
2018-05-02 22:06:44 [ 2958.035220]  [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0
2018-05-02 22:06:44 [ 2958.041197]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
2018-05-02 22:06:44 [ 2958.046723]
2018-05-02 22:06:44 [ 2958.048771] Kernel panic - not syncing: LBUG
2018-05-02 22:06:44 [ 2958.053576] CPU: 2 PID: 33236 Comm: lfsck_namespace Tainted: P           OE  ------------   3.10.0-693.21.1.el7.x86_64 #1
2018-05-02 22:06:44 [ 2958.065051] Hardware name: Dell Inc. PowerEdge R740/0JM3W2, BIOS 1.3.7 02/08/2018
2018-05-02 22:06:44 [ 2958.073066] Call Trace:
2018-05-02 22:06:44 [ 2958.076060]  [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
2018-05-02 22:06:44 [ 2958.081738]  [<ffffffff816a8634>] panic+0xe8/0x21f
2018-05-02 22:06:44 [ 2958.087058]  [<ffffffffc0645854>] lbug_with_loc+0x64/0xb0 [libcfs]
2018-05-02 22:06:44 [ 2958.093781]  [<ffffffffc0d82573>] dt_mode_to_dft+0x73/0x80 [obdclass]
2018-05-02 22:06:44 [ 2958.100741]  [<ffffffffc115ac81>] lfsck_namespace_repair_dangling+0x621/0xf40 [lfsck]
2018-05-02 22:06:45 [ 2958.109091]  [<ffffffffc0d7ea22>] ? htable_lookup+0x102/0x180 [obdclass]
2018-05-02 22:06:45 [ 2958.116294]  [<ffffffffc1186f4a>] lfsck_namespace_striped_dir_rescan+0x86a/0x1220 [lfsck]
2018-05-02 22:06:45 [ 2958.124963]  [<ffffffffc115ce71>] lfsck_namespace_assistant_handler_p1+0x18d1/0x1f40 [lfsck]
2018-05-02 22:06:45 [ 2958.133889]  [<ffffffff8102954d>] ? __switch_to+0xcd/0x500
2018-05-02 22:06:45 [ 2958.139866]  [<ffffffffc114098e>] lfsck_assistant_engine+0x3ce/0x20b0 [lfsck]
2018-05-02 22:06:45 [ 2958.147492]  [<ffffffff810cb0b5>] ? sched_clock_cpu+0x85/0xc0
2018-05-02 22:06:45 [ 2958.153724]  [<ffffffff8102954d>] ? __switch_to+0xcd/0x500
2018-05-02 22:06:45 [ 2958.159689]  [<ffffffff810c7c70>] ? wake_up_state+0x20/0x20
2018-05-02 22:06:45 [ 2958.165742]  [<ffffffffc11405c0>] ? lfsck_master_engine+0x1310/0x1310 [lfsck]
2018-05-02 22:06:45 [ 2958.173343]  [<ffffffff810b4031>] kthread+0xd1/0xe0
2018-05-02 22:06:45 [ 2958.178685]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
2018-05-02 22:06:45 [ 2958.185227]  [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0
2018-05-02 22:06:45 [ 2958.191065]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
2018-05-02 22:06:45 [ 2958.197613] Kernel Offset: disabled

I've failed the MDT's back to warble2 and mounted them by hand with -o skip_lfsck

cheers,
robin



 Comments   
Comment by Peter Jones [ 02/May/18 ]

usr/src/lustre-2.10.3/lu10887-lfsck.patch

So you have applied the patch https://review.whamcloud.com/31915/ on the MDT, right? If yes, there should be other corners for lfsck_namespace_repair_dangling() that missed to set the “mode”. I will investigate more. Anyway, the existed patches are still valid.

Comment by SC Admin (Inactive) [ 02/May/18 ]

yes, that patch is what I've called lu10887-lfsck.patch and is applied.

cheers,
robin

Comment by Gerrit Updater [ 02/May/18 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/32245
Subject: LU-10988 obdclass: show invalid mode for dt_mode_to_dft
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b4b453e5a8d4ff2cb51b4b261b54457d3258b8c6

Comment by nasf (Inactive) [ 02/May/18 ]

scadmin, would you please to apply the patch https://review.whamcloud.com/32245? that will help us to understand what the bad file "mode" is when hit the trouble. Thanks!

Comment by SC Admin (Inactive) [ 02/May/18 ]

no probs.

2018-05-03 01:02:53 [10209.454437] LustreError: 35645:0:(dt_object.c:213:dt_mode_to_dft()) ASSERTION( 0 ) failed: invalid mode 0
2018-05-03 01:02:53 [10209.464798] LustreError: 35645:0:(dt_object.c:213:dt_mode_to_dft()) LBUG
2018-05-03 01:02:53 [10209.472260] Pid: 35645, comm: lfsck_namespace
2018-05-03 01:02:53 [10209.477336] 
2018-05-03 01:02:53 [10209.477336] Call Trace:
2018-05-03 01:02:53 [10209.482611]  [<ffffffffc05f67ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
2018-05-03 01:02:53 [10209.489784]  [<ffffffffc05f683c>] lbug_with_loc+0x4c/0xb0 [libcfs]
2018-05-03 01:02:53 [10209.496628]  [<ffffffffc0b275a1>] dt_mode_to_dft+0xa1/0xb0 [obdclass]
2018-05-03 01:02:53 [10209.503721]  [<ffffffffc14b5c81>] lfsck_namespace_repair_dangling+0x621/0xf40 [lfsck]
2018-05-03 01:02:53 [10209.512193]  [<ffffffffc0b23a22>] ? htable_lookup+0x102/0x180 [obdclass]
2018-05-03 01:02:53 [10209.519533]  [<ffffffffc14e1f4a>] lfsck_namespace_striped_dir_rescan+0x86a/0x1220 [lfsck]
2018-05-03 01:02:53 [10209.528336]  [<ffffffffc14b7e71>] lfsck_namespace_assistant_handler_p1+0x18d1/0x1f40 [lfsck]
2018-05-03 01:02:53 [10209.537370]  [<ffffffff8102954d>] ? __switch_to+0xcd/0x500
2018-05-03 01:02:53 [10209.543458]  [<ffffffffc149b98e>] lfsck_assistant_engine+0x3ce/0x20b0 [lfsck]
2018-05-03 01:02:53 [10209.551182]  [<ffffffff810cb0b5>] ? sched_clock_cpu+0x85/0xc0
2018-05-03 01:02:53 [10209.557512]  [<ffffffff8102954d>] ? __switch_to+0xcd/0x500
2018-05-03 01:02:53 [10209.563565]  [<ffffffff810c7c70>] ? default_wake_function+0x0/0x20
2018-05-03 01:02:53 [10209.570305]  [<ffffffffc149b5c0>] ? lfsck_assistant_engine+0x0/0x20b0 [lfsck]
2018-05-03 01:02:53 [10209.578004]  [<ffffffff810b4031>] kthread+0xd1/0xe0
2018-05-03 01:02:53 [10209.583438]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
2018-05-03 01:02:53 [10209.588968]  [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0
2018-05-03 01:02:53 [10209.594914]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
2018-05-03 01:02:53 [10209.600428] 
2018-05-03 01:02:53 [10209.602473] Kernel panic - not syncing: LBUG

cheers,
robin

Comment by nasf (Inactive) [ 02/May/18 ]

scadmin, thanks for the new test results. I have updated the patch ttps://review.whamcloud.com/32245. That should have fixed your trouble. Would you please to try again (set 2 or newer)? Thanks!

Comment by SC Admin (Inactive) [ 03/May/18 ]

yup, that worked. thanks!

lfsck on the MDT1,MDT2 where it was 'crashed' has now completed:

[warble1]root:   lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: stopped
flags:
param: all_targets,create_mdtobj
last_completed_time: 1523258154
time_since_last_completed: 2084076 seconds
latest_start_time: 1523283745
time_since_latest_start: 2058485 seconds
last_checkpoint_time: 1523284186
time_since_last_checkpoint: 2058044 seconds
latest_start_position: 3643308, [0x20001072c:0x16502:0x0], 0x75092e6bed520000
last_checkpoint_position: 3643308, [0x20001072c:0x16502:0x0], 0x750934ffbf07ffff
first_failure_position: 3643307, N/A, N/A
checked_phase1: 2426567
checked_phase2: 0
updated_phase1: 0
updated_phase2: 0
failed_phase1: 1
failed_phase2: 0
directories: 202176
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 14706
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 1
striped_shards_scanned: 142443
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 5
run_time_phase1: 180 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 13480 items/sec
average_speed_phase2: 0 objs/sec
average_speed_total: 13480 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets,create_mdtobj
last_completed_time: 1525341840
time_since_last_completed: 390 seconds
latest_start_time: 1525339672
time_since_latest_start: 2558 seconds
last_checkpoint_time: 1525341840
time_since_last_checkpoint: 390 seconds
latest_start_position: 2934356, [0x28001c941:0x9c24:0x0], 0x0
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 7946529
checked_phase2: 3312630
updated_phase1: 1
updated_phase2: 1
failed_phase1: 0
failed_phase2: 0
directories: 1414222
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 52655
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 3
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 1
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1060672
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 6
run_time_phase1: 1488 seconds
run_time_phase2: 931 seconds
average_speed_phase1: 5340 items/sec
average_speed_phase2: 3558 objs/sec
average_speed_total: 4654 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets,create_mdtobj
last_completed_time: 1525341827
time_since_last_completed: 403 seconds
latest_start_time: 1525339070
time_since_latest_start: 3160 seconds
last_checkpoint_time: 1525341827
time_since_last_checkpoint: 403 seconds
latest_start_position: 2965156, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 7917065
checked_phase2: 3318839
updated_phase1: 0
updated_phase2: 1
failed_phase1: 0
failed_phase2: 0
directories: 1425028
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 52856
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 1
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1068469
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 6
run_time_phase1: 2016 seconds
run_time_phase2: 919 seconds
average_speed_phase1: 3927 items/sec
average_speed_phase2: 3611 objs/sec
average_speed_total: 3828 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A

MDT0 still needs to be re-scanned I think though 'cos it was 'stopped' as part of the crashing stuff in LU-10887

I'll start a whole new scan again now...

lctl lfsck_start -M dagg-MDT0000 -t namespace -A -r -C

cheers,
robin

Comment by nasf (Inactive) [ 03/May/18 ]

"stopped" is not the expected status. Please re-run the LFSCK with LFSCK debug enabled (lctl set_param debug+=lfsck), then if it failed again, we can analysis the debug logs.

Comment by SC Admin (Inactive) [ 03/May/18 ]

yes, I know 'stopped' is not correct. these 3 MDTs have been 'stopped', and 'crashed', 'crashed' respectively for a week now, and mounted with -o skip_lfsck (for obvious reasons).

now that lfsck is fixed, the 'crashed' ones continued automatically as soon as they were mounted, and then finished. the 'stopped' one (MDT0) remained stopped, as that has been its state for a week or more.

I have now re-run the whole namespace lfsck again, and output is now this:

[warble1]root:   lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets,create_mdtobj
last_completed_time: 1525344435
time_since_last_completed: 963 seconds
latest_start_time: 1525342746
time_since_latest_start: 2652 seconds
last_checkpoint_time: 1525344435
time_since_last_checkpoint: 963 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 8060227
checked_phase2: 3437356
updated_phase1: 0
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 1393898
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 53149
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1036368
striped_shards_repaired: 1
striped_shards_failed: 0
striped_shards_skipped: 2
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 6
run_time_phase1: 1039 seconds
run_time_phase2: 650 seconds
average_speed_phase1: 7757 items/sec
average_speed_phase2: 5288 objs/sec
average_speed_total: 6807 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets,create_mdtobj
last_completed_time: 1525344401
time_since_last_completed: 997 seconds
latest_start_time: 1525342746
time_since_latest_start: 2652 seconds
last_checkpoint_time: 1525344401
time_since_last_checkpoint: 997 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 7743558
checked_phase2: 3250035
updated_phase1: 0
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 1381748
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 52656
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1036341
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 2
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 7
run_time_phase1: 638 seconds
run_time_phase2: 616 seconds
average_speed_phase1: 12137 items/sec
average_speed_phase2: 5276 objs/sec
average_speed_total: 8766 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets,create_mdtobj
last_completed_time: 1525344398
time_since_last_completed: 1000 seconds
latest_start_time: 1525342746
time_since_latest_start: 2652 seconds
last_checkpoint_time: 1525344398
time_since_last_checkpoint: 1000 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 7746926
checked_phase2: 3246941
updated_phase1: 2
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 1382139
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 52862
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1036341
striped_shards_repaired: 1
striped_shards_failed: 0
striped_shards_skipped: 2
name_hash_repaired: 2
linkea_overflow_cleared: 0
success_count: 7
run_time_phase1: 641 seconds
run_time_phase2: 613 seconds
average_speed_phase1: 12085 items/sec
average_speed_phase2: 5296 objs/sec
average_speed_total: 8767 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A

look ok to you?

anything else we should be doing to repair the damage from LU-10877 or LU-10677?

cheers,
robin

Comment by nasf (Inactive) [ 03/May/18 ]

striped_shards_repaired: 1

That means some corrupted striped directory has been fixed. And the status become "completed", that is expected.

striped_shards_skipped: 2

That means that the namespace LFSCK could not confirm whether related shards are inconsistent or not because without enough evidences.

Generally, the system is in healthy status. As for those uncertain items, if they are corrupted, then we can consider to fix them when we really hit trouble in the future.

Comment by Gerrit Updater [ 12/May/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32245/
Subject: LU-10988 lfsck: load object attr when prepare LFSCK request
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f7c354096a810df0a9333dace2f538d6dfbe486f

Comment by Peter Jones [ 12/May/18 ]

Landed for 2.12

Comment by Gerrit Updater [ 23/May/18 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/32522
Subject: LU-10988 lfsck: load object attr when prepare LFSCK request
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 6b3b1a51fe759f769f86f8ff968290754a467ff1

Comment by Gerrit Updater [ 11/Jun/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/32522/
Subject: LU-10988 lfsck: load object attr when prepare LFSCK request
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 21d33c11abdb2908fa5fa43bb942a97615c6f7a2

Generated at Sat Feb 10 02:39:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.