[LU-11111] crash doing LFSCK: orph_index_insert()) ASSERTION( !(obj->mod_flags & ORPHAN_OBJ) Created: 02/Jul/18  Updated: 08/Apr/19  Resolved: 28/Mar/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.4
Fix Version/s: Lustre 2.13.0, Lustre 2.12.1

Type: Bug Priority: Minor
Reporter: SC Admin (Inactive) Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None
Environment:

x86_64, zfs 0.7.9, centos7.4, 2.10.4 plus the "lfsck: load object attr when prepare LFSCK request" patch from LU-10988


Issue Links:
Duplicate
is duplicated by LU-12127 Improve LFSCK orphan handling Resolved
Related
is related to LU-10888 'lctl abort_recovery' allow aborting ... Resolved
is related to LU-11239 sanity-lfsck test 36a fails with 'Fai... Resolved
is related to LU-11419 lfsck does not complete phase2 Resolved
is related to LU-11516 ASSERTION( ((o)->lo_header->loh_attr ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Hi,

I hit this just now when doing a lfsck for LU-10677.

lctl lfsck_start -M dagg-MDT0000 -t namespace -A -r -C

2018-07-02 19:00:24 [699080.215320] Lustre: debug daemon will attempt to start writing to /mnt/root/root/lfsck.debug_daemon.dagg-MDTs.20180702.1 (20480000kB max)
2018-07-02 19:51:27 [702136.419050] LustreError: 199045:0:(mdd_orphans.c:205:orph_index_insert()) ASSERTION( !(obj->mod_flags & ORPHAN_OBJ) ) failed: 
2018-07-02 19:51:27 [702136.432255] LustreError: 199045:0:(mdd_orphans.c:205:orph_index_insert()) LBUG
2018-07-02 19:51:27 [702136.440349] Pid: 199045, comm: mdt01_013
2018-07-02 19:51:27 [702136.445132] 
2018-07-02 19:51:27 [702136.445132] Call Trace:
2018-07-02 19:51:27 [702136.450734]  [<ffffffffc05267ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
2018-07-02 19:51:27 [702136.458083]  [<ffffffffc052683c>] lbug_with_loc+0x4c/0xb0 [libcfs]
2018-07-02 19:51:27 [702136.465078]  [<ffffffffc11e3827>] __mdd_orphan_add+0x8e7/0xa20 [mdd]
2018-07-02 19:51:27 [702136.472242]  [<ffffffffc09f720c>] ? osd_write_locked+0x3c/0x60 [osd_zfs]
2018-07-02 19:51:27 [702136.479747]  [<ffffffffc11d33ce>] mdd_finish_unlink+0x17e/0x410 [mdd]
2018-07-02 19:51:27 [702136.486988]  [<ffffffffc11d56e4>] mdd_unlink+0xae4/0xbf0 [mdd]
2018-07-02 19:51:27 [702136.493616]  [<ffffffffc10bc728>] mdt_reint_unlink+0xc28/0x11d0 [mdt]
2018-07-02 19:51:27 [702136.500846]  [<ffffffffc10bfb33>] mdt_reint_rec+0x83/0x210 [mdt]
2018-07-02 19:51:27 [702136.507628]  [<ffffffffc10a137b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
2018-07-02 19:51:27 [702136.514928]  [<ffffffffc10acf07>] mdt_reint+0x67/0x140 [mdt]
2018-07-02 19:51:28 [702136.521402]  [<ffffffffc0e752ba>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
2018-07-02 19:51:28 [702136.529074]  [<ffffffffc0e1de2b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
2018-07-02 19:51:28 [702136.537493]  [<ffffffffc0e1a458>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
2018-07-02 19:51:28 [702136.545015]  [<ffffffff810c7c82>] ? default_wake_function+0x12/0x20
2018-07-02 19:51:28 [702136.552008]  [<ffffffff810bdc4b>] ? __wake_up_common+0x5b/0x90
2018-07-02 19:51:28 [702136.558588]  [<ffffffffc0e21572>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
2018-07-02 19:51:28 [702136.565605]  [<ffffffffc0e20ae0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
2018-07-02 19:51:28 [702136.572580]  [<ffffffff810b4031>] kthread+0xd1/0xe0
2018-07-02 19:51:28 [702136.578163]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
2018-07-02 19:51:28 [702136.583819]  [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0
2018-07-02 19:51:28 [702136.589900]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
2018-07-02 19:51:28 [702136.595542] 
2018-07-02 19:51:28 [702136.597694] Kernel panic - not syncing: LBUG
2018-07-02 19:51:28 [702136.602600] CPU: 9 PID: 199045 Comm: mdt01_013 Tainted: P           OE  ------------   3.10.0-693.21.1.el7.x86_64 #1
2018-07-02 19:51:28 [702136.613744] Hardware name: Dell Inc. PowerEdge R740/0JM3W2, BIOS 1.3.7 02/08/2018
2018-07-02 19:51:28 [702136.621857] Call Trace:
2018-07-02 19:51:28 [702136.624946]  [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
2018-07-02 19:51:28 [702136.630704]  [<ffffffff816a8634>] panic+0xe8/0x21f
2018-07-02 19:51:28 [702136.636117]  [<ffffffffc0526854>] lbug_with_loc+0x64/0xb0 [libcfs]
2018-07-02 19:51:28 [702136.642927]  [<ffffffffc11e3827>] __mdd_orphan_add+0x8e7/0xa20 [mdd]
2018-07-02 19:51:28 [702136.649895]  [<ffffffffc09f720c>] ? osd_write_locked+0x3c/0x60 [osd_zfs]
2018-07-02 19:51:28 [702136.657201]  [<ffffffffc11d33ce>] mdd_finish_unlink+0x17e/0x410 [mdd]
2018-07-02 19:51:28 [702136.664232]  [<ffffffffc11d56e4>] mdd_unlink+0xae4/0xbf0 [mdd]
2018-07-02 19:51:28 [702136.670661]  [<ffffffffc10bc728>] mdt_reint_unlink+0xc28/0x11d0 [mdt]
2018-07-02 19:51:28 [702136.677681]  [<ffffffffc10bfb33>] mdt_reint_rec+0x83/0x210 [mdt]
2018-07-02 19:51:28 [702136.684247]  [<ffffffffc10a137b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
2018-07-02 19:51:28 [702136.691331]  [<ffffffffc10acf07>] mdt_reint+0x67/0x140 [mdt]
2018-07-02 19:51:28 [702136.697583]  [<ffffffffc0e752ba>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
2018-07-02 19:51:28 [702136.705025]  [<ffffffffc0e1de2b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
2018-07-02 19:51:28 [702136.713235]  [<ffffffffc0e1a458>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
2018-07-02 19:51:28 [702136.720538]  [<ffffffff810c7c82>] ? default_wake_function+0x12/0x20
2018-07-02 19:51:28 [702136.727314]  [<ffffffff810bdc4b>] ? __wake_up_common+0x5b/0x90
2018-07-02 19:51:28 [702136.733666]  [<ffffffffc0e21572>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
2018-07-02 19:51:28 [702136.740438]  [<ffffffffc0e20ae0>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
2018-07-02 19:51:28 [702136.748310]  [<ffffffff810b4031>] kthread+0xd1/0xe0
2018-07-02 19:51:28 [702136.753661]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
2018-07-02 19:51:28 [702136.760227]  [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0
2018-07-02 19:51:28 [702136.766108]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
2018-07-02 19:51:28 [702136.772696] Kernel Offset: disabled

the debug_daemon running at the time, but the file is 6.1G and probably truncated because of the crash. let me know if you want it anyway.

I've mounted the MDT's back up again with -o skip_lfsck.

I think the crash happened somewhere in phase2 of the namespace lfsck. the lfsck state after the reboot is below.

[warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: crashed
flags: scanned-once,inconsistent
param: all_targets,create_mdtobj
last_completed_time: 1528222041
time_since_last_completed: 2305149 seconds
latest_start_time: 1530521735
time_since_latest_start: 5455 seconds
last_checkpoint_time: 1530525083
time_since_last_checkpoint: 2107 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 14945199
checked_phase2: 4683016
updated_phase1: 0
updated_phase2: 1
failed_phase1: 0
failed_phase2: 0
directories: 2205524
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 324817
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 1
dangling_repaired: 92
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1639566
striped_shards_repaired: 1
striped_shards_failed: 0
striped_shards_skipped: 3
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 9
run_time_phase1: 2357 seconds
run_time_phase2: 960 seconds
average_speed_phase1: 6340 items/sec
average_speed_phase2: 4878 objs/sec
average_speed_total: 5917 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: crashed
flags: scanned-once,inconsistent
param: all_targets,create_mdtobj
last_completed_time: 1528222004
time_since_last_completed: 2305186 seconds
latest_start_time: 1530521735
time_since_latest_start: 5455 seconds
last_checkpoint_time: 1530525083
time_since_last_checkpoint: 2107 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 14356291
checked_phase2: 4705808
updated_phase1: 0
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 2185653
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 323999
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 11
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1639415
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 4
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 10
run_time_phase1: 2374 seconds
run_time_phase2: 960 seconds
average_speed_phase1: 6047 items/sec
average_speed_phase2: 4901 objs/sec
average_speed_total: 5717 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: crashed
flags: scanned-once,inconsistent
param: all_targets,create_mdtobj
last_completed_time: 1528221995
time_since_last_completed: 2305195 seconds
latest_start_time: 1530521735
time_since_latest_start: 5455 seconds
last_checkpoint_time: 1530525083
time_since_last_checkpoint: 2107 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 14368285
checked_phase2: 4739156
updated_phase1: 2
updated_phase2: 1
failed_phase1: 0
failed_phase2: 0
directories: 2187046
dirent_repaired: 0
linkea_repaired: 1
nlinks_repaired: 0
multiple_linked_checked: 324797
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 88
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 1
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1639666
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 4
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 10
run_time_phase1: 2387 seconds
run_time_phase2: 960 seconds
average_speed_phase1: 6019 items/sec
average_speed_phase2: 4936 objs/sec
average_speed_total: 5708 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A

cheers,
robin



 Comments   
Comment by SC Admin (Inactive) [ 02/Jul/18 ]

FWIW the last little bit of the debug_file is

...
00100000:10000000:3.0:1530524123.223490:0:375744:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa471:0x0] in the trace file, flags 1, old 0, new 1: rc = 0
00100000:10000000:9.0:1530524123.224700:0:377274:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa471:0x0] in the trace file, flags 2, old 1, new 3: rc = 0
00100000:10000000:3.0:1530524123.224783:0:375744:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa45c:0x0] in the trace file, flags 1, old 0, new 1: rc = 0
00100000:10000000:9.0:1530524123.225897:0:377274:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa45c:0x0] in the trace file, flags 2, old 1, new 3: rc = 0
00100000:10000000:3.0:1530524123.225978:0:375744:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa46a:0x0] in the trace file, flags 1, old 0, new 1: rc = 0
00100000:10000000:9.0:1530524123.227163:0:377274:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa46a:0x0] in the trace file, flags 2, old 1, new 3: rc = 0
00100000:10000000:3.0:1530524123.227248:0:375744:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa476:0x0] in the trace file, flags 1, old 0, new 1: rc = 0
00100000:10000000:9.0:1530524123.228443:0:377274:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa476:0x0] in the trace file, flags 2, old 1, new 3: rc = 0
00100000:10000000:9.0:1530524123.231812:0:377274:0:(lfsck_striped_dir.c:505:lfsck_record_lmv()) dagg-MDT0002-osd: record slave LMV EA for the striped directory [0x68002fa24:0x172d1:0x0]: shard = [0x68002f9d1:0xa47c:0x0], index = 0, flags = 0, flags2 = 0, depth = 0
00100000:10000000:9.0:1530524123.231905:0:377274:0:(lfsck_striped_dir.c:505:lfsck_record_lmv()) dagg-MDT0002-osd: record slave LMV EA for the striped directory [0x68002fa24:0x172d1:0x0]: shard = [0x2800301a1:0xa489:0x0], index = 2, flags = 0, flags2 = 0, depth = 0
00100000:10000000:3.0:1530524123.231921:0:375744:0:(lfsck_engine.c:1092:lfsck_master_engine()) LFSCK exit: oit_flags = 0x60003, dir_flags = 0x8006, oit_cookie = 35184372088832, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 375744, rc = 1
00100000:10000000:3.0:1530524123.231926:0:375744:0:(lfsck_lib.c:2577:lfsck_post_generic()) dagg-MDT0002-osd: waiting for assistant to do lfsck_namespace post, rc = 1
00100000:10000000:9.0:1530524123.231951:0:377274:0:(lfsck_striped_dir.c:505:lfsck_record_lmv()) dagg-MDT0002-osd: record slave LMV EA for the striped directory [0x68002fa24:0x172d1:0x0]: shard = [0x20001f032:0xa4fa:0x0], index = 1, flags = 0, flags2 = 0, depth = 0
00100000:10000000:9.0:1530524123.231956:0:377274:0:(lfsck_striped_dir.c:2189:lfsck_namespace_striped_dir_rescan()) dagg-MDT0002-osd: namespace LFSCK repair the shard 0 [0x68002f9d1:0xa47c:0x0] of the striped directory [0x68002f9d1:0xa47c:0x0] with dangling no/no, rename no/no, llinkea no/no, repair_lmvea no/no: rc = 0
00100000:10000000:9.0:1530524123.231985:0:377274:0:(lfsck_striped_dir.c:2189:lfsck_namespace_striped_dir_rescan()) dagg-MDT0002-osd: namespace LFSCK repair the shard 1 [0x20001f032:0xa4fa:0x0] of the striped directory [0x20001f032:0xa4fa:0x0] with dangling no/no, rename no/no, llinkea no/no, repair_lmvea no/no: rc = 0
00100000:10000000:9.0:1530524123.232007:0:377274:0:(lfsck_striped_dir.c:2189:lfsck_namespace_striped_dir_rescan()) dagg-MDT0002-osd: namespace LFSCK repair the shard 2 [0x2800301a1:0xa489:0x0] of the striped directory [0x2800301a1:0xa489:0x0] with dangling no/no, rename no/no, llinkea no/no, repair_lmvea no/no: rc = 0
00100000:10000000:9.0:1530524123.232034:0:377274:0:(lfsck_engine.c:1650:lfsck_assistant_engine()) dagg-MDT0002-osd: lfsck_namespace LFSCK assistant thread post
00100000:10000000:3.0:1530524123.232035:0:375744:0:(lfsck_lib.c:2589:lfsck_post_generic()) dagg-MDT0002-osd: the assistant has done lfsck_namespace post, rc = 1
00100000:10000000:5.0F:1530524123.232052:0:17117:0:(lfsck_namespace.c:4721:lfsck_namespace_in_notify()) dagg-MDT0001-osd: namespace LFSCK handles notify 3 from MDT 2, status 1, flags 3
00100000:10000000:13.0F:1530524123.232053:0:24114:0:(lfsck_namespace.c:4721:lfsck_namespace_in_notify()) dagg-MDT0000-osd: namespace LFSCK handles notify 3 from MDT 2, status 1, flags 2
00100000:10000000:3.0:1530524123.232053:0:375744:0:(lfsck_namespace.c:4397:lfsck_namespace_post()) dagg-MDT0002-osd: namespace LFSCK post done: rc = 0
00100000:10000000:3.0:1530524123.232054:0:375744:0:(lfsck_lib.c:2607:lfsck_double_scan_generic()) dagg-MDT0002-osd: waiting for assistant to do lfsck_namespace double_scan, status 2
00100000:10000000:1.0F:1530524123.232061:0:375552:0:(lfsck_namespace.c:6229:lfsck_namespace_assistant_handler_p2()) dagg-MDT0000-osd: namespace LFSCK phase2 scan start
00100000:10000000:9.0:1530524123.232068:0:377274:0:(lfsck_engine.c:1669:lfsck_assistant_engine()) dagg-MDT0002-osd: LFSCK assistant notified others for lfsck_namespace post: rc = 0
00100000:10000000:9.0:1530524123.232070:0:377274:0:(lfsck_engine.c:1687:lfsck_assistant_engine()) dagg-MDT0002-osd: LFSCK assistant sync before the second-stage scaning
00100000:10000000:3.0:1530524123.232072:0:375744:0:(lfsck_lib.c:2617:lfsck_double_scan_generic()) dagg-MDT0002-osd: the assistant has done lfsck_namespace double_scan, status 0
00100000:10000000:11.0F:1530524123.232106:0:377219:0:(lfsck_namespace.c:6229:lfsck_namespace_assistant_handler_p2()) dagg-MDT0001-osd: namespace LFSCK phase2 scan start
00100000:10000000:1.0:1530524123.232204:0:375552:0:(lfsck_namespace.c:5944:lfsck_namespace_scan_local_lpf()) dagg-MDT0000-osd: stop to scan backend /lost+found: rc = 0
00100000:10000000:11.0:1530524123.232339:0:377219:0:(lfsck_namespace.c:5944:lfsck_namespace_scan_local_lpf()) dagg-MDT0001-osd: stop to scan backend /lost+found: rc = 0
00100000:10000000:11.0:1530524123.463692:0:377274:0:(lfsck_engine.c:1694:lfsck_assistant_engine()) dagg-MDT0002-osd: LFSCK assistant phase2 scan start, synced: rc = 0
00100000:10000000:11.0:1530524123.463694:0:377274:0:(lfsck_namespace.c:6229:lfsck_namespace_assistant_handler_p2()) dagg-MDT0002-osd: namespace LFSCK phase2 scan start
00100000:10000000:11.0:1530524123.463722:0:377274:0:(lfsck_namespace.c:5944:lfsck_namespace_scan_local_lpf()) dagg-MDT0002-osd: stop to scan backend /lost+found: rc = 0
00100000:10000000:3.0F:1530524329.684216:0:375552:0:(lfsck_namespace.c:2253:lfsck_namespace_repair_unmatched_pairs()) dagg-MDT0000-osd: namespace LFSCK rebuild dotdot name entry for the object [0x20001f034:0xb6c2:0x0], new parent [0x68002f9d2:0x866d:0x0]: rc = 1
00100000:10000000:2.0F:1530524869.309940:0:377274:0:(lfsck_namespace.c:1217:lfsck_namespace_insert_normal()) dagg-MDT0002-osd: namespace LFSCK insert object [0x68002c327:0xc8c9:0x0] with the name 00000001000002CB00000083 and type 100000 to the parent [0x2000182d2:0x2c6d:0x0]: rc = 1

cheers,
robin

Comment by Peter Jones [ 02/Jul/18 ]

Lai

Could you please assist?

Thanks

Peter

Comment by Lai Siyao [ 05/Jul/18 ]

Can you post the stack trace of all processes in the crash dump?

Comment by SC Admin (Inactive) [ 05/Jul/18 ]

sorry, the console log above is all we have. we don't have kernel crash dumps setup.

cheers,
robin

Comment by Lai Siyao [ 09/Jul/18 ]

This may be because LFSCK added an orphan back to its original parent directory, and then it was unlinked again, which triggered this assertion, I'll commit a fix soon.

Comment by Gerrit Updater [ 09/Jul/18 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32796
Subject: LU-11111 mdd: don't assert on ORPHAN flag
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 693fe452ec4349bb6ea12b8901b9ec1665d68e95

Comment by Lai Siyao [ 09/Jul/18 ]

I just uploaded a fix, you can apply it and update MDS only because it's a fix for MDT code.

Comment by SC Admin (Inactive) [ 25/Jul/18 ]

thanks.
I applied it last night and let the original lfsck continue.
it finished ok (output below).

because it's been a long time, and we've had many more crashes from LU-11082, I started another namespace lfsck.

this hung most of the way through phase2 on mdt0 and didn't complete. it completed on mdt1,2.
I waited about 15mins after all the counters stopped going up on mdt0 and then tried a lfsck_stop, but it just blocked. lfsck_namespace was using 100% of a core. looked hung. I had to power cycle the MDS.
MDTs mounted with -o skip_lfsck again now.

do you want me to open a new ticket for this hang or stick with this ticket?

there are a lot of problems repaired in both of the below. presumable because of LU-10887 and LU-11082 and maybe LU-10677.

output from continued lfsck from ages ago (that crashed in phase2) ->

[warble1]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets,create_mdtobj
last_completed_time: 1532449023
time_since_last_completed: 22 seconds
latest_start_time: 1532447977
time_since_latest_start: 1068 seconds
last_checkpoint_time: 1532449023
time_since_last_checkpoint: 22 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 14945199
checked_phase2: 6638062
updated_phase1: 0
updated_phase2: 1
failed_phase1: 0
failed_phase2: 0
directories: 2205524
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 324817
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 1
dangling_repaired: 92
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1639566
striped_shards_repaired: 1
striped_shards_failed: 0
striped_shards_skipped: 3
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 10
run_time_phase1: 2357 seconds
run_time_phase2: 1972 seconds
average_speed_phase1: 6340 items/sec
average_speed_phase2: 3366 objs/sec
average_speed_total: 4985 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets,create_mdtobj
last_completed_time: 1532448851
time_since_last_completed: 194 seconds
latest_start_time: 1532447985
time_since_latest_start: 1060 seconds
last_checkpoint_time: 1532448851
time_since_last_checkpoint: 194 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 14356291
checked_phase2: 6221317
updated_phase1: 0
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 2185653
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 323999
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 11
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1639415
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 5
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 11
run_time_phase1: 2374 seconds
run_time_phase2: 1800 seconds
average_speed_phase1: 6047 items/sec
average_speed_phase2: 3456 objs/sec
average_speed_total: 4929 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets,create_mdtobj
last_completed_time: 1532448851
time_since_last_completed: 194 seconds
latest_start_time: 1532448011
time_since_latest_start: 1034 seconds
last_checkpoint_time: 1532448851
time_since_last_checkpoint: 194 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 14368285
checked_phase2: 6254745
updated_phase1: 2
updated_phase2: 1
failed_phase1: 0
failed_phase2: 0
directories: 2187046
dirent_repaired: 0
linkea_repaired: 1
nlinks_repaired: 0
multiple_linked_checked: 324797
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 88
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 1
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1639666
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 4
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 11
run_time_phase1: 2387 seconds
run_time_phase2: 1799 seconds
average_speed_phase1: 6019 items/sec
average_speed_phase2: 3476 objs/sec
average_speed_total: 4926 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A

output from next namespace lfsck that stopped progressing on mdt0 (before power-off of MDS) ->

[warble1]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: scanning-phase2
flags: scanned-once,inconsistent
param: all_targets,create_mdtobj
last_completed_time: 1532449023
time_since_last_completed: 5691 seconds
latest_start_time: 1532449114
time_since_latest_start: 5600 seconds
last_checkpoint_time: 1532453018
time_since_last_checkpoint: 1696 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 18305111
checked_phase2: 5342520
updated_phase1: 44
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 2368594
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 381414
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 18
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1761665
striped_shards_repaired: 20
striped_shards_failed: 0
striped_shards_skipped: 84
name_hash_repaired: 44
linkea_overflow_cleared: 0
success_count: 10
run_time_phase1: 2823 seconds
run_time_phase2: 2776 seconds
average_speed_phase1: 6484 items/sec
average_speed_phase2: 1924 objs/sec
average_speed_total: 4223 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: 35 objs/sec
current_position: [0x20001f8cf:0xcddd:0x0]
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets,create_mdtobj
last_completed_time: 1532453205
time_since_last_completed: 1509 seconds
latest_start_time: 1532449114
time_since_latest_start: 5600 seconds
last_checkpoint_time: 1532453205
time_since_last_checkpoint: 1509 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 17716791
checked_phase2: 6134353
updated_phase1: 155
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 2348707
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 381110
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1761553
striped_shards_repaired: 78
striped_shards_failed: 0
striped_shards_skipped: 102
name_hash_repaired: 155
linkea_overflow_cleared: 0
success_count: 12
run_time_phase1: 2707 seconds
run_time_phase2: 1267 seconds
average_speed_phase1: 6544 items/sec
average_speed_phase2: 4841 objs/sec
average_speed_total: 6001 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets,create_mdtobj
last_completed_time: 1532453199
time_since_last_completed: 1515 seconds
latest_start_time: 1532449114
time_since_latest_start: 5600 seconds
last_checkpoint_time: 1532453199
time_since_last_checkpoint: 1515 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 17724868
checked_phase2: 6127858
updated_phase1: 1
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 2349705
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 380857
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 36
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1761725
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 102
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 12
run_time_phase1: 2698 seconds
run_time_phase2: 1261 seconds
average_speed_phase1: 6569 items/sec
average_speed_phase2: 4859 objs/sec
average_speed_total: 6024 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A

cheers,
robin

Comment by Lai Siyao [ 26/Jul/18 ]

As for the hung issue, I have a ticket LU-10888, but it's a fairly big issue, I need some time to think it over and implement.

Comment by Gerrit Updater [ 03/Aug/18 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32935
Subject: LU-11111 lfsck: print correct parent FID
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1c38dbb03e64992de7483ca57e724d608d5ebfb0

Comment by Andreas Dilger [ 03/Aug/18 ]

I think that suppressing the LASSERT() when the ORPHAN_OBJ flag is seen again is potentially the wrong fix for this problem.

It seems to me that the root cause here is that the file was unlinked by the user while it was still open/in use (which can happen commonly for temporary files). This results in ORPHAN_OBJ being set at the MDD layer, and LUSTRE_ORPHAN_FL should also set at the OSD layer. In this case, LFSCK should not be reconnecting the file/directory back into the namespace which later triggers the LASSERT(), since the user wants the file to be deleted. Instead, I think the more correct fix is to keep the LASSERT() as-is, and instead modify LFSCK to not reconnect objects that are marked with the LUSTRE_ORPHAN_FL at the OSD layer.

Unlike the ORPHAN_OBJ flag at the MDD layer, LFSCK can access the LUSTRE_ORPHAN_FL from the OSD before it tries to reconnect the object.

There is a secondary question about why the orphan object is itself not linked into the PENDING directory when it is being unlinked? This is where open-unlinked files should be located, so that they are not lost (unlinked at mount time by the normal ldiskfs orphan object handling) if the MDS crashes while the file/directory is open and in use. The MDS needs to be able to re-open the file after the crash, which is why such files are linked into PENDING, rather than just being referenced in memory.

Comment by Alexey Lyashkov [ 06/Aug/18 ]

I think we have hit this or similar issue with lfsck background run against sanity 230h (lfs migrate) or unlink a directory.
In this case FS, assert was don't hit but FS is corrupted as lfsck have insert an directory based on linkEA data

[35018.890469] Lustre: 30550:0:(lfsck_namespace.c:1220:lfsck_namespace_insert_normal()) fs1-MDT0000-osd: namespace LFSCK insert object [0x200030975:0x3:0x0] with the name d230h.sanity and type 40000 to the parent [0x200000007:0x1:0x0]: rc = 1

PENDING have flushed after MDT remount and it caused to remove this entry from PENDING, but not from "/" (aka [0x200000007:0x1:0x0] - so it cause a situation when dir entry have point to stale node

00080000:00000002:12.0:1532436667.487291:0:20602:0:(osd_handler.c:559:osd_iget_check()) no inode for FID: [0x200030975:0x3:0x0], ino = 422576130, rc = -116
Comment by Alexey Lyashkov [ 07/Aug/18 ]

I have replicated it with very simple test case.
console 1> mkdir /mnt/lustre/testdir && ./multiop /mnt/lustre/testdir D_c
console 2> rmdir /mnt/lustre/testdir && lctl lfsck_start -o -r -C on -c on -A -M lustre-MDT0000

lfsck moved an unlinked file to the normal namespace.
patch will be ready son.

Comment by Fan Yong [ 08/Aug/18 ]

Current namespace LFSCK logic does not check the ORPHAN flag, as to it may add the orphan object back to its original parent directory because of the stale misguided linkEA. That should be fixed. The fix should be done inside the LFSCK engine, instead of OSD. Because:

  1. The LFSCK is not only for namespace consistency, but also for layout consistency, and may be for more in the future. We can skip orphan object for namespace LFSCK, but still need to verify the layout consistency for orphan object to avoid accessing or destroying wrong OST-object(s).
  2. The trouble may happen for both ldiskfs and ZFS backend.
Comment by SC Admin (Inactive) [ 10/Sep/18 ]

we would like to lfsck our filesystem. we have a dozen or so new unremovable directories from the MDS hangs. unfortunately a bunch of users have noticed these.

last time I ran a lfsck it hung and I had to reset the MDS, which risks causing more corruption.

is there an ETA for a fix so that lfsck won't hang?

alternatively, suggestions for ways that we can remove DNE from the filesystem would be appreciated.
thanks.

cheers,
robin

Comment by Lai Siyao [ 12/Sep/18 ]

Hi Robin, it'll take a couple of weeks to finish patch for LU-10888.

What do you mean remove DNE? do you mean migrate all dir and files to MDT0 and remove other MDTs from the system? but 2.10 doesn't support striped dir migration yet, and patches to support it will land to master (check LU-4684) later.

Comment by Lai Siyao [ 13/Sep/18 ]

Robin, so there are four issues:
1. MDT hung during recovery, which will be fixed by LU-10888.
2. lfsck hung, I'd suggest you create a new ticket, and post all process backtraces.
3. unremovable dirs/files after lfsck, LU-10677.
4. MDT hung and won't umount, LU-10301.

Which do you think has the top priority?

BTW I'm curious how you use DNE, normally the operations that involves multiple MDTs should be rare, such as create/unlink/setattr striped directory, because you only need to create striped directory for certain directories that may contain lots of files, and normal directories are just plain. I saw your comment in LU-11301 mentioned inherited dir striping across all MDTs, do you mean dir default stripe is set, so all dirs are created as striped directory? And do you mkdir/rmdir frequently?

Comment by Alexey Lyashkov [ 13/Sep/18 ]

Lai,

you are wrong. This bug caused a unremovable files (ESTALE error) after MDT remount. MDT was kill an PENDING object, but normal namespace have a name and fid of deleted file. It was my start point.
MDT hung during recovery is unfixed for now, and caused a situation when OSP start to take attributes from MDT/OSP when this target is not accessible. dt_sync is fixed, but osp_getattr not (i have hit it many times).

Comment by SC Admin (Inactive) [ 13/Sep/18 ]

Hi Lai,

LU-11301 has by far the highest priority for us.

it hangs the entire cluster on average about once a week I guess. ie. cluster is down. also it's taken to evicting most clients recently too, which means a total reboot of clients, plus an unknown amount of jobs lost.
these incidents tend to happen in batches, so we can sometimes go a 2 or 3 weeks without incident and then 3 crashes in a row.
LU-11082 and LU-11301 have both had much the same symptoms.

we have inherited striping on all dirs across all 3 MDTs. this is the only DNE config that makes sense to us. how do we know where users are going to put 100's of millions of files across 10's of thousands of dirs? we cannot possibly know this. if we set inherited dir striping then the MDTs work like OSTs, and on average we will get good balance across them. multiple MDTs had some file stat() performance benefits, and we assume performance for the rest of ops will improve over time as DNE matures. AFAIK nobody does more than the usual amount of mkdir/rmdir, which yes, is ultra-slow.

so if this isn't a supported config or is something that's never going to work well, then a) please make that clear in docs b) there is a window of time while the filesystem isn't over half full yet, and we can take a few weeks or a month to rsync all files (probably <2 PiB) onto an unstriped top level dir on MDT0 (which we left there for this purpose), take a couple of days of downtime to tidy up, and then destroy MDT1,MDT2 and make the MDT0 pool bigger. ie. remove DNE entirely. that's the only other config that makes sense to us. will this approach work?

on your points ->

1. cool. thanks.
2. we applied patch from LU-11062 so stack traces are back. will re-try at some stage.
3. also being able to corrupt the filesystem with just some mkdir's
4. LU-11301

cheers,
robin

Comment by Lai Siyao [ 13/Sep/18 ]

See lustre manual: https://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#dbdoclet.deactivating_mdt_ost

After you copy all dir/files to MDT0, remove old dirs/files(or you can keep them, but they are not accessible after deactivation, and consume space of MDT0 and OSTs), and permanently deactivate MDT1-2, then you can stop MDT1-2.

I'm working on LU-11301, will update you when I make progress.

Comment by Gerrit Updater [ 05/Oct/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32935/
Subject: LU-11111 lfsck: print correct parent FID
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3583e1fd998c56977adde9abe9c9a9dc9bd88d58

Comment by Gerrit Updater [ 04/Feb/19 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34174
Subject: LU-11111 lfsck: skip orphan processing
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5a7be2a5cd89b67b55daa42725efee62b8e3fb57

Comment by Gerrit Updater [ 11/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32959/
Subject: LU-11111 lfsck: skip orphan processing
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 077570483e75e0610fd45149b926097547c434b8

Comment by Cory Spitz [ 08/Mar/19 ]

Since https://review.whamcloud.com/#/c/32959/ was landed after Andreas recommended, "to land this patch as-is, with the knowledge that there is a rare corner case that is missed," should we keep this ticket open or just track the remaining work somewhere else? Either way it seems that https://review.whamcloud.com/#/c/34174/ should be abandoned now.

Comment by Andreas Dilger [ 08/Mar/19 ]

Cory, that patch resolved the aforementioned race condition (LFSCK on an orphan object that also had a corrupt layout), but it was having problems in testing and I thought it prudent to fix the issue with the simpler (if slightly imperfect) version of the patch from Shadow, which we had eventually gotten working.

I don't want to abandon Fan yong's patch, since it shows how to fix the issue completely, but I don't think it is a priority to land at this point. I agree that it would probably make sense to open a different ticket and link the patch there, and close this one.

Comment by Gerrit Updater [ 28/Mar/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34528
Subject: LU-11111 lfsck: skip orphan processing
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 9847af4cbd7b2040e65decc589ecee2808eee521

Comment by Andreas Dilger [ 28/Mar/19 ]

Tracking remaining patch in LU-12127.

Comment by Gerrit Updater [ 08/Apr/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34528/
Subject: LU-11111 lfsck: skip orphan processing
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 92912099c0ee982d11019b2c586e255654eb9f45

Generated at Sat Feb 10 02:41:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.