[LU-15886] remove unreasonable assertions in LFSCK code Created: 25/May/22 Updated: 20/Jul/22 Resolved: 20/Jul/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Lai Siyao | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
LFSCK should assert on disk data, i.e. any kind of corrupt data is possible, this can avoid annoying crash in LFSCK. |
| Comments |
| Comment by Lai Siyao [ 25/May/22 ] |
|
Crashes are seen like this: [708827.866619] LustreError: 4172:0:(lfsck_lib.c:1639:lfsck_instance_cleanup()) ASSERTION( lfsck->li_obj_dir == ((void *)0) ) failed: [708827.870508] LustreError: 4172:0:(lfsck_lib.c:1639:lfsck_instance_cleanup()) LBUG [708827.872606] Pid: 4172, comm: umount 3.10.0-1160.49.1.el7_lustre.ddn16.x86_64 #1 SMP Mon Dec 20 11:42:01 PST 2021 [708827.872607] Call Trace: [708827.872623] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs] [708827.872628] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs] [708827.872637] [<0>] lfsck_instance_cleanup+0x658/0x730 [lfsck] [708827.872643] [<0>] lfsck_degister+0x43/0x50 [lfsck] [708827.872651] [<0>] mdd_process_config+0x16a/0x5f0 [mdd] [708827.872666] [<0>] mdt_stack_fini+0x2c2/0xca0 [mdt] [708827.872673] [<0>] mdt_device_fini+0x34b/0x930 [mdt] [708827.872698] [<0>] class_cleanup+0x9b8/0xc50 [obdclass] [708827.872713] [<0>] class_process_config+0x65c/0x2830 [obdclass] [708827.872728] [<0>] class_manual_cleanup+0x1c6/0x710 [obdclass] [708827.872745] [<0>] server_put_super+0xa35/0x1150 [obdclass] [708827.872748] [<0>] generic_shutdown_super+0x6d/0x100 [708827.872750] [<0>] kill_anon_super+0x12/0x20 [708827.872764] [<0>] lustre_kill_super+0x32/0x50 [obdclass] [708827.872765] [<0>] deactivate_locked_super+0x4e/0x70 [708827.872766] [<0>] deactivate_super+0x46/0x60 [708827.872768] [<0>] cleanup_mnt+0x3f/0x80 [708827.872770] [<0>] __cleanup_mnt+0x12/0x20 [708827.872774] [<0>] task_work_run+0xbb/0xe0 [708827.872776] [<0>] do_notify_resume+0xa5/0xc0 [708827.872778] [<0>] int_signal+0x12/0x17 [708827.872795] [<0>] 0xfffffffffffffffe [708827.872797] Kernel panic - not syncing: LBUG and [10089.987070] Lustre: vriprod1-OST000f: deleting orphan objects from 0x0:10180632 to 0x0:10182049 [10090.183768] LustreError: 29027:0:(lfsck_namespace.c:5896:lfsck_namespace_scan_local_lpf_one()) ASSERTION( dt_object_exists(child) ) failed: [10090.185375] LustreError: 29027:0:(lfsck_namespace.c:5896:lfsck_namespace_scan_local_lpf_one()) LBUG [10090.186495] Pid: 29027, comm: lfsck_namespace 3.10.0-1160.49.1.el7_lustre.ddn16.x86_64 #1 SMP Mon Dec 20 11:42:01 PST 2021 [10090.186497] Call Trace: [10090.186526] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs] [10090.186532] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs] [10090.186571] [<0>] lfsck_namespace_scan_local_lpf_one+0xa39/0xdf0 [lfsck] [10090.186580] [<0>] lfsck_namespace_scan_local_lpf+0x59c/0x970 [lfsck] [10090.186592] [<0>] lfsck_namespace_assistant_handler_p2+0x682/0xa80 [lfsck] [10090.186600] [<0>] lfsck_assistant_engine+0xfb1/0x20a0 [lfsck] [10090.186604] [<0>] kthread+0xd1/0xe0 [10090.186607] [<0>] ret_from_fork_nospec_begin+0x7/0x21 [10090.186632] [<0>] 0xfffffffffffffffe [10090.186633] Kernel panic - not syncing: LBUG |
| Comment by Gerrit Updater [ 25/May/22 ] |
|
"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47447 |
| Comment by Lai Siyao [ 29/May/22 ] |
|
Another crash hit: [ 420.060499] Lustre: vriprod1-OST007f: deleting orphan objects from 0x1300000404:364617 to 0x1300000404:364961 [ 420.061574] Lustre: vriprod1-OST0019: deleting orphan objects from 0x118000040e:3594584 to 0x118000040e:3595009 [ 426.029391] LustreError: 1129:0:(lfsck_namespace.c:3340:lfsck_namespace_linkea_clear_overflow()) ASSERTION( ldata->ld_leh->leh_reccount > 0 ) failed: [ 426.034333] LustreError: 1129:0:(lfsck_namespace.c:3340:lfsck_namespace_linkea_clear_overflow()) LBUG [ 426.037651] Pid: 1129, comm: lfsck_namespace 3.10.0-1160.49.1.el7_lustre.ddn16.x86_64 #1 SMP Mon Dec 20 11:42:01 PST 2021 [ 426.037653] Call Trace: [ 426.037697] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs] [ 426.037703] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 426.037753] [<0>] lfsck_namespace_linkea_clear_overflow.isra.66+0x390/0x4d3 [lfsck] [ 426.037771] [<0>] lfsck_namespace_double_scan_one+0x1b2/0x15a0 [lfsck] [ 426.037787] [<0>] lfsck_namespace_double_scan_one_trace_file+0x3ba/0x7d0 [lfsck] [ 426.037800] [<0>] lfsck_namespace_assistant_handler_p2+0x6e0/0xa80 [lfsck] [ 426.037814] [<0>] lfsck_assistant_engine+0xfb1/0x20a0 [lfsck] [ 426.037818] [<0>] kthread+0xd1/0xe0 [ 426.037822] [<0>] ret_from_fork_nospec_begin+0x7/0x21 [ 426.037898] [<0>] 0xfffffffffffffffe [ 426.037900] Kernel panic - not syncing: LBUG [ 426.040053] CPU: 8 PID: 1129 Comm: lfsck_namespace Kdump: loaded Tainted: G OE ------------ T 3.10.0-1160.49.1.el7_lustre.ddn16.x86_64 #1 [ 426.044558] Hardware name: DDN SFA400NVXE, BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 426.047323] Call Trace: [ 426.049339] [<ffffffff88584539>] dump_stack+0x19/0x1b [ 426.051794] [<ffffffff8857e241>] panic+0xe8/0x21f [ 426.053797] [<ffffffffc0c8d8fb>] lbug_with_loc+0x9b/0xa0 [libcfs] [ 426.056032] [<ffffffffc16b12fd>] lfsck_namespace_linkea_clear_overflow.isra.66+0x390/0x4d3 [lfsck] [ 426.059110] [<ffffffffc167eb72>] lfsck_namespace_double_scan_one+0x1b2/0x15a0 [lfsck] [ 426.061930] [<ffffffffc168031a>] lfsck_namespace_double_scan_one_trace_file+0x3ba/0x7d0 [lfsck] [ 426.064345] [<ffffffffc16840d0>] lfsck_namespace_assistant_handler_p2+0x6e0/0xa80 [lfsck] [ 426.066723] [<ffffffffc10e6087>] ? ptlrpc_set_destroy+0x1f7/0x460 [ptlrpc] [ 426.069097] [<ffffffff88026ae6>] ? kfree+0x106/0x140 [ 426.071256] [<ffffffffc10e6087>] ? ptlrpc_set_destroy+0x1f7/0x460 [ptlrpc] [ 426.073723] [<ffffffffc1666a81>] lfsck_assistant_engine+0xfb1/0x20a0 [lfsck] [ 426.076042] [<ffffffff88589df0>] ? __schedule+0x320/0x680 [ 426.078232] [<ffffffff87edadf0>] ? wake_up_state+0x20/0x20 [ 426.080655] [<ffffffffc1665ad0>] ? lfsck_master_engine+0x1360/0x1360 [lfsck] [ 426.082720] [<ffffffff87ec5e61>] kthread+0xd1/0xe0 [ 426.084783] [<ffffffff87ec5d90>] ? insert_kthread_work+0x40/0x40 [ 426.086902] [<ffffffff88596ddd>] ret_from_fork_nospec_begin+0x7/0x21 [ 426.088888] [<ffffffff87ec5d90>] ? insert_kthread_work+0x40/0x40 |
| Comment by Gerrit Updater [ 18/Jul/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47447/ |
| Comment by Peter Jones [ 20/Jul/22 ] |
|
Landed for 2.16 |