[LU-15078] sanity-sec test_26: nodemap_config_set_active()) ASSERTION( config->nmc_default_nodemap ) failed Created: 11/Oct/21 Updated: 30/Nov/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.8, Lustre 2.15.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Elena Gryaznova | Assignee: | Sebastien Buisson |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
[ 394.285494] Lustre: DEBUG MARKER: == sanity-sec test 26: test transferring very large nodemap ========================================================== 17:02:01 (1633539721) [ 1774.903911] LustreError: 11647:0:(nodemap_handler.c:1695:nodemap_config_set_active()) ASSERTION( config->nmc_default_nodemap ) failed: [ 1774.908810] LustreError: 11647:0:(nodemap_handler.c:1695:nodemap_config_set_active()) LBUG [ 1774.911038] Pid: 11647, comm: ll_cfg_requeue 4.18.0-193.19.1.el8_2.x86_64 #1 SMP Mon Sep 14 14:37:00 UTC 2020 [ 1774.913568] Call Trace TBD: [ 1774.915018] [<0>] libcfs_call_trace+0x6f/0x90 [libcfs] [ 1774.916666] [<0>] lbug_with_loc+0x43/0x80 [libcfs] [ 1774.918650] [<0>] nodemap_config_set_active+0x2a6/0x2b0 [ptlrpc] [ 1774.920492] [<0>] nodemap_config_set_active_mgc+0x3a/0x210 [ptlrpc] [ 1774.922355] [<0>] mgc_process_recover_nodemap_log+0x10d1/0x1200 [mgc] [ 1774.924222] [<0>] mgc_process_log+0x76e/0x800 [mgc] [ 1774.925804] [<0>] mgc_requeue_thread+0x29e/0x740 [mgc] [ 1774.927470] [<0>] kthread+0x112/0x130 [ 1774.928857] [<0>] ret_from_fork+0x35/0x40 [ 1774.930299] Kernel panic - not syncing: LBUG [ 1774.931747] CPU: 0 PID: 11647 Comm: ll_cfg_requeue Tainted: G OE ---------r-t - 4.18.0-193.19.1.el8_2.x86_64 #1 [ 1774.934471] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1774.936179] Call Trace: [ 1774.937345] dump_stack+0x5c/0x80 [ 1774.938612] panic+0xe7/0x2a9 [ 1774.939788] ? ret_from_fork+0x35/0x40 [ 1774.941091] lbug_with_loc.cold.10+0x18/0x18 [libcfs] [ 1774.942677] ? nodemap_config_set_active+0x29a/0x2b0 [ptlrpc] [ 1774.944384] nodemap_config_set_active+0x2a6/0x2b0 [ptlrpc] [ 1774.946044] nodemap_config_set_active_mgc+0x3a/0x210 [ptlrpc] [ 1774.947719] ? kmem_cache_free+0x18c/0x1b0 [ 1774.949043] mgc_process_recover_nodemap_log+0x10d1/0x1200 [mgc] [ 1774.950710] mgc_process_log+0x76e/0x800 [mgc] [ 1774.952075] mgc_requeue_thread+0x29e/0x740 [mgc] [ 1774.953524] ? finish_wait+0x80/0x80 [ 1774.954730] ? mgc_process_config+0xe60/0xe60 [mgc] [ 1774.956172] kthread+0x112/0x130 [ 1774.957304] ? kthread_flush_work_fn+0x10/0x10 [ 1774.958636] ret_from_fork+0x35/0x40 [ 1774.960354] Kernel Offset: 0x2cc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 1774.962825] ---[ end Kernel panic - not syncing: LBUG ]--- |
| Comments |
| Comment by Alena Nikitenko [ 23/Nov/21 ] |
|
I found similar crash in 2.12.8 tests https://testing.whamcloud.com/test_sets/a6e88cc1-a192-40de-8a3a-1afe126db426 [22430.721269] LustreError: 1203:0:(nodemap_handler.c:1663:nodemap_config_set_active()) ASSERTION( config->nmc_default_nodemap ) failed: [22430.723320] LustreError: 1203:0:(nodemap_handler.c:1663:nodemap_config_set_active()) LBUG [22430.724640] Pid: 1203, comm: ll_cfg_requeue 3.10.0-1160.45.1.el7_lustre.x86_64 #1 SMP Thu Nov 18 19:16:15 UTC 2021 [22430.726275] Call Trace: [22430.726741] [<ffffffffc09a87cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [22430.727910] [<ffffffffc09a887c>] lbug_with_loc+0x4c/0xa0 [libcfs] [22430.728940] [<ffffffffc0f1a497>] nodemap_config_set_active+0x2a7/0x2e0 [ptlrpc] [22430.730566] [<ffffffffc0f24728>] nodemap_config_set_active_mgc+0x38/0x1e0 [ptlrpc] [22430.731873] [<ffffffffc114a796>] mgc_process_recover_nodemap_log+0x316/0xe70 [mgc] [22430.733151] [<ffffffffc114d8de>] mgc_process_log+0x7be/0x8f0 [mgc] [22430.734184] [<ffffffffc114f988>] mgc_requeue_thread+0x2c8/0x870 [mgc] [22430.735270] [<ffffffffabec5e61>] kthread+0xd1/0xe0 [22430.736131] [<ffffffffac595df7>] ret_from_fork_nospec_end+0x0/0x39 [22430.737192] [<ffffffffffffffff>] 0xffffffffffffffff [22430.738233] Kernel panic - not syncing: LBUG [22430.738932] CPU: 1 PID: 1203 Comm: ll_cfg_requeue Kdump: loaded Tainted: G OE ------------ 3.10.0-1160.45.1.el7_lustre.x86_64 #1 [22430.740918] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [22430.741827] Call Trace: [22430.742248] [<ffffffffac583539>] dump_stack+0x19/0x1b [22430.743084] [<ffffffffac57d241>] panic+0xe8/0x21f [22430.743859] [<ffffffffc09a88cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [22430.744865] [<ffffffffc0f1a497>] nodemap_config_set_active+0x2a7/0x2e0 [ptlrpc] [22430.746064] [<ffffffffc0f24728>] nodemap_config_set_active_mgc+0x38/0x1e0 [ptlrpc] [22430.747310] [<ffffffffc0e77d80>] ? ptlrpc_request_cache_free+0x90/0x1d0 [ptlrpc] [22430.748525] [<ffffffffc0e79045>] ? __ptlrpc_req_finished+0x515/0x790 [ptlrpc] [22430.749681] [<ffffffffc114a796>] mgc_process_recover_nodemap_log+0x316/0xe70 [mgc] [22430.750888] [<ffffffffc114d8de>] mgc_process_log+0x7be/0x8f0 [mgc] [22430.751878] [<ffffffffac586d90>] ? schedule_timeout+0x170/0x2d0 [22430.752845] [<ffffffffc09af177>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [22430.753912] [<ffffffffc114f988>] mgc_requeue_thread+0x2c8/0x870 [mgc] [22430.754965] [<ffffffffabedadf0>] ? wake_up_state+0x20/0x20 [22430.755857] [<ffffffffc114f6c0>] ? mgc_process_config+0x13f0/0x13f0 [mgc] [22430.756935] [<ffffffffabec5e61>] kthread+0xd1/0xe0 [22430.757712] [<ffffffffabec5d90>] ? insert_kthread_work+0x40/0x40 [22430.758675] [<ffffffffac595df7>] ret_from_fork_nospec_begin+0x21/0x21 [22430.759703] [<ffffffffabec5d90>] ? insert_kthread_work+0x40/0x40 |
| Comment by Peter Jones [ 26/Nov/21 ] |
|
Sebastien Should this issue be a concern? Peter |
| Comment by Sebastien Buisson [ 29/Nov/21 ] |
|
I do not think this issue should be considered a blocker for 2.15. On master, it occurred only once in the past 4 weeks, and only when running full-dne-part-2 test session. All other sessions that run sanity-sec, like review-dne-part-2, review-dne-zfs-part-2 and review-dne-selinux-ssk-part-2, did not hit this bug. |
| Comment by Xinliang Liu [ 19/Jun/23 ] |
|
I found that I can hit this issue easily on arm AlmaLinux 8.8 with 4k page size kernel , master v2.15.56.9. ``` [28814.645272] Lustre: DEBUG MARKER: == sanity-sec test 26: test transferring very large nodemap ========================================================== 18:02:24 (1687024944) ``` Note that, the test start running time is [28814.645272]s and crash happen time is [35582.906885]s , it seems stuck for quite a long time 112.8 Mins before crash. |