Details
-
Bug
-
Resolution: Unresolved
-
Major
-
Lustre 2.17.0
-
None
-
3
-
9223372036854775807
Description
This is a periodic assertion we hit in maloo, first time in Apr 2025. While most occurrences were in sanity quota, we just hit one in lfsck too.
[32438.240695] Lustre: DEBUG MARKER: == sanity-lfsck test 16: LFSCK can repair inconsistent MDT-object/OST-object owner ========================================================== 09:07:29 (1761728849) [32438.289458] Lustre: 1352109:0:(osd_internal.h:1470:osd_trans_exec_op()) lustre-MDT0000: opcode 2: before 251 < left 278, rollback = 2 [32438.289653] Lustre: 1352109:0:(osd_internal.h:1470:osd_trans_exec_op()) Skipped 1799 previous similar messages [32438.289750] Lustre: 1352109:0:(osd_handler.c:2076:osd_trans_dump_creds()) create: 1/4/4, destroy: 0/0/0 [32438.289844] Lustre: 1352109:0:(osd_handler.c:2076:osd_trans_dump_creds()) Skipped 1799 previous similar messages [32438.289959] Lustre: 1352109:0:(osd_handler.c:2083:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 4/278/0 [32438.290054] Lustre: 1352109:0:(osd_handler.c:2083:osd_trans_dump_creds()) Skipped 1799 previous similar messages [32438.290152] Lustre: 1352109:0:(osd_handler.c:2090:osd_trans_dump_creds()) write: 1/11/0, punch: 0/0/0, quota 1/3/2 [32438.290259] Lustre: 1352109:0:(osd_handler.c:2090:osd_trans_dump_creds()) Skipped 1799 previous similar messages [32438.290357] Lustre: 1352109:0:(osd_handler.c:2100:osd_trans_dump_creds()) insert: 4/65/3, delete: 0/0/0 [32438.290453] Lustre: 1352109:0:(osd_handler.c:2100:osd_trans_dump_creds()) Skipped 1799 previous similar messages [32438.290554] Lustre: 1352109:0:(osd_handler.c:2107:osd_trans_dump_creds()) ref_add: 2/2/0, ref_del: 0/0/0 [32438.290653] Lustre: 1352109:0:(osd_handler.c:2107:osd_trans_dump_creds()) Skipped 1799 previous similar messages [32438.351208] LustreError: 1356644:0:(qmt_entry.c:1165:qmt_map_lge_idx()) qmt: cannot map ostidx 3, num_used 3: rc = -22 [32438.351402] LustreError: 1356644:0:(qmt_entry.c:1230:qmt_seed_glbe_all()) ASSERTION( idx >= 0 ) failed: idx -22 lqe_is_global 1 lqe ff4feaf57122ccb8 [32438.351496] LustreError: 1356644:0:(qmt_entry.c:1230:qmt_seed_glbe_all()) LBUG [32438.351590] CPU: 0 PID: 1356644 Comm: mdt_rdpg00_002 Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.40.1_lustre.el9.x86_64 #1 [32438.351684] Hardware name: Red Hat KVM, BIOS 1.16.3-2.el9_5.1 04/01/2014 [32438.351780] Call Trace: [32438.351899] <TASK> [32438.352005] dump_stack_lvl+0x34/0x48 [32438.352101] lbug_with_loc.cold+0x5/0x43 [lnet] [32438.352227] qmt_seed_glbe_all+0x3d1/0x7c0 [lquota] [32438.352375] qmt_setup_lqe_gd+0x14b/0x1b0 [lquota] [32438.352542] qmt_lvbo_init+0x349/0x820 [lquota] [32438.352659] ldlm_lvbo_init+0x62/0x1d0 [ptlrpc] [32438.352860] ldlm_handle_enqueue+0x5a6/0x16d0 [ptlrpc] [32438.353068] tgt_enqueue+0x60/0x240 [ptlrpc] [32438.353263] tgt_handle_request0+0x147/0x770 [ptlrpc] [32438.353465] tgt_request_handle+0x3fd/0xd00 [ptlrpc] [32438.353646] ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc] [32438.353834] ? srso_alias_return_thunk+0x5/0xfbef5 [32438.353949] ptlrpc_main+0x9bf/0xea0 [ptlrpc] [32438.354132] ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc] [32438.354318] kthread+0xdd/0x100 [32438.354411] ? __pfx_kthread+0x10/0x10 [32438.354504] ret_from_fork+0x29/0x50 [32438.354611] </TASK> [32438.354717] Kernel panic - not syncing: LBUG
First hit: https://testing.whamcloud.com/test_sets/de9713b3-da39-4f96-9ed2-b63c8560a4ce
Just hit in master next: https://testing.whamcloud.com/test_sets/7b58893c-f3ba-4ca0-8899-6374c5aead5d
from the looks of it, error handling is just not there? This is relatively srerious as the assertion takes down an mds.