Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
None
-
None
-
3
-
9223372036854775807
Description
There's something landed recently (directio related I am sure) that has landed recently and causes regular though not very frequent failures in maloo.
[16534.089981] Lustre: DEBUG MARKER: == sanity test 119h: basic tests of memory unaligned dio ========================================================== 11:35:07 (1719315307) [16535.909936] LustreError: 443178:0:(rw26.c:426:ll_direct_rw_pages()) ASSERTION( i == pv->ldp_count ) failed: [16535.911077] LustreError: 443178:0:(rw26.c:426:ll_direct_rw_pages()) LBUG [16535.911822] CPU: 0 PID: 443178 Comm: multiop Kdump: loaded Tainted: G OE -------- --- 5.14.0-284.30.1.el9_2.x86_64 #1 [16535.912992] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [16535.913597] Call Trace: [16535.913905] <TASK> [16535.914173] dump_stack_lvl+0x34/0x48 [16535.914628] lbug_with_loc.cold+0x5/0x43 [libcfs] [16535.915184] ll_direct_rw_pages+0x46e/0x470 [lustre] [16535.915886] ll_direct_IO_impl+0x462/0xc90 [lustre] [16535.916432] ? bpf_lsm_inode_removexattr+0x10/0x10 [16535.916970] generic_file_direct_write+0x9c/0x1e0 [16535.917519] __generic_file_write_iter+0x9e/0x1a0 [16535.918029] vvp_io_write_start+0xb25/0xcb0 [lustre] [16535.918595] cl_io_start+0x61/0x130 [obdclass] [16535.919327] cl_io_loop+0x99/0x220 [obdclass] [16535.919846] ll_file_io_generic+0x53d/0xef0 [lustre] [16535.920415] do_file_write_iter+0x4e3/0x770 [lustre] [16535.920991] ? __alloc_pages+0xe6/0x230 [16535.921430] new_sync_write+0xff/0x190 [16535.921875] vfs_write+0x1ef/0x280 [16535.922270] ksys_write+0x5f/0xe0 [16535.922657] do_syscall_64+0x5c/0x90 [16535.923119] ? handle_mm_fault+0xc5/0x2a0 [16535.923582] ? do_user_addr_fault+0x1dd/0x6b0 [16535.924073] ? exc_page_fault+0x62/0x150 [16535.924504] entry_SYSCALL_64_after_hwframe+0x63/0xcd [16535.925059] RIP: 0033:0x7f415893eb97 [16535.925477] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [16535.927263] RSP: 002b:00007ffc09d90ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [16535.928037] RAX: ffffffffffffffda RBX: 0000000000406a48 RCX: 00007f415893eb97 [16535.928773] RDX: 0000000000000fff RSI: 00007f41585cb010 RDI: 0000000000000003 [16535.929501] RBP: 0000000000000fff R08: 0000000000000003 R09: 0000000000000000 [16535.930232] R10: 00007f41588070e0 R11: 0000000000000246 R12: 0000000000000fff [16535.930969] R13: 0000000000000003 R14: 00000000004044fc R15: 0000000000000122 [16535.931695] </TASK> [16535.932007] Kernel panic - not syncing: LBUG [16535.932480] CPU: 0 PID: 443178 Comm: multiop Kdump: loaded Tainted: G OE -------- --- 5.14.0-284.30.1.el9_2.x86_64 #1 [16535.933665] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [16535.934264] Call Trace: [16535.934571] <TASK> [16535.934840] dump_stack_lvl+0x34/0x48 [16535.935257] panic+0xf4/0x2c6 [16535.935637] lbug_with_loc.cold+0x1a/0x43 [libcfs] [16535.936168] ll_direct_rw_pages+0x46e/0x470 [lustre] [16535.936750] ll_direct_IO_impl+0x462/0xc90 [lustre] [16535.937312] ? bpf_lsm_inode_removexattr+0x10/0x10 [16535.937842] generic_file_direct_write+0x9c/0x1e0 [16535.938395] __generic_file_write_iter+0x9e/0x1a0 [16535.938918] vvp_io_write_start+0xb25/0xcb0 [lustre] [16535.939479] cl_io_start+0x61/0x130 [obdclass] [16535.940016] cl_io_loop+0x99/0x220 [obdclass] [16535.940542] ll_file_io_generic+0x53d/0xef0 [lustre] [16535.941098] do_file_write_iter+0x4e3/0x770 [lustre] [16535.941657] ? __alloc_pages+0xe6/0x230 [16535.942081] new_sync_write+0xff/0x190 [16535.942501] vfs_write+0x1ef/0x280 [16535.942892] ksys_write+0x5f/0xe0 [16535.943279] do_syscall_64+0x5c/0x90 [16535.943693] ? handle_mm_fault+0xc5/0x2a0 [16535.944132] ? do_user_addr_fault+0x1dd/0x6b0 [16535.944617] ? exc_page_fault+0x62/0x150 [16535.945050] entry_SYSCALL_64_after_hwframe+0x63/0xcd [16535.945589] RIP: 0033:0x7f415893eb97 [16535.945992] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [16535.947776] RSP: 002b:00007ffc09d90ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [16535.948557] RAX: ffffffffffffffda RBX: 0000000000406a48 RCX: 00007f415893eb97 [16535.949289] RDX: 0000000000000fff RSI: 00007f41585cb010 RDI: 0000000000000003 [16535.950032] RBP: 0000000000000fff R08: 0000000000000003 R09: 0000000000000000 [16535.950765] R10: 00007f41588070e0 R11: 0000000000000246 R12: 0000000000000fff [16535.951491] R13: 0000000000000003 R14: 00000000004044fc R15: 0000000000000122 [16535.952228] </TASK>
First time it hit in a seemingly unrelated review on June 11th: https://testing.whamcloud.com/test_sets/69350cd5-31d1-4654-bb37-1986fda71422
and then shortly after hit in master proper on June 25th: https://testing.whamcloud.com/test_sets/00f3ebed-3676-4495-9fd6-308afa7df77b
another failure from that session: https://testing.whamcloud.com/test_sets/79ab5267-752e-41f6-a27a-940ee3406679
Always hits in sanity test 119h