[LU-15487] crash after mdd_dir_page_build() error Created: 27/Jan/22 Updated: 04/Feb/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Andreas Dilger | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Seeing crashes in random locations with apparent memory corruption shortly after mdd_dir_page_build() reports an error. https://testing.whamcloud.com/test_sets/293f9d80-1e10-4042-86b5-7816504cc1ae LNetError: 11244:0:(o2iblnd_cb.c:3371:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds LNetError: 11244:0:(o2iblnd_cb.c:3446:kiblnd_check_conns()) Timed out RDMA with 172.168.202.16@o2ib (105): c: 7, oc: 0, rc: 8 Lustre: 11433:0:(mdd_object.c:3460:mdd_dir_page_build()) build page failed: -22! LustreError: 11251:0:(events.c:496:ptlrpc_master_callback()) ASSERTION( callback == request_out_callback || callback == reply_in_callback || callback == client_bulk_callback || callback == request_in_callback || callback == reply_out_callback || callback == server_bulk_callback ) failed: LustreError: 11251:0:(events.c:496:ptlrpc_master_callback()) LBUG Pid: 11251, comm: kiblnd_sd_01_02 3.10.0-1062.1.1.el7_lustre.x86_64 #1 SMP Wed Mar 25 16:04:09 PDT 2020 Call Trace: libcfs_call_trace+0x8c/0xc0 [libcfs] lbug_with_loc+0x4c/0xa0 [libcfs] ptlrpc_master_callback+0xbd/0xc0 [ptlrpc] lnet_eq_enqueue_event+0x2e/0x140 [lnet] lnet_finalize+0x24c/0xd40 [lnet] kiblnd_recv+0x1cd/0x7c0 [ko2iblnd] lnet_ni_recv+0xc8/0x330 [lnet] lnet_recv_put+0x85/0xb0 [lnet] lnet_parse_local+0x5ae/0xd40 [lnet] lnet_parse+0x99a/0x11e0 [lnet] kiblnd_handle_rx+0x213/0x6b0 [ko2iblnd] kiblnd_scheduler+0xf42/0x1190 [ko2iblnd] kthread+0xd1/0xe0 Not yet sure of cause/effect, but filing ticket to track and submit a debug patch. LDISKFS-fs warning (device dm-18): ldiskfs_dx_add_entry:2629: Large directory feature is not enabled on this filesystem Lustre: 52007:0:(mdd_object.c:3460:mdd_dir_page_build()) build page failed: -22! WARNING: CPU: 80 PID: 74750 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0 list_del corruption. prev->next should be ffffa1d80d4a7000, but was (null) LustreError: 56942:0:(events.c:496:ptlrpc_master_callback()) ASSERTION( callback == request_out_callback || callback == reply_in_callback || callback == client_bulk_callback || callback == request_in_callback || callback == reply_out_callback || callback == server_bulk_callback ) failed: CPU: 80 PID: 74750 Comm: mdt_rdpg01_086 Kdump: loaded 3.10.0-1062.1.1.el7_lustre.x86_64 #1 __list_del_entry+0xa1/0xd0 list_del+0xd/0x30 ptlrpc_server_drop_request+0xe5/0x6d0 [ptlrpc] ptlrpc_server_finish_active_request+0x92/0x140 [ptlrpc] ptlrpc_server_handle_request+0x401/0xab0 [ptlrpc] ptlrpc_main+0xb34/0x1470 [ptlrpc] kthread+0xd1/0xe0 |
| Comments |
| Comment by Gerrit Updater [ 28/Jan/22 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46368 |
| Comment by Gerrit Updater [ 04/Mar/22 ] |
|
"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46697 |
| Comment by Gerrit Updater [ 30/May/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46368/ |