Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.15.5
-
None
-
3
-
9223372036854775807
Description
CX recently encountered this issue where they run many of their OSTs to full capacity and 2 OSS out of ~300 OSSs saw this crash.
[442601.088522] LustreError: 1038502:0:(tgt_grant.c:472:tgt_grant_space_left()) lustrefs-OST00a4: cli 55d74450-4b3f-415c-a830-9d3f4126f1d5/000000002ee65e21 left=529690624 < tot_grant=6756016175 unstable=0 pending=0 dirty=8417280 [442601.088528] LustreError: 1038502:0:(tgt_grant.c:472:tgt_grant_space_left()) Skipped 20975 previous similar messages [442710.230093] LustreError: 287529:0:(ofd_dev.c:1740:ofd_create_hdl()) lustrefs-OST00a4: unable to precreate: rc = -28 [442710.230098] LustreError: 287529:0:(ofd_dev.c:1740:ofd_create_hdl()) Skipped 3 previous similar messages [442973.953697] LustreError: 367776:0:(ofd_dev.c:1740:ofd_create_hdl()) lustrefs-OST00a4: unable to precreate: rc = -28 [442973.953702] LustreError: 367776:0:(ofd_dev.c:1740:ofd_create_hdl()) Skipped 5 previous similar messages [442990.802602] Lustre: 308494:0:(osd_handler.c:586:osd_ldiskfs_add_entry()) lustrefs-OST00a4: directory (inode: 157548816, FID: [0x9640110:0xabb62bd4:0x0]) has reached max size limit [442991.305415] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 [442991.306436] PGD 0 [442991.306687] Oops: 0000 [#1] SMP NOPTI [442991.307118] CPU: 161 PID: 2199701 Comm: ll_ost09_049 Kdump: loaded Tainted: G OE X -------- - - 4.18.0-553.5.1.el8_10_lustre_oci_250709.x86_64 #1 [442991.308869] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.6.4 02/27/2023 [442991.309812] RIP: 0010:osd_attr_get+0xa1/0x700 [osd_ldiskfs] [442991.310495] Code: 00 01 0f 85 42 03 00 00 a8 02 0f 85 50 03 00 00 4d 8d b5 a8 00 00 00 4c 89 f7 e8 2a 6d b8 d1 49 8b 6d 40 48 81 0b ff 9f 08 00 <48> 8b 45 58 48 89 43 18 48 8b 45 68 48 89 43 10 48 8b 45 78 48 89 [442991.312658] RSP: 0018:ff78785742fffbf0 EFLAGS: 00010206 [442991.313284] RAX: 0000000000000000 RBX: ff4eb835f2f86c30 RCX: ff4eb831602eb530 [442991.314121] RDX: 0000000000000001 RSI: ff4eb8320c953c00 RDI: ff4eb8320c953ca8 [442991.314956] RBP: 0000000000000000 R08: ff7878574f615ab9 R09: 0000000000000000 [442991.315796] R10: ff4eb837ed2ac600 R11: 0000000000000100 R12: ff4eb8382172bc80 [442991.316646] R13: ff4eb8320c953c00 R14: ff4eb8320c953ca8 R15: ff4eb835f2f86c00 [442991.317479] FS: 0000000000000000(0000) GS:ff4eb932a6440000(0000) knlGS:0000000000000000 [442991.318433] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [442991.319095] CR2: 0000000000000058 CR3: 00000148de810003 CR4: 0000000000771ee0 [442991.319946] PKRU: 55555554 [442991.320275] Call Trace: [442991.320570] ? __die_body+0x1a/0x60 [442991.320987] ? no_context+0x1ba/0x3f0 [442991.321421] ? __bad_area_nosemaphore+0x16c/0x1c0 [442991.321982] ? do_page_fault+0x37/0x12d [442991.322442] ? page_fault+0x1e/0x30 [442991.322863] ? osd_attr_get+0xa1/0x700 [osd_ldiskfs] [442991.323481] ofd_attr_get+0x93/0x2a0 [ofd] [442991.323965] ofd_lvbo_init+0x595/0x9e0 [ofd] [442991.324484] ? srso_alias_return_thunk+0x5/0xfcdfd [442991.325060] ? srso_alias_return_thunk+0x5/0xfcdfd [442991.325629] ? ldlm_handle_enqueue0+0xe27/0x1500 [ptlrpc] [442991.326336] ? srso_alias_return_thunk+0x5/0xfcdfd [442991.326904] ldlm_handle_enqueue0+0xe27/0x1500 [ptlrpc] [442991.327562] tgt_enqueue+0xa4/0x220 [ptlrpc] [442991.328122] tgt_request_handle+0xccd/0x1a20 [ptlrpc] [442991.328763] ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 [ptlrpc] [442991.329510] ? srso_alias_return_thunk+0x5/0xfcdfd [442991.330099] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] [442991.330847] ? srso_alias_return_thunk+0x5/0xfcdfd [442991.331430] ? srso_alias_return_thunk+0x5/0xfcdfd [442991.331983] ptlrpc_main+0xbec/0x1530 [ptlrpc] [442991.332566] ? srso_alias_return_thunk+0x5/0xfcdfd [442991.333160] ? ptlrpc_wait_event+0x590/0x590 [ptlrpc] [442991.333799] kthread+0x134/0x150 [442991.334214] ? set_kthread_struct+0x50/0x50 [442991.334716] ret_from_fork+0x1f/0x40