Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.15.4
-
None
-
2.15.4 clients + servers, ldiskfs, CentOS 7.9 kernel 3.10.0-1160.108.1.el7_lustre.pl1.x86_64
-
3
-
9223372036854775807
Description
Hello, we hit our first LBUG with 2.15.4 in production with the following trace:
[1928909.418709] Lustre: fir-OST0099: Bulk IO read error with 13b41e84-d040-4a81-8804-171135a5ca68 (at 10.51.14.12@o2ib3), client will retry: rc -110 [1928909.431824] Lustre: Skipped 46 previous similar messages [1928922.657005] LustreError: 44052:0:(ldlm_lockd.c:261:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.51.10.38@o2ib3 ns: filter-fir-OST009f_UUID lock: ffff9e292b11e540/0xa96a836f66e96b33 lrc: 3/0,0 mode: PW/PW res: [0x2ac0000400:0x1e13a4b:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 4194304->4259839) gid 0 flags: 0x60000400020020 nid: 10.51.10.38@o2ib3 remote: 0xc0aaf00bc9c529b4 expref: 44 pid: 47188 timeout: 1929024 lvb_type: 0 [1928922.699614] LustreError: 33995:0:(client.c:1256:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff9e215aab2400 x1789761432555520/t0(0) o105->fir-OST009f@10.51.10.38@o2ib3:15/16 lens 392/224 e 0 to 0 dl 0 ref 1 fl Rpc:QU/0/ffffffff rc 0/-1 job:'' [1928923.056984] LustreError: 46818:0:(ldlm_lib.c:3538:target_bulk_io()) @@@ Eviction on bulk READ req@ffff9e205b151f80 x1790434282424320/t0(0) o3->de1c7312-0728-4b26-8a91-192b15c87ece@10.51.10.38@o2ib3:9/0 lens 488/440 e 0 to 0 dl 1708737149 ref 1 fl Interpret:/0/0 rc 0/0 job:'41750242' [1928923.082396] LustreError: 46818:0:(ofd_io.c:1027:ofd_commitrw_read()) ASSERTION( ofd_object_exists(fo) ) failed: [1928923.092895] LustreError: 46818:0:(ofd_io.c:1027:ofd_commitrw_read()) LBUG [1928923.099873] Pid: 46818, comm: ll_ost_io01_031 3.10.0-1160.108.1.el7_lustre.pl1.x86_64 #1 SMP Fri Jan 26 11:26:38 PST 2024 [1928923.111132] Call Trace: [1928923.113781] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs] [1928923.119107] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs] [1928923.124079] [<0>] ofd_commitrw+0xd2a/0xd80 [ofd] [1928923.128901] [<0>] tgt_brw_read+0xa88/0x2030 [ptlrpc] [1928923.134058] [<0>] tgt_request_handle+0x93f/0x19d0 [ptlrpc] [1928923.139729] [<0>] ptlrpc_server_handle_request+0x253/0xc30 [ptlrpc] [1928923.146185] [<0>] ptlrpc_main+0xbf4/0x15e0 [ptlrpc] [1928923.151246] [<0>] kthread+0xd1/0xe0 [1928923.154921] [<0>] ret_from_fork_nospec_begin+0x7/0x21 [1928923.160178] [<0>] 0xfffffffffffffffe [1928923.163930] Kernel panic - not syncing: LBUG [1928923.168372] CPU: 26 PID: 46818 Comm: ll_ost_io01_031 Kdump: loaded Tainted: G OE ------------ 3.10.0-1160.108.1.el7_lustre.pl1.x86_64 #1 [1928923.181917] Hardware name: Dell Inc. PowerEdge R6525/07Y51T, BIOS 2.13.3 09/12/2023 [1928923.189741] Call Trace: [1928923.192369] [<ffffffff90db1bec>] dump_stack+0x19/0x1f [1928923.197681] [<ffffffff90dab708>] panic+0xe8/0x21f [1928923.202836] [<ffffffffc055a5eb>] lbug_with_loc+0x9b/0xa0 [libcfs] [1928923.209310] [<ffffffffc185730a>] ofd_commitrw+0xd2a/0xd80 [ofd] [1928923.215493] [<ffffffff906cc790>] ? wake_up_atomic_t+0x40/0x40 [1928923.221520] [<ffffffffc13c9818>] tgt_brw_read+0xa88/0x2030 [ptlrpc] [1928923.228053] [<ffffffffc1358586>] ? ptl_send_buf+0x136/0x540 [ptlrpc] [1928923.234674] [<ffffffffc13625a7>] ? lustre_msg_add_version+0x27/0xb0 [ptlrpc] [1928923.241987] [<ffffffffc13628e2>] ? lustre_pack_reply_v2+0x142/0x2c0 [ptlrpc] [1928923.249302] [<ffffffffc1362ad2>] ? lustre_pack_reply_flags+0x72/0x1f0 [ptlrpc] [1928923.256787] [<ffffffffc1362c61>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [1928923.263672] [<ffffffffc13c711f>] tgt_request_handle+0x93f/0x19d0 [ptlrpc] [1928923.270728] [<ffffffffc13a87a5>] ? ptlrpc_nrs_req_get_nolock0+0xd5/0x170 [ptlrpc] [1928923.278463] [<ffffffffc055703e>] ? ktime_get_real_seconds+0xe/0x20 [libcfs] [1928923.285694] [<ffffffffc1372dc3>] ptlrpc_server_handle_request+0x253/0xc30 [ptlrpc] [1928923.293513] [<ffffffff906d8990>] ? task_rq_unlock+0x20/0x20 [1928923.299346] [<ffffffff906d8f73>] ? __wake_up+0x13/0x20 [1928923.304758] [<ffffffffc1374a54>] ptlrpc_main+0xbf4/0x15e0 [ptlrpc] [1928923.311351] [<ffffffffc1373e60>] ? ptlrpc_wait_event+0x5d0/0x5d0 [ptlrpc] [1928923.318550] [<ffffffff906cb621>] kthread+0xd1/0xe0 [1928923.323602] [<ffffffff906cb550>] ? insert_kthread_work+0x40/0x40 [1928923.329868] [<ffffffff90dc51dd>] ret_from_fork_nospec_begin+0x7/0x21 [1928923.336482] [<ffffffff906cb550>] ? insert_kthread_work+0x40/0x40
Attaching vmcore-dmesg as fir-io8-s2_2024-02-23_17-11-22_vmcore-dmesg.txt and the output of "foreach bt" as fir-io8-s2_2024-02-23_17-11-22_foreach_bt.txt
The vmcore can be made available upon request.
Any ideas? Thanks!
Attachments
Issue Links
- duplicates
-
LU-16345 ofd_commitrw_read() can be passed non-existing object
- Resolved