Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.12.6
-
None
-
2
-
9223372036854775807
Description
Lnet not working with EL8.5 and MOFED5.2 with lustre 2.12.6.
I first see this error.
[Wed May 4 23:28:46 2022] alg: No test for adler32 (adler32-zlib) [Wed May 4 23:28:46 2022] alg: hash: digest failed on test 1 for crc32-table: ret=126
And this
[Wed May 4 23:37:02 2022] LNetError: 7708:0:(lib-move.c:2955:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.141.16.185@o2ib417: -125 [Wed May 4 23:37:02 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 924 seconds [Wed May 4 23:37:59 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 981 seconds [Wed May 4 23:38:49 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 1031 seconds [Wed May 4 23:38:49 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Skipped 1 previous similar message [Wed May 4 23:40:04 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 1106 seconds [Wed May 4 23:40:04 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Skipped 1 previous similar message [Wed May 4 23:40:04 2022] INFO: task kworker/u256:1:7922 blocked for more than 120 seconds. [Wed May 4 23:40:04 2022] Tainted: G OE --------- - - 4.18.0-240.15.1.1nas.el8.t4.x86_64 #1 [Wed May 4 23:40:04 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed May 4 23:40:04 2022] kworker/u256:1 D 0 7922 2 0x80004080 [Wed May 4 23:40:04 2022] Workqueue: rdma_cm cma_work_handler [rdma_cm] [Wed May 4 23:40:04 2022] Call Trace: [Wed May 4 23:40:04 2022] __schedule+0x2a9/0x710 [Wed May 4 23:40:04 2022] schedule+0x4d/0xc0 [Wed May 4 23:40:04 2022] schedule_preempt_disabled+0x11/0x20 [Wed May 4 23:40:04 2022] __mutex_lock.isra.5+0x343/0x550 [Wed May 4 23:40:04 2022] ? kiblnd_post_rx+0x1ff/0x520 [ko2iblnd] [Wed May 4 23:40:04 2022] rdma_connect+0x1e/0x40 [rdma_cm] [Wed May 4 23:40:04 2022] kiblnd_cm_callback+0x1476/0x2220 [ko2iblnd] [Wed May 4 23:40:04 2022] ? __switch_to_asm+0x41/0x70 [Wed May 4 23:40:04 2022] cma_cm_event_handler+0x25/0xf0 [rdma_cm] [Wed May 4 23:40:04 2022] cma_work_handler+0x5a/0xb0 [rdma_cm] [Wed May 4 23:40:04 2022] process_one_work+0x1ae/0x3a0 [Wed May 4 23:40:04 2022] worker_thread+0x3c/0x3c0 [Wed May 4 23:40:04 2022] ? create_worker+0x1a0/0x1a0 [Wed May 4 23:40:04 2022] kthread+0x11d/0x140 [Wed May 4 23:40:04 2022] ? kthread_flush_work_fn+0x10/0x10 [Wed May 4 23:40:04 2022] ret_from_fork+0x22/0x40 [Wed May 4 23:40:54 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 1156 seconds [Wed May 4 23:40:54 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Skipped 1 previous similar message [Wed May 4 23:42:07 2022] INFO: task kworker/u256:1:7922 blocked for more than 120 seconds. [Wed May 4 23:42:07 2022] Tainted: G OE --------- - - 4.18.0-240.15.1.1nas.el8.t4.x86_64 #1 [Wed May 4 23:42:07 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed May 4 23:42:07 2022] kworker/u256:1 D 0 7922 2 0x80004080 [Wed May 4 23:42:07 2022] Workqueue: rdma_cm cma_work_handler [rdma_cm] [Wed May 4 23:42:07 2022] Call Trace: [Wed May 4 23:42:07 2022] __schedule+0x2a9/0x710 [Wed May 4 23:42:07 2022] schedule+0x4d/0xc0 [Wed May 4 23:42:07 2022] schedule_preempt_disabled+0x11/0x20 [Wed May 4 23:42:07 2022] __mutex_lock.isra.5+0x343/0x550 [Wed May 4 23:42:07 2022] ? kiblnd_post_rx+0x1ff/0x520 [ko2iblnd] [Wed May 4 23:42:07 2022] rdma_connect+0x1e/0x40 [rdma_cm] [Wed May 4 23:42:07 2022] kiblnd_cm_callback+0x1476/0x2220 [ko2iblnd] [Wed May 4 23:42:07 2022] ? __switch_to_asm+0x41/0x70 [Wed May 4 23:42:07 2022] cma_cm_event_handler+0x25/0xf0 [rdma_cm] [Wed May 4 23:42:07 2022] cma_work_handler+0x5a/0xb0 [rdma_cm] [Wed May 4 23:42:07 2022] process_one_work+0x1ae/0x3a0 [Wed May 4 23:42:07 2022] worker_thread+0x3c/0x3c0 [Wed May 4 23:42:07 2022] ? create_worker+0x1a0/0x1a0 [Wed May 4 23:42:07 2022] kthread+0x11d/0x140 [Wed May 4 23:42:07 2022] ? kthread_flush_work_fn+0x10/0x10 [Wed May 4 23:42:07 2022] ret_from_fork+0x22/0x40 [Wed May 4 23:42:09 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 1231 seconds [Wed May 4 23:42:09 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Skipped 1 previous similar message [Wed May 4 23:42:59 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 1281 seconds [Wed May 4 23:42:59 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Skipped 1 previous similar message
See attached debug logs.