[LU-15824] lnet not working with EL5.4 MOFED5.2 Created: 05/May/22 Updated: 13/Jul/22 Resolved: 13/Jul/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Mahmoud Hanafi | Assignee: | Peter Jones |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 2 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Lnet not working with EL8.5 and MOFED5.2 with lustre 2.12.6. I first see this error. [Wed May 4 23:28:46 2022] alg: No test for adler32 (adler32-zlib) [Wed May 4 23:28:46 2022] alg: hash: digest failed on test 1 for crc32-table: ret=126 And this [Wed May 4 23:37:02 2022] LNetError: 7708:0:(lib-move.c:2955:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.141.16.185@o2ib417: -125 [Wed May 4 23:37:02 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 924 seconds [Wed May 4 23:37:59 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 981 seconds [Wed May 4 23:38:49 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 1031 seconds [Wed May 4 23:38:49 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Skipped 1 previous similar message [Wed May 4 23:40:04 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 1106 seconds [Wed May 4 23:40:04 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Skipped 1 previous similar message [Wed May 4 23:40:04 2022] INFO: task kworker/u256:1:7922 blocked for more than 120 seconds. [Wed May 4 23:40:04 2022] Tainted: G OE --------- - - 4.18.0-240.15.1.1nas.el8.t4.x86_64 #1 [Wed May 4 23:40:04 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed May 4 23:40:04 2022] kworker/u256:1 D 0 7922 2 0x80004080 [Wed May 4 23:40:04 2022] Workqueue: rdma_cm cma_work_handler [rdma_cm] [Wed May 4 23:40:04 2022] Call Trace: [Wed May 4 23:40:04 2022] __schedule+0x2a9/0x710 [Wed May 4 23:40:04 2022] schedule+0x4d/0xc0 [Wed May 4 23:40:04 2022] schedule_preempt_disabled+0x11/0x20 [Wed May 4 23:40:04 2022] __mutex_lock.isra.5+0x343/0x550 [Wed May 4 23:40:04 2022] ? kiblnd_post_rx+0x1ff/0x520 [ko2iblnd] [Wed May 4 23:40:04 2022] rdma_connect+0x1e/0x40 [rdma_cm] [Wed May 4 23:40:04 2022] kiblnd_cm_callback+0x1476/0x2220 [ko2iblnd] [Wed May 4 23:40:04 2022] ? __switch_to_asm+0x41/0x70 [Wed May 4 23:40:04 2022] cma_cm_event_handler+0x25/0xf0 [rdma_cm] [Wed May 4 23:40:04 2022] cma_work_handler+0x5a/0xb0 [rdma_cm] [Wed May 4 23:40:04 2022] process_one_work+0x1ae/0x3a0 [Wed May 4 23:40:04 2022] worker_thread+0x3c/0x3c0 [Wed May 4 23:40:04 2022] ? create_worker+0x1a0/0x1a0 [Wed May 4 23:40:04 2022] kthread+0x11d/0x140 [Wed May 4 23:40:04 2022] ? kthread_flush_work_fn+0x10/0x10 [Wed May 4 23:40:04 2022] ret_from_fork+0x22/0x40 [Wed May 4 23:40:54 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 1156 seconds [Wed May 4 23:40:54 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Skipped 1 previous similar message [Wed May 4 23:42:07 2022] INFO: task kworker/u256:1:7922 blocked for more than 120 seconds. [Wed May 4 23:42:07 2022] Tainted: G OE --------- - - 4.18.0-240.15.1.1nas.el8.t4.x86_64 #1 [Wed May 4 23:42:07 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed May 4 23:42:07 2022] kworker/u256:1 D 0 7922 2 0x80004080 [Wed May 4 23:42:07 2022] Workqueue: rdma_cm cma_work_handler [rdma_cm] [Wed May 4 23:42:07 2022] Call Trace: [Wed May 4 23:42:07 2022] __schedule+0x2a9/0x710 [Wed May 4 23:42:07 2022] schedule+0x4d/0xc0 [Wed May 4 23:42:07 2022] schedule_preempt_disabled+0x11/0x20 [Wed May 4 23:42:07 2022] __mutex_lock.isra.5+0x343/0x550 [Wed May 4 23:42:07 2022] ? kiblnd_post_rx+0x1ff/0x520 [ko2iblnd] [Wed May 4 23:42:07 2022] rdma_connect+0x1e/0x40 [rdma_cm] [Wed May 4 23:42:07 2022] kiblnd_cm_callback+0x1476/0x2220 [ko2iblnd] [Wed May 4 23:42:07 2022] ? __switch_to_asm+0x41/0x70 [Wed May 4 23:42:07 2022] cma_cm_event_handler+0x25/0xf0 [rdma_cm] [Wed May 4 23:42:07 2022] cma_work_handler+0x5a/0xb0 [rdma_cm] [Wed May 4 23:42:07 2022] process_one_work+0x1ae/0x3a0 [Wed May 4 23:42:07 2022] worker_thread+0x3c/0x3c0 [Wed May 4 23:42:07 2022] ? create_worker+0x1a0/0x1a0 [Wed May 4 23:42:07 2022] kthread+0x11d/0x140 [Wed May 4 23:42:07 2022] ? kthread_flush_work_fn+0x10/0x10 [Wed May 4 23:42:07 2022] ret_from_fork+0x22/0x40 [Wed May 4 23:42:09 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 1231 seconds [Wed May 4 23:42:09 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Skipped 1 previous similar message [Wed May 4 23:42:59 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 1281 seconds [Wed May 4 23:42:59 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Skipped 1 previous similar message See attached debug logs. |
| Comments |
| Comment by Peter Jones [ 05/May/22 ] |
|
Mahmoud Do you have the patch Peter |
| Comment by Mahmoud Hanafi [ 06/May/22 ] |
|
I don't think we have that. I will get a build with that patch. Thanks, |
| Comment by Mahmoud Hanafi [ 13/Jul/22 ] |
|
please close this |