Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15824

lnet not working with EL5.4 MOFED5.2

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.12.6
    • None
    • 2
    • 9223372036854775807

    Description

      Lnet not working with EL8.5 and MOFED5.2 with lustre 2.12.6.

      I first see this error.

      [Wed May  4 23:28:46 2022] alg: No test for adler32 (adler32-zlib)
      [Wed May  4 23:28:46 2022] alg: hash: digest failed on test 1 for crc32-table: ret=126
        

      And this

      [Wed May  4 23:37:02 2022] LNetError: 7708:0:(lib-move.c:2955:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.141.16.185@o2ib417: -125
      [Wed May  4 23:37:02 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 924 seconds
      [Wed May  4 23:37:59 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 981 seconds
      [Wed May  4 23:38:49 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 1031 seconds
      [Wed May  4 23:38:49 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Skipped 1 previous similar message
      [Wed May  4 23:40:04 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 1106 seconds
      [Wed May  4 23:40:04 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Skipped 1 previous similar message
      [Wed May  4 23:40:04 2022] INFO: task kworker/u256:1:7922 blocked for more than 120 seconds.
      [Wed May  4 23:40:04 2022]       Tainted: G           OE    --------- -  - 4.18.0-240.15.1.1nas.el8.t4.x86_64 #1
      [Wed May  4 23:40:04 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [Wed May  4 23:40:04 2022] kworker/u256:1  D    0  7922      2 0x80004080
      [Wed May  4 23:40:04 2022] Workqueue: rdma_cm cma_work_handler [rdma_cm]
      [Wed May  4 23:40:04 2022] Call Trace:
      [Wed May  4 23:40:04 2022]  __schedule+0x2a9/0x710
      [Wed May  4 23:40:04 2022]  schedule+0x4d/0xc0
      [Wed May  4 23:40:04 2022]  schedule_preempt_disabled+0x11/0x20
      [Wed May  4 23:40:04 2022]  __mutex_lock.isra.5+0x343/0x550
      [Wed May  4 23:40:04 2022]  ? kiblnd_post_rx+0x1ff/0x520 [ko2iblnd]
      [Wed May  4 23:40:04 2022]  rdma_connect+0x1e/0x40 [rdma_cm]
      [Wed May  4 23:40:04 2022]  kiblnd_cm_callback+0x1476/0x2220 [ko2iblnd]
      [Wed May  4 23:40:04 2022]  ? __switch_to_asm+0x41/0x70
      [Wed May  4 23:40:04 2022]  cma_cm_event_handler+0x25/0xf0 [rdma_cm]
      [Wed May  4 23:40:04 2022]  cma_work_handler+0x5a/0xb0 [rdma_cm]
      [Wed May  4 23:40:04 2022]  process_one_work+0x1ae/0x3a0
      [Wed May  4 23:40:04 2022]  worker_thread+0x3c/0x3c0
      [Wed May  4 23:40:04 2022]  ? create_worker+0x1a0/0x1a0
      [Wed May  4 23:40:04 2022]  kthread+0x11d/0x140
      [Wed May  4 23:40:04 2022]  ? kthread_flush_work_fn+0x10/0x10
      [Wed May  4 23:40:04 2022]  ret_from_fork+0x22/0x40
      [Wed May  4 23:40:54 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 1156 seconds
      [Wed May  4 23:40:54 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Skipped 1 previous similar message
      [Wed May  4 23:42:07 2022] INFO: task kworker/u256:1:7922 blocked for more than 120 seconds.
      [Wed May  4 23:42:07 2022]       Tainted: G           OE    --------- -  - 4.18.0-240.15.1.1nas.el8.t4.x86_64 #1
      [Wed May  4 23:42:07 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [Wed May  4 23:42:07 2022] kworker/u256:1  D    0  7922      2 0x80004080
      [Wed May  4 23:42:07 2022] Workqueue: rdma_cm cma_work_handler [rdma_cm]
      [Wed May  4 23:42:07 2022] Call Trace:
      [Wed May  4 23:42:07 2022]  __schedule+0x2a9/0x710
      [Wed May  4 23:42:07 2022]  schedule+0x4d/0xc0
      [Wed May  4 23:42:07 2022]  schedule_preempt_disabled+0x11/0x20
      [Wed May  4 23:42:07 2022]  __mutex_lock.isra.5+0x343/0x550
      [Wed May  4 23:42:07 2022]  ? kiblnd_post_rx+0x1ff/0x520 [ko2iblnd]
      [Wed May  4 23:42:07 2022]  rdma_connect+0x1e/0x40 [rdma_cm]
      [Wed May  4 23:42:07 2022]  kiblnd_cm_callback+0x1476/0x2220 [ko2iblnd]
      [Wed May  4 23:42:07 2022]  ? __switch_to_asm+0x41/0x70
      [Wed May  4 23:42:07 2022]  cma_cm_event_handler+0x25/0xf0 [rdma_cm]
      [Wed May  4 23:42:07 2022]  cma_work_handler+0x5a/0xb0 [rdma_cm]
      [Wed May  4 23:42:07 2022]  process_one_work+0x1ae/0x3a0
      [Wed May  4 23:42:07 2022]  worker_thread+0x3c/0x3c0
      [Wed May  4 23:42:07 2022]  ? create_worker+0x1a0/0x1a0
      [Wed May  4 23:42:07 2022]  kthread+0x11d/0x140
      [Wed May  4 23:42:07 2022]  ? kthread_flush_work_fn+0x10/0x10
      [Wed May  4 23:42:07 2022]  ret_from_fork+0x22/0x40
      [Wed May  4 23:42:09 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 1231 seconds
      [Wed May  4 23:42:09 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Skipped 1 previous similar message
      [Wed May  4 23:42:59 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Timed out tx for 10.141.16.185@o2ib417: 1281 seconds
      [Wed May  4 23:42:59 2022] LNet: 7675:0:(o2iblnd_cb.c:3421:kiblnd_check_conns()) Skipped 1 previous similar message 

      See attached debug logs.

      Attachments

        1. out1.dk.gz
          6 kB
        2. out2.dk.gz
          6 kB

        Activity

          People

            pjones Peter Jones
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: