Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16297

ptl_send_rpc() ASSERTION ( (at_max == 0) || imp->imp_state != LUSTRE_IMP_FULL || (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) || !(imp->imp_connect_data.ocd_connect_flags & 0x1000000ULL) )

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • Lustre 2.15.1
    • 3
    • 9223372036854775807

    Description

      May 1 05:09:25 scratchn011 kernel: LustreError: 25042:0:(niobuf.c:772:ptl_send_rpc()) ASSERTION( (at_max == 0) || imp->imp_state != LUSTRE_IMP_FULL || (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) || !(imp->imp_connect_data.ocd_connect_flags & 0x1000000ULL) ) failed: May 1 05:09:25 scratchn011 kernel: LustreError: 25042:0:(niobuf.c:772:ptl_send_rpc()) LBUG
      May 1 05:09:25 scratchn011 kernel: IEC: 026000003: LASSERT:
      
      { "pid": "25042", "ext_pid": "0", "filename": "niobuf.c", "line": "772", "func_name": "ptl_send_rpc", "assert_info": "( (at_max == 0) || imp->imp_state != LUSTRE_IMP_FULL || (imp->imp_msghdr_flags & MSGHDR_AT__SUPPORT) || !(imp->imp_connect_data.ocd_connect_flags & 0x1000000ULL) ) failed: " }
      May 1 05:09:25 scratchn011 kernel: IEC: 026000004: LBUG:
      
      { "pid": "25042", "ext_pid": "0", "filename": "niobuf.c", "line": "772", "func_name": "ptl_send_rpc" }
      May 1 05:09:25 scratchn011 kernel: Pid: 25042, comm: ptlrpcd_06_02 3.10.0-957.1.3957.1.3.x4.4.25.x86_64 #1 SMP Mon Sep 20 16:59:46 PDT 2021
      May 1 05:09:25 scratchn011 kernel: Call Trace:
      May 1 05:09:25 scratchn011 kernel: [<0>] libcfs_call_trace+0x8e/0xf0 [libcfs]
      May 1 05:09:25 scratchn011 kernel: [<0>] lbug_with_loc+0x4c/0xa0 [libcfs]
      May 1 05:09:25 scratchn011 kernel: [<0>] ptl_send_rpc+0xcfd/0xf10 [ptlrpc]
      May 1 05:09:25 scratchn011 kernel: [<0>] ptlrpc_check_set.part.25+0x18ec/0x1e50 [ptlrpc]
      May 1 05:09:25 scratchn011 kernel: [<0>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
      May 1 05:09:25 scratchn011 kernel: [<0>] ptlrpcd_check+0x4ab/0x590 [ptlrpc]
      May 1 05:09:25 scratchn011 kernel: [<0>] ptlrpcd+0x4b8/0x560 [ptlrpc]
      May 1 05:09:25 scratchn011 kernel: [<0>] kthread+0xd1/0xe0
      
      crash> obd_import.imp_state,imp_msghdr_flags,imp_connect_data ffff94044a276000
        imp_state = LUSTRE_IMP_CONNECTING
        imp_msghdr_flags = (unknown: 0)
        imp_connect_data = {
          ocd_connect_flags = 2323857477600284832,
        }
      crash> p/x 2323857477600284832&0x1000000ULL
      $3 = 0x1000000
      

      this is a race between connect and re-send threads.

      769         LASSERT(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||
      770                 (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
      771                 !(imp->imp_connect_data.ocd_connect_flags &
      772                 OBD_CONNECT_AT));
      

      the assertion has 4 verification
      When connection happens in the middle of assertion, a second part of assertion fails. And this leads to a wrong fail. A simple way to make this checks valid requires an atomic checking, with spin lock. But this is a hot path and spin lock would affect performance. So I prefer changing assertion to a warning.

      Attachments

        Issue Links

          Activity

            People

              aboyko Alexander Boyko
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: