Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16297

ptl_send_rpc() ASSERTION ( (at_max == 0) || imp->imp_state != LUSTRE_IMP_FULL || (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) || !(imp->imp_connect_data.ocd_connect_flags & 0x1000000ULL) )

Details

    • 3
    • 9223372036854775807

    Description

      May 1 05:09:25 scratchn011 kernel: LustreError: 25042:0:(niobuf.c:772:ptl_send_rpc()) ASSERTION( (at_max == 0) || imp->imp_state != LUSTRE_IMP_FULL || (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) || !(imp->imp_connect_data.ocd_connect_flags & 0x1000000ULL) ) failed: May 1 05:09:25 scratchn011 kernel: LustreError: 25042:0:(niobuf.c:772:ptl_send_rpc()) LBUG
      May 1 05:09:25 scratchn011 kernel: IEC: 026000003: LASSERT:
      
      { "pid": "25042", "ext_pid": "0", "filename": "niobuf.c", "line": "772", "func_name": "ptl_send_rpc", "assert_info": "( (at_max == 0) || imp->imp_state != LUSTRE_IMP_FULL || (imp->imp_msghdr_flags & MSGHDR_AT__SUPPORT) || !(imp->imp_connect_data.ocd_connect_flags & 0x1000000ULL) ) failed: " }
      May 1 05:09:25 scratchn011 kernel: IEC: 026000004: LBUG:
      
      { "pid": "25042", "ext_pid": "0", "filename": "niobuf.c", "line": "772", "func_name": "ptl_send_rpc" }
      May 1 05:09:25 scratchn011 kernel: Pid: 25042, comm: ptlrpcd_06_02 3.10.0-957.1.3957.1.3.x4.4.25.x86_64 #1 SMP Mon Sep 20 16:59:46 PDT 2021
      May 1 05:09:25 scratchn011 kernel: Call Trace:
      May 1 05:09:25 scratchn011 kernel: [<0>] libcfs_call_trace+0x8e/0xf0 [libcfs]
      May 1 05:09:25 scratchn011 kernel: [<0>] lbug_with_loc+0x4c/0xa0 [libcfs]
      May 1 05:09:25 scratchn011 kernel: [<0>] ptl_send_rpc+0xcfd/0xf10 [ptlrpc]
      May 1 05:09:25 scratchn011 kernel: [<0>] ptlrpc_check_set.part.25+0x18ec/0x1e50 [ptlrpc]
      May 1 05:09:25 scratchn011 kernel: [<0>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
      May 1 05:09:25 scratchn011 kernel: [<0>] ptlrpcd_check+0x4ab/0x590 [ptlrpc]
      May 1 05:09:25 scratchn011 kernel: [<0>] ptlrpcd+0x4b8/0x560 [ptlrpc]
      May 1 05:09:25 scratchn011 kernel: [<0>] kthread+0xd1/0xe0
      
      crash> obd_import.imp_state,imp_msghdr_flags,imp_connect_data ffff94044a276000
        imp_state = LUSTRE_IMP_CONNECTING
        imp_msghdr_flags = (unknown: 0)
        imp_connect_data = {
          ocd_connect_flags = 2323857477600284832,
        }
      crash> p/x 2323857477600284832&0x1000000ULL
      $3 = 0x1000000
      

      this is a race between connect and re-send threads.

      769         LASSERT(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||
      770                 (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
      771                 !(imp->imp_connect_data.ocd_connect_flags &
      772                 OBD_CONNECT_AT));
      

      the assertion has 4 verification
      When connection happens in the middle of assertion, a second part of assertion fails. And this leads to a wrong fail. A simple way to make this checks valid requires an atomic checking, with spin lock. But this is a hot path and spin lock would affect performance. So I prefer changing assertion to a warning.

      Attachments

        Issue Links

          Activity

            [LU-16297] ptl_send_rpc() ASSERTION ( (at_max == 0) || imp->imp_state != LUSTRE_IMP_FULL || (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) || !(imp->imp_connect_data.ocd_connect_flags & 0x1000000ULL) )

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55040/
            Subject: LU-16297 ptlrpc: don't panic during reconnection
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: dda0bd1207d0cb4864c3ec2a10cd881021ef2ea9

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55040/ Subject: LU-16297 ptlrpc: don't panic during reconnection Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: dda0bd1207d0cb4864c3ec2a10cd881021ef2ea9

            "Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55040
            Subject: LU-16297 ptlrpc: don't panic during reconnection
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: f66d7f24e3ed7f8d299f83a87c1da14cb2d3f8b5

            gerrit Gerrit Updater added a comment - "Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55040 Subject: LU-16297 ptlrpc: don't panic during reconnection Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: f66d7f24e3ed7f8d299f83a87c1da14cb2d3f8b5
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49029/
            Subject: LU-16297 ptlrpc: don't panic during reconnection
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: df31c4c0b39b8845911344e6fadc008bcba40bb1

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49029/ Subject: LU-16297 ptlrpc: don't panic during reconnection Project: fs/lustre-release Branch: master Current Patch Set: Commit: df31c4c0b39b8845911344e6fadc008bcba40bb1

            "Alexander <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49029
            Subject: LU-16297 ptlrpc: don't panic during reconnection
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 10bb3582bceb8107ed552d5554faf49e4586858d

            gerrit Gerrit Updater added a comment - "Alexander <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49029 Subject: LU-16297 ptlrpc: don't panic during reconnection Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 10bb3582bceb8107ed552d5554faf49e4586858d

            People

              aboyko Alexander Boyko
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: