Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7558

niobuf.c:721:ptl_send_rpc() LASSERT(AT_OFF || imp_state != LUSTRE_IMP_FULL || imp_msghdr_flags & MSGHDR_AT_SUPPORT ...)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • Lustre 2.12.0
    • BG/Q I/O nodes
      lustre-client-ion-2.5.4-4chaos_2.6.32_504.8.2.bgq.3blueos.V1R2M3.bl2.2_1.ppc64.ppc64
    • 3
    • 9223372036854775807

      /bgsys/logs/BGQ.sn/R04-ID-J00.log (among many others)

      LustreError: 28558:0: (niobuf.c:721:ptl_send_rpc()) ASSERTION( (at_max == 0) || request->rq_import->imp_state != LUSTRE_IMP_FULL || (request->rq_import->imp_msghdr_flags & 0x1) || ! (request->rq_import->imp_connect_data.ocd_connect_flags & 0x1000000ULL) ) failed:
      LustreError: 28558:0: (niobuf.c:721:ptl_send_rpc()) LBUG
      Call Trace:
      show_stack
      libcfs_debug_dumpstack
      lbug_with_loc
      ptl_send_rpc
      ptlrpc_send_new_req
      ptlrpc_set_wait
      ll_statfs_internal
      ll_statfs
      statfs_by_dentry
      vfs_statfs
      user_statfs
      SyS_statfs
      syscall_exit

      Occurred on many tens of I/O nodes, then within the next 24 hours, occurred on many tens more. Continuing to occur.

      We have not seen this issue before. The patch that introduced this assert was in the patch stack for our tag 2.5.4-1chaos, rolled out in April. We do not know what triggered this now.

      c389652 LU-5528 ptlrpc: fix race between connect vs resend

      There are no crash dumps for these nodes, nor much in the console logs.

      Because several conditions were ASSERTed in a single statement, which failed is unknown.

            tappro Mikhail Pershin
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: