Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7558

niobuf.c:721:ptl_send_rpc() LASSERT(AT_OFF || imp_state != LUSTRE_IMP_FULL || imp_msghdr_flags & MSGHDR_AT_SUPPORT ...)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.12.0
    • BG/Q I/O nodes
      lustre-client-ion-2.5.4-4chaos_2.6.32_504.8.2.bgq.3blueos.V1R2M3.bl2.2_1.ppc64.ppc64
    • 3
    • 9223372036854775807

    Description

      /bgsys/logs/BGQ.sn/R04-ID-J00.log (among many others)

      LustreError: 28558:0: (niobuf.c:721:ptl_send_rpc()) ASSERTION( (at_max == 0) || request->rq_import->imp_state != LUSTRE_IMP_FULL || (request->rq_import->imp_msghdr_flags & 0x1) || ! (request->rq_import->imp_connect_data.ocd_connect_flags & 0x1000000ULL) ) failed:
      LustreError: 28558:0: (niobuf.c:721:ptl_send_rpc()) LBUG
      Call Trace:
      show_stack
      libcfs_debug_dumpstack
      lbug_with_loc
      ptl_send_rpc
      ptlrpc_send_new_req
      ptlrpc_set_wait
      ll_statfs_internal
      ll_statfs
      statfs_by_dentry
      vfs_statfs
      user_statfs
      SyS_statfs
      syscall_exit

      Occurred on many tens of I/O nodes, then within the next 24 hours, occurred on many tens more. Continuing to occur.

      We have not seen this issue before. The patch that introduced this assert was in the patch stack for our tag 2.5.4-1chaos, rolled out in April. We do not know what triggered this now.

      c389652 LU-5528 ptlrpc: fix race between connect vs resend

      There are no crash dumps for these nodes, nor much in the console logs.

      Because several conditions were ASSERTed in a single statement, which failed is unknown.

      Attachments

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: