Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20326

LNetError: 4856:0:(o2iblnd_cb.c:704:kiblnd_setup_rd_kiov()) ASSERTION( nob > 0 ) failed:

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Blocker Blocker
    • Lustre 2.18.0
    • Lustre 2.18.0
    • None
    • 3
    • 9223372036854775807

      The conversion of the LNet layer to struct iov_iter changed the
      LNET_MSG_GET case of kiblnd_send() to describe the RDMA GET sink
      from the outbound payload (payload_niov/payload_kiov/payload_nob)
      instead of the attached memory descriptor (msg_md). For a GET the
      initiator sends no payload: lnet_prep_send() is called as
      lnet_prep_send(msg, LNET_MSG_GET, target, 0, 0), so msg_len, and
      therefore payload_nob, is always 0. The sink length lives in
      msg_md->md_length.

      As a result kiblnd_setup_rd_kiov() is now invoked with nob == 0 and
      trips its first assertion:

            LNetError: (o2iblnd_cb.c:704:kiblnd_setup_rd_kiov()) ASSERTION( nob > 0 ) failed
      

      The IMMEDIATE-vs-RDMA size check immediately above still correctly
      uses msg_md->md_length, so small GET replies take the IMMEDIATE path
      and escape. Only GET sinks large enough to require RDMA reach
      kiblnd_setup_rd_kiov() and hit the assert. This covers every
      server-side BULK_GET_SINK transfer over o2ib – client writes and
      target_bulk_io() bulk pulls (e.g. the MDS_BATCH buffer pulled by
      mdt_batch() during batched statahead) – so a large write or an
      mdtest stat phase reliably LBUGs the target.

      Restore the GET sink description to come from msg_md, matching the
      adjacent size check and the pre-conversion behaviour.

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: