Details

    • 3
    • 3998

    Description

      lnd_cb.c:558:kgnilnd_setup_phys_buffer()) failed to allocate tx_phys
      [2012-04-07 02:08:24][c5-0c0s5n2]LNet: 29099:0:(gnilnd_cb.c:1068:kgnilnd_tx_done()) $$ error -12 on tx 0xffff88000fe06b40-><?> id 0/0 state GNILND_TX_ALLOCD age 17481575s  msg@0xffff88000fe06bc0 m/v/ty/ck/pck/pl b00fbabe/8/3/0/78db/0 x0:GNILND_MSG_PUT_REQ
      [2012-04-07 02:08:24][c5-0c0s5n2]LustreError: 29099:0:(events.c:198:client_bulk_callback()) event type 0, status -5, desc ffff880627c24000
      

      The error is detected on both client and server; the server expects the client to retry but it doesn't. In the mean time, the OSS issues a lock callback to the client, but the client does not respond because it is waiting for the I/O to complete. Eventually the OSS evicts the client. Lustre does not retry the bulk op when it detects the error.

      Attachments

        Issue Links

          Activity

            [LU-1517] no retry for the bulk operation
            spitzcor Cory Spitz added a comment -

            Johann, for master, have you seen LU-901 and change #4092? Or does it only partially address the fault?

            spitzcor Cory Spitz added a comment - Johann, for master, have you seen LU-901 and change #4092? Or does it only partially address the fault?

            I'm fine with landing the patch on b1_8. That said, it seems that master suffers from the same issue, right?
            If so, it would be great to push a patch against master.
            Thanks in advance!

            johann Johann Lombardi (Inactive) added a comment - I'm fine with landing the patch on b1_8. That said, it seems that master suffers from the same issue, right? If so, it would be great to push a patch against master. Thanks in advance!
            spitzcor Cory Spitz added a comment -

            Understood. Trigerring this bug results in client eviction though. Just FYI.

            spitzcor Cory Spitz added a comment - Understood. Trigerring this bug results in client eviction though. Just FYI.

            I just wanted to say that 1.8 is in more of a maintenance mode at this point. The patch looks fine but very few things are landing in 1.8 right now.

            keith Keith Mannthey (Inactive) added a comment - I just wanted to say that 1.8 is in more of a maintenance mode at this point. The patch looks fine but very few things are landing in 1.8 right now.

            The patch has been reviewed and is awaiting final merger.

            keith Keith Mannthey (Inactive) added a comment - The patch has been reviewed and is awaiting final merger.
            spitzcor Cory Spitz added a comment -

            Cray has been using this patch and it is effective.

            spitzcor Cory Spitz added a comment - Cray has been using this patch and it is effective.

            Hi,

            Have WC/Intel had time to review our proposed patch yet?

            -cf

            cfaber#1 Colin Faber [X] (Inactive) added a comment - Hi, Have WC/Intel had time to review our proposed patch yet? -cf
            aboyko Alexander Boyko added a comment - Review request http://review.whamcloud.com/3102

            People

              keith Keith Mannthey (Inactive)
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: