Details

    • 3
    • 3998

    Description

      lnd_cb.c:558:kgnilnd_setup_phys_buffer()) failed to allocate tx_phys
      [2012-04-07 02:08:24][c5-0c0s5n2]LNet: 29099:0:(gnilnd_cb.c:1068:kgnilnd_tx_done()) $$ error -12 on tx 0xffff88000fe06b40-><?> id 0/0 state GNILND_TX_ALLOCD age 17481575s  msg@0xffff88000fe06bc0 m/v/ty/ck/pck/pl b00fbabe/8/3/0/78db/0 x0:GNILND_MSG_PUT_REQ
      [2012-04-07 02:08:24][c5-0c0s5n2]LustreError: 29099:0:(events.c:198:client_bulk_callback()) event type 0, status -5, desc ffff880627c24000
      

      The error is detected on both client and server; the server expects the client to retry but it doesn't. In the mean time, the OSS issues a lock callback to the client, but the client does not respond because it is waiting for the I/O to complete. Eventually the OSS evicts the client. Lustre does not retry the bulk op when it detects the error.

      Attachments

        Issue Links

          Activity

            [LU-1517] no retry for the bulk operation

            Ok it seems the patches for 1.8,2.3 and Master has been merged. 2.2 and 2.3 are dead branches at this point.

            I think this issue is safe to close. Please reopen if you disagree.

            keith Keith Mannthey (Inactive) added a comment - Ok it seems the patches for 1.8,2.3 and Master has been merged. 2.2 and 2.3 are dead branches at this point. I think this issue is safe to close. Please reopen if you disagree.
            spitzcor Cory Spitz added a comment -

            I think that we should at least land the master patch.

            spitzcor Cory Spitz added a comment - I think that we should at least land the master patch.

            The patch ported easily as expected so I sent it to the various branches.

            http://review.whamcloud.com/4296 <- b2_1
            http://review.whamcloud.com/4297 <- b2_2
            http://review.whamcloud.com/4298 <- b2_3
            http://review.whamcloud.com/4299 <- Master

            keith Keith Mannthey (Inactive) added a comment - The patch ported easily as expected so I sent it to the various branches. http://review.whamcloud.com/4296 <- b2_1 http://review.whamcloud.com/4297 <- b2_2 http://review.whamcloud.com/4298 <- b2_3 http://review.whamcloud.com/4299 <- Master

            Sure, let's push review 3102 to all branches first (b1_8, b2_1, b2_2, b2_3 and master). Then more intrusive changes (like 4092) can be considered on master.

            johann Johann Lombardi (Inactive) added a comment - Sure, let's push review 3102 to all branches first (b1_8, b2_1, b2_2, b2_3 and master). Then more intrusive changes (like 4092) can be considered on master.
            spitzcor Cory Spitz added a comment -

            Yes, it is curious that we landed this fix to b1_8 before master. FYI, Cray has been using this fix on our 2.2 for some time and testing has gone well. We should push it to master now while we wait for LU-901.

            spitzcor Cory Spitz added a comment - Yes, it is curious that we landed this fix to b1_8 before master. FYI, Cray has been using this fix on our 2.2 for some time and testing has gone well. We should push it to master now while we wait for LU-901 .
            spitzcor Cory Spitz added a comment -

            Johann, for master, have you seen LU-901 and change #4092? Or does it only partially address the fault?

            spitzcor Cory Spitz added a comment - Johann, for master, have you seen LU-901 and change #4092? Or does it only partially address the fault?

            I'm fine with landing the patch on b1_8. That said, it seems that master suffers from the same issue, right?
            If so, it would be great to push a patch against master.
            Thanks in advance!

            johann Johann Lombardi (Inactive) added a comment - I'm fine with landing the patch on b1_8. That said, it seems that master suffers from the same issue, right? If so, it would be great to push a patch against master. Thanks in advance!
            spitzcor Cory Spitz added a comment -

            Understood. Trigerring this bug results in client eviction though. Just FYI.

            spitzcor Cory Spitz added a comment - Understood. Trigerring this bug results in client eviction though. Just FYI.

            I just wanted to say that 1.8 is in more of a maintenance mode at this point. The patch looks fine but very few things are landing in 1.8 right now.

            keith Keith Mannthey (Inactive) added a comment - I just wanted to say that 1.8 is in more of a maintenance mode at this point. The patch looks fine but very few things are landing in 1.8 right now.

            The patch has been reviewed and is awaiting final merger.

            keith Keith Mannthey (Inactive) added a comment - The patch has been reviewed and is awaiting final merger.

            People

              keith Keith Mannthey (Inactive)
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: