Details

    • Technical task
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0, Upstream
    • Lustre 2.10.0, Upstream
    • 9223372036854775807

    Description

         As part of recent (last September) RDMA changes, the routine ib_get_dma_mr() was deleted and an extra "flags" parameter was added to ib_alloc_pd().  This was all done to stop upper layers from accessing the R-key for unregistered DMA memory regions.  It seems that this opens the door to accessing any physical memory you want on a remote system thereby being a security issue.

      For the applicable changes, see these two kernel commits:

       
      from 4.9-rc1: change of arguments number for ib_alloc_pd
      commit ed082d36a7b2c27d1cda55fdfb28af18040c4a89
      Author: Christoph Hellwig <hch@lst.de>
      Date:   Mon Sep 5 12:56:17 2016 +0200
      from 4.9-rc1: removal of ib_get_dma_mr
      commit 5ef990f06bd7e3cf521b5705d898d8e43d04ea90
      Author: Christoph Hellwig <hch@lst.de>
      Date:   Mon Sep 5 12:56:21 2016 +0200
       

      LNet uses unregistered memory regions for short messages (lower latency).  A "method" of still allowing us to work as we did before was added in the form of a PD variable called "unsafe_global_rkey" which is populated if you pass the flag "IB_PD_UNSAFE_GLOBAL_RKEY" as the new flags parameter to ib_alloc_pd().  As the name implies, this is considered unsafe and should not be done unless you are very confident in your security.  Using it will trigger a warning to the kernel console.

      This change has caused a kernel maintainer to mark ko2iblnd as "BROKEN" in the 4.10 kernel build.  To get rid of this broken tag, we need to fix this quickly so I am going to make use of the unsafe flag so I do not have to re-architect ko2iblnd for short messages.  We need to test how much of a performance hit we get on short messages if we have to register all memory regions for better security.  There is not enough time to do that now.

      So, this ticket was created to track the following changes to the staging version of LNet:

      1. Remove the BROKEN tag from lnet/Kconfig.
      2. Pass the IB_PD_UNSAFE_GLOBAL_RKEY flag when calling ib_alloc_pd().
      3. Everywhere we are using the device DMA mr to derive the R-key for non-registered memory regions, use the pd->unsafe_global_rkey value instead (this will get rid of our call to ib_get_dma_mr()).

      Once this is landed to staging, use this ticket to also make the change to community master with the necessary compile hooks to ensure the change is only done when it makes sense.

      Attachments

        Issue Links

          Activity

            [LU-9026] Adapt to the removal of ib_get_dma_mr()
            pjones Peter Jones added a comment -

            Landed to master which will now make testing easier.

            pjones Peter Jones added a comment - Landed to master which will now make testing easier.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25277/
            Subject: LU-9026 o2iblnd: Adapt to the removal of ib_get_dma_mr()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e4297ef38561f1e788ba73ca0c8078a09dc8c303

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25277/ Subject: LU-9026 o2iblnd: Adapt to the removal of ib_get_dma_mr() Project: fs/lustre-release Branch: master Current Patch Set: Commit: e4297ef38561f1e788ba73ca0c8078a09dc8c303

            mlx5 cards supports a fast memory registration mode, so should lack a performance problems,
            but mlx4 + QDR cards (Connect-X2 possible) should have an issues with MD rpc traffic which need constantly send a reply with different kernel address, so memory mapping cache isn't work in this case.

            shadow Alexey Lyashkov added a comment - mlx5 cards supports a fast memory registration mode, so should lack a performance problems, but mlx4 + QDR cards (Connect-X2 possible) should have an issues with MD rpc traffic which need constantly send a reply with different kernel address, so memory mapping cache isn't work in this case.

            I tested mlx4 card (Connect-3) which is FDR. I need to try with the mlx5 driver once my Power8 node image is updated.

            simmonsja James A Simmons added a comment - I tested mlx4 card (Connect-3) which is FDR. I need to try with the mlx5 driver once my Power8 node image is updated.

            James,

            did you test a mlx5 or mlx4 cards a specially QDR ?

            shadow Alexey Lyashkov added a comment - James, did you test a mlx5 or mlx4 cards a specially QDR ?

            Dough you can test out the patch with MOFED 4. Its requires this patch to work. I have been running master +M MOFED 4 to test this patch. So far I haven't see the performance degradation you are seeing. Give MOFED 4 a try.

            simmonsja James A Simmons added a comment - Dough you can test out the patch with MOFED 4. Its requires this patch to work. I have been running master +M MOFED 4 to test this patch. So far I haven't see the performance degradation you are seeing. Give MOFED 4 a try.

            Dmitriy,

            your's patch isn't complete. Based on upstream commit, you need define an
            struct xattr_handlers to do same work. But you's patch just disable an xattr for lustre client. I don't think it good one.

            shadow Alexey Lyashkov added a comment - Dmitriy, your's patch isn't complete. Based on upstream commit, you need define an struct xattr_handlers to do same work. But you's patch just disable an xattr for lustre client. I don't think it good one.

            I just tried to build master against 4.9 (client only).  It fails in the libcfs debug code.  Will you be addressing that?  I tried turning it off via a configure option, and that triggers a warning that you can't turn off debugging.  Sigh.

            doug Doug Oucharek (Inactive) added a comment - I just tried to build master against 4.9 (client only).  It fails in the libcfs debug code.  Will you be addressing that?  I tried turning it off via a configure option, and that triggers a warning that you can't turn off debugging.  Sigh.

            Doug,
            I'm porting master to Linux 4.9 above your patch. I have one more patch to complete. You can look at series in top for now https://review.whamcloud.com/25840.

            dmiter Dmitry Eremin (Inactive) added a comment - Doug, I'm porting master to Linux 4.9 above your patch. I have one more patch to complete. You can look at series in top for now https://review.whamcloud.com/25840 .
            doug Doug Oucharek (Inactive) added a comment - - edited

            Did some more testing and am starting to believe we have a general performance problem in the upstream LNet and is not caused by the patch in this ticket.  I have opened two tickets corresponding to the two items above:

            1. Performance with IB/OPA: LU-9179
            2. map_on_demand <256 a problem: LU-9180

            I have not been able to validate the master version of this fix for performance issues yet because, well, I don't yet know how to build the master branch against Linux 4.9.  Help appreciated.

            doug Doug Oucharek (Inactive) added a comment - - edited Did some more testing and am starting to believe we have a general performance problem in the upstream LNet and is not caused by the patch in this ticket.  I have opened two tickets corresponding to the two items above: Performance with IB/OPA: LU-9179 map_on_demand <256 a problem: LU-9180 I have not been able to validate the master version of this fix for performance issues yet because, well, I don't yet know how to build the master branch against Linux 4.9.  Help appreciated.

            People

              doug Doug Oucharek (Inactive)
              doug Doug Oucharek (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: