Details
-
Technical task
-
Resolution: Fixed
-
Blocker
-
Lustre 2.10.0, Upstream
-
9223372036854775807
Description
As part of recent (last September) RDMA changes, the routine ib_get_dma_mr() was deleted and an extra "flags" parameter was added to ib_alloc_pd(). This was all done to stop upper layers from accessing the R-key for unregistered DMA memory regions. It seems that this opens the door to accessing any physical memory you want on a remote system thereby being a security issue.
For the applicable changes, see these two kernel commits:
from 4.9-rc1: change of arguments number for ib_alloc_pd
commit ed082d36a7b2c27d1cda55fdfb28af18040c4a89
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Sep 5 12:56:17 2016 +0200
from 4.9-rc1: removal of ib_get_dma_mr
commit 5ef990f06bd7e3cf521b5705d898d8e43d04ea90
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Sep 5 12:56:21 2016 +0200
LNet uses unregistered memory regions for short messages (lower latency). A "method" of still allowing us to work as we did before was added in the form of a PD variable called "unsafe_global_rkey" which is populated if you pass the flag "IB_PD_UNSAFE_GLOBAL_RKEY" as the new flags parameter to ib_alloc_pd(). As the name implies, this is considered unsafe and should not be done unless you are very confident in your security. Using it will trigger a warning to the kernel console.
This change has caused a kernel maintainer to mark ko2iblnd as "BROKEN" in the 4.10 kernel build. To get rid of this broken tag, we need to fix this quickly so I am going to make use of the unsafe flag so I do not have to re-architect ko2iblnd for short messages. We need to test how much of a performance hit we get on short messages if we have to register all memory regions for better security. There is not enough time to do that now.
So, this ticket was created to track the following changes to the staging version of LNet:
- Remove the BROKEN tag from lnet/Kconfig.
- Pass the IB_PD_UNSAFE_GLOBAL_RKEY flag when calling ib_alloc_pd().
- Everywhere we are using the device DMA mr to derive the R-key for non-registered memory regions, use the pd->unsafe_global_rkey value instead (this will get rid of our call to ib_get_dma_mr()).
Once this is landed to staging, use this ticket to also make the change to community master with the necessary compile hooks to ensure the change is only done when it makes sense.
I have been focused on doing some performance testing and am having some strange problems. This patch makes the default map_on_demand 256. With Lustre master, I am seeing a 5% reduction in performance with MLX QDR. Overall, not too bad given we are addressing a security hold with this change (i.e. we are being forced to do this).
I downloaded the build from upstream with this change. Things are not good there. Ran into two bad problems:
[ 1125.547100] LNetError: 22552:0:(o2iblnd_cb.c:1068:kiblnd_init_rdma()) RDMA is too large for peer 192.168.1.24@o2ib (131072), src size: 1048576 dst size: 1048576
Some map_on_demand fixes/changes must be missing upstream. Whether that missing code is also causing the huge drop in performance, I don't yet know. I need to find an upstream build without this patch to see if performance has always been bad upstream and no one noticed.
In any case, I will be opening a ticket to address number 2 as we need to find the missing code and get it upstream. Depending on how my additional testing for number 1 goes we will either be ok (and a 5% performance hit will accompany this change) or this patch has introduced a nasty problem which needs to be debugged before we can land it to master.