[LU-13343] recovery-small test_140a: unable to mount /mnt/lustre2 on MDS Created: 09/Mar/20 Updated: 01/Feb/24 Resolved: 09/Jul/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Sebastien Buisson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/1b932583-bcd8-4933-9b6a-2d70a0b7aecc test_140a failed with the following error on the client console log: [ 245.535452] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock,skpath=/tmp/test-framework-keys trevis-3vm9@tcp:/lustre /mnt/lustre2 [ 245.868341] LustreError: 15982:0:(gss_keyring.c:862:gss_sec_lookup_ctx_kr()) failed request key: -126 [ 245.869838] LustreError: 15982:0:(sec.c:449:sptlrpc_req_get_ctx()) req ffff9fb5ed3c3600: fail to get context [ 245.871460] LustreError: 15982:0:(lmv_obd.c:306:lmv_connect_mdc()) target lustre-MDT0000_UUID connect error -111 [ 245.872973] LustreError: 15982:0:(llite_lib.c:320:client_common_fill_super()) cannot connect to lustre-clilmv-ffff9fb5f8a2d000: rc = -111 [ 245.882513] LustreError: 15982:0:(lov_obd.c:824:lov_cleanup()) lustre-clilov-ffff9fb5f8a2d000: lov tgt 0 not cleaned! deathrow=0, lovrc=1 [ 245.891188] Lustre: Unmounted lustre-client [ 245.892122] LustreError: 15982:0:(obd_mount.c:1681:lustre_fill_super()) Unable to mount (-111) [ 246.144872] Lustre: DEBUG MARKER: /usr/sbin/lctl mark recovery-small test_140a: @@@@@@ FAIL: unable to mount \/mnt\/lustre2 on MDS [ 246.335604] Lustre: DEBUG MARKER: recovery-small test_140a: @@@@@@ FAIL: unable to mount /mnt/lustre2 on MDS It looks like the mount2 setup is missing the security setup. VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Andreas Dilger [ 09/Mar/20 ] |
|
It looks like this test has been failing since 2020-01-23, about 160 times, as many as 8 times a day. |
| Comment by Andreas Dilger [ 09/Mar/20 ] |
|
Per
|
| Comment by Gerrit Updater [ 09/Mar/20 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37832 |
| Comment by Gerrit Updater [ 10/Mar/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37832/ |
| Comment by Andreas Dilger [ 11/Mar/20 ] |
|
I was going to resolve this issue as "fixed" rather than marking it "always_except" and keeping it open, because my understanding is that local mounts would/should never use SSK/Kerberos as there is no transport security needed for memcpy() within the node. That said, should there be an internal check for connections from the local NID to skip the gss_sec_lookup_ctx_kr() code/check so that this doesn't generate an error at all? Otherwise, it seems like if SSK is configured on a server it would prevent local mounts from working at all? Am I misunderstanding the issue here? |
| Comment by Sebastien Buisson [ 12/Mar/20 ] |
|
The problem is just about the test framework. There is no restriction to use SSK with local clients (ie clients mounted on server nodes), it is just that it requires installing the client key as well on these nodes. This is why I pushed this patch to skip the few tests that are using local clients, when SSK is enabled. If the number of cases increase, we would want to have SSK keys installed properly wherever they are needed. |
| Comment by Andreas Dilger [ 12/Mar/20 ] |
|
What I'm asking about is the reverse - if SSK is enabled on the servers, it seems to me that it should be possible to mount a local client without the need to configure SSK for that client. I can't see any benefit to SSK/KRB for a local client mount, since the data doesn't go over the network, and the server itself can verify that the NID of the client is local. |
| Comment by Sebastien Buisson [ 12/Mar/20 ] |
|
On the one hand, you are raising an interesting point. There is no real added value of checking SSK/KRB credentials for a local node. On the other hand, one of the purposes of strong authentication is to define roles for nodes. When you install credentials on nodes (MDS, OSS, client), you explicitly assign them a role, meaning you want to prevent nodes from being re-purposed. So if we implement what you suggest, we would weaken this principle, with the initial intention of making local mounts easier. I am not against what you suggest, but we have to be aware if the implications. |
| Comment by Andreas Dilger [ 12/Mar/20 ] |
|
For clients mounting a local filesystem on the server for data movement or protocol re-export there is a need for the local client mount. I agree that the admin could configure the "client" for this local mount, but even if that was not inconvenient for the admin, it would still hurt performance due to encryption overhead for both the "client" and the "server" running on the same node for no real benefit, so we would likely recommend against using KRB/SSK for local connections. |
| Comment by Sebastien Buisson [ 12/Mar/20 ] |
|
Understood. So I will work on a patch to add an internal check for connections from the local NID to skip the gss_sec_lookup_ctx_kr() code/check. Please leave this ticket open so that we keep in mind we already have recovery-small test_140a and test_140b that are making use of local clients. |
| Comment by Gerrit Updater [ 04/Mar/22 ] |
|
"Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/46704 |
| Comment by Gerrit Updater [ 08/Mar/22 ] |
|
"Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/46736 |
| Comment by Gerrit Updater [ 11/May/23 ] |
|
"Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50941 |
| Comment by Gerrit Updater [ 08/Jul/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/46704/ |
| Comment by Peter Jones [ 09/Jul/23 ] |
|
Landed for 2.16 |
| Comment by Gerrit Updater [ 10/Jul/23 ] |
|
"Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51616 |