[LU-7854] sanity-gss test_1 fails with 'chmod /lustre/scratch failed' Created: 03/Mar/16 Updated: 15/Mar/18 Resolved: 15/Mar/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Sebastien Buisson (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
COmbined MGS/MDS with one MDT, two OSSs with two OSTs, a single client. Kerberos is setup |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
As part of the Kerberos test plan, I ran the test-group/regression suite of tests including sanity-gss. I set up a Kerberos environment and ran the regression test group. sanity-gss tests 1 and 2 fail in the same way; with a connection refused error: == sanity-gss test 1: create file == 10:46:36 (1457001996) chmod: cannot access `/lustre/scratch': Connection refused sanity-gss test_1: @@@@@@ FAIL: chmod /lustre/scratch failed From the dmesg from the MDS for test 2, we see Lustre: DEBUG MARKER: Setting sptlrpc rule: scratch.srpc.flavor.default=gssnull Lustre: 9164:0:(gss_mech_switch.c:72:lgss_mech_register()) Register gssnull mechanism LustreError: 7732:0:(gss_keyring.c:791:gss_sec_lookup_ctx_kr()) failed request key: -126 LustreError: 7736:0:(sec.c:440:sptlrpc_req_get_ctx()) req ffff880036c03c80: fail to get context LustreError: 7732:0:(gss_keyring.c:791:gss_sec_lookup_ctx_kr()) Skipped 1423 previous similar messages LustreError: 29799:0:(pinger.c:105:ptlrpc_ping()) OOM trying to ping scratch-MDT0000-lwp-MDT0000_UUID->scratch-MDT0000_UUID LustreError: 7734:0:(gss_keyring.c:791:gss_sec_lookup_ctx_kr()) failed request key: -126 LustreError: 7734:0:(gss_keyring.c:791:gss_sec_lookup_ctx_kr()) Skipped 203419 previous similar messages LustreError: 7736:0:(sec.c:440:sptlrpc_req_get_ctx()) req ffff880036c03c80: fail to get context LustreError: 7736:0:(sec.c:440:sptlrpc_req_get_ctx()) Skipped 209084 previous similar messages LustreError: 7732:0:(gss_keyring.c:791:gss_sec_lookup_ctx_kr()) failed request key: -126 LustreError: 7732:0:(gss_keyring.c:791:gss_sec_lookup_ctx_kr()) Skipped 461885 previous similar messages LustreError: 7730:0:(sec.c:440:sptlrpc_req_get_ctx()) req ffff880036c03980: fail to get context LustreError: 7730:0:(sec.c:440:sptlrpc_req_get_ctx()) Skipped 467259 previous similar messages Lustre: DEBUG MARKER: == sanity-gss test 1: create file == 10:46:36 (1457001996) LustreError: 7732:0:(gss_keyring.c:791:gss_sec_lookup_ctx_kr()) failed request key: -126 LustreError: 7732:0:(gss_keyring.c:791:gss_sec_lookup_ctx_kr()) Skipped 950299 previous similar messages LustreError: 7736:0:(sec.c:440:sptlrpc_req_get_ctx()) req ffff880036c03c80: fail to get context LustreError: 7736:0:(sec.c:440:sptlrpc_req_get_ctx()) Skipped 958900 previous similar messages Lustre: DEBUG MARKER: sanity-gss test_1: @@@@@@ FAIL: chmod /lustre/scratch failed LustreError: 29799:0:(pinger.c:105:ptlrpc_ping()) OOM trying to ping scratch-MDT0000-mdtlov_UUID->scratch-OST0000_UUID Lustre: DEBUG MARKER: == sanity-gss test 2: lfs flushctx == 10:46:40 (1457002000) LustreError: 7732:0:(gss_keyring.c:791:gss_sec_lookup_ctx_kr()) failed request key: -126 LustreError: 7732:0:(gss_keyring.c:791:gss_sec_lookup_ctx_kr()) Skipped 1953286 previous similar messages LustreError: 7736:0:(sec.c:440:sptlrpc_req_get_ctx()) req ffff880036c03c80: fail to get context LustreError: 7736:0:(sec.c:440:sptlrpc_req_get_ctx()) Skipped 1963518 previous similar messages Lustre: DEBUG MARKER: sanity-gss test_2: @@@@@@ FAIL: chmod /lustre/scratch failed LustreError: 29799:0:(pinger.c:105:ptlrpc_ping()) OOM trying to ping scratch-MDT0000-mdtlov_UUID->scratch-OST0000_UUID LustreError: 29799:0:(pinger.c:105:ptlrpc_ping()) Skipped 4 previous similar messages Only three of 13 tests passed on this and previous runs of sanity-gss. The logs for the sanity-gss test suite are at https://testing.hpdd.intel.com/test_sets/09108032-e16b-11e5-8edf-5254006e85c2 |
| Comments |
| Comment by Andreas Dilger [ 04/Mar/16 ] |
|
Is this a regression, or has this test not been run/passed recently? |
| Comment by James Nunez (Inactive) [ 04/Mar/16 ] |
|
This is not a regression and it looks like this test suite is rarely executed. Looking at Maloo results for sanity-gss for the past year and a half, it looks like this test suite has not passed testing in that time frame. |
| Comment by Andreas Dilger [ 07/Mar/16 ] |
|
Jeremy, Sebastien, have you ever run the sanity-gss.sh script successfully? Are there any patches needed to get this working in our test environment? Sebastien, this is something that should be run as part of your patch http://review.whamcloud.com/18781 " |
| Comment by Jeremy Filizetti [ 07/Mar/16 ] |
|
I believe I've had it run through once, but so far I haven't added patches for SK testing. Looking at sanity-gss it's using gssnull which requires the lsvcgssd to be running but I don't see any start_gss_daemons called there. The -126 is -ENOKEY which would make sense given that gssnull requires lsvcgssd to be running with the -z flag (from my last patch set for shared key). |
| Comment by James Nunez (Inactive) [ 07/Mar/16 ] |
|
Jeremy - Thanks for the comment. I have lsvcgssd running on all servers, but I didn't give it any flags. |
| Comment by Sebastien Buisson (Inactive) [ 08/Mar/16 ] |
|
Thanks Andreas. |
| Comment by Peter Jones [ 18/Apr/17 ] |
|
Is this still a live issue or was it fixed under |
| Comment by Sebastien Buisson (Inactive) [ 26/Apr/17 ] |
|
Hi, This issue is not directly related to I think the best way to tackle this is to push a new patch to modify sanity-gss according to Jeremy’s suggestion (i.e. starting lsvcgssd) and add sanity-gss to the Test-Parameters of the patch. Sebastien. |
| Comment by Gerrit Updater [ 01/Jun/17 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/27383 |
| Comment by Sebastien Buisson (Inactive) [ 01/Jun/17 ] |
|
Hi, Following Jeremy's suggestion, I uploaded patch https://review.whamcloud.com/27383 in order to start lsvcgssd deamons with '-z' flag for gssnull flavor. Sebastien. |
| Comment by Sebastien Buisson (Inactive) [ 07/Jun/17 ] |
|
The tests cannot pass because of the problem described in |
| Comment by Peter Jones [ 15/Jan/18 ] |
|
sbuisson just to confirm - you are now able to move forward with this fix now, right? |
| Comment by Sebastien Buisson (Inactive) [ 16/Jan/18 ] |
|
Sure, I have just refreshed patch at https://review.whamcloud.com/27383. |
| Comment by Sebastien Buisson (Inactive) [ 18/Jan/18 ] |
|
FYI, work on GSS (or Shared Key or Kerberos) is currently blocked because of issue described in |
| Comment by Peter Jones [ 06/Feb/18 ] |
|
sbuisson is work now able to proceed since the patches for |
| Comment by Gerrit Updater [ 15/Feb/18 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/31317 |
| Comment by Gerrit Updater [ 15/Mar/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31317/ |
| Comment by Gerrit Updater [ 15/Mar/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27383/ |
| Comment by Peter Jones [ 15/Mar/18 ] |
|
Landed for 2.11 |