[LU-9795] SSK test failures in many suites when SHARED_KEY is enabled Created: 24/Jul/17  Updated: 20/Oct/20

Status: Reopened
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.13.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Minor
Reporter: Chris Hanna Assignee: Chris Hanna
Resolution: Unresolved Votes: 0
Labels: always_except

Issue Links:
Related
is related to LU-13498 sanity test 56w fails with '/usr/bin/... Resolved
is related to LU-10531 GSS, Shared Key and Kerberos support ... Resolved
is related to LU-9145 When Shared Key feature is active, No... Resolved
is related to LU-8602 Support GSS crypto code with linux 4.... Resolved
Severity: 4
Project: Test Infrastructure
Rank (Obsolete): 9223372036854775807

 Description   

The shared key feature (SSK) new in Lustre is not currently passing all tests in some suites within the Lustre test environment. When SHARED_KEY is set to TRUE, the suites and tests listed below cause errors, some of which disrupt further tests. This LU is meant to track these issues and link to the appropriate fixes, if they exist. The testing feature for SSK was added as part of LU-8275.

Some tests may fail because of defects in the SSK feature, but it is also possible that they fail because the current testing framework fails to account for them properly. For example, there may be timing issues related to long SSK spin-up that prevent steps from being resolved in their proper order.

Current tests which are known to cause issues include:
sanity: 101g 102b 300f
sanity-hsm: 13
sanity-gss: 8 90
dne_sanity: 6g 27u 27D 27E 33f 83 101g 102b 102c 102i 102k 102m 102n 102r 103b 105a 105b 105c 106 110
replay-single: 71a 110f
replay-dual: 0a 0b
conf-sanity: 76a 76b 76c 76d 103
sanity-lfsck: 18e 33

If the defects cannot be resolved, they will be added as exceptions to the testing suites, but only if SSK is enabled.



 Comments   
Comment by Gerrit Updater [ 08/Oct/18 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/33316
Subject: LU-9795 mdt: only set groups if GID is not squashed
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c2f87b3018cddcf738a977f9a3136704a8d10d96

Comment by Gerrit Updater [ 12/Oct/18 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/33357
Subject: LU-9795 gss: properly handle mgssec
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b3218e9a6fd901ee8913bff52755d55ca30b08d5

Comment by Gerrit Updater [ 22/Oct/18 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/33415
Subject: LU-9795 gss: fix gss-based integrity check for multi-rail
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 090164f647c555c8cbc6613fa403081d41ecade5

Comment by Gerrit Updater [ 29/Oct/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/28662/
Subject: LU-9795 tests: exclude several tests which conflict with SSK
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3c8a48644f04a6cdb81c602a1d199320e9f68aa7

Comment by Gerrit Updater [ 13/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33316/
Subject: LU-9795 mdt: only set groups if GID is not squashed
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e17178808f7430a17b4cfe8f407b7c2a825d285a

Comment by Gerrit Updater [ 13/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33415/
Subject: LU-9795 gss: fix gss-based integrity check for multi-rail
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e2bd32ca75d60870a70cd3a00e8aac8efb751762

Comment by Peter Jones [ 13/Nov/18 ]

Landed for 2.12

Comment by Gerrit Updater [ 17/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33357/
Subject: LU-9795 gss: properly handle mgssec
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 87383c55e74a219e72bcf861a2d2e81d978a927f

Comment by Andreas Dilger [ 26/Aug/19 ]

There were a number of tests added to the ALWAYS_EXCEPT list when SSK was landed (tracked under this ticket), in order to facilitate the landing process:

if $SHARED_KEY; then
        # bug number:    LU-9795 LU-9795 LU-9795 LU-9795
        ALWAYS_EXCEPT+=" 17n     60a     133g    300f "
fi

However, those issues were not fixed and this ticket should not have been closed with the always_except label on it.

Either we need to determine what the root cause of these failures is and fix that, or make a determination that those tests are not suitable to be run when SSK is enabled and the tests should use skip "some good reason" inside the test to be skipped rather than using ALWAYS_EXCEPT (which means that there is some lingering defect to be fixed).

Comment by Andreas Dilger [ 20/Oct/20 ]

After patch https://review.whamcloud.com/40161 "LU-13498 tests: remove tests from ALWAYS_EXCEPT with SSK" lands, the only subtests that are still in ALWAYS_EXCEPT because of this ticket will be:

  • conf-sanity test_84, test_86, test_103
  • replay-single test_121
  • sanity-hsm test_402b
Generated at Sat Feb 10 02:29:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.