[LU-12992] sanity test_240: failed to mount client Created: 21/Nov/19  Updated: 22/Oct/20  Resolved: 17/Mar/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Sebastien Buisson
Resolution: Fixed Votes: 0
Labels: SSK

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for liuying <emoly.liu@intel.com>

Failure Rate: 3.08% of most recent 65 runs, 35 skipped (all branches), especially in group review-dne-ssk testing

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/3f7f577a-0b7d-11ea-9487-52540065bddc

test_240 failed with the following error:

Starting client: onyx-30vm1.onyx.whamcloud.com:  -o user_xattr,flock onyx-30vm4@tcp:/lustre /mnt/lustre
CMD: onyx-30vm1.onyx.whamcloud.com mkdir -p /mnt/lustre
CMD: onyx-30vm1.onyx.whamcloud.com mount -t lustre -o user_xattr,flock onyx-30vm4@tcp:/lustre /mnt/lustre
mount.lustre: mount onyx-30vm4@tcp:/lustre at /mnt/lustre failed: Operation not permitted
 sanity test_240: @@@@@@ FAIL: failed to mount client 

<<Please provide additional information about the failure here>>

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_240 - failed to mount client



 Comments   
Comment by James Nunez (Inactive) [ 12/Dec/19 ]

I’m seeing a similar issue for various sanity-sec tests For example, at https://testing.whamcloud.com/test_sets/203d27d0-1cb1-11ea-b1e8-52540065bddc, we see sanity-sec test 25 fail with, from the suite_log,

trevis-40vm2: mount.lustre: mount trevis-40vm4@tcp:/lustre at /mnt/lustre failed: Operation not permitted
 sanity-sec test_25: @@@@@@ FAIL: unable to mount clients trevis-40vm1.trevis.whamcloud.com,trevis-40vm2 

Since the client (vm2) isn’t able to mount, looking at the console log for that node, we see a couple of Lustre errors

[23381.792880] LustreError: 18418:0:(gss_keyring.c:1411:gss_kt_update()) negotiation: rpc err 0, gss err 0
[23381.794690] LustreError: 18418:0:(gss_keyring.c:1411:gss_kt_update()) Skipped 1 previous similar message
[23381.796257] Lustre: 18418:0:(sec_gss.c:315:cli_ctx_expire()) ctx ffff9dcc80055e00(0->lustre-MDT0000_UUID) get expired: 1576134447(+40s)
[23381.799069] Lustre: 18418:0:(sec_gss.c:315:cli_ctx_expire()) Skipped 3 previous similar messages
[23381.829815] Lustre: 18421:0:(sec_gss.c:377:gss_cli_ctx_uptodate()) client refreshed ctx ffff9dcce6b5d300 idx 0xf9e70ab0fc7179a7 (0->lustre-MDT0001_UUID), expiry 1576739197(+604790s)
[23381.832536] Lustre: 18421:0:(sec_gss.c:377:gss_cli_ctx_uptodate()) Skipped 1 previous similar message
[23381.834400] Lustre: 18421:0:(gss_svc_upcall.c:804:gss_svc_upcall_install_rvs_ctx()) create reverse svc ctx ffff9dcce7e1ce40 to lustre-MDT0001_UUID: idx 0xa48733bdf9b2ed6f
[23381.836877] Lustre: 18421:0:(gss_svc_upcall.c:804:gss_svc_upcall_install_rvs_ctx()) Skipped 1 previous similar message
[23386.829605] Lustre: 18430:0:(sec_gss.c:377:gss_cli_ctx_uptodate()) client refreshed ctx ffff9dccfaadf100 idx 0x6e9f4cddf09a985c (0->lustre-MDT0000_UUID), expiry 1576739202(+604790s)
[23386.832351] Lustre: 18430:0:(sec_gss.c:377:gss_cli_ctx_uptodate()) Skipped 2 previous similar messages
[23386.833903] Lustre: 18430:0:(gss_svc_upcall.c:804:gss_svc_upcall_install_rvs_ctx()) create reverse svc ctx ffff9dcc80169240 to lustre-MDT0000_UUID: idx 0xa48733bdf9b2ed72
[23386.836387] Lustre: 18430:0:(gss_svc_upcall.c:804:gss_svc_upcall_install_rvs_ctx()) Skipped 2 previous similar messages
[23386.845207] LustreError: 18400:0:(lmv_obd.c:1261:lmv_statfs()) lustre-MDT0000-mdc-ffff9dcce7c96800: can't stat MDS #0: rc = -1
[23386.846471] Lustre: 14703:0:(sec_gss.c:1228:gss_cli_ctx_fini_common()) gss.keyring@ffff9dccf9e40c00: destroy ctx ffff9dcc80055e00(0->lustre-MDT0000_UUID)
[23386.846473] Lustre: 14703:0:(sec_gss.c:1228:gss_cli_ctx_fini_common()) Skipped 11 previous similar messages
[23386.861940] Lustre: 17601:0:(gss_cli_upcall.c:398:gss_do_ctx_fini_rpc()) client finishing forward ctx ffff9dccfaadf100 idx 0x6e9f4cddf09a985c (0->lustre-MDT0000_UUID)
[23386.864376] Lustre: 17601:0:(gss_cli_upcall.c:398:gss_do_ctx_fini_rpc()) Skipped 11 previous similar messages
[23386.893798] LustreError: 18400:0:(lov_obd.c:827:lov_cleanup()) lustre-clilov-ffff9dcce7c96800: lov tgt 0 not cleaned! deathrow=0, lovrc=1
[23386.902000] Lustre: Unmounted lustre-client
[23386.903744] LustreError: 18400:0:(obd_mount.c:1672:lustre_fill_super()) Unable to mount  (-1)
[23387.123495] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-sec test_25: @@@@@@ FAIL: unable to mount clients trevis-40vm1.trevis.whamcloud.com,trevis-40vm2 

In the console logs for the MDS1/3 (vm4), we see

[ 5283.481582] Lustre: MGS: Connection restored to decbd92d-dc4c-4 (at 10.9.5.228@tcp)
[ 5283.482930] Lustre: Skipped 1 previous similar message
[ 5283.604715] Lustre: 6471:0:(sec_gss.c:2066:gss_svc_handle_init()) create svc ctx ffff9117dbd56640: user from 10.9.5.228@tcp authenticated as root
[ 5283.606945] Lustre: 6471:0:(sec_gss.c:2066:gss_svc_handle_init()) Skipped 1 previous similar message
[ 5283.638705] LustreError: 2455:0:(gss_svc_upcall.c:969:gss_svc_upcall_handle_init()) authentication failed
[ 5283.640512] LustreError: 2455:0:(gss_svc_upcall.c:969:gss_svc_upcall_handle_init()) Skipped 3 previous similar messages
[ 5283.642785] Lustre: 2455:0:(sec_gss.c:370:gss_cli_ctx_uptodate()) server installed reverse ctx ffff9117dc872100 idx 0xf84af856cc294b8, expiry 1576739209(+604800s)
[ 5283.645641] Lustre: 2455:0:(sec_gss.c:370:gss_cli_ctx_uptodate()) Skipped 1 previous similar message
[ 5288.660041] Lustre: 2454:0:(sec_gss.c:2066:gss_svc_handle_init()) create svc ctx ffff9117b645ae40: user from 10.9.5.229@tcp authenticated as root
[ 5288.662394] Lustre: 2454:0:(sec_gss.c:2066:gss_svc_handle_init()) Skipped 2 previous similar messages
[ 5288.688635] Lustre: 2454:0:(sec_gss.c:370:gss_cli_ctx_uptodate()) server installed reverse ctx ffff9117dca36600 idx 0xa48733bdf9b2ed72, expiry 1576739214(+604800s)
[ 5288.691011] Lustre: 2454:0:(sec_gss.c:370:gss_cli_ctx_uptodate()) Skipped 2 previous similar messages
[ 5288.692523] Lustre: lustre-MDT0000: Connection restored to 15155991-9df2-4 (at 10.9.5.229@tcp)
[ 5288.693925] Lustre: Skipped 4 previous similar messages
[ 5288.716302] Lustre: 2454:0:(sec_gss.c:2326:gss_svc_handle_destroy()) destroy svc ctx ffff9117b645ae40 idx 0x6e9f4cddf09a985c (0->10.9.5.229@tcp)
[ 5288.718414] Lustre: 2454:0:(sec_gss.c:2326:gss_svc_handle_destroy()) Skipped 3 previous similar messages
[ 5288.980369] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-sec test_25: @@@@@@ FAIL: unable to mount clients trevis-40vm1.trevis.whamcloud.com,trevis-40vm2 

Similar failure for sanity-sec test 27a at https://testing.whamcloud.com/test_sets/74c35394-1bab-11ea-adca-52540065bddc . In the client (vm6) console log we see that gss_kt_update() rpc error is -85 and not 0 as above

[23759.672542] LustreError: 3900:0:(gss_keyring.c:1411:gss_kt_update()) negotiation: rpc err -85, gss err 0
Comment by Gerrit Updater [ 17/Dec/19 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37048
Subject: LU-12992 tests: tentative reproducer
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 931e36b050e63b7c5c2c350d9fa8a4520a7df2a9

Comment by Gerrit Updater [ 19/Dec/19 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37064
Subject: LU-12992 gss: retry in case of short computed shared key
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8609331583142d48241eaffd74ccf73aaad96d1d

Comment by Gerrit Updater [ 19/Dec/19 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37065
Subject: LU-12992 tests: another tentative reproducer
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 600a6e54b9f6963b56d56ed8365e036fda732956

Comment by Gerrit Updater [ 17/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37064/
Subject: LU-12992 gss: retry in case of short computed shared key
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6724002732c1bedb4ae4217871d432972672747e

Comment by Peter Jones [ 17/Mar/20 ]

Landed for 2.14

Comment by Gerrit Updater [ 22/Oct/20 ]

Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40364
Subject: LU-12992 gss: retry in case of short computed shared key
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 380fe0c16adcbe2b4e8040f129dc27755f9bab86

Generated at Sat Feb 10 02:57:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.