[LU-17228] sanity test_36i: expect 200 got 1698166494 Created: 25/Oct/23  Updated: 09/Jan/24  Resolved: 09/Jan/24

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Related
is related to LU-17173 Cleanup lfs flushctx Resolved
is related to LU-17286 recovery-small test_66 timeout Resolved
is related to LU-17317 sanity-sec test_16: test all_off:6000... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for S Buisson <sbuisson@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/092044e2-44a4-4fac-8108-9b95801c2180

test_36i failed with the following error:

expect 200 got 1698166494

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/99900 - 4.18.0-477.21.1.el8_8.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/99900 - 4.18.0-477.21.1.el8_lustre.x86_64

Possibly because the file could not be stated:

stat: cannot statx '/mnt/lustre/d36i.sanity/striped_dir': No such file or directory

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_36i - expect 200 got 1698166494



 Comments   
Comment by Etienne Aujames [ 27/Nov/23 ]

+1 on master: https://testing.whamcloud.com/test_sets/54903a1b-c187-4eab-a26f-7d010e36f1c6

Comment by Etienne Aujames [ 29/Nov/23 ]

This test continues to fail on https://review.whamcloud.com/53202.
It seems to be related to GSS:

client dmesg:

[ 2861.312457] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 36i: change mtime on striped directory ==== 10:22:05 \(1701080525\)
[ 2861.534950] Lustre: DEBUG MARKER: == sanity test 36i: change mtime on striped directory ==== 10:22:05 (1701080525)
[ 2861.606338] Lustre: 18765:0:(gss_svc_upcall.c:1620:gss_svc_upcall_get_ctx()) Invalid gss ctx idx 0x64bb119e1d581425 from 10.240.23.70@tcp
[ 2861.936148] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_36i: @@@@@@ FAIL: expect 200 got 1701080525 
[ 2862.151355] Lustre: DEBUG MARKER: sanity test_36i: @@@@@@ FAIL: expect 200 got 1701080525
[ 2862.470230] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /autotest/autotest-2/2023-11-27/lustre-reviews_review-dne-selinux-ssk-part-1_100496_24_a3948d5e-bc2b-4bec-96bc-b5afaffeb1fb//sanity.test_36i.debug_log.$(hostname -s).1701080526.log;
               		dmesg > /autotest/autotest-2/2023-11-27/lustre-revi

MDS (2) logs:

[ 2940.267050] Lustre: DEBUG MARKER: == sanity test 36i: change mtime on striped directory ==== 10:22:05 (1701080525)
[ 2940.323819] Lustre: 11588:0:(sec_gss.c:686:gss_cli_ctx_handle_err_notify()) req x1783709610067456/t0, ctx 000000007b39aa0b idx 0x64bb119e1d581425(0->c): reverseserver respond (00080000/00000000)
[ 2940.326903] LustreError: 11588:0:(client.c:1491:after_reply()) @@@ unwrap reply failed: rc = -22  req@0000000027c4ce16 x1783709610067456/t0(0) o104->lustre-MDT0001@10.240.23.66@tcp:15/16 lens 328/224 e 0 to 0 dl 1701080541 ref 1 fl Rpc:RQU/0/ffffffff rc 0/-1 job:'' uid:4294967295 gid:4294967295
[ 2940.654427] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_36i: @@@@@@ FAIL: expect 200 got 1701080525 
[ 2940.910097] Lustre: DEBUG MARKER: sanity test_36i: @@@@@@ FAIL: expect 200 got 1701080525
[ 2941.195988] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /autotest/autotest-2/2023-11-27/lustre-reviews_review-dne-selinux-ssk-part-1_100496_24_a3948d5e-bc2b-4bec-96bc-b5afaffeb1fb//sanity.test_36i.debug_log.$(hostname -s).1701080526.log;
               		dmesg > /autotest/autotest-2/2023-11-27/lustre-revi

From what I understood, the error returned by server is: major: 00080000 = GSS_S_BAD_NAMETYPE

I found the same traces on LU-17286.

Comment by Sebastien Buisson [ 04/Dec/23 ]

Thanks Etienne. Looks similar to LU-17173 that I am currently investigating.

Comment by Andreas Dilger [ 06/Jan/24 ]

+1 on master: https://testing.whamcloud.com/test_sets/1e766618-c631-4e9c-b822-fcb1f538e578

Comment by Andreas Dilger [ 09/Jan/24 ]

+1 on master: https://testing.whamcloud.com/test_sets/6e5fa464-b884-4959-98fa-b086a97fdce8

Comment by Sebastien Buisson [ 09/Jan/24 ]

This issue is fixed with landing of patch "LU-17317 gss: do not continue using expired reverse context" https://review.whamcloud.com/c/fs/lustre-release/+/53375.
The two failures reported above are for patches not sitting on top of master, hence lacking the fix.

Comment by Andreas Dilger [ 09/Jan/24 ]

Fixed with landing of LU-17317.

Generated at Sat Feb 10 03:33:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.