[LU-7182] LBUG during key reestablishment with GSS clients Created: 18/Sep/15 Updated: 01/Jun/16 Resolved: 24/Nov/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jeremy Filizetti | Assignee: | John Hammond |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | SSK, kerberos | ||
| Issue Links: |
|
||||||||||||
| Epic/Theme: | gss | ||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Error during reconnect and during key expiration when the client attempts to instantiate a new key with the server. The server gets an LBUG: <3>LustreError: 10260:0:(tgt_handler.c:679:tgt_request_handle()) @@@ Unexpected xid 0 vs. last_xid 55e7a7650006b See following dmesg output: <4>Lustre: 9770:0:(sec_gss.c:2086:gss_svc_handle_init()) create svc ctx ffff8800a3f2d240: user from 192.168.1.108@tcp authenticated as root <4>Lustre: 9770:0:(sec_gss.c:394:gss_cli_ctx_uptodate()) server installed reverse ctx ffff8800a8d74b40 idx 0xbf1831de90d710e6, expiry 3588728678(+2147483647s) <4>Lustre: 9770:0:(tgt_handler.c:850:tgt_init_sec_level()) client 192.168.1.108@tcp -> target lustre-MDT0000 uses old version, run under security level 0. <4>Lustre: 10260:0:(sec_gss.c:2346:gss_svc_handle_destroy()) destroy svc ctx ffff8800a3f2d240 idx 0xbd752af10f590f8d (0->192.168.1.108@tcp) <3>LustreError: 10260:0:(tgt_handler.c:679:tgt_request_handle()) @@@ Unexpected xid 0 vs. last_xid 55e7a7650006b <3> req@ffff8800b3b85cc0 x0/t0(0) o803->2fc4a29f-142e-0b7c-f3c3-9b04e56ce460@192.168.1.108@tcp:0/0 lens 224/0 e 0 to 0 dl 1441245045 ref 1 fl Interpret:/0/ffffffff rc 0/-1 <4>Lustre: 10280:0:(gss_svc_upcall.c:1076:gss_svc_upcall_get_ctx()) Invalid gss ctx idx 0xbd752af10f590f8d from 192.168.1.108@tcp <0>LustreError: 10260:0:(tgt_handler.c:681:tgt_request_handle()) LBUG <4>Pid: 10260, comm: mdt00_004 <4> <4>Call Trace: <4> [<ffffffffa100d875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa100de77>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa069db98>] tgt_request_handle+0xc68/0x1230 [ptlrpc] <4> [<ffffffffa06455d1>] ptlrpc_main+0xe41/0x1920 [ptlrpc] <4> [<ffffffffa0644790>] ? ptlrpc_main+0x0/0x1920 [ptlrpc] <4> [<ffffffff8109abf6>] kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 10260, comm: mdt00_004 Not tainted 2.6.32-431.23.3.el6_lustre.x86_64 #1 <4>Call Trace: <4> [<ffffffff8152896c>] ? panic+0xa7/0x16f <4> [<ffffffffa100decb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa069db98>] ? tgt_request_handle+0xc68/0x1230 [ptlrpc] <4> [<ffffffffa06455d1>] ? ptlrpc_main+0xe41/0x1920 [ptlrpc] <4> [<ffffffffa0644790>] ? ptlrpc_main+0x0/0x1920 [ptlrpc] <4> [<ffffffff8109abf6>] ? kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20 <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 |
| Comments |
| Comment by Jeremy Filizetti [ 18/Sep/15 ] |
|
Linking to shared key but I believe this affect kerberos as well |
| Comment by Joseph Gmitter (Inactive) [ 18/Sep/15 ] |
|
Hi John, |
| Comment by Sebastien Buisson (Inactive) [ 21/Sep/15 ] |
|
Hi, I think the following patch from Niu should land before this issue is investigated: Indeed, Kerberos revival and multi-slot last-rcvd features were developed at the same time. And because they both change portions of code in CLIO, it can lead to such issues. Sebastien. |
| Comment by Joseph Gmitter (Inactive) [ 02/Oct/15 ] |
|
Update: http://review.whamcloud.com/#/c/15473/ has landed to master on 10/2 |
| Comment by Peter Jones [ 26/Oct/15 ] |
|
Jeremy Do you still see this as an issue with the tip of master? Peter |
| Comment by Andreas Dilger [ 24/Nov/15 ] |
|
The patch http://review.whamcloud.com/15473 " Since we haven't hit this issue ourselves, I'm closing this as a duplicate of that bug. Jeremy, please reopen if that patch did not resolve your probem. |
| Comment by Jeremy Filizetti [ 01/Dec/15 ] |
|
I'm not sure about 15473, I ended up using http://review.whamcloud.com/#/c/16759/. |
| Comment by Jeremy Filizetti [ 01/Dec/15 ] |
|
I don't have access to reopen this or I don't know how. Either way it needs to be reopen until 16759 lands. After looking in my test system I can see that without 16759 I was still having the LBUG. |
| Comment by Peter Jones [ 01/Dec/15 ] |
|
Jeremy I have now given you permissions to reopen tickets, but in this case, wouldn't it be better to just track Peter |
| Comment by Jeremy Filizetti [ 01/Dec/15 ] |
|
Thanks. That is fine, we can keep it in |