[LU-7508] LBUG sending reply to GSS enabled client Created: 01/Dec/15 Updated: 01/Jun/16 Resolved: 09/Dec/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jeremy Filizetti | Assignee: | Jeremy Filizetti |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | SSK, kerberos, patch | ||
| Issue Links: |
|
||||||||
| Epic/Theme: | gss | ||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Lustre with LBUG when handling a reply to RPC which has a bad context or bad signature (due to server's target being remounted). When GSS enabled the rq_reqmsg is NULL in this case so lustre_msg_get_opc should not be called. <4>Oops: 0000 [#1] SMP <4>last sysfs file: /sys/devices/system/cpu/possible <4>CPU 2 <4>Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) osd_ldiskfs(U) ldiskfs(U) exportfs lquota(U) lfsck(U) jbd obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc_gss(U) sunrpc ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic libcfs(U) autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 microcode sg virtio_balloon snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache sr_mod cdrom virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] <4> <4>Pid: 4134, comm: mdt01_002 Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Red Hat KVM <4>RIP: 0010:[<ffffffffa0665c6e>] [<ffffffffa0665c6e>] lustre_msg_get_opc+0xe/0x100 [ptlrpc] <4>RSP: 0018:ffff8800cb37fca0 EFLAGS: 00010286 <4>RAX: 0000000000000000 RBX: ffff8800bcfd5c80 RCX: 0000000000000000 <4>RDX: 0000000000000122 RSI: 0000000000000000 RDI: 0000000000000000 <4>RBP: ffff8800cb37fcb0 R08: 0000000000000003 R09: 0000000000000140 <4>R10: 0000000000000240 R11: 0000000000000400 R12: 0000000000000000 <4>R13: ffff8800cb345ec0 R14: ffff8800cc32cc00 R15: 0000000000000122 <4>FS: 0000000000000000(0000) GS:ffff88002c300000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>CR2: 0000000000000008 CR3: 0000000116943000 CR4: 00000000000006e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process mdt01_002 (pid: 4134, threadinfo ffff8800cb37e000, task ffff8800cb37d540) <4>Stack: <4> 0000000000000028 ffff8800bcfd5c80 ffff8800cb37fce0 ffffffffa06279c8 <4><d> ffff8800cb37fcd0 ffff8800b4bb6000 ffff8800bcfd5c80 ffff8800cb345ec0 <4><d> ffff8800cb37fd50 ffffffffa0627f2e ffffffffa0921760 ffff8800cb345ec0 <4>Call Trace: <4> [<ffffffffa06279c8>] target_send_reply_msg+0x68/0x1f0 [ptlrpc] <4> [<ffffffffa0627f2e>] target_send_reply+0x3de/0x710 [ptlrpc] <4> [<ffffffffa06723bf>] ptlrpc_server_handle_req_in+0x25f/0xd10 [ptlrpc] <4> [<ffffffffa0678a86>] ptlrpc_main+0x9d6/0x1910 [ptlrpc] <4> [<ffffffffa06780b0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc] <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4>Code: 24 50 48 83 c4 78 4c 89 e0 5b 41 5c 41 5d 41 5e 41 5f c9 c3 45 31 e4 e9 13 ff ff ff 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 <81> 7f 08 d3 0b d0 0b 48 89 fb 74 66 c7 05 ac 21 12 00 00 01 00 <1>RIP [<ffffffffa0665c6e>] lustre_msg_get_opc+0xe/0x100 [ptlrpc] <4> RSP <ffff8800cb37fca0> <4>CR2: 0000000000000008 I will post a patch shortly for this. |
| Comments |
| Comment by Gerrit Updater [ 01/Dec/15 ] |
|
Jeremy Filizetti (jeremy.filizetti@gmail.com) uploaded a new patch: http://review.whamcloud.com/17414 |
| Comment by John Hammond [ 02/Dec/15 ] |
|
Hi Jeremy, Are you testing this with Kerberos, shared key, or other? It would be nice to add regression tests for things like this if possible. |
| Comment by Jeremy Filizetti [ 02/Dec/15 ] |
|
All of my testing right now is on shared key. However, this condition should be common to all GSS mechanisms. The SECSVC_COMPLETE in ptlrpc_server_handle_req_in() can come from gss_svc_upcall_handle_init() when no init channel or context is instantiated or when gss_svc_handle_data() returns gssapi major error no context or bad signature. In both of those cases, rq_reqmsg wasn't populated with the lustre_msg_buf at the bottom of gss_svc_handle_init() gss_svc_verify_request() respectively. I'll try to take a look at the test suite to see if I can hack together a test for this to include in the commit. |
| Comment by John Hammond [ 02/Dec/15 ] |
|
Is there a description of how to setup a shared key Lustre instance? Is |
| Comment by Jeremy Filizetti [ 02/Dec/15 ] |
|
|
| Comment by Gerrit Updater [ 09/Dec/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17414/ |
| Comment by Peter Jones [ 09/Dec/15 ] |
|
Landed for 2.8 |