Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7508

LBUG sending reply to GSS enabled client

Details

    • 3
    • 9223372036854775807

    Description

      Lustre with LBUG when handling a reply to RPC which has a bad context or bad signature (due to server's target being remounted). When GSS enabled the rq_reqmsg is NULL in this case so lustre_msg_get_opc should not be called.

      <4>Oops: 0000 [#1] SMP
      <4>last sysfs file: /sys/devices/system/cpu/possible
      <4>CPU 2
      <4>Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) osd_ldiskfs(U) ldiskfs(U) exportfs lquota(U) lfsck(U) jbd obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc_gss(U) sunrpc ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic libcfs(U) autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 microcode sg virtio_balloon snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache sr_mod cdrom virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      <4>
      <4>Pid: 4134, comm: mdt01_002 Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Red Hat KVM
      <4>RIP: 0010:[<ffffffffa0665c6e>]  [<ffffffffa0665c6e>] lustre_msg_get_opc+0xe/0x100 [ptlrpc]
      <4>RSP: 0018:ffff8800cb37fca0  EFLAGS: 00010286
      <4>RAX: 0000000000000000 RBX: ffff8800bcfd5c80 RCX: 0000000000000000
      <4>RDX: 0000000000000122 RSI: 0000000000000000 RDI: 0000000000000000
      <4>RBP: ffff8800cb37fcb0 R08: 0000000000000003 R09: 0000000000000140
      <4>R10: 0000000000000240 R11: 0000000000000400 R12: 0000000000000000
      <4>R13: ffff8800cb345ec0 R14: ffff8800cc32cc00 R15: 0000000000000122
      <4>FS:  0000000000000000(0000) GS:ffff88002c300000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      <4>CR2: 0000000000000008 CR3: 0000000116943000 CR4: 00000000000006e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process mdt01_002 (pid: 4134, threadinfo ffff8800cb37e000, task ffff8800cb37d540)
      <4>Stack:
      <4> 0000000000000028 ffff8800bcfd5c80 ffff8800cb37fce0 ffffffffa06279c8
      <4><d> ffff8800cb37fcd0 ffff8800b4bb6000 ffff8800bcfd5c80 ffff8800cb345ec0
      <4><d> ffff8800cb37fd50 ffffffffa0627f2e ffffffffa0921760 ffff8800cb345ec0
      <4>Call Trace:
      <4> [<ffffffffa06279c8>] target_send_reply_msg+0x68/0x1f0 [ptlrpc]
      <4> [<ffffffffa0627f2e>] target_send_reply+0x3de/0x710 [ptlrpc]
      <4> [<ffffffffa06723bf>] ptlrpc_server_handle_req_in+0x25f/0xd10 [ptlrpc]
      <4> [<ffffffffa0678a86>] ptlrpc_main+0x9d6/0x1910 [ptlrpc]
      <4> [<ffffffffa06780b0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
      <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
      <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
      <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
      <4>Code: 24 50 48 83 c4 78 4c 89 e0 5b 41 5c 41 5d 41 5e 41 5f c9 c3 45 31 e4 e9 13 ff ff ff 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 <81> 7f 08 d3 0b d0 0b 48 89 fb 74 66 c7 05 ac 21 12 00 00 01 00
      <1>RIP  [<ffffffffa0665c6e>] lustre_msg_get_opc+0xe/0x100 [ptlrpc]
      <4> RSP <ffff8800cb37fca0>
      <4>CR2: 0000000000000008
      

      I will post a patch shortly for this.

      Attachments

        Issue Links

          Activity

            [LU-7508] LBUG sending reply to GSS enabled client
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17414/
            Subject: LU-7508 ldlm: Don't check opcode with NULL rq_reqmsg
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 3f4572caef5f25f4a9b5347b2ccf933fdad9db9c

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17414/ Subject: LU-7508 ldlm: Don't check opcode with NULL rq_reqmsg Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3f4572caef5f25f4a9b5347b2ccf933fdad9db9c

            LUDOC-197 should be the most accurate reference. The code has not been pushed yet. Part of why we pushed the doc early was to make sure there were no usability issues with how something would work so it could be fixed before pushing the code. Right now the last thing I need to do before pushing the code is to simplify handling for the MGS keys. I expect to have that completed at least by the weekend.

            jfilizetti Jeremy Filizetti added a comment - LUDOC-197 should be the most accurate reference. The code has not been pushed yet. Part of why we pushed the doc early was to make sure there were no usability issues with how something would work so it could be fixed before pushing the code. Right now the last thing I need to do before pushing the code is to simplify handling for the MGS keys. I expect to have that completed at least by the weekend.
            jhammond John Hammond added a comment -

            Is there a description of how to setup a shared key Lustre instance? Is LUDOC-197 up to date? It references some scripts/utilities that I can't find anywhere.

            jhammond John Hammond added a comment - Is there a description of how to setup a shared key Lustre instance? Is LUDOC-197 up to date? It references some scripts/utilities that I can't find anywhere.

            All of my testing right now is on shared key. However, this condition should be common to all GSS mechanisms. The SECSVC_COMPLETE in ptlrpc_server_handle_req_in() can come from gss_svc_upcall_handle_init() when no init channel or context is instantiated or when gss_svc_handle_data() returns gssapi major error no context or bad signature. In both of those cases, rq_reqmsg wasn't populated with the lustre_msg_buf at the bottom of gss_svc_handle_init() gss_svc_verify_request() respectively.

            I'll try to take a look at the test suite to see if I can hack together a test for this to include in the commit.

            jfilizetti Jeremy Filizetti added a comment - All of my testing right now is on shared key. However, this condition should be common to all GSS mechanisms. The SECSVC_COMPLETE in ptlrpc_server_handle_req_in() can come from gss_svc_upcall_handle_init() when no init channel or context is instantiated or when gss_svc_handle_data() returns gssapi major error no context or bad signature. In both of those cases, rq_reqmsg wasn't populated with the lustre_msg_buf at the bottom of gss_svc_handle_init() gss_svc_verify_request() respectively. I'll try to take a look at the test suite to see if I can hack together a test for this to include in the commit.
            jhammond John Hammond added a comment -

            Hi Jeremy,

            Are you testing this with Kerberos, shared key, or other? It would be nice to add regression tests for things like this if possible.

            jhammond John Hammond added a comment - Hi Jeremy, Are you testing this with Kerberos, shared key, or other? It would be nice to add regression tests for things like this if possible.

            Jeremy Filizetti (jeremy.filizetti@gmail.com) uploaded a new patch: http://review.whamcloud.com/17414
            Subject: LU-7508 ldlm: Don't check opcode with NULL rq_reqmsg
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4fd60696fb3c3f57abed7b49655e3708c761e332

            gerrit Gerrit Updater added a comment - Jeremy Filizetti (jeremy.filizetti@gmail.com) uploaded a new patch: http://review.whamcloud.com/17414 Subject: LU-7508 ldlm: Don't check opcode with NULL rq_reqmsg Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4fd60696fb3c3f57abed7b49655e3708c761e332

            People

              jfilizetti Jeremy Filizetti
              jfilizetti Jeremy Filizetti
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: