Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7508

LBUG sending reply to GSS enabled client

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      Lustre with LBUG when handling a reply to RPC which has a bad context or bad signature (due to server's target being remounted). When GSS enabled the rq_reqmsg is NULL in this case so lustre_msg_get_opc should not be called.

      <4>Oops: 0000 [#1] SMP
      <4>last sysfs file: /sys/devices/system/cpu/possible
      <4>CPU 2
      <4>Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) osd_ldiskfs(U) ldiskfs(U) exportfs lquota(U) lfsck(U) jbd obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc_gss(U) sunrpc ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic libcfs(U) autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 microcode sg virtio_balloon snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache sr_mod cdrom virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      <4>
      <4>Pid: 4134, comm: mdt01_002 Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Red Hat KVM
      <4>RIP: 0010:[<ffffffffa0665c6e>]  [<ffffffffa0665c6e>] lustre_msg_get_opc+0xe/0x100 [ptlrpc]
      <4>RSP: 0018:ffff8800cb37fca0  EFLAGS: 00010286
      <4>RAX: 0000000000000000 RBX: ffff8800bcfd5c80 RCX: 0000000000000000
      <4>RDX: 0000000000000122 RSI: 0000000000000000 RDI: 0000000000000000
      <4>RBP: ffff8800cb37fcb0 R08: 0000000000000003 R09: 0000000000000140
      <4>R10: 0000000000000240 R11: 0000000000000400 R12: 0000000000000000
      <4>R13: ffff8800cb345ec0 R14: ffff8800cc32cc00 R15: 0000000000000122
      <4>FS:  0000000000000000(0000) GS:ffff88002c300000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      <4>CR2: 0000000000000008 CR3: 0000000116943000 CR4: 00000000000006e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process mdt01_002 (pid: 4134, threadinfo ffff8800cb37e000, task ffff8800cb37d540)
      <4>Stack:
      <4> 0000000000000028 ffff8800bcfd5c80 ffff8800cb37fce0 ffffffffa06279c8
      <4><d> ffff8800cb37fcd0 ffff8800b4bb6000 ffff8800bcfd5c80 ffff8800cb345ec0
      <4><d> ffff8800cb37fd50 ffffffffa0627f2e ffffffffa0921760 ffff8800cb345ec0
      <4>Call Trace:
      <4> [<ffffffffa06279c8>] target_send_reply_msg+0x68/0x1f0 [ptlrpc]
      <4> [<ffffffffa0627f2e>] target_send_reply+0x3de/0x710 [ptlrpc]
      <4> [<ffffffffa06723bf>] ptlrpc_server_handle_req_in+0x25f/0xd10 [ptlrpc]
      <4> [<ffffffffa0678a86>] ptlrpc_main+0x9d6/0x1910 [ptlrpc]
      <4> [<ffffffffa06780b0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
      <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
      <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
      <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
      <4>Code: 24 50 48 83 c4 78 4c 89 e0 5b 41 5c 41 5d 41 5e 41 5f c9 c3 45 31 e4 e9 13 ff ff ff 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 <81> 7f 08 d3 0b d0 0b 48 89 fb 74 66 c7 05 ac 21 12 00 00 01 00
      <1>RIP  [<ffffffffa0665c6e>] lustre_msg_get_opc+0xe/0x100 [ptlrpc]
      <4> RSP <ffff8800cb37fca0>
      <4>CR2: 0000000000000008
      

      I will post a patch shortly for this.

      Attachments

        Issue Links

          Activity

            People

              jfilizetti Jeremy Filizetti
              jfilizetti Jeremy Filizetti
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: