Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5398

Oops in ldlm_handle_enqueue0()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.7.0
    • Lustre 2.7.0
    • 3
    • 15024

    Description

      In ldlm_handle_enqueue0() following an early GOTO(out, ...) may leave lock NULL. This will oops:

       out:
               ...
              if (!IS_ERR(lock)) {
                      LDLM_DEBUG(lock, "server-side enqueue handler, sending reply"
                                 "(err=%d, rc=%d)", err, rc);
      
                      if (rc == 0) {
                              if (req_capsule_has_field(&req->rq_pill, &RMF_DLM_LVB,
                                                        RCL_SERVER) &&
                                  ldlm_lvbo_size(lock) > 0) {
                                      void *buf;
                                      int buflen;
      
                                      buf = req_capsule_server_get(&req->rq_pill,
                                                                   &RMF_DLM_LVB);
                                      LASSERTF(buf != NULL, "req %p, lock %p\n",
                                               req, lock);
                                      buflen = req_capsule_get_size(&req->rq_pill,
                                                      &RMF_DLM_LVB, RCL_SERVER);
                                      if (buflen > 0) {
                                              buflen = ldlm_lvbo_fill(lock, buf,
                                                                      buflen);
                                              if (buflen >= 0)
                                                      req_capsule_shrink(
                                                              &req->rq_pill,
                                                              &RMF_DLM_LVB,
                                                              buflen, RCL_SERVER);
                                              else
                                                      rc = buflen;
                                      } else {
                                              rc = buflen;
                                      }
                              }
                      } else {
                              lock_res_and_lock(lock);
      			ldlm_resource_unlink_lock(lock);
                              ldlm_lock_destroy_nolock(lock);
                              unlock_res_and_lock(lock);
                      }
      
      [  139.919778] BUG: unable to handle kernel NULL pointer dereference at 0000000000000186
      [  139.919782] IP: [<ffffffffa063d05e>] lock_res_and_lock+0xe/0x60 [ptlrpc]
      [  139.919837] PGD 1f061a067 PUD 1e9293067 PMD 0
      [  139.919842] Oops: 0000 [#1] SMP
      [  139.919844] last sysfs file: /sys/devices/system/cpu/possible
      [  139.919904] CPU 1
      [  139.919906] Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U)\
       nodemap(U) osd_ldiskfs(U) ldiskfs(U) exportfs lquota(U) lfsck(U) jbd obdecho(U) mgc(U) lov(\
      U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_gener\
      ic sha256_generic libcfs(U) autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 microc\
      ode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk virtio_pci vir\
      tio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last \
      unloaded: speedstep_lib]
      [  139.919956]
      [  139.919959] Pid: 30623, comm: ll_ost00_011 Not tainted 2.6.32-431.5.1.el6.lustre.x86_64 #\
      1 Bochs Bochs
      [  139.919964] RIP: 0010:[<ffffffffa063d05e>]  [<ffffffffa063d05e>] lock_res_and_lock+0xe/0x\
      60 [ptlrpc]
      [  139.920005] RSP: 0018:ffff8801ef06bc70  EFLAGS: 00010282
      [  139.920008] RAX: 0000000000000001 RBX: ffff8801fc914eb0 RCX: 0000000000000000
      [  139.920010] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000
      [  139.920013] RBP: ffff8801ef06bc80 R08: 00000000fffffff9 R09: 00000000fffffffc
      [  139.920016] R10: 0000000000000001 R11: 0000000000000000 R12: ffffc9000d7cb0e0
      [  139.920019] R13: 00000000fffffff2 R14: 0000000000000000 R15: 00000000fffffff2
        139.920022] FS:  0000000000000000(0000) GS:ffff88002fa00000(0000) knlGS:0000000000000000
      [  139.920025] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      [  139.920028] CR2: 0000000000000186 CR3: 00000001eaac9000 CR4: 00000000000006e0
      [  139.920038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  139.920045] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  139.920048] Process ll_ost00_011 (pid: 30623, threadinfo ffff8801ef06a000, task ffff8801e\
      c412100)
      [  139.920051] Stack:
      [  139.920053]  0000000000000004 ffff8801fc914eb0 ffff8801ef06bcf0 ffffffffa066b609
      [  139.920058] <d> ffff880219ec4db0 ffff880219ec4d98 0000000000000001 ffff880219ec4d98
      [  139.920063] <d> 0000000000000000 0000000040000000 ffffc9000d7cb000 ffff8801fc914eb0
      [  139.920074] Call Trace:
      [  139.920086]  [<ffffffffa066b609>] ldlm_handle_enqueue0+0x699/0x11e0 [ptlrpc]
      [  139.920086]  [<ffffffffa06ea4b2>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
      [  139.920086]  [<ffffffffa06e9745>] tgt_request_handle+0x245/0xad0 [ptlrpc]
      [  139.920086]  [<ffffffffa069c9f1>] ptlrpc_main+0xcf1/0x1870 [ptlrpc]
      [  139.920086]  [<ffffffffa069bd00>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
      [  139.920086]  [<ffffffff8109eab6>] kthread+0x96/0xa0
      [  139.920086]  [<ffffffff8100c30a>] child_rip+0xa/0x20
      [  139.920086]  [<ffffffff81554710>] ? _spin_unlock_irq+0x30/0x40
      [  139.920086]  [<ffffffff8100bb10>] ? restore_args+0x0/0x30
      [  139.920086]  [<ffffffff8109ea20>] ? kthread+0x0/0xa0
      [  139.920086]  [<ffffffff8100c300>] ? child_rip+0x0/0x20
      [  139.920086] Code: 83 86 01 00 00 40 75 0c 48 8d bb 88 00 00 00 e8 59 77 f1 e0 48 83 c4 08\
       5b c9 c3 66 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 <f6> 87 86 01 00 00 40 48 89 fb 75\
       0c 48 8d bf 88 00 00 00 e8 ea
      [  139.920086] RIP  [<ffffffffa063d05e>] lock_res_and_lock+0xe/0x60 [ptlrpc]
      [  139.920086]  RSP <ffff8801ef06bc70>
      [  139.920086] CR2: 0000000000000186
      

      This was found via RPC packet corruption.

      Attachments

        Issue Links

          Activity

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: