Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5337

bad free from lod_striped_it_fini()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.6.0, Lustre 2.7.0
    • 3
    • 14892

    Description

      This was found using memory allocation fault injection.

      export MDSCOUNT=4
      llmount.sh
      
      lfs mkdir -c4 /mnt/lustre/d0
      echo /root/lustre-release/lustre/osp/osp_object.c:1149 > /proc/fs/lustre/alloc_fail # fail to alloc it in osp_it_init()
      rmdir /mnt/lustre/d0
      
      [  152.160897] ------------[ cut here ]------------
      [  152.161814] kernel BUG at mm/slab.c:2965!
      [  152.161814] invalid opcode: 0000 [#1] SMP 
      [  152.161814] last sysfs file: /sys/devices/system/cpu/possible
      [  152.161814] CPU 1 
      [  152.161814] Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) nodemap(U) osd_ldiskfs(U) ldiskfs(U) exportfs lquota(U) lfsck(U) jbd obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      [  152.177629] 
      [  152.177629] Pid: 6163, comm: mdt00_002 Not tainted 2.6.32-431.5.1.el6.lustre.x86_64 #1 Bochs Bochs
      [  152.177629] RIP: 0010:[<ffffffff81184707>]  [<ffffffff81184707>] cache_free_debugcheck+0x1c7/0x260
      [  152.177629] RSP: 0018:ffff8801f8169960  EFLAGS: 00010002
      [  152.177629] RAX: 0000000080042800 RBX: ffff88021fd304c0 RCX: ffff8801f816c000
      [  152.177629] RDX: 0000000000000000 RSI: 0000000000080000 RDI: ffff8801f816c570
      [  152.177629] RBP: ffff8801f81699a0 R08: 00000000fffffffe R09: 0000000000000000
      [  152.177629] R10: 000000000000000f R11: 000000000000000f R12: ffff8801f816c570
      [  152.177629] R13: ffff8801f94301f0 R14: ffffffffa0d9df54 R15: ffff8801f94302a0
      [  152.177629] FS:  0000000000000000(0000) GS:ffff88002fa00000(0000) knlGS:0000000000000000
      [  152.177629] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      [  152.177629] CR2: 00000000006d4460 CR3: 0000000217dfe000 CR4: 00000000000006e0
      [  152.177629] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  152.177629] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  152.177629] Process mdt00_002 (pid: 6163, threadinfo ffff8801f8168000, task ffff8801f8166600)
      [  152.177629] Stack:
      [  152.177629]  ffff8801f8145418 ffff8801f816c570 0000000000000000 ffff88021fd304c0
      [  152.177629] <d> 0000000000000286 ffff8801f816c570 ffff88021ccdf308 ffff8801f94302a0
      [  152.177629] <d> ffff8801f81699f0 ffffffff81187a99 ffff8801f81699c0 ffff8801de8aecc0
      [  152.177629] Call Trace:
      [  152.177629]  [<ffffffff81187a99>] kfree+0xe9/0x300
      [  152.177629]  [<ffffffffa0d9df54>] osp_it_fini+0x194/0x1e0 [osp]
      [  152.177629]  [<ffffffffa0d5a044>] lod_striped_it_fini+0xb4/0x200 [lod]
      [  152.177629]  [<ffffffffa0805143>] mdd_may_delete+0x543/0x990 [mdd]
      [  152.177629]  [<ffffffffa08055d5>] mdd_unlink_sanity_check+0x45/0x100 [mdd]
      [  152.177629]  [<ffffffffa080c563>] mdd_unlink+0x233/0xcc0 [mdd]
      [  152.177629]  [<ffffffffa0cb985a>] ? mdt_reint_unlink+0x9ca/0x10b0 [mdt]
      [  152.177629]  [<ffffffffa0cb0a08>] mdo_unlink+0x18/0x50 [mdt]
      [  152.177629]  [<ffffffffa0cb9894>] mdt_reint_unlink+0xa04/0x10b0 [mdt]
      [  152.177629]  [<ffffffffa0cb07a1>] mdt_reint_rec+0x41/0xe0 [mdt]
      [  152.177629]  [<ffffffffa0c9baf3>] mdt_reint_internal+0x4c3/0x7c0 [mdt]
      [  152.177629]  [<ffffffffa0c9c37b>] mdt_reint+0x6b/0x120 [mdt]
      [  152.177629]  [<ffffffffa06ebc35>] tgt_request_handle+0x245/0xad0 [ptlrpc]
      [  152.177629]  [<ffffffffa069ed91>] ptlrpc_main+0xcf1/0x1880 [ptlrpc]
      [  152.177629]  [<ffffffff81554710>] ? _spin_unlock_irq+0x30/0x40
      [  152.177629]  [<ffffffff8105e57d>] ? finish_task_switch+0x7d/0x110
      [  152.177629]  [<ffffffff8105e548>] ? finish_task_switch+0x48/0x110
      [  152.177629]  [<ffffffff815514a5>] ? thread_return+0x4e/0x7d9
      [  152.177629]  [<ffffffffa069e0a0>] ? ptlrpc_main+0x0/0x1880 [ptlrpc]
      [  152.177629]  [<ffffffff8109eab6>] kthread+0x96/0xa0
      [  152.177629]  [<ffffffff8100c30a>] child_rip+0xa/0x20
      [  152.177629]  [<ffffffff81554710>] ? _spin_unlock_irq+0x30/0x40
      [  152.177629]  [<ffffffff8100bb10>] ? restore_args+0x0/0x30
      [  152.177629]  [<ffffffff8109ea20>] ? kthread+0x0/0xa0
      [  152.177629]  [<ffffffff8100c300>] ? child_rip+0x0/0x20
      [  152.177629] Code: ff a5 48 83 c4 18 4c 89 e0 5b 41 5c 41 5d 41 5e 41 5f c9 c3 8b 83 0c 80 00 00 4d 89 74 04 f8 8b 83 14 80 00 00 eb 80 0f 0b eb fe <0f> 0b eb fe 0f 0b 0f 1f 00 eb fb 48 8b 40 10 48 8b 10 e9 d0 fe 
      [  152.177629] RIP  [<ffffffff81184707>] cache_free_debugcheck+0x1c7/0x260
      [  152.177629]  RSP <ffff8801f8169960>
      

      The problem is at the bottom of lod_striped_it_next(). If do_index_try() or dio_it.init() fail then it->lit_it still points to the old (invalid) down layer iterator. In this case after next() fails, mdd_dir_is_empty() calls lod_striped_it_fini() which in turn calls osp_it_fini() on the already invalid osd_it_ea iterator.

              next->do_index_ops->dio_it.put(env, it->lit_it);
              next->do_index_ops->dio_it.fini(env, it->lit_it);
      
              rc = next->do_ops->do_index_try(env, next, &dt_directory_features);
              if (rc != 0)
                      RETURN(rc);
      
              next = lo->ldo_stripe[it->lit_stripe_index];
              LASSERT(next != NULL);
              LASSERT(next->do_index_ops != NULL);
      
              it_next = next->do_index_ops->dio_it.init(env, next, it->lit_attr,
                                                        BYPASS_CAPA);
              if (!IS_ERR(it_next)) {
                      it->lit_it = it_next;
                      goto again;
              } else {
                      rc = PTR_ERR(it_next);
              }
      
              RETURN(rc);
      }
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: