Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11015

Do not free lov kobject twice

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Looks like commit 3c900918 / https://review.whamcloud.com/30960 introduced a bit of a problem where we can put lov_tgts_kobj twice while we are actually holding only one reference.

      The problem is in lov_putref:

              mutex_lock(&lov->lov_lock);
              /* ok to dec to 0 more than once -- ltd_exp's will be null */
              if (atomic_dec_and_test(&lov->lov_refcount) && lov->lov_death_row) {
                      struct list_head kill = LIST_HEAD_INIT(kill);
      ...
                      if (lov->lov_tgts_kobj)
                              kobject_put(lov->lov_tgts_kobj);
      
              } else {
      

      So when the listed condition of dec to 0 happens more than once we do the put twice (since we dont' zero out the lov_tgts_kobj pointer.

      Seems like we just need to set it to NULL there for things to work correctly.

      The crash looks something like this:

      [50173.869510] Lustre: DEBUG MARKER: == conf-sanity test 50g: deactivated OST should not cause panic ====================================== 11:18:14 (1525879094)
      [50199.609971] Lustre: DEBUG MARKER: centos-24.localnet: executing wait_import_state FULL osc.lustre-OST0001-osc-MDT0000.ost_server_uuid 40
      [50206.043193] Lustre: DEBUG MARKER: osc.lustre-OST0001-osc-MDT0000.ost_server_uuid in FULL state after 6 sec
      [50206.755766] Lustre: DEBUG MARKER: centos-24.localnet: executing wait_import_state FULL osc.lustre-OST0001-osc-ffff8802ca6c0800.ost_server_uuid 40
      [50206.930583] Lustre: DEBUG MARKER: osc.lustre-OST0001-osc-ffff8802ca6c0800.ost_server_uuid in FULL state after 0 sec
      [50207.193737] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200000bd0:0x1:0x0], use llapi_layout_get_by_path()
      [50207.245506] Lustre: Permanently deactivating lustre-OST0001
      [50209.744659] LustreError: 21378:0:(sec.c:407:import_sec_validate_get()) import ffff8802a452c000 (NEW) with no sec
      [50210.979738] LustreError: 21396:0:(lov_obd.c:826:lov_cleanup()) lustre-clilov-ffff8802a694c800: lov tgt 1 not cleaned! deathrow=0, lovrc=1
      [50211.000820] ------------[ cut here ]------------
      [50211.001549] WARNING: at lib/kobject.c:600 kobject_put+0x50/0x60()
      [50211.002347] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
      [50211.003054] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect sysimgblt ttm ata_generic drm_kms_helper pata_acpi drm ata_piix i2c_piix4 virtio_console pcspkr serio_raw virtio_balloon virtio_blk libata i2c_core floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
      [50211.009320] CPU: 3 PID: 21396 Comm: umount Tainted: P        W  OE  ------------   3.10.0-debug #2
      [50211.010550] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [50211.011246] task: ffff8802994c0a00 ti: ffff8802d93cc000 task.ti: ffff8802d93cc000
      [50211.012482] RIP: 0010:[<ffffffff8137faed>]  [<ffffffff8137faed>] strnlen+0xd/0x40
      [50211.013738] RSP: 0018:ffff8802d93cf830  EFLAGS: 00010086
      [50211.014402] RAX: ffffffff81a6dea9 RBX: ffffffff820d58ec RCX: fffffffffffffffe
      [50211.015091] RDX: 3a65727473756c2b RSI: ffffffffffffffff RDI: 3a65727473756c2b
      [50211.015798] RBP: ffff8802d93cf830 R08: 000000000000ffff R09: 000000000000ffff
      [50211.016593] R10: 00000000000000a0 R11: 0000000000000050 R12: 3a65727473756c2b
      [50211.017446] R13: ffffffff820d5cc0 R14: 00000000ffffffff R15: 0000000000000000
      [50211.018203] FS:  00007f593f158880(0000) GS:ffff88033e460000(0000) knlGS:0000000000000000
      [50211.019452] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [50211.020105] CR2: 000000000218f000 CR3: 000000029ed76000 CR4: 00000000000006e0
      [50211.020817] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [50211.023442] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [50211.024073] Stack:
      [50211.024501]  ffff8802d93cf868 ffffffff81381fbb ffffffff820d58ec ffffffff820d5cc0
      [50211.025405]  ffff8802d93cf9c0 ffffffff81aa0bbe ffffffff81aa0bbe ffff8802d93cf8d8
      [50211.026296]  ffffffff81383751 0000000000000246 ffffffffffffff10 000000000000000c
      [50211.027199] Call Trace:
      [50211.027628]  [<ffffffff81381fbb>] string.isra.7+0x3b/0xf0
      [50211.028088]  [<ffffffff81383751>] vsnprintf+0x201/0x6a0
      [50211.028551]  [<ffffffff81383bfd>] vscnprintf+0xd/0x30
      [50211.029017]  [<ffffffff81078766>] vprintk_emit+0x116/0x5a0
      [50211.029496]  [<ffffffff81378f20>] ? kobject_put+0x50/0x60
      [50211.029954]  [<ffffffff81378f20>] ? kobject_put+0x50/0x60
      [50211.030426]  [<ffffffff81078c0f>] vprintk+0x1f/0x30
      [50211.030877]  [<ffffffff8107623c>] warn_slowpath_common+0x5c/0xb0
      [50211.031356]  [<ffffffff810762ec>] warn_slowpath_fmt+0x5c/0x80
      [50211.031837]  [<ffffffff81378f20>] kobject_put+0x50/0x60
      [50211.032367]  [<ffffffffa08b93fe>] lov_putref+0x92e/0xac0 [lov]
      [50211.032855]  [<ffffffffa08c08ff>] lov_cleanup+0x27f/0x610 [lov]
      [50211.033416]  [<ffffffffa036d04a>] class_free_dev+0x1ea/0x650 [obdclass]
      [50211.033913]  [<ffffffffa036d690>] class_export_put+0x1e0/0x2b0 [obdclass]
      [50211.034426]  [<ffffffffa036f1e5>] class_unlink_export+0x135/0x170 [obdclass]
      [50211.034931]  [<ffffffffa0384600>] class_decref+0x80/0x160 [obdclass]
      [50211.035566]  [<ffffffffa0384a51>] class_detach+0x1b1/0x2e0 [obdclass]
      [50211.036429]  [<ffffffffa038b47d>] class_process_config+0x196d/0x27e0 [obdclass]
      [50211.037950]  [<ffffffffa01e36a7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [50211.038785]  [<ffffffffa038c4d0>] class_manual_cleanup+0x1e0/0x6d0 [obdclass]
      [50211.039780]  [<ffffffffa14bdef8>] ll_put_super+0x128/0x890 [lustre]
      [50211.040571]  [<ffffffff8112b4cd>] ? call_rcu_sched+0x1d/0x20
      [50211.041270]  [<ffffffffa14e69ec>] ? ll_destroy_inode+0x1c/0x20 [lustre]
      [50211.041971]  [<ffffffff8120a3a8>] ? destroy_inode+0x38/0x60
      [50211.042650]  [<ffffffff8120a4d6>] ? evict+0x106/0x170
      [50211.043348]  [<ffffffff8120a57e>] ? dispose_list+0x3e/0x50
      [50211.044088]  [<ffffffff8120b224>] ? evict_inodes+0x114/0x140
      [50211.044712]  [<ffffffff811efa46>] generic_shutdown_super+0x56/0xe0
      [50211.045205]  [<ffffffff811efe22>] kill_anon_super+0x12/0x20
      [50211.045691]  [<ffffffffa038ee25>] lustre_kill_super+0x45/0x50 [obdclass]
      [50211.046397]  [<ffffffff811f0329>] deactivate_locked_super+0x49/0x60
      [50211.047227]  [<ffffffff811f0926>] deactivate_super+0x46/0x60
      [50211.048048]  [<ffffffff8120f115>] mntput_no_expire+0xc5/0x120
      [50211.048834]  [<ffffffff8121029f>] SyS_umount+0x9f/0x3c0
      [50211.049764]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
      

      and we can clearly see the putref entering that path twice in the debug log.

      Attachments

        Activity

          People

            green Oleg Drokin
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: