Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9791

When umount client, kobject_put crashed the kernel

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.11.0
    • Lustre 2.11.0
    • 3
    • 9223372036854775807

    Description

      I was testing the latest master branch (b2c8846, LU-6210 utils: Use C99 struct initializer for long_opt_start). All the Lustre servers and client runs on the same host. And when I umount the client. The kobject_put() crashed the kernel.

       

      [  118.118013] -----------[ cut here ]-----------
      [  118.118025] WARNING: at lib/kobject.c:612 kobject_put+0x50/0x60()
      [  118.118028] kobject: '(null)' (ffff88001240eec0): is not initialized, yet kobject_put() is being called.
      [  118.118030] Modules linked in: osc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_mod ppdev sg pcspkr virtio_balloon i2c_piix4 i2c_core parport_pc parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi virtio_scsi virtio_net virtio_blk ata_piix serio_raw virtio_pci virtio_ring virtio libata floppy
      [  118.118140] CPU: 1 PID: 9487 Comm: umount Tainted: G           OE  ------------   3.10.0-514.26.2.el7_lustre.2.10.50_69_g8793c5b.x86_64 #1
      [  118.118144] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  118.118151]  ffff88001203f9e0 00000000ac9255be ffff88001203f998 ffffffff81687383
      [  118.118156]  ffff88001203f9d0 ffffffff81085cb0 ffff88001240eec0 ffff8800123a1000
      [  118.118160]  ffff88001240e410 ffff880012219300 ffff88001212d800 ffff88001203fa38
      [  118.118165] Call Trace:
      [  118.118174]  [<ffffffff81687383>] dump_stack+0x19/0x1b
      [  118.118180]  [<ffffffff81085cb0>] warn_slowpath_common+0x70/0xb0
      [  118.118184]  [<ffffffff81085d4c>] warn_slowpath_fmt+0x5c/0x80
      [  118.118188]  [<ffffffff8131aee0>] kobject_put+0x50/0x60
      [  118.118239]  [<ffffffffa04d1596>] lprocfs_obd_cleanup+0x56/0x70 [obdclass]
      [  118.118252]  [<ffffffffa0f9dcc7>] osc_precleanup+0xe7/0x2c0 [osc]
      [  118.118295]  [<ffffffffa04e4f91>] class_cleanup+0x2a1/0xcf0 [obdclass]
      [  118.118334]  [<ffffffffa04e79e2>] class_process_config+0x1992/0x23f0 [obdclass]
      [  118.118352]  [<ffffffffa0dfe9c5>] ? lov_putref+0x2f5/0xa80 [lov]
      [  118.118370]  [<ffffffffa03a2b97>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [  118.118408]  [<ffffffffa04e8606>] class_manual_cleanup+0x1c6/0x710 [obdclass]
      [  118.118421]  [<ffffffffa0dfe9d2>] lov_putref+0x302/0xa80 [lov]
      [  118.118434]  [<ffffffffa0e05d92>] lov_disconnect+0x172/0x420 [lov]
      [  118.118461]  [<ffffffffa0ecc853>] obd_disconnect+0xb3/0x330 [lustre]
      [  118.118483]  [<ffffffffa0ecfc90>] ll_put_super+0x610/0xaa0 [lustre]
      [  118.118490]  [<ffffffff81138fcd>] ? call_rcu_sched+0x1d/0x20
      [  118.118531]  [<ffffffffa0efa30c>] ? ll_destroy_inode+0x1c/0x20 [lustre]
      [  118.118538]  [<ffffffff8121a8f8>] ? destroy_inode+0x38/0x60
      [  118.118542]  [<ffffffff8121aa26>] ? evict+0x106/0x170
      [  118.118546]  [<ffffffff8121aace>] ? dispose_list+0x3e/0x50
      [  118.118550]  [<ffffffff8121b724>] ? evict_inodes+0x114/0x140
      [  118.118557]  [<ffffffff81200f72>] generic_shutdown_super+0x72/0xf0
      [  118.118562]  [<ffffffff81201342>] kill_anon_super+0x12/0x20
      [  118.118602]  [<ffffffffa04eaf15>] lustre_kill_super+0x45/0x50 [obdclass]
      [  118.118607]  [<ffffffff812016f9>] deactivate_locked_super+0x49/0x60
      [  118.118611]  [<ffffffff81201cf6>] deactivate_super+0x46/0x60
      [  118.118616]  [<ffffffff8121f145>] mntput_no_expire+0xc5/0x120
      [  118.118622]  [<ffffffff81220280>] SyS_umount+0xa0/0x3b0
      [  118.118627]  [<ffffffff81697a49>] system_call_fastpath+0x16/0x1b
      [  118.118630] --[ end trace 8308964b9c22e228 ]--
      [  118.118641] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [  118.128816] IP: [<ffffffff81333c5b>] __list_add+0x1b/0xc0
      [  118.135215] PGD 0
      [  118.137535] Oops: 0000 1 SMP
      [  118.141290] Modules linked in: osc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_mod ppdev sg pcspkr virtio_balloon i2c_piix4 i2c_core parport_pc parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi virtio_scsi virtio_net virtio_blk ata_piix serio_raw virtio_pci virtio_ring virtio libata floppy
      [  118.191875] CPU: 1 PID: 9487 Comm: umount Tainted: G        W  OE  ------------   3.10.0-514.26.2.el7_lustre.2.10.50_69_g8793c5b.x86_64 #1
      [  118.201778] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  118.206430] task: ffff88003a2bbec0 ti: ffff88001203c000 task.ti: ffff88001203c000
      [  118.212428] RIP: 0010:[<ffffffff81333c5b>]  [<ffffffff81333c5b>] __list_add+0x1b/0xc0
      [  118.218697] RSP: 0018:ffff88001203f9d8  EFLAGS: 00010046
      [  118.222891] RAX: ffff88001203fa00 RBX: ffff88001203fa18 RCX: ffff88001203ffd8
      [  118.228673] RDX: ffff88001240ef10 RSI: 0000000000000000 RDI: ffff88001203fa18
      [  118.234425] RBP: ffff88001203f9f0 R08: 0000000000000000 R09: 0000000000000259
      [  118.240045] R10: 0000000000000000 R11: ffff88001203f696 R12: ffff88001240ef10
      [  118.245753] R13: 0000000000000000 R14: ffff88003a2bbec0 R15: ffff88001212d800
      [  118.251453] FS:  00007f592e03f880(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
      [  118.257952] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  118.262534] CR2: 0000000000000000 CR3: 0000000039b90000 CR4: 00000000000006e0
      [  118.268281] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  118.274125] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  118.279871] Stack:
      [  118.281497]  ffff88001240ef00 ffff88001240ef08 7fffffffffffffff ffff88001203fa50
      [  118.287490]  ffffffff8168cdfb 0000000000000001 ffff88003a2bbec0 ffffffff810c54e0
      [  118.293586]  0000000000000000 0000000000000000 00000000ac9255be ffff88001240df40
      [  118.299707] Call Trace:
      [  118.301684]  [<ffffffff8168cdfb>] wait_for_completion+0xeb/0x170
      [  118.306465]  [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20
      [  118.311048]  [<ffffffffa04d15a2>] lprocfs_obd_cleanup+0x62/0x70 [obdclass]
      [  118.316568]  [<ffffffffa0f9dcc7>] osc_precleanup+0xe7/0x2c0 [osc]
      [  118.321477]  [<ffffffffa04e4f91>] class_cleanup+0x2a1/0xcf0 [obdclass]
      [  118.326709]  [<ffffffffa04e79e2>] class_process_config+0x1992/0x23f0 [obdclass]
      [  118.332633]  [<ffffffffa0dfe9c5>] ? lov_putref+0x2f5/0xa80 [lov]
      [  118.337354]  [<ffffffffa03a2b97>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [  118.342771]  [<ffffffffa04e8606>] class_manual_cleanup+0x1c6/0x710 [obdclass]
      [  118.348450]  [<ffffffffa0dfe9d2>] lov_putref+0x302/0xa80 [lov]
      [  118.353088]  [<ffffffffa0e05d92>] lov_disconnect+0x172/0x420 [lov]
      [  118.357984]  [<ffffffffa0ecc853>] obd_disconnect+0xb3/0x330 [lustre]
      [  118.363140]  [<ffffffffa0ecfc90>] ll_put_super+0x610/0xaa0 [lustre]
      [  118.368130]  [<ffffffff81138fcd>] ? call_rcu_sched+0x1d/0x20
      [  118.372727]  [<ffffffffa0efa30c>] ? ll_destroy_inode+0x1c/0x20 [lustre]
      [  118.377968]  [<ffffffff8121a8f8>] ? destroy_inode+0x38/0x60
      [  118.382433]  [<ffffffff8121aa26>] ? evict+0x106/0x170
      [  118.386409]  [<ffffffff8121aace>] ? dispose_list+0x3e/0x50
      [  118.390857]  [<ffffffff8121b724>] ? evict_inodes+0x114/0x140
      [  118.395288]  [<ffffffff81200f72>] generic_shutdown_super+0x72/0xf0
      [  118.400239]  [<ffffffff81201342>] kill_anon_super+0x12/0x20
      [  118.404689]  [<ffffffffa04eaf15>] lustre_kill_super+0x45/0x50 [obdclass]
      [  118.409950]  [<ffffffff812016f9>] deactivate_locked_super+0x49/0x60
      [  118.414997]  [<ffffffff81201cf6>] deactivate_super+0x46/0x60
      [  118.419472]  [<ffffffff8121f145>] mntput_no_expire+0xc5/0x120
      [  118.424018]  [<ffffffff81220280>] SyS_umount+0xa0/0x3b0
      [  118.428227]  [<ffffffff81697a49>] system_call_fastpath+0x16/0x1b
      [  118.433054] Code: ff e9 3b ff ff ff b8 f4 ff ff ff e9 31 ff ff ff 55 48 89 e5 41 55 49 89 f5 41 54 49 89 d4 53 4c 8b 42 08 48 89 fb 49 39 f0 75 2a <4d> 8b 45 00 4d 39 c4 75 68 4c 39 e3 74 3e 4c 39 eb 74 39 49 89
      [  118.450312] RIP  [<ffffffff81333c5b>] __list_add+0x1b/0xc0
      [  118.454795]  RSP <ffff88001203f9d8>
      [  118.457571] CR2: 0000000000000000

      If the client runs on a seperate host, everything will be fine.

      Attachments

        Issue Links

          Activity

            People

              jhammond John Hammond
              lixi Li Xi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: