Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2092

cpu lockup on lustre umount

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.4.0
    • Lustre 2.4.0
    • None
    • 3
    • 4370

    Description

      Ater recent landings of quota code the first unmount by sanity.sh on my test node (8 cores, 10G RAM) locks up:

      [23052.136007] BUG: soft lockup - CPU#5 stuck for 67s! [umount:19443]
      [23052.136988] Modules linked in: lustre obdfilter osp lod ost mdt osd_ldiskfs f
      sfilt_ldiskfs ldiskfs mdd mds mgs lquota obdecho mgc lov osc mdc lmv fid fld ptl
      rpc obdclass lvfs ksocklnd lnet libcfs ext2 exportfs jbd sha512_generic sha256_generic sunrpc ipv6 microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 m
      bcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_p
      iix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
      [23052.137256] CPU 5[23052.137256] Modules linked in: lustre obdfilter osp lod ost mdt osd_ldiskfs f
      sfilt_ldiskfs ldiskfs mdd mds mgs lquota obdecho mgc lov osc mdc lmv fid fld ptl
      rpc obdclass lvfs ksocklnd lnet libcfs ext2 exportfs jbd sha512_generic sha256_generic sunrpc ipv6 microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 m
      bcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_p
      iix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
      [23052.137256]
      [23052.137256] Pid: 19443, comm: umount Tainted: G        W  ---------------    
      2.6.32-debug #5 Bochs Bochs[23052.137256] RIP: 0010:[<ffffffffa078c163>]  [<ffffffffa078c163>] lprocfs_remo
      ve_nolock+0x23/0x140 [obdclass]
      [23052.137256] RSP: 0018:ffff880158fe18c8  EFLAGS: 00010286
      [23052.137256] RAX: ffffffffa0814020 RBX: ffff880158fe18f8 RCX: 0000000000000000[23052.137256] RDX: 0000000000000000 RSI: 0000000000000030 RDI: ffff8802187f9988
      [23052.137256] RBP: ffffffff8100bc0e R08: ffff880158fe1908 R09: 0000000000000059
      [23052.137256] R10: 0000000000000001 R11: 0000000087654321 R12: ffff880158fe19d8
      [23052.137256] R13: ffff8802187f98a8 R14: ffff88027a589df0 R15: ffff88027a589ee8
      [23052.137256] FS:  00007fca35fa7740(0000) GS:ffff880028340000(0000) knlGS:0000000000000000
      [23052.137256] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [23052.137256] CR2: ffff8802187faf78 CR3: 0000000158247000 CR4: 00000000000006e0
      [23052.137256] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [23052.137256] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [23052.137256] Process umount (pid: 19443, threadinfo ffff880158fe0000, task ffff8801b4d4c2c0)
      [23052.137256] Stack:
      [23052.137256]  ffff880100000010 0000000000000246 ffff8802187f9988 ffff880158fe19d8
      [23052.137256] <d> ffff8802187f98a8 ffff88027a589df0 ffff880158fe1918 ffffffffa078c3d5
      [23052.137256] <d> ffff880158fe1908 ffff8802187f9930 ffff880158fe1948 ffffffffa046dede
      [23052.137256] Call Trace:
      [23052.137256]  [<ffffffffa078c3d5>] ? lprocfs_remove+0x25/0x40 [obdclass]
      [23052.137256]  [<ffffffffa046dede>] ? qmt_pool_free+0x3e/0x240 [lquota]
      [23052.137256]  [<ffffffffa046e2fc>] ? qmt_pool_fini+0xbc/0x240 [lquota]
      [23052.137256]  [<ffffffffa0468b00>] ? qmt_device_fini+0x140/0x500 [lquota]
      [23052.137256]  [<ffffffffa07ad3c7>] ? class_cleanup+0x577/0xdc0 [obdclass]
      [23052.137256]  [<ffffffffa07825dc>] ? class_name2dev+0x7c/0xf0 [obdclass]
      [23052.137256]  [<ffffffffa07aecb5>] ? class_process_config+0x10a5/0x1ca0 [obdclass]
      [23052.137256]  [<ffffffffa04bc118>] ? libcfs_log_return+0x28/0x40 [libcfs]
      [23052.137256]  [<ffffffffa07a8671>] ? lustre_cfg_new+0x391/0x7e0 [obdclass]
      [23052.137256]  [<ffffffffa07afa29>] ? class_manual_cleanup+0x179/0x6e0 [obdclass]
      [23052.137256]  [<ffffffffa04bc118>] ? libcfs_log_return+0x28/0x40 [libcfs]
      [23052.137256]  [<ffffffffa046833e>] ? qmt_device_obd_disconnect+0xee/0x130 [lquota]
      [23052.137256]  [<ffffffffa0a3f4fe>] ? mdt_quota_fini+0xee/0x410 [mdt]
      [23052.137256]  [<ffffffffa0a40c54>] ? mdt_device_fini+0x74/0x500 [mdt]
      [23052.137256]  [<ffffffffa07ad3c7>] ? class_cleanup+0x577/0xdc0 [obdclass]
      [23052.137256]  [<ffffffffa07825dc>] ? class_name2dev+0x7c/0xf0 [obdclass]
      [23052.137256]  [<ffffffffa07aecb5>] ? class_process_config+0x10a5/0x1ca0 [obdclass]
      [23052.137256]  [<ffffffffa04bc118>] ? libcfs_log_return+0x28/0x40 [libcfs]
      [23052.137256]  [<ffffffffa07a8671>] ? lustre_cfg_new+0x391/0x7e0 [obdclass]
      [23052.137256]  [<ffffffffa07afa29>] ? class_manual_cleanup+0x179/0x6e0 [obdclass]
      [23052.137256]  [<ffffffffa07825dc>] ? class_name2dev+0x7c/0xf0 [obdclass]
      [23052.137256]  [<ffffffffa07bbf9c>] ? server_put_super+0x5ec/0x1230 [obdclass]
      [23052.137256]  [<ffffffff8117fabb>] ? generic_shutdown_super+0x5b/0xe0
      [23052.137256]  [<ffffffff8117fba6>] ? kill_anon_super+0x16/0x60
      [23052.137256]  [<ffffffffa07b1846>] ? lustre_kill_super+0x36/0x60 [obdclass]
      [23052.137256]  [<ffffffff81180c35>] ? deactivate_super+0x85/0xa0
      [23052.137256]  [<ffffffff8119ccaf>] ? mntput_no_expire+0xbf/0x110
      [23052.137256]  [<ffffffff8119d75b>] ? sys_umount+0x7b/0x3a0
      [23052.137256]  [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
      [23052.137256] Code: 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 56 41 55 41 54 53 48 83 ec 10 0f 1f 44 00 00 48 8b 1f 48 85 db 74 60 48 c7 07 00 00 00 00 <4c> 8b 73 48 4d 85 f6 75 0f e9 c4 00 00 00 0f 1f 80 00 00 00 00
      [23052.137256] Call Trace:
      [23052.137256]  [<ffffffffa04c16d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      [23052.137256]  [<ffffffffa078c3d5>] ? lprocfs_remove+0x25/0x40 [obdclass]
      [23052.137256]  [<ffffffffa046dede>] ? qmt_pool_free+0x3e/0x240 [lquota]
      [23052.137256]  [<ffffffffa046e2fc>] ? qmt_pool_fini+0xbc/0x240 [lquota]
      [23052.137256]  [<ffffffffa0468b00>] ? qmt_device_fini+0x140/0x500 [lquota]
      [23052.137256]  [<ffffffffa07ad3c7>] ? class_cleanup+0x577/0xdc0 [obdclass]
      [23052.137256]  [<ffffffffa07825dc>] ? class_name2dev+0x7c/0xf0 [obdclass]
      [23052.137256]  [<ffffffffa07aecb5>] ? class_process_config+0x10a5/0x1ca0 [obdclass]
      [23052.137256]  [<ffffffffa04bc118>] ? libcfs_log_return+0x28/0x40 [libcfs]
      [23052.137256]  [<ffffffffa07a8671>] ? lustre_cfg_new+0x391/0x7e0 [obdclass]
      [23052.137256]  [<ffffffffa07afa29>] ? class_manual_cleanup+0x179/0x6e0 [obdclass]
      [23052.137256]  [<ffffffffa04bc118>] ? libcfs_log_return+0x28/0x40 [libcfs]
      [23052.137256]  [<ffffffffa046833e>] ? qmt_device_obd_disconnect+0xee/0x130 [lquota]
      [23052.137256]  [<ffffffffa0a3f4fe>] ? mdt_quota_fini+0xee/0x410 [mdt]
      [23052.137256]  [<ffffffffa0a40c54>] ? mdt_device_fini+0x74/0x500 [mdt]
      [23052.137256]  [<ffffffffa07ad3c7>] ? class_cleanup+0x577/0xdc0 [obdclass]
      [23052.137256]  [<ffffffffa07825dc>] ? class_name2dev+0x7c/0xf0 [obdclass]
      [23052.137256]  [<ffffffffa07aecb5>] ? class_process_config+0x10a5/0x1ca0 [obdclass]
      [23052.137256]  [<ffffffffa04bc118>] ? libcfs_log_return+0x28/0x40 [libcfs]
      [23052.137256]  [<ffffffffa07a8671>] ? lustre_cfg_new+0x391/0x7e0 [obdclass]
      [23052.137256]  [<ffffffffa07afa29>] ? class_manual_cleanup+0x179/0x6e0 [obdclass]
      [23052.137256]  [<ffffffffa07825dc>] ? class_name2dev+0x7c/0xf0 [obdclass]
      [23052.137256]  [<ffffffffa07bbf9c>] ? server_put_super+0x5ec/0x1230 [obdclass]
      [23052.137256]  [<ffffffff8117fabb>] ? generic_shutdown_super+0x5b/0xe0
      [23052.137256]  [<ffffffff8117fba6>] ? kill_anon_super+0x16/0x60
      [23052.137256]  [<ffffffffa07b1846>] ? lustre_kill_super+0x36/0x60 [obdclass]
      [23052.137256]  [<ffffffff81180c35>] ? deactivate_super+0x85/0xa0
      [23052.137256]  [<ffffffff8119ccaf>] ? mntput_no_expire+0xbf/0x110
      [23052.137256]  [<ffffffff8119d75b>] ? sys_umount+0x7b/0x3a0
      [23052.137256]  [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
      

      much later followed by:

      [26328.160006] BUG: soft lockup - CPU#7 stuck for 68s! [rsyslogd:1134][26328.160007] Modules linked in: lustre obdfilter osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mds mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs ext2 exportfs jbd sha512_generic sha256_generic sunrpc ipv6 microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_p
      iix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
      [26328.160017] CPU 7[26328.160018] Modules linked in: lustre obdfilter osp lod ost mdt osd_ldiskfs f
      sfilt_ldiskfs ldiskfs mdd mds mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs ext2 exportfs jbd sha512_generic sha256_g
      eneric sunrpc ipv6 microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 m
      bcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_p
      iix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
      [26328.160027]
      [26328.160028] Pid: 1134, comm: rsyslogd Tainted: G        W  ---------------    2.6.32-debug #5 Bochs Bochs
      [26328.160029] RIP: 0010:[<ffffffff8104873a>]  [<ffffffff8104873a>] flush_tlb_others_ipi+0x11a/0x130
      [26328.160033] RSP: 0018:ffff88027ab71d68  EFLAGS: 00000246
      [26328.160034] RAX: 0000000000000000 RBX: ffff88027ab71da8 RCX: 0000000000000008
      [26328.160034] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff81e09368
      [26328.160035] RBP: ffffffff8100bc0e R08: 0000000000000000 R09: 0000000000000008
      [26328.160036] R10: 0000000000000000 R11: 00007f9700000000 R12: ffff88027ab71d58
      [26328.160037] R13: ffffffff8100bc0e R14: 0000000000000000 R15: ffff88027a02df28
      [26328.160038] FS:  00007f970ffff700(0000) GS:ffff8800283c0000(0000) knlGS:0000000000000000
      [26328.160039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [26328.160040] CR2: 00000000006dee5c CR3: 000000025a13e000 CR4: 00000000000006e0
      [26328.160043] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [26328.160045] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [26328.160046] Process rsyslogd (pid: 1134, threadinfo ffff88027ab70000, task ffff88025a1c4580)
      [26328.160047] Stack:
      [26328.160048]  ffffffffffffffff ffffffff81e09340 00007f970004e000 ffff8802674b2a30
      [26328.160049] <d> ffffffffffffffff ffff8802674b2d18 00007f970004f000 ffff880259c9d000
      [26328.160050] <d> ffff88027ab71dd8 ffffffff810487c6 ffff88027ab71dd8 ffff8802674b2a30
      [26328.160052] Call Trace:
      [26328.160054]  [<ffffffff810487c6>] ? native_flush_tlb_others+0x76/0x90
      [26328.160056]  [<ffffffff8104899c>] ? flush_tlb_mm+0x5c/0xa0
      [26328.160058]  [<ffffffff81144ce0>] ? mprotect_fixup+0x680/0x820
      [26328.160060]  [<ffffffff8108fd60>] ? autoremove_wake_function+0x0/0x40
      [26328.160062]  [<ffffffff81145005>] ? sys_mprotect+0x185/0x250
      [26328.160064]  [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
      [26328.160065] Code: f8 c9 c3 66 0f 1f 44 00 00 48 8b 05 e1 c9 b5 00 41 8d b5 f0 00 00 00 4c 89 e7 ff 90 e0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 <8b> 35 84 39 b5 00 4c 89 e7 e8 a8 73 23 00 85 c0 74 ec eb 90 66
      [26328.160074] Call Trace:
      [26328.160075]  [<ffffffff81048748>] ? flush_tlb_others_ipi+0x128/0x130
      [26328.160077]  [<ffffffff810487c6>] ? native_flush_tlb_others+0x76/0x90
      [26328.160078]  [<ffffffff8104899c>] ? flush_tlb_mm+0x5c/0xa0
      [26328.160080]  [<ffffffff81144ce0>] ? mprotect_fixup+0x680/0x820
      [26328.160081]  [<ffffffff8108fd60>] ? autoremove_wake_function+0x0/0x40
      [26328.160083]  [<ffffffff81145005>] ? sys_mprotect+0x185/0x250
      [26328.160085]  [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: