Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.4.0
-
None
-
3
-
4370
Description
Ater recent landings of quota code the first unmount by sanity.sh on my test node (8 cores, 10G RAM) locks up:
[23052.136007] BUG: soft lockup - CPU#5 stuck for 67s! [umount:19443] [23052.136988] Modules linked in: lustre obdfilter osp lod ost mdt osd_ldiskfs f sfilt_ldiskfs ldiskfs mdd mds mgs lquota obdecho mgc lov osc mdc lmv fid fld ptl rpc obdclass lvfs ksocklnd lnet libcfs ext2 exportfs jbd sha512_generic sha256_generic sunrpc ipv6 microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 m bcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_p iix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] [23052.137256] CPU 5[23052.137256] Modules linked in: lustre obdfilter osp lod ost mdt osd_ldiskfs f sfilt_ldiskfs ldiskfs mdd mds mgs lquota obdecho mgc lov osc mdc lmv fid fld ptl rpc obdclass lvfs ksocklnd lnet libcfs ext2 exportfs jbd sha512_generic sha256_generic sunrpc ipv6 microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 m bcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_p iix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] [23052.137256] [23052.137256] Pid: 19443, comm: umount Tainted: G W --------------- 2.6.32-debug #5 Bochs Bochs[23052.137256] RIP: 0010:[<ffffffffa078c163>] [<ffffffffa078c163>] lprocfs_remo ve_nolock+0x23/0x140 [obdclass] [23052.137256] RSP: 0018:ffff880158fe18c8 EFLAGS: 00010286 [23052.137256] RAX: ffffffffa0814020 RBX: ffff880158fe18f8 RCX: 0000000000000000[23052.137256] RDX: 0000000000000000 RSI: 0000000000000030 RDI: ffff8802187f9988 [23052.137256] RBP: ffffffff8100bc0e R08: ffff880158fe1908 R09: 0000000000000059 [23052.137256] R10: 0000000000000001 R11: 0000000087654321 R12: ffff880158fe19d8 [23052.137256] R13: ffff8802187f98a8 R14: ffff88027a589df0 R15: ffff88027a589ee8 [23052.137256] FS: 00007fca35fa7740(0000) GS:ffff880028340000(0000) knlGS:0000000000000000 [23052.137256] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [23052.137256] CR2: ffff8802187faf78 CR3: 0000000158247000 CR4: 00000000000006e0 [23052.137256] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [23052.137256] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [23052.137256] Process umount (pid: 19443, threadinfo ffff880158fe0000, task ffff8801b4d4c2c0) [23052.137256] Stack: [23052.137256] ffff880100000010 0000000000000246 ffff8802187f9988 ffff880158fe19d8 [23052.137256] <d> ffff8802187f98a8 ffff88027a589df0 ffff880158fe1918 ffffffffa078c3d5 [23052.137256] <d> ffff880158fe1908 ffff8802187f9930 ffff880158fe1948 ffffffffa046dede [23052.137256] Call Trace: [23052.137256] [<ffffffffa078c3d5>] ? lprocfs_remove+0x25/0x40 [obdclass] [23052.137256] [<ffffffffa046dede>] ? qmt_pool_free+0x3e/0x240 [lquota] [23052.137256] [<ffffffffa046e2fc>] ? qmt_pool_fini+0xbc/0x240 [lquota] [23052.137256] [<ffffffffa0468b00>] ? qmt_device_fini+0x140/0x500 [lquota] [23052.137256] [<ffffffffa07ad3c7>] ? class_cleanup+0x577/0xdc0 [obdclass] [23052.137256] [<ffffffffa07825dc>] ? class_name2dev+0x7c/0xf0 [obdclass] [23052.137256] [<ffffffffa07aecb5>] ? class_process_config+0x10a5/0x1ca0 [obdclass] [23052.137256] [<ffffffffa04bc118>] ? libcfs_log_return+0x28/0x40 [libcfs] [23052.137256] [<ffffffffa07a8671>] ? lustre_cfg_new+0x391/0x7e0 [obdclass] [23052.137256] [<ffffffffa07afa29>] ? class_manual_cleanup+0x179/0x6e0 [obdclass] [23052.137256] [<ffffffffa04bc118>] ? libcfs_log_return+0x28/0x40 [libcfs] [23052.137256] [<ffffffffa046833e>] ? qmt_device_obd_disconnect+0xee/0x130 [lquota] [23052.137256] [<ffffffffa0a3f4fe>] ? mdt_quota_fini+0xee/0x410 [mdt] [23052.137256] [<ffffffffa0a40c54>] ? mdt_device_fini+0x74/0x500 [mdt] [23052.137256] [<ffffffffa07ad3c7>] ? class_cleanup+0x577/0xdc0 [obdclass] [23052.137256] [<ffffffffa07825dc>] ? class_name2dev+0x7c/0xf0 [obdclass] [23052.137256] [<ffffffffa07aecb5>] ? class_process_config+0x10a5/0x1ca0 [obdclass] [23052.137256] [<ffffffffa04bc118>] ? libcfs_log_return+0x28/0x40 [libcfs] [23052.137256] [<ffffffffa07a8671>] ? lustre_cfg_new+0x391/0x7e0 [obdclass] [23052.137256] [<ffffffffa07afa29>] ? class_manual_cleanup+0x179/0x6e0 [obdclass] [23052.137256] [<ffffffffa07825dc>] ? class_name2dev+0x7c/0xf0 [obdclass] [23052.137256] [<ffffffffa07bbf9c>] ? server_put_super+0x5ec/0x1230 [obdclass] [23052.137256] [<ffffffff8117fabb>] ? generic_shutdown_super+0x5b/0xe0 [23052.137256] [<ffffffff8117fba6>] ? kill_anon_super+0x16/0x60 [23052.137256] [<ffffffffa07b1846>] ? lustre_kill_super+0x36/0x60 [obdclass] [23052.137256] [<ffffffff81180c35>] ? deactivate_super+0x85/0xa0 [23052.137256] [<ffffffff8119ccaf>] ? mntput_no_expire+0xbf/0x110 [23052.137256] [<ffffffff8119d75b>] ? sys_umount+0x7b/0x3a0 [23052.137256] [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b [23052.137256] Code: 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 56 41 55 41 54 53 48 83 ec 10 0f 1f 44 00 00 48 8b 1f 48 85 db 74 60 48 c7 07 00 00 00 00 <4c> 8b 73 48 4d 85 f6 75 0f e9 c4 00 00 00 0f 1f 80 00 00 00 00 [23052.137256] Call Trace: [23052.137256] [<ffffffffa04c16d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [23052.137256] [<ffffffffa078c3d5>] ? lprocfs_remove+0x25/0x40 [obdclass] [23052.137256] [<ffffffffa046dede>] ? qmt_pool_free+0x3e/0x240 [lquota] [23052.137256] [<ffffffffa046e2fc>] ? qmt_pool_fini+0xbc/0x240 [lquota] [23052.137256] [<ffffffffa0468b00>] ? qmt_device_fini+0x140/0x500 [lquota] [23052.137256] [<ffffffffa07ad3c7>] ? class_cleanup+0x577/0xdc0 [obdclass] [23052.137256] [<ffffffffa07825dc>] ? class_name2dev+0x7c/0xf0 [obdclass] [23052.137256] [<ffffffffa07aecb5>] ? class_process_config+0x10a5/0x1ca0 [obdclass] [23052.137256] [<ffffffffa04bc118>] ? libcfs_log_return+0x28/0x40 [libcfs] [23052.137256] [<ffffffffa07a8671>] ? lustre_cfg_new+0x391/0x7e0 [obdclass] [23052.137256] [<ffffffffa07afa29>] ? class_manual_cleanup+0x179/0x6e0 [obdclass] [23052.137256] [<ffffffffa04bc118>] ? libcfs_log_return+0x28/0x40 [libcfs] [23052.137256] [<ffffffffa046833e>] ? qmt_device_obd_disconnect+0xee/0x130 [lquota] [23052.137256] [<ffffffffa0a3f4fe>] ? mdt_quota_fini+0xee/0x410 [mdt] [23052.137256] [<ffffffffa0a40c54>] ? mdt_device_fini+0x74/0x500 [mdt] [23052.137256] [<ffffffffa07ad3c7>] ? class_cleanup+0x577/0xdc0 [obdclass] [23052.137256] [<ffffffffa07825dc>] ? class_name2dev+0x7c/0xf0 [obdclass] [23052.137256] [<ffffffffa07aecb5>] ? class_process_config+0x10a5/0x1ca0 [obdclass] [23052.137256] [<ffffffffa04bc118>] ? libcfs_log_return+0x28/0x40 [libcfs] [23052.137256] [<ffffffffa07a8671>] ? lustre_cfg_new+0x391/0x7e0 [obdclass] [23052.137256] [<ffffffffa07afa29>] ? class_manual_cleanup+0x179/0x6e0 [obdclass] [23052.137256] [<ffffffffa07825dc>] ? class_name2dev+0x7c/0xf0 [obdclass] [23052.137256] [<ffffffffa07bbf9c>] ? server_put_super+0x5ec/0x1230 [obdclass] [23052.137256] [<ffffffff8117fabb>] ? generic_shutdown_super+0x5b/0xe0 [23052.137256] [<ffffffff8117fba6>] ? kill_anon_super+0x16/0x60 [23052.137256] [<ffffffffa07b1846>] ? lustre_kill_super+0x36/0x60 [obdclass] [23052.137256] [<ffffffff81180c35>] ? deactivate_super+0x85/0xa0 [23052.137256] [<ffffffff8119ccaf>] ? mntput_no_expire+0xbf/0x110 [23052.137256] [<ffffffff8119d75b>] ? sys_umount+0x7b/0x3a0 [23052.137256] [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
much later followed by:
[26328.160006] BUG: soft lockup - CPU#7 stuck for 68s! [rsyslogd:1134][26328.160007] Modules linked in: lustre obdfilter osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mds mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs ext2 exportfs jbd sha512_generic sha256_generic sunrpc ipv6 microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_p iix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] [26328.160017] CPU 7[26328.160018] Modules linked in: lustre obdfilter osp lod ost mdt osd_ldiskfs f sfilt_ldiskfs ldiskfs mdd mds mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs ext2 exportfs jbd sha512_generic sha256_g eneric sunrpc ipv6 microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 m bcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_p iix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] [26328.160027] [26328.160028] Pid: 1134, comm: rsyslogd Tainted: G W --------------- 2.6.32-debug #5 Bochs Bochs [26328.160029] RIP: 0010:[<ffffffff8104873a>] [<ffffffff8104873a>] flush_tlb_others_ipi+0x11a/0x130 [26328.160033] RSP: 0018:ffff88027ab71d68 EFLAGS: 00000246 [26328.160034] RAX: 0000000000000000 RBX: ffff88027ab71da8 RCX: 0000000000000008 [26328.160034] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff81e09368 [26328.160035] RBP: ffffffff8100bc0e R08: 0000000000000000 R09: 0000000000000008 [26328.160036] R10: 0000000000000000 R11: 00007f9700000000 R12: ffff88027ab71d58 [26328.160037] R13: ffffffff8100bc0e R14: 0000000000000000 R15: ffff88027a02df28 [26328.160038] FS: 00007f970ffff700(0000) GS:ffff8800283c0000(0000) knlGS:0000000000000000 [26328.160039] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [26328.160040] CR2: 00000000006dee5c CR3: 000000025a13e000 CR4: 00000000000006e0 [26328.160043] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [26328.160045] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [26328.160046] Process rsyslogd (pid: 1134, threadinfo ffff88027ab70000, task ffff88025a1c4580) [26328.160047] Stack: [26328.160048] ffffffffffffffff ffffffff81e09340 00007f970004e000 ffff8802674b2a30 [26328.160049] <d> ffffffffffffffff ffff8802674b2d18 00007f970004f000 ffff880259c9d000 [26328.160050] <d> ffff88027ab71dd8 ffffffff810487c6 ffff88027ab71dd8 ffff8802674b2a30 [26328.160052] Call Trace: [26328.160054] [<ffffffff810487c6>] ? native_flush_tlb_others+0x76/0x90 [26328.160056] [<ffffffff8104899c>] ? flush_tlb_mm+0x5c/0xa0 [26328.160058] [<ffffffff81144ce0>] ? mprotect_fixup+0x680/0x820 [26328.160060] [<ffffffff8108fd60>] ? autoremove_wake_function+0x0/0x40 [26328.160062] [<ffffffff81145005>] ? sys_mprotect+0x185/0x250 [26328.160064] [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b [26328.160065] Code: f8 c9 c3 66 0f 1f 44 00 00 48 8b 05 e1 c9 b5 00 41 8d b5 f0 00 00 00 4c 89 e7 ff 90 e0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 <8b> 35 84 39 b5 00 4c 89 e7 e8 a8 73 23 00 85 c0 74 ec eb 90 66 [26328.160074] Call Trace: [26328.160075] [<ffffffff81048748>] ? flush_tlb_others_ipi+0x128/0x130 [26328.160077] [<ffffffff810487c6>] ? native_flush_tlb_others+0x76/0x90 [26328.160078] [<ffffffff8104899c>] ? flush_tlb_mm+0x5c/0xa0 [26328.160080] [<ffffffff81144ce0>] ? mprotect_fixup+0x680/0x820 [26328.160081] [<ffffffff8108fd60>] ? autoremove_wake_function+0x0/0x40 [26328.160083] [<ffffffff81145005>] ? sys_mprotect+0x185/0x250 [26328.160085] [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b