Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
Just had this failure on current master indicating there's some strange race on unmount/recovery:
[10141.926287] Lustre: DEBUG MARKER: == recovery-small test 50: failover MDS under load =================================================== 04:37:14 (1501317434) [10152.267289] Lustre: Failing over lustre-MDT0000 [10152.631207] BUG: spinlock already unlocked on CPU#0, loop1/31863 [10152.633002] lock: 0xffff8802f0848d30, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1 [10152.634247] CPU: 0 PID: 31863 Comm: loop1 Tainted: P OE ------------ 3.10.0-debug #2 [10152.635652] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [10152.636302] 0000000000000000 00000000cce3b7e9 ffff88009724bcd8 ffffffff816fd3e4 [10152.650263] ffff88009724bcf8 ffffffff816fd472 ffff8802f0848d30 ffffffff81aa1aa1 [10152.651754] ffff88009724bd18 ffffffff816fd498 ffff8802f0848d30 ffff8802f0848d30 [10152.653011] Call Trace: [10152.653618] [<ffffffff816fd3e4>] dump_stack+0x19/0x1b [10152.654241] [<ffffffff816fd472>] spin_dump+0x8c/0x91 [10152.654888] [<ffffffff816fd498>] spin_bug+0x21/0x26 [10152.655541] [<ffffffff8138ed0e>] do_raw_spin_unlock+0x8e/0x90 [10152.656204] [<ffffffff8170641c>] _raw_spin_unlock_irqrestore+0x2c/0x50 [10152.662772] [<ffffffff810af504>] __wake_up+0x44/0x50 [10152.663435] [<ffffffffa0b64131>] dio_complete_routine+0xf1/0x210 [osd_ldiskfs] [10152.669336] [<ffffffff8122a084>] bio_endio+0x64/0xa0 [10152.670008] [<ffffffffa012e01a>] loop_thread+0xfa/0x2d0 [loop] [10152.670697] [<ffffffff810a4070>] ? wake_up_atomic_t+0x30/0x30 [10152.671358] [<ffffffffa012df20>] ? do_bio_filebacked+0x330/0x330 [loop] [10152.672046] [<ffffffff810a2eba>] kthread+0xea/0xf0 [10152.672687] [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140 [10152.673355] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [10152.674103] [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140 [10152.802652] LustreError: 20975:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8802cf342c00 x1574243965118512/t0(0) o13->lustre-OST0001-osc-MDT0000@0@lo:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [10153.990105] LustreError: 20972:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8802cd351c00 x1574243965120016/t0(0) o13->lustre-OST0000-osc-MDT0000@0@lo:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [10158.883067] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) [10158.886333] Lustre: Skipped 2 previous similar messages [10163.882509] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) [10163.885202] Lustre: Skipped 2 previous similar messages [10168.882441] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) [10168.884575] Lustre: Skipped 2 previous similar messages [10173.882675] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) [10173.885062] Lustre: Skipped 2 previous similar messages [10177.987965] BUG: soft lockup - CPU#7 stuck for 22s! [umount:32081] [10177.988759] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate mbcache jbd2 syscopyarea sysfillrect sysimgblt ttm ata_generic drm_kms_helper pata_acpi drm ata_piix i2c_piix4 libata serio_raw virtio_balloon i2c_core virtio_console pcspkr virtio_blk floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs] [10177.999141] CPU: 7 PID: 32081 Comm: umount Tainted: P OE ------------ 3.10.0-debug #2 [10178.000367] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [10178.001014] task: ffff8802d0e14540 ti: ffff8802e7f78000 task.ti: ffff8802e7f78000 [10178.002199] RIP: 0010:[<ffffffff810ebfc2>] [<ffffffff810ebfc2>] smp_call_function_many+0x202/0x260 [10178.003473] RSP: 0018:ffff8802e7f7b9e0 EFLAGS: 00000202 [10178.004119] RAX: 0000000000000000 RBX: 000000003f597982 RCX: ffff88033e419180 [10178.004804] RDX: 0000000000000000 RSI: 0000000000000010 RDI: 0000000000000000 [10178.005478] RBP: ffff8802e7f7ba18 R08: ffff88032c099c00 R09: ffff880000000000 [10178.006167] R10: 0000000000000000 R11: ffff8802e7f7b87e R12: ffffffff81376679 [10178.006869] R13: 0000000000000286 R14: 000000000000002b R15: ffff8802c0bc5d10 [10178.007582] FS: 00007f4fcc6a6880(0000) GS:ffff88033e4e0000(0000) knlGS:0000000000000000 [10178.008839] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [10178.009518] CR2: 00007ffcb48db010 CR3: 00000000ac9df000 CR4: 00000000000006e0 [10178.010180] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [10178.010864] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [10178.011549] Stack: [10178.012221] 000000013f597982 ffffffff81d70260 ffff8802e7f78000 ffffffff81067430 [10178.013481] 0000000000000000 ffffc90008bb9000 0000000000000080 ffff8802e7f7ba30 [10178.014875] ffffffff810ec057 ffff8802eedf5f80 ffff8802e7f7ba60 ffffffff810ec08b [10178.016148] Call Trace: [10178.016745] [<ffffffff81067430>] ? flush_tlb_func+0xb0/0xb0 [10178.017397] [<ffffffff810ec057>] smp_call_function+0x37/0x40 [10178.018049] [<ffffffff810ec08b>] on_each_cpu+0x2b/0x70 [10178.018732] [<ffffffff81067869>] flush_tlb_kernel_range+0x59/0xa0 [10178.019384] [<ffffffff811b042b>] remove_vm_area+0x6b/0xa0 [10178.020022] [<ffffffff811b048e>] __vunmap+0x2e/0x120 [10178.020660] [<ffffffff811b0608>] vfree+0x48/0x90 [10178.021297] [<ffffffffa02a8b6b>] cfs_hash_buckets_free+0x19b/0x220 [libcfs] [10178.021990] [<ffffffffa02ab11b>] cfs_hash_putref+0x2ab/0x500 [libcfs] [10178.022669] [<ffffffff81379078>] ? kobject_release+0x98/0x1b0 [10178.023372] [<ffffffffa0552a2c>] ldlm_namespace_free_post+0x6c/0x2a0 [ptlrpc] [10178.026044] [<ffffffffa0c6ab8e>] mdt_device_fini+0x6fe/0xfb0 [mdt] [10178.026668] [<ffffffffa03b2b64>] class_cleanup+0x7b4/0xcf0 [obdclass] [10178.027138] [<ffffffffa03b50dd>] class_process_config+0x19cd/0x23b0 [obdclass] [10178.027972] [<ffffffff811cd4f9>] ? __kmalloc+0x649/0x660 [10178.028431] [<ffffffffa02a2cc7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [10178.028937] [<ffffffffa03b5c86>] class_manual_cleanup+0x1c6/0x6d0 [obdclass] [10178.029804] [<ffffffffa03e34de>] server_put_super+0x8ae/0xca0 [obdclass] [10178.030623] [<ffffffff811efa46>] generic_shutdown_super+0x56/0xe0 [10178.031460] [<ffffffff811efe22>] kill_anon_super+0x12/0x20 [10178.032244] [<ffffffffa03b83c2>] lustre_kill_super+0x32/0x50 [obdclass] [10178.033090] [<ffffffff811f0329>] deactivate_locked_super+0x49/0x60 [10178.034232] [<ffffffff811f0926>] deactivate_super+0x46/0x60 [10178.035006] [<ffffffff8120f115>] mntput_no_expire+0xc5/0x120 [10178.035693] [<ffffffff8121029f>] SyS_umount+0x9f/0x3c0 [10178.036324] [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b [10178.036972] Code: 48 63 35 96 58 c8 00 89 c2 39 f0 0f 8d 86 fe ff ff 48 98 49 8b 0f 48 03 0c c5 e0 38 d6 81 f6 41 20 01 74 cd 0f 1f 44 00 00 f3 90 <f6> 41 20 01 75 f8 48 63 35 65 58 c8 00 eb b7 0f b6 4d cc 4c 89 [10178.040955] Kernel panic - not syncing: softlockup: hung tasks [10178.041604] CPU: 7 PID: 32081 Comm: umount Tainted: P OEL ------------ 3.10.0-debug #2 [10178.042845] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [10178.043491] ffffffff81a6f14b 000000003f597982 ffff88033e4e3df8 ffffffff816fd3e4 [10178.044714] ffff88033e4e3e78 ffffffff816f8c34 0000000000000008 ffff88033e4e3e88 [10178.045936] ffff88033e4e3e28 000000003f597982 ffff88033e4e3e47 0000000000000000 [10178.047168] Call Trace: [10178.047745] <IRQ> [<ffffffff816fd3e4>] dump_stack+0x19/0x1b [10178.048400] [<ffffffff816f8c34>] panic+0xd8/0x1e7 [10178.049033] [<ffffffff81121f86>] watchdog_timer_fn+0x1b6/0x1c0 [10178.049683] [<ffffffff81121dd0>] ? watchdog_enable+0xc0/0xc0 [10178.050317] [<ffffffff810a74c9>] __hrtimer_run_queues+0xf9/0x2d0 [10178.050975] [<ffffffff810a7b70>] hrtimer_interrupt+0xb0/0x1e0 [10178.051753] [<ffffffff817115dc>] ? call_softirq+0x1c/0x30 [10178.052385] [<ffffffff81067430>] ? flush_tlb_func+0xb0/0xb0 [10178.053036] [<ffffffff8104ad97>] local_apic_timer_interrupt+0x37/0x60 [10178.053711] [<ffffffff817121cf>] smp_apic_timer_interrupt+0x3f/0x60 [10178.054361] [<ffffffff8171091d>] apic_timer_interrupt+0x6d/0x80 [10178.055014] <EOI> [<ffffffff810ebfc2>] ? smp_call_function_many+0x202/0x260 [10178.056743] [<ffffffff81067430>] ? flush_tlb_func+0xb0/0xb0 [10178.057374] [<ffffffff810ec057>] smp_call_function+0x37/0x40 [10178.058025] [<ffffffff810ec08b>] on_each_cpu+0x2b/0x70 [10178.058664] [<ffffffff81067869>] flush_tlb_kernel_range+0x59/0xa0 [10178.059310] [<ffffffff811b042b>] remove_vm_area+0x6b/0xa0 [10178.059955] [<ffffffff811b048e>] __vunmap+0x2e/0x120 [10178.060588] [<ffffffff811b0608>] vfree+0x48/0x90 [10178.061228] [<ffffffffa02a8b6b>] cfs_hash_buckets_free+0x19b/0x220 [libcfs] [10178.061915] [<ffffffffa02ab11b>] cfs_hash_putref+0x2ab/0x500 [libcfs] [10178.062581] [<ffffffff81379078>] ? kobject_release+0x98/0x1b0 [10178.063303] [<ffffffffa0552a2c>] ldlm_namespace_free_post+0x6c/0x2a0 [ptlrpc] [10178.064530] [<ffffffffa0c6ab8e>] mdt_device_fini+0x6fe/0xfb0 [mdt] [10178.065233] [<ffffffffa03b2b64>] class_cleanup+0x7b4/0xcf0 [obdclass] [10178.065917] [<ffffffffa03b50dd>] class_process_config+0x19cd/0x23b0 [obdclass] [10178.067241] [<ffffffff811cd4f9>] ? __kmalloc+0x649/0x660 [10178.067908] [<ffffffffa02a2cc7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [10178.068604] [<ffffffffa03b5c86>] class_manual_cleanup+0x1c6/0x6d0 [obdclass] [10178.069298] [<ffffffffa03e34de>] server_put_super+0x8ae/0xca0 [obdclass] [10178.069980] [<ffffffff811efa46>] generic_shutdown_super+0x56/0xe0 [10178.070683] [<ffffffff811efe22>] kill_anon_super+0x12/0x20 [10178.071457] [<ffffffffa03b83c2>] lustre_kill_super+0x32/0x50 [obdclass] [10178.072271] [<ffffffff811f0329>] deactivate_locked_super+0x49/0x60 [10178.073072] [<ffffffff811f0926>] deactivate_super+0x46/0x60 [10178.073776] [<ffffffff8120f115>] mntput_no_expire+0xc5/0x120 [10178.074410] [<ffffffff8121029f>] SyS_umount+0x9f/0x3c0 [10178.076646] [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
Crashdump adn modules in /exports/crashdumps/"192.168.123.150-2017-07-29-04:37:56" on onylx-68