Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9807

BUG: spinlock already unlocked on CPU#0, in recovery

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Just had this failure on current master indicating there's some strange race on unmount/recovery:

      [10141.926287] Lustre: DEBUG MARKER: == recovery-small test 50: failover MDS under load =================================================== 04:37:14 (1501317434)
      [10152.267289] Lustre: Failing over lustre-MDT0000
      [10152.631207] BUG: spinlock already unlocked on CPU#0, loop1/31863
      [10152.633002]  lock: 0xffff8802f0848d30, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1
      [10152.634247] CPU: 0 PID: 31863 Comm: loop1 Tainted: P           OE  ------------   3.10.0-debug #2
      [10152.635652] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [10152.636302]  0000000000000000 00000000cce3b7e9 ffff88009724bcd8 ffffffff816fd3e4
      [10152.650263]  ffff88009724bcf8 ffffffff816fd472 ffff8802f0848d30 ffffffff81aa1aa1
      [10152.651754]  ffff88009724bd18 ffffffff816fd498 ffff8802f0848d30 ffff8802f0848d30
      [10152.653011] Call Trace:
      [10152.653618]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
      [10152.654241]  [<ffffffff816fd472>] spin_dump+0x8c/0x91
      [10152.654888]  [<ffffffff816fd498>] spin_bug+0x21/0x26
      [10152.655541]  [<ffffffff8138ed0e>] do_raw_spin_unlock+0x8e/0x90
      [10152.656204]  [<ffffffff8170641c>] _raw_spin_unlock_irqrestore+0x2c/0x50
      [10152.662772]  [<ffffffff810af504>] __wake_up+0x44/0x50
      [10152.663435]  [<ffffffffa0b64131>] dio_complete_routine+0xf1/0x210 [osd_ldiskfs]
      [10152.669336]  [<ffffffff8122a084>] bio_endio+0x64/0xa0
      [10152.670008]  [<ffffffffa012e01a>] loop_thread+0xfa/0x2d0 [loop]
      [10152.670697]  [<ffffffff810a4070>] ? wake_up_atomic_t+0x30/0x30
      [10152.671358]  [<ffffffffa012df20>] ? do_bio_filebacked+0x330/0x330 [loop]
      [10152.672046]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [10152.672687]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
      [10152.673355]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [10152.674103]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
      [10152.802652] LustreError: 20975:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff8802cf342c00 x1574243965118512/t0(0) o13->lustre-OST0001-osc-MDT0000@0@lo:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      [10153.990105] LustreError: 20972:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff8802cd351c00 x1574243965120016/t0(0) o13->lustre-OST0000-osc-MDT0000@0@lo:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      [10158.883067] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
      [10158.886333] Lustre: Skipped 2 previous similar messages
      [10163.882509] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
      [10163.885202] Lustre: Skipped 2 previous similar messages
      [10168.882441] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
      [10168.884575] Lustre: Skipped 2 previous similar messages
      [10173.882675] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
      [10173.885062] Lustre: Skipped 2 previous similar messages
      [10177.987965] BUG: soft lockup - CPU#7 stuck for 22s! [umount:32081]
      [10177.988759] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate mbcache jbd2 syscopyarea sysfillrect sysimgblt ttm ata_generic drm_kms_helper pata_acpi drm ata_piix i2c_piix4 libata serio_raw virtio_balloon i2c_core virtio_console pcspkr virtio_blk floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
      [10177.999141] CPU: 7 PID: 32081 Comm: umount Tainted: P           OE  ------------   3.10.0-debug #2
      [10178.000367] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [10178.001014] task: ffff8802d0e14540 ti: ffff8802e7f78000 task.ti: ffff8802e7f78000
      [10178.002199] RIP: 0010:[<ffffffff810ebfc2>]  [<ffffffff810ebfc2>] smp_call_function_many+0x202/0x260
      [10178.003473] RSP: 0018:ffff8802e7f7b9e0  EFLAGS: 00000202
      [10178.004119] RAX: 0000000000000000 RBX: 000000003f597982 RCX: ffff88033e419180
      [10178.004804] RDX: 0000000000000000 RSI: 0000000000000010 RDI: 0000000000000000
      [10178.005478] RBP: ffff8802e7f7ba18 R08: ffff88032c099c00 R09: ffff880000000000
      [10178.006167] R10: 0000000000000000 R11: ffff8802e7f7b87e R12: ffffffff81376679
      [10178.006869] R13: 0000000000000286 R14: 000000000000002b R15: ffff8802c0bc5d10
      [10178.007582] FS:  00007f4fcc6a6880(0000) GS:ffff88033e4e0000(0000) knlGS:0000000000000000
      [10178.008839] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [10178.009518] CR2: 00007ffcb48db010 CR3: 00000000ac9df000 CR4: 00000000000006e0
      [10178.010180] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [10178.010864] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [10178.011549] Stack:
      [10178.012221]  000000013f597982 ffffffff81d70260 ffff8802e7f78000 ffffffff81067430
      [10178.013481]  0000000000000000 ffffc90008bb9000 0000000000000080 ffff8802e7f7ba30
      [10178.014875]  ffffffff810ec057 ffff8802eedf5f80 ffff8802e7f7ba60 ffffffff810ec08b
      [10178.016148] Call Trace:
      [10178.016745]  [<ffffffff81067430>] ? flush_tlb_func+0xb0/0xb0
      [10178.017397]  [<ffffffff810ec057>] smp_call_function+0x37/0x40
      [10178.018049]  [<ffffffff810ec08b>] on_each_cpu+0x2b/0x70
      [10178.018732]  [<ffffffff81067869>] flush_tlb_kernel_range+0x59/0xa0
      [10178.019384]  [<ffffffff811b042b>] remove_vm_area+0x6b/0xa0
      [10178.020022]  [<ffffffff811b048e>] __vunmap+0x2e/0x120
      [10178.020660]  [<ffffffff811b0608>] vfree+0x48/0x90
      [10178.021297]  [<ffffffffa02a8b6b>] cfs_hash_buckets_free+0x19b/0x220 [libcfs]
      [10178.021990]  [<ffffffffa02ab11b>] cfs_hash_putref+0x2ab/0x500 [libcfs]
      [10178.022669]  [<ffffffff81379078>] ? kobject_release+0x98/0x1b0
      [10178.023372]  [<ffffffffa0552a2c>] ldlm_namespace_free_post+0x6c/0x2a0 [ptlrpc]
      [10178.026044]  [<ffffffffa0c6ab8e>] mdt_device_fini+0x6fe/0xfb0 [mdt]
      [10178.026668]  [<ffffffffa03b2b64>] class_cleanup+0x7b4/0xcf0 [obdclass]
      [10178.027138]  [<ffffffffa03b50dd>] class_process_config+0x19cd/0x23b0 [obdclass]
      [10178.027972]  [<ffffffff811cd4f9>] ? __kmalloc+0x649/0x660
      [10178.028431]  [<ffffffffa02a2cc7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [10178.028937]  [<ffffffffa03b5c86>] class_manual_cleanup+0x1c6/0x6d0 [obdclass]
      [10178.029804]  [<ffffffffa03e34de>] server_put_super+0x8ae/0xca0 [obdclass]
      [10178.030623]  [<ffffffff811efa46>] generic_shutdown_super+0x56/0xe0
      [10178.031460]  [<ffffffff811efe22>] kill_anon_super+0x12/0x20
      [10178.032244]  [<ffffffffa03b83c2>] lustre_kill_super+0x32/0x50 [obdclass]
      [10178.033090]  [<ffffffff811f0329>] deactivate_locked_super+0x49/0x60
      [10178.034232]  [<ffffffff811f0926>] deactivate_super+0x46/0x60
      [10178.035006]  [<ffffffff8120f115>] mntput_no_expire+0xc5/0x120
      [10178.035693]  [<ffffffff8121029f>] SyS_umount+0x9f/0x3c0
      [10178.036324]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
      [10178.036972] Code: 48 63 35 96 58 c8 00 89 c2 39 f0 0f 8d 86 fe ff ff 48 98 49 8b 0f 48 03 0c c5 e0 38 d6 81 f6 41 20 01 74 cd 0f 1f 44 00 00 f3 90 <f6> 41 20 01 75 f8 48 63 35 65 58 c8 00 eb b7 0f b6 4d cc 4c 89 
      [10178.040955] Kernel panic - not syncing: softlockup: hung tasks
      [10178.041604] CPU: 7 PID: 32081 Comm: umount Tainted: P           OEL ------------   3.10.0-debug #2
      [10178.042845] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [10178.043491]  ffffffff81a6f14b 000000003f597982 ffff88033e4e3df8 ffffffff816fd3e4
      [10178.044714]  ffff88033e4e3e78 ffffffff816f8c34 0000000000000008 ffff88033e4e3e88
      [10178.045936]  ffff88033e4e3e28 000000003f597982 ffff88033e4e3e47 0000000000000000
      [10178.047168] Call Trace:
      [10178.047745]  <IRQ>  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
      [10178.048400]  [<ffffffff816f8c34>] panic+0xd8/0x1e7
      [10178.049033]  [<ffffffff81121f86>] watchdog_timer_fn+0x1b6/0x1c0
      [10178.049683]  [<ffffffff81121dd0>] ? watchdog_enable+0xc0/0xc0
      [10178.050317]  [<ffffffff810a74c9>] __hrtimer_run_queues+0xf9/0x2d0
      [10178.050975]  [<ffffffff810a7b70>] hrtimer_interrupt+0xb0/0x1e0
      [10178.051753]  [<ffffffff817115dc>] ? call_softirq+0x1c/0x30
      [10178.052385]  [<ffffffff81067430>] ? flush_tlb_func+0xb0/0xb0
      [10178.053036]  [<ffffffff8104ad97>] local_apic_timer_interrupt+0x37/0x60
      [10178.053711]  [<ffffffff817121cf>] smp_apic_timer_interrupt+0x3f/0x60
      [10178.054361]  [<ffffffff8171091d>] apic_timer_interrupt+0x6d/0x80
      [10178.055014]  <EOI>  [<ffffffff810ebfc2>] ? smp_call_function_many+0x202/0x260
      [10178.056743]  [<ffffffff81067430>] ? flush_tlb_func+0xb0/0xb0
      [10178.057374]  [<ffffffff810ec057>] smp_call_function+0x37/0x40
      [10178.058025]  [<ffffffff810ec08b>] on_each_cpu+0x2b/0x70
      [10178.058664]  [<ffffffff81067869>] flush_tlb_kernel_range+0x59/0xa0
      [10178.059310]  [<ffffffff811b042b>] remove_vm_area+0x6b/0xa0
      [10178.059955]  [<ffffffff811b048e>] __vunmap+0x2e/0x120
      [10178.060588]  [<ffffffff811b0608>] vfree+0x48/0x90
      [10178.061228]  [<ffffffffa02a8b6b>] cfs_hash_buckets_free+0x19b/0x220 [libcfs]
      [10178.061915]  [<ffffffffa02ab11b>] cfs_hash_putref+0x2ab/0x500 [libcfs]
      [10178.062581]  [<ffffffff81379078>] ? kobject_release+0x98/0x1b0
      [10178.063303]  [<ffffffffa0552a2c>] ldlm_namespace_free_post+0x6c/0x2a0 [ptlrpc]
      [10178.064530]  [<ffffffffa0c6ab8e>] mdt_device_fini+0x6fe/0xfb0 [mdt]
      [10178.065233]  [<ffffffffa03b2b64>] class_cleanup+0x7b4/0xcf0 [obdclass]
      [10178.065917]  [<ffffffffa03b50dd>] class_process_config+0x19cd/0x23b0 [obdclass]
      [10178.067241]  [<ffffffff811cd4f9>] ? __kmalloc+0x649/0x660
      [10178.067908]  [<ffffffffa02a2cc7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [10178.068604]  [<ffffffffa03b5c86>] class_manual_cleanup+0x1c6/0x6d0 [obdclass]
      [10178.069298]  [<ffffffffa03e34de>] server_put_super+0x8ae/0xca0 [obdclass]
      [10178.069980]  [<ffffffff811efa46>] generic_shutdown_super+0x56/0xe0
      [10178.070683]  [<ffffffff811efe22>] kill_anon_super+0x12/0x20
      [10178.071457]  [<ffffffffa03b83c2>] lustre_kill_super+0x32/0x50 [obdclass]
      [10178.072271]  [<ffffffff811f0329>] deactivate_locked_super+0x49/0x60
      [10178.073072]  [<ffffffff811f0926>] deactivate_super+0x46/0x60
      [10178.073776]  [<ffffffff8120f115>] mntput_no_expire+0xc5/0x120
      [10178.074410]  [<ffffffff8121029f>] SyS_umount+0x9f/0x3c0
      [10178.076646]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
      

      Crashdump adn modules in /exports/crashdumps/"192.168.123.150-2017-07-29-04:37:56" on onylx-68

      Attachments

        Activity

          People

            wc-triage WC Triage
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: