Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5378

recovery-small test 57 crash on invalid spinlock.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.6.0, Lustre 2.7.0, Lustre 2.11.0
    • None
    • 3
    • 14985

    Description

      This seems to be reproducing from time to time on my systems.
      A crash like below that's likely due to lctl still trying to read the proc file while a parallel unmount frees stats.

      <4>[ 7018.594828] Lustre: DEBUG MARKER: == recovery-small test 57: read procfs entries causes kernel crash == 17:20:23 (1405891223)
      <0>[ 7021.841908] BUG: spinlock bad magic on CPU#0, lctl/27044 (Not tainted)
      <4>[ 7021.842559] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
      <4>[ 7021.843161] last sysfs file: /sys/devices/system/cpu/possible
      <4>[ 7021.844006] CPU 0 
      <4>[ 7021.844006] Modules linked in: lustre ofd osp lod ost mdt mdd mgs nodemap osd_ldiskfs ldiskfs lquota lfsck obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 jbd2 mbcache virtio_balloon virtio_console i2c_piix4 i2c_core virtio_net virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs]
      <4>[ 7021.844006] 
      <4>[ 7021.844006] Pid: 27044, comm: lctl Not tainted 2.6.32-rhe6.5-debug #2 Red Hat KVM
      <4>[ 7021.844006] RIP: 0010:[<ffffffff81292771>]  [<ffffffff81292771>] spin_bug+0x81/0x100
      <4>[ 7021.844006] RSP: 0018:ffff880089aa1cd8  EFLAGS: 00010002
      <4>[ 7021.844006] RAX: 0000000000000050 RBX: ffff88007aeec348 RCX: 00000000ffffffff
      <4>[ 7021.844006] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000046
      <4>[ 7021.844006] RBP: ffff880089aa1cf8 R08: 0000000000000000 R09: 000000006b6b6b6b
      <4>[ 7021.844006] R10: 0736072e07340735 R11: 073907360720075b R12: 6b6b6b6b6b6b6b6b
      <4>[ 7021.844006] R13: ffffffff817e1c57 R14: 0000000000000000 R15: 6b6b6b6b6b6b6b6b
      <4>[ 7021.844006] FS:  00007f92cc6d4700(0000) GS:ffff880006200000(0000) knlGS:0000000000000000
      <4>[ 7021.844006] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[ 7021.844006] CR2: 00007f92cbf57800 CR3: 00000000748fa000 CR4: 00000000000006f0
      <4>[ 7021.844006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>[ 7021.844006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>[ 7021.844006] Process lctl (pid: 27044, threadinfo ffff880089aa0000, task ffff8800502183c0)
      <4>[ 7021.844006] Stack:
      <4>[ 7021.844006]  0000006cffffffff ffff88007aeec348 0000000000000000 ffff880089aa1da8
      <4>[ 7021.844006] <d> ffff880089aa1d48 ffffffff81292935 0000000000000000 ffff880089aa1e60
      <4>[ 7021.844006] <d> ffff880089aa1d38 0000000000000292 0000000000000000 ffff880089aa1da8
      <4>[ 7021.844006] Call Trace:
      <4>[ 7021.844006]  [<ffffffff81292935>] _raw_spin_lock+0xa5/0x180
      <4>[ 7021.844006]  [<ffffffff81516894>] _spin_lock_irqsave+0x24/0x30
      <4>[ 7021.844006]  [<ffffffffa09f0a51>] lprocfs_stats_collect+0x161/0x180 [obdclass]
      <4>[ 7021.844006]  [<ffffffffa09f0ae6>] lprocfs_stats_seq_show+0x76/0x150 [obdclass]
      <4>[ 7021.844006]  [<ffffffff81170393>] ? kmem_cache_alloc_trace+0x143/0x250
      <4>[ 7021.844006]  [<ffffffff811ae778>] seq_read+0xf8/0x420
      <4>[ 7021.844006]  [<ffffffff811ae680>] ? seq_read+0x0/0x420
      <4>[ 7021.844006]  [<ffffffff811f4ae5>] proc_reg_read+0x85/0xc0
      <4>[ 7021.844006]  [<ffffffff81189c95>] vfs_read+0xb5/0x1a0
      <4>[ 7021.844006]  [<ffffffff81189dd1>] sys_read+0x51/0x90
      <4>[ 7021.844006]  [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
      <4>[ 7021.844006] Code: 8d 8e a0 06 00 00 49 89 c1 4c 89 ee 31 c0 48 c7 c7 f8 1f 7e 81 65 8b 14 25 d8 e0 00 00 e8 72 06 28 00 4d 85 e4 44 8b 4b 08 74 6b <45> 8b 84 24 a8 04 00 00 49 8d 8c 24 a0 06 00 00 8b 53 04 48 89 
      <1>[ 7021.844006] RIP  [<ffffffff81292771>] spin_bug+0x81/0x100
      

      There's also unmount that does the freeing:

      PID: 27018  TASK: ffff8800836f80c0  CPU: 5   COMMAND: "umount"
       #0 [ffff880049b216f8] schedule at ffffffff815133ca
       #1 [ffff880049b217c0] schedule_timeout at ffffffff815142b5
       #2 [ffff880049b21870] wait_for_common at ffffffff81513f2b
       #3 [ffff880049b21900] wait_for_completion at ffffffff8151403d
       #4 [ffff880049b21910] remove_proc_entry at ffffffff811fb7a7
       #5 [ffff880049b219b0] lprocfs_remove_nolock at ffffffffa09efa20 [obdclass]
       #6 [ffff880049b219f0] lprocfs_remove at ffffffffa09efc15 [obdclass]
       #7 [ffff880049b21a10] lprocfs_obd_cleanup at ffffffffa09efc84 [obdclass]
       #8 [ffff880049b21a30] osc_precleanup at ffffffffa0ef94ec [osc]
       #9 [ffff880049b21a60] class_cleanup at ffffffffa0a085c3 [obdclass]
      #10 [ffff880049b21ae0] class_process_config at ffffffffa0a0a67a [obdclass]
      #11 [ffff880049b21b70] class_manual_cleanup at ffffffffa0a0ad59 [obdclass]
      #12 [ffff880049b21c30] lov_putref at ffffffffa0f83276 [lov]
      #13 [ffff880049b21cb0] lov_disconnect at ffffffffa0f8a8a2 [lov]
      #14 [ffff880049b21ce0] ll_put_super at ffffffffa07f51ce [lustre]
      #15 [ffff880049b21e30] generic_shutdown_super at ffffffff8118bb0b
      #16 [ffff880049b21e50] kill_anon_super at ffffffff8118bbf6
      #17 [ffff880049b21e70] lustre_kill_super at ffffffffa0a0cbda [obdclass]
      #18 [ffff880049b21e90] deactivate_super at ffffffff8118c397
      #19 [ffff880049b21eb0] mntput_no_expire at ffffffff811ab40f
      #20 [ffff880049b21ee0] sys_umount at ffffffff811abf7b
      #21 [ffff880049b21f80] system_call_fastpath at ffffffff8100b0b2
      

      This is probably somewhat similar to LU-106

      Sample crashdump in /exports/crashdumps/192.168.10.223-2014-07-20-17:20:28
      tag in my tree: master-20140720

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              green Oleg Drokin
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: