Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.5.3
-
None
-
3
-
17712
Description
Hi,
When trying to umount an MDT, we hit the following bug leading to the crash of the server:
<4>------------[ cut here ]------------ <4>WARNING: at lib/list_debug.c:48 list_del+0x6e/0xa0() (Not tainted) <4>Hardware name: bullx <4>list_del corruption. prev->next should be ffff88030b092a90, but was 303054534f2d7077 <4>Modules linked in: osp(U) mdd(U) lfsck(U) lod(U) mdt(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U) ldiskfs(U) lustre(U) lov(U) osc(U) mdc( U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic crc32c_intel libcfs(U) nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipmi_devintf ipmi_si ipmi_msghandler acpi_cpufreq freq_table mperf rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) i pv6 ib_uverbs(U) ib_umad(U) mlx5_ib(U) mlx5_core(U) mlx4_ib(U) ib_sa(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) dm_round_robin scsi_dh_emc dm_ multipath mic(U) uinput raid0 ses enclosure serio_raw compat(U) cxgb3 mdio lpfc scsi_transport_fc scsi_tgt igb i2c_algo_bit i2c_core ptp pps_core sg lpc_ich mfd_core ioatdma dca shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom aacraid ahci usb_storage ata_generic pata_jmicron dm_mirror dm _region_hash dm_log dm_mod megaraid_sas [last unloaded: scsi_wait_scan] <4>Pid: 24459, comm: umount Not tainted 2.6.32-504.8.1.el6.Bull.70.x86_64 #1 <4>Call Trace: <4> [<ffffffff81074df7>] ? warn_slowpath_common+0x87/0xc0 <4> [<ffffffff81074ee6>] ? warn_slowpath_fmt+0x46/0x50 <4> [<ffffffff8129f0de>] ? list_del+0x6e/0xa0 <4> [<ffffffff811a8372>] ? d_kill+0x22/0x60 <4> [<ffffffff811a8716>] ? __shrink_dcache_sb+0x366/0x3c0 <4> [<ffffffff811a8962>] ? shrink_dcache_sb+0x12/0x20 <4> [<ffffffffa0e53f76>] ? osd_umount+0x56/0x150 [osd_ldiskfs] <4> [<ffffffffa0e5b2f7>] ? osd_device_fini+0x147/0x190 [osd_ldiskfs] <4> [<ffffffffa0745a73>] ? class_cleanup+0x573/0xd30 [obdclass] <4> [<ffffffffa071c0a6>] ? class_name2dev+0x56/0xe0 [obdclass] <4> [<ffffffffa074779a>] ? class_process_config+0x156a/0x1ad0 [obdclass] <4> [<ffffffffa07408f3>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass] <4> [<ffffffffa0747e79>] ? class_manual_cleanup+0x179/0x6f0 [obdclass] <4> [<ffffffffa071a57b>] ? class_export_put+0x10b/0x2c0 [obdclass] <4> [<ffffffffa0e5d8d5>] ? osd_obd_disconnect+0x1c5/0x1d0 [osd_ldiskfs] <4> [<ffffffffa074a41b>] ? lustre_put_lsi+0x1ab/0x11a0 [obdclass] <4> [<ffffffffa07529c8>] ? lustre_common_put_super+0x5d8/0xbf0 [obdclass] <4> [<ffffffffa07830dd>] ? server_put_super+0x1bd/0xf60 [obdclass] <4> [<ffffffff811909bb>] ? generic_shutdown_super+0x5b/0xe0 <4> [<ffffffff81190aa6>] ? kill_anon_super+0x16/0x60 <4> [<ffffffffa0749d26>] ? lustre_kill_super+0x36/0x60 [obdclass] <4> [<ffffffff81191247>] ? deactivate_super+0x57/0x80 <4> [<ffffffff811b0e7f>] ? mntput_no_expire+0xbf/0x110 <4> [<ffffffff811b19cb>] ? sys_umount+0x7b/0x3a0 <4> [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b <4>---[ end trace 08665e30da4e39f3 ]--- <4>------------[ cut here ]------------
Here is the backtrace of the faulty process:
crash> bt
PID: 24459 TASK: ffff880338436040 CPU: 1 COMMAND: "umount"
#0 [ffff8800282239c0] machine_kexec at ffffffff8103b71b
#1 [ffff880028223a20] crash_kexec at ffffffff810c9852
#2 [ffff880028223af0] oops_end at ffffffff8152ec30
#3 [ffff880028223b20] no_context at ffffffff8104c80b
#4 [ffff880028223b70] __bad_area_nosemaphore at ffffffff8104ca95
#5 [ffff880028223bc0] bad_area_nosemaphore at ffffffff8104cb63
#6 [ffff880028223bd0] __do_page_fault at ffffffff8104d2bf
#7 [ffff880028223cf0] do_page_fault at ffffffff81530b7e
#8 [ffff880028223d20] page_fault at ffffffff8152df35
[exception RIP: _spin_lock+14]
RIP: ffffffff8152da1e RSP: ffff880028223dd0 RFLAGS: 00010002
RAX: 0000000000010000 RBX: 0000000000006450 RCX: 000000000019ae76
RDX: 0000000000005444 RSI: ffff88030b092740 RDI: 0000000000000040
RBP: ffff880028223dd0 R8: ffff88002822d140 R9: 0001407b60dde621
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000282
R13: ffff88030b092740 R14: ffff88063cd50480 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff880028223dd8] kmem_cache_free at ffffffff81175b0f
#10 [ffff880028223e48] __d_free at ffffffff811a82cf
#11 [ffff880028223e68] d_callback at ffffffff811a8c12
#12 [ffff880028223e78] __rcu_process_callbacks at ffffffff810f05f5
#13 [ffff880028223ec8] rcu_process_callbacks at ffffffff810f083b
#14 [ffff880028223ed8] __do_softirq at ffffffff8107d8b1
#15 [ffff880028223f48] call_softirq at ffffffff8100c30c
#16 [ffff880028223f60] do_softirq at ffffffff8100fb55
#17 [ffff880028223f80] irq_exit at ffffffff8107d765
#18 [ffff880028223f90] smp_apic_timer_interrupt at ffffffff815347ca
#19 [ffff880028223fb0] apic_timer_interrupt at ffffffff8100bb93
--- <IRQ stack> ---
#20 [ffff8802326db798] apic_timer_interrupt at ffffffff8100bb93
[exception RIP: vprintk+593]
RIP: ffffffff81075d51 RSP: ffff8802326db848 RFLAGS: 00000246
RAX: 0000000000010500 RBX: ffff8802326db8d8 RCX: 00000000000049b5
RDX: ffff880028220000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffffffff8100bb8e R8: 0000000000000000 R9: ffffffff81649020
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff81074f95
R13: ffff8802326db7d8 R14: 0000000000000046 R15: 0000000000000025
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#21 [ffff8802326db8e0] printk at ffffffff81529f80
#22 [ffff8802326db940] warn_slowpath_common at ffffffff81074dae
#23 [ffff8802326db980] warn_slowpath_fmt at ffffffff81074ee6
#24 [ffff8802326db9e0] list_del at ffffffff8129f0de
#25 [ffff8802326dba00] d_kill at ffffffff811a8372
#26 [ffff8802326dba20] __shrink_dcache_sb at ffffffff811a8716
#27 [ffff8802326dbac0] shrink_dcache_sb at ffffffff811a8962
#28 [ffff8802326dbad0] osd_umount at ffffffffa0e53f76 [osd_ldiskfs]
#29 [ffff8802326dbaf0] osd_device_fini at ffffffffa0e5b2f7 [osd_ldiskfs]
#30 [ffff8802326dbb10] class_cleanup at ffffffffa0745a73 [obdclass]
#31 [ffff8802326dbb90] class_process_config at ffffffffa074779a [obdclass]
#32 [ffff8802326dbc20] class_manual_cleanup at ffffffffa0747e79 [obdclass]
#33 [ffff8802326dbce0] osd_obd_disconnect at ffffffffa0e5d8d5 [osd_ldiskfs]
#34 [ffff8802326dbd20] lustre_put_lsi at ffffffffa074a41b [obdclass]
#35 [ffff8802326dbd50] lustre_common_put_super at ffffffffa07529c8 [obdclass]
#36 [ffff8802326dbdc0] server_put_super at ffffffffa07830dd [obdclass]
#37 [ffff8802326dbe30] generic_shutdown_super at ffffffff811909bb
#38 [ffff8802326dbe50] kill_anon_super at ffffffff81190aa6
#39 [ffff8802326dbe70] lustre_kill_super at ffffffffa0749d26 [obdclass]
#40 [ffff8802326dbe90] deactivate_super at ffffffff81191247
#41 [ffff8802326dbeb0] mntput_no_expire at ffffffff811b0e7f
#42 [ffff8802326dbee0] sys_umount at ffffffff811b19cb
#43 [ffff8802326dbf80] system_call_fastpath at ffffffff8100b072
RIP: 00007f42273d0997 RSP: 00007fff3f71d3a8 RFLAGS: 00010202
RAX: 00000000000000a6 RBX: ffffffff8100b072 RCX: 00000000fbad2498
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f4229e83be0
RBP: 00007f4229e83bc0 R8: 00007f4229e83c00 R9: 0000000000000000
R10: 00007fff3f71d140 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 00007f4229e83c70
ORIG_RAX: 00000000000000a6 CS: 0033 SS: 002b
Sebastien.