Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
Trying debug kernel in maloo it seems ZFS is hardly operational, sleeps in atomic context and then dies due to some spinlock deadlock or something (soft lockup leading to panic due to our kernel config).
05:18:30:[ 590.971378] BUG: sleeping function called from invalid context at kernel/mutex.c:104 05:18:30:[ 590.973218] in_atomic(): 1, irqs_disabled(): 0, pid: 32539, name: mdt00_002 05:18:30:[ 590.974751] CPU: 0 PID: 32539 Comm: mdt00_002 Tainted: P W OE ------------ 3.10.0-327.22.2.el7_lustre.x86_64 #1 05:18:30:[ 590.976645] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 05:18:30:[ 590.978113] ffff88004c77bf58 00000000d641fc8e ffff880039193860 ffffffff8164bed6 05:18:30:[ 590.979760] ffff880039193870 ffffffff810b5639 ffff880039193888 ffffffff81651220 05:18:30:[ 590.981415] ffff88004c77be88 ffff8800391938b0 ffffffffa03e0d80 ffff880025cf5f70 05:18:30:[ 590.983057] Call Trace: 05:18:30:[ 590.984316] [<ffffffff8164bed6>] dump_stack+0x19/0x1b 05:18:30:[ 590.985764] [<ffffffff810b5639>] __might_sleep+0xd9/0x100 05:18:30:[ 590.987246] [<ffffffff81651220>] mutex_lock+0x20/0x40 05:18:30:[ 590.988729] [<ffffffffa03e0d80>] sa_spill_rele+0x20/0xb0 [zfs] 05:18:30:[ 590.990223] [<ffffffffa0fcf39f>] osd_object_sa_dirty_rele+0xaf/0x110 [osd_zfs] 05:18:30:[ 590.991867] [<ffffffffa0fc7d20>] osd_trans_stop+0x2a0/0x530 [osd_zfs] 05:18:30:[ 590.993471] [<ffffffffa0e2da69>] top_trans_stop+0x99/0x8f0 [ptlrpc] 05:18:30:[ 590.995068] [<ffffffffa121cbda>] ? lod_attr_set+0xaa/0x920 [lod] 05:18:30:[ 590.996606] [<ffffffffa1202219>] lod_trans_stop+0x259/0x340 [lod] 05:18:30:[ 590.998149] [<ffffffffa1284ffd>] ? mdd_attr_set_internal+0x11d/0x2a0 [mdd] 05:18:30:[ 590.999737] [<ffffffffa128fa5a>] mdd_trans_stop+0x1a/0x1c [mdd] 05:18:30:[ 591.001275] [<ffffffffa127d85c>] mdd_create+0x104c/0x12b0 [mdd] 05:18:30:[ 591.002815] [<ffffffffa1154f19>] mdt_md_create+0x849/0xba0 [mdt] 05:18:30:[ 591.004373] [<ffffffffa0bac561>] ? lprocfs_job_stats_log+0xd1/0x600 [obdclass] 05:18:30:[ 591.006010] [<ffffffffa11553db>] mdt_reint_create+0x16b/0x350 [mdt] 05:18:30:[ 591.007596] [<ffffffffa11568e0>] mdt_reint_rec+0x80/0x210 [mdt] 05:18:30:[ 591.009238] [<ffffffffa1139e02>] mdt_reint_internal+0x582/0x970 [mdt] 05:18:30:[ 591.010814] [<ffffffffa1144b67>] mdt_reint+0x67/0x140 [mdt] 05:18:30:[ 591.012344] [<ffffffffa0e1a7e5>] tgt_request_handle+0x925/0x1330 [ptlrpc] 05:18:30:[ 591.013948] [<ffffffffa0dc824e>] ptlrpc_server_handle_request+0x22e/0xaa0 [ptlrpc] 05:18:30:[ 591.015621] [<ffffffffa0dc6aee>] ? ptlrpc_wait_event+0xae/0x350 [ptlrpc] 05:18:30:[ 591.017218] [<ffffffff810bcc92>] ? default_wake_function+0x12/0x20 05:18:30:[ 591.018769] [<ffffffff810b2cd8>] ? __wake_up_common+0x58/0x90 05:18:30:[ 591.020298] [<ffffffffa0dcc018>] ptlrpc_main+0xa58/0x1db0 [ptlrpc] 05:18:30:[ 591.021868] [<ffffffffa0dcb5c0>] ? ptlrpc_register_service+0xe60/0xe60 [ptlrpc] 05:18:30:[ 591.023511] [<ffffffff810a8a24>] kthread+0xe4/0xf0 05:18:30:[ 591.024956] [<ffffffff810a8940>] ? kthread_create_on_node+0x140/0x140 05:18:30:[ 591.026521] [<ffffffff8165d3d8>] ret_from_fork+0x58/0x90 05:18:30:[ 591.027984] [<ffffffff810a8940>] ? kthread_create_on_node+0x140/0x140
Examples:
https://testing.hpdd.intel.com/test_sessions/35c51d2c-5e25-11e6-b2e2-5254006e85c2
https://testing.hpdd.intel.com/test_sessions/d4afd7c4-5e48-11e6-b5b1-5254006e85c2