[LU-11998] ASSERTION( req->rq_phase == expected_phase ) failed in sanity 411 Created: 24/Feb/19  Updated: 24/Feb/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In my testing this is one of the more frequent failures so I have a lot or crashdumps if needed

Basically it unfolds like this:

[17452.428803] Lustre: DEBUG MARKER: == sanity test 411: Slab allocation error with cgroup does not LBUG ================================== 06:38:53 (1550662733)
[17460.153247] SLAB: Unable to allocate memory on node 0 (gfp=0x100050)
[17460.173293]   cache: kmalloc-512(0:osc_slab_alloc), object size: 4096, order: 0
[17460.177182]   node 0: slabs: 95/95, objs: 95/95, free: 0
[17460.338187] SLAB: Unable to allocate memory on node 0 (gfp=0x180050)
[17460.357819]   cache: radix_tree_node(0:osc_slab_alloc), object size: 4096, order: 0
[17460.360806]   node 0: slabs: 20/20, objs: 20/20, free: 0
[17460.404582] SLAB: Unable to allocate memory on node 0 (gfp=0x100050)
[17460.407022]   cache: kmalloc-ac(0:osc_slab_alloc), object size: 64, order: 0
[17460.413337]   node 0: slabs: 2/2, objs: 118/118, free: 0
[17460.606578] SLAB: Unable to allocate memory on node 0 (gfp=0x180050)
[17460.624043]   cache: radix_tree_node(0:osc_slab_alloc), object size: 4096, order: 0
[17460.625738]   node 0: slabs: 37/37, objs: 37/37, free: 0
[17460.906768] SLAB: Unable to allocate memory on node 0 (gfp=0x100000)
[17460.907221]   cache: kmalloc-ac(0:osc_slab_alloc), object size: 64, order: 0
[17460.907221]   node 0: slabs: 3/3, objs: 177/177, free: 0
[17460.932028] SLAB: Unable to allocate memory on node 0 (gfp=0x100000)
[17460.933775]   cache: kmalloc-ac(0:osc_slab_alloc), object size: 64, order: 0
[17460.933775]   node 0: slabs: 3/3, objs: 177/177, free: 0
[17461.005654] SLAB: Unable to allocate memory on node 0 (gfp=0x100050)
[17461.025645]   cache: osc_extent_kmem(0:osc_slab_alloc), object size: 216, order: 0
[17461.030831]   node 0: slabs: 4/4, objs: 72/72, free: 0
[17461.293822] SLAB: Unable to allocate memory on node 0 (gfp=0x100020)
[17461.294008]   cache: ptlrpc_cache(0:osc_slab_alloc), object size: 4096, order: 0
[17461.294008]   node 0: slabs: 1/1, objs: 1/1, free: 0
[17461.316586] LustreError: 20110:0:(events.c:326:request_in_callback()) Can't allocate incoming request descriptor: Dropping mdt_readpage RPC from 12345-0@lo
[17461.319280] SLAB: Unable to allocate memory on node 0 (gfp=0x100050)
[17461.321488]   cache: kmalloc-256(0:osc_slab_alloc), object size: 4096, order: 0
[17461.324562]   node 0: slabs: 0/0, objs: 0/0, free: 0
[17461.325578] LustreError: 20110:0:(client.c:1045:ptlrpc_set_destroy()) ASSERTION( req->rq_phase == expected_phase ) failed: 
[17461.328245] LustreError: 20110:0:(client.c:1045:ptlrpc_set_destroy()) LBUG
[17461.330257] Pid: 20110, comm: dd 3.10.0-7.6-debug #1 SMP Wed Nov 7 21:55:08 EST 2018
[17461.333789] Call Trace:
[17461.335607]  [<ffffffffa029f7dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[17461.342130]  [<ffffffffa029f88c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[17461.343898]  [<ffffffffa0651891>] ptlrpc_set_destroy+0x331/0x410 [ptlrpc]
[17461.346113]  [<ffffffffa0657c2d>] ptlrpc_queue_wait+0x8d/0x230 [ptlrpc]
[17461.347822]  [<ffffffffa092d82e>] mdc_close+0x1ee/0x990 [mdc]
[17461.348764]  [<ffffffffa05bf8fe>] lmv_close+0x17e/0x2a0 [lmv]
[17461.349849]  [<ffffffffa143b0a1>] ll_close_inode_openhandle+0x2e1/0xcf0 [lustre]
[17461.352105]  [<ffffffffa143fed0>] ll_md_real_close+0xf0/0x1e0 [lustre]
[17461.354109]  [<ffffffffa14405d5>] ll_file_release+0x615/0x8c0 [lustre]
[17461.357071]  [<ffffffff812386bc>] __fput+0xfc/0x300
[17461.360240]  [<ffffffff8123899e>] ____fput+0xe/0x10
[17461.361226]  [<ffffffff810b1885>] task_work_run+0xb5/0xf0
[17461.362386]  [<ffffffff8102bc22>] do_notify_resume+0x92/0xb0
[17461.364703]  [<ffffffff817c5158>] int_signal+0x12/0x17
[17461.366681]  [<ffffffffffffffff>] 0xffffffffffffffff
[17461.368926] Kernel panic - not syncing: LBUG


 Comments   
Comment by Alex Zhuravlev [ 24/Feb/19 ]

does this happen to sanity/411 only?

Generated at Sat Feb 10 02:48:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.