[LU-4327] tgt_ses_info()) ASSERTION( env->le_ses != ((void *)0) ) failed Created: 29/Nov/13  Updated: 06/Jan/14  Resolved: 26/Dec/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Blocker
Reporter: Oleg Drokin Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-3467 Unified request handler on OST Resolved
Severity: 3
Rank (Obsolete): 11829

 Description   

I hit this soon after landing unified target support, running racer.

<0>[46139.810353] LustreError: 31320:0:(lu_target.h:129:tgt_ses_info()) ASSERTIO
N( env->le_ses != ((void *)0) ) failed: 
<0>[46139.811605] LustreError: 31320:0:(lu_target.h:129:tgt_ses_info()) LBUG
<0>[46139.812210] Kernel panic - not syncing: LBUG in interrupt.
<0>[46139.812211] 
<4>[46139.813195] Pid: 31320, comm: ll_ost00_007 Not tainted 2.6.32-rhe6.4-debug
2 #1
<4>[46139.815359] Call Trace:
<4>[46139.815359]  [<ffffffff814fade7>] ? panic+0xa7/0x16f
<4>[46139.815359]  [<ffffffffa0abbeed>] ? lbug_with_loc+0x8d/0xb0 [libcfs]
<4>[46139.815359]  [<ffffffffa14367d4>] ? tgt_punch_hpreq_lock_match+0x104/0x110 [ptlrpc]
<4>[46139.815359]  [<ffffffffa13c03f8>] ? ldlm_server_blocking_ast+0x1e8/0x880 [ptlrpc]
<4>[46139.815359]  [<ffffffffa1434f5b>] ? tgt_blocking_ast+0x7b/0x5e0 [ptlrpc]
<4>[46139.815359]  [<ffffffffa0ac7685>] ? libcfs_nid2str+0x155/0x160 [libcfs]
<4>[46139.815359]  [<ffffffffa1393e8d>] ? ldlm_work_bl_ast_lock+0xdd/0x290 [ptlrpc]
<4>[46139.815359]  [<ffffffffa13d453f>] ? ptlrpc_set_wait+0x6f/0x830 [ptlrpc]
<4>[46139.815359]  [<ffffffffa13d0ea8>] ? ptlrpc_prep_set+0x38/0x300 [ptlrpc]
<4>[46139.815359]  [<ffffffff81094e64>] ? __init_waitqueue_head+0x24/0x40
<4>[46139.815359]  [<ffffffffa13d0f8f>] ? ptlrpc_prep_set+0x11f/0x300 [ptlrpc]
<4>[46139.815359]  [<ffffffffa1393db0>] ? ldlm_work_bl_ast_lock+0x0/0x290 [ptlrpc]
<4>[46139.815359]  [<ffffffffa1396e3b>] ? ldlm_run_ast_work+0x1bb/0x440 [ptlrpc]
<4>[46139.815359]  [<ffffffffa13ada6f>] ? ldlm_process_extent_lock+0x1af/0xaa0 [ptlrpc]
<4>[46139.815359]  [<ffffffffa13963cc>] ? ldlm_lock_enqueue+0x38c/0x860 [ptlrpc]
<4>[46139.815359]  [<ffffffffa13bef1f>] ? ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
<4>[46139.815359]  [<ffffffffa1438012>] ? tgt_enqueue+0x62/0x1d0 [ptlrpc]
<4>[46139.815359]  [<ffffffffa143c2c4>] ? tgt_request_handle+0x224/0x9f0 [ptlrpc]
<4>[46139.815359]  [<ffffffffa13efdd3>] ? ptlrpc_main+0xcd3/0x1940 [ptlrpc]
<4>[46139.815359]  [<ffffffffa13ef100>] ? ptlrpc_main+0x0/0x1940 [ptlrpc]
<4>[46139.815359]  [<ffffffff81094726>] ? kthread+0x96/0xa0
<4>[46139.815359]  [<ffffffff8100c10a>] ? child_rip+0xa/0x20
<4>[46139.815359]  [<ffffffff81094690>] ? kthread+0x0/0xa0
<4>[46139.815359]  [<ffffffff8100c100>] ? child_rip+0x0/0x20

code branch in my tree: master-20131128
crrashdump and modules are in /exports/crashdumps/192.168.10.210-2013-11-28-21\:30\:10

This might be slightly related to lu-2246 too, which failed with the same assrtion, though with a totally different path.



 Comments   
Comment by Mikhail Pershin [ 30/Nov/13 ]

That doesn't look like master branch, there is no tgt_punch_hpreq_lock_match() in master now, it is not yet landed http://review.whamcloud.com/#/c/7383/26
Is that happened while you were testing that patch as pre-landing check? Please clarify.

Comment by Mikhail Pershin [ 01/Dec/13 ]

Well, the hpreq_lock_match() may be called for requests in exp_hp_rpcs list, but in that list request is put earlier, before processing, so it might has no thread and corresponding lu_env. Patch was refreshed to don't use thread environment in hpreq_lock_match() but take everything from request itself.

Comment by Oleg Drokin [ 26/Dec/13 ]

problem was in a patch that did not land and underwent some changes to fix this.

Comment by nasf (Inactive) [ 06/Jan/14 ]

Another failure instance:

https://maloo.whamcloud.com/test_sets/fe5a3d08-76ab-11e3-9ce8-52540035b04c

Generated at Sat Feb 10 01:41:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.