[LU-5664] assertion in failure handling of LNetNIInit Created: 25/Sep/14  Updated: 19/Feb/15  Resolved: 19/Feb/15

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Liang Zhen (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-5568 kernel crash when when network initia... Resolved
Severity: 3
Rank (Obsolete): 15875

 Description   

I hit this in my testing, it seems like failure handling of LNetNIInit is not correct, for example, if we have initialised some NIs before the failure, then we should finalise those initialised NIs before calling lnet_unprepare

LNetError: 2843:0:(api-ni.c:1505:lnet_startup_lndnis()) Can't load LND tcp, module ksocklnd, rc=256
LNetError: 2843:0:(api-ni.c:823:lnet_unprepare()) ASSERTION( list_empty(&the_lnet.ln_nis) ) failed: 
LNetError: 2843:0:(api-ni.c:823:lnet_unprepare()) LBUG
Kernel panic - not syncing: LBUG
Pid: 2843, comm: insmod Tainted: P           ---------------    2.6.32.431.lustre #1
Call Trace:
 [<ffffffff8152528a>] ? panic+0xa7/0x16f
 [<ffffffffa041aeeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
 [<ffffffffa04c0d6d>] ? lnet_unprepare+0x2ad/0x320 [lnet]
 [<ffffffffa04c4998>] ? LNetNIInit+0x1f8/0x3f0 [lnet]
 [<ffffffffa052a06e>] ? srpc_startup+0x5e/0x220 [lnet_selftest]
 [<ffffffffa052f585>] ? init_module+0x215/0x500 [lnet_selftest]
 [<ffffffffa052f370>] ? init_module+0x0/0x500 [lnet_selftest]
 [<ffffffff8100204c>] ? do_one_initcall+0x3c/0x1d0
 [<ffffffff810bc511>] ? sys_init_module+0xe1/0x250
 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b


 Comments   
Comment by Amir Shehata (Inactive) [ 25/Sep/14 ]

I believe this is a duplicate of LU-5568

There is already a patch to fix this issue:
http://review.whamcloud.com/#/c/11718/

Comment by Liang Zhen (Inactive) [ 26/Sep/14 ]

thanks Amir!

Generated at Sat Feb 10 01:53:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.