[LU-13381] Crash in LNet while configuring net Created: 23/Mar/20  Updated: 07/Jul/22  Resolved: 07/Jul/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Amir Shehata (Inactive) Assignee: Serguei Smirnov
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

LUTF Script: dlc_libconfig_24

Causes LNet to crash with the following stack trace

 #0  delay_tsc (__loops=<optimized out>) at arch/x86/lib/delay.c:77
#1  0xffffffff813907fd in __delay (loops=<optimized out>) at arch/x86/lib/delay.c:108
#2  __const_udelay (xloops=<optimized out>, xloops@entry=4295000) at arch/x86/lib/delay.c:122
#3  0xffffffff81774a96 in panic (fmt=fmt@entry=0x3326 <irq_stack_union+13094> <error: Cannot access memory at address 0x3326>) at kernel/panic.c:279
#4  0xffffffffc05000ab in lbug_with_loc (msgdata=msgdata@entry=0x16080 <cpu_stopper+32>)
    at /home/ashehata/LustreBuild/mr-lutf/libcfs/libcfs/linux-debug.c:104
#5  0xffffffffc0548790 in lnet_ni_unlink_locked (ni=0xffff8800acb4c600) at /home/ashehata/LustreBuild/mr-lutf/lnet/lnet/api-ni.c:2015
#6  lnet_shutdown_lndni (ni=ni@entry=0xffff8800acb4c600) at /home/ashehata/LustreBuild/mr-lutf/lnet/lnet/api-ni.c:2099
#7  0xffffffffc054b477 in lnet_startup_lndni (tun=0xffff8800bb984abc, ni=0xffff8800acb4c600)
    at /home/ashehata/LustreBuild/mr-lutf/lnet/lnet/api-ni.c:2239
#8  lnet_startup_lndnet (net=net@entry=0xffff88003656ff00, tun=0xffff8800bb984abc) at /home/ashehata/LustreBuild/mr-lutf/lnet/lnet/api-ni.c:2375
#9  0xffffffffc054c795 in lnet_add_net_common (net=net@entry=0xffff88003656ff00, tun=tun@entry=0xffff8800bb984aa8)
    at /home/ashehata/LustreBuild/mr-lutf/lnet/lnet/api-ni.c:3122
#10 0xffffffffc054f556 in lnet_dyn_add_ni (conf=conf@entry=0xffff8800bb984000) at /home/ashehata/LustreBuild/mr-lutf/lnet/lnet/api-ni.c:3240
#11 0xffffffffc056d3d8 in lnet_dyn_configure_ni (hdr=0xffff8800bb984000) at /home/ashehata/LustreBuild/mr-lutf/lnet/lnet/module.c:146
#12 lnet_ioctl (nb=<optimized out>, cmd=3233310047, vdata=0xffff8800bb984000) at /home/ashehata/LustreBuild/mr-lutf/lnet/lnet/module.c:208
#13 0xffffffff81788b6f in notifier_call_chain (nl=nl@entry=0xffffffffc050cd20, val=val@entry=3233310047, v=v@entry=0xffff8800bb984000, 
    nr_to_call=nr_to_call@entry=-1, nr_calls=nr_calls@entry=0x0 <irq_stack_union>) at kernel/notifier.c:93
#14 0xffffffff810cc5ad in __blocking_notifier_call_chain (nh=0xffffffffc050cd00, val=3233310047, v=0xffff8800bb984000, 
    nr_to_call=nr_to_call@entry=-1, nr_calls=nr_calls@entry=0x0 <irq_stack_union>) at kernel/notifier.c:314
#15 0xffffffff810cc5e6 in blocking_notifier_call_chain (nh=<optimized out>, val=<optimized out>, v=<optimized out>) at kernel/notifier.c:325
#16 0xffffffffc04ea8c1 in libcfs_psdev_ioctl ()
#17 0xffffffff8125fb40 in vfs_ioctl (arg=<optimized out>, cmd=<optimized out>, filp=<optimized out>) at fs/ioctl.c:43
#18 do_vfs_ioctl (filp=filp@entry=0xffff8800af691400, fd=fd@entry=3, cmd=cmd@entry=3233310047, arg=arg@entry=26972208) at fs/ioctl.c:631
#19 0xffffffff8125fde1 in SYSC_ioctl (arg=26972208, cmd=3233310047, fd=3) at fs/ioctl.c:646
#20 SyS_ioctl (fd=3, cmd=3233310047, arg=26972208) at fs/ioctl.c:637

assert happens api-ni.c:

 2230 »·······if (ni->ni_net->net_tunables.lct_peer_tx_credits == 0 ||
2231 »·······    ni->ni_net->net_tunables.lct_max_tx_credits == 0) {
2232 »·······»·······LCONSOLE_ERROR_MSG(0x107, "LNI %s has no %scredits\n",
2233 »·······»·······»·······»·······   libcfs_lnd2str(net->net_lnd->lnd_type),
2234 »·······»·······»·······»·······   ni->ni_net->net_tunables.lct_peer_tx_credits == 0 ?
2235 »·······»·······»·······»·······»·······"" : "per-peer ");
2236 »·······»·······/* shutdown the NI since if we get here then it must've already
2237 »·······»······· * been started
2238 »·······»······· */
2239 »·······»·······lnet_shutdown_lndni(ni);
2240 »·······»·······return -EINVAL;
2241 »·······}


 Comments   
Comment by Andreas Dilger [ 07/Jul/22 ]

Haven't seen this again.

Generated at Sat Feb 10 03:00:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.