[LU-10554] trivial typo on lnetctl command line generates LBUG on lustre client Created: 23/Jan/18  Updated: 07/Jul/22  Resolved: 06/Feb/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.2
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Minor
Reporter: Alex Kulyavtsev Assignee: Sonia Sharma (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

lustre client 10.2.1
OS: slf6.8
kernel: 2.6.32-642.15.1.el6.x86_64

Custom rpm rebuild:
rpmbuild --rebuild --without servers --with lnet-dlc --with lustre-utils ./lustre-2.10.2-1.src.rpm


Issue Links:
Related
is related to LU-10151 lnetctl gives the worst configration ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I'm trying lnetctl commands. I tried to specify interface as ib0 and it crashed client node with LBUG.
I think lnetctl can just report the error without LBUG.

[root@tev0509 rpmbuild]# lnetctl net show
net:

  • net type: lo
    local NI(s):
  • nid: 0@lo
    status: up
  • net type: o2ib
    local NI(s):
  • nid: 192.168.176.72@o2ib
    status: up
    interfaces:
    0: ib0
  1. lnetctl lnet unconfigure
  2. lnetctl lnet configure
  3. lnetctl net show
    net:
  • net type: lo
    local NI(s):
  • nid: 0@lo
    status: up
  1. lnetctl net add --if ib0

Message from syslogd@tev0509 at Jan 23 15:30:15 ...
kernel:LNetError: 30614:0:(api-ni.c:1499:lnet_startup_lndnet()) ASSERTION( libcfs_isknown_lnd(lnd_type) ) failed:

Message from syslogd@tev0509 at Jan 23 15:30:15 ...
kernel:LNetError: 30614:0:(api-ni.c:1499:lnet_startup_lndnet()) LBUG

==========
kernel.log :

2018-01-23 15:30:15 Pid: 30614, comm: lnetctl
2018-01-23 15:30:15
2018-01-23 15:30:15 Call Trace:
2018-01-23 15:30:15 [<ffffffffa0a33885>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
2018-01-23 15:30:15 [<ffffffffa0a339cf>] lbug_with_loc+0x3f/0x90 [libcfs]
2018-01-23 15:30:15 [<ffffffffa0a95b38>] lnet_startup_lndnet+0x8b8/0x8c0 [lnet]
2018-01-23 15:30:15 [<ffffffffa0a4a65b>] ? cfs_percpt_lock+0x5b/0x110 [libcfs]
2018-01-23 15:30:16 [<ffffffffa0a96cf4>] lnet_add_net_common+0x134/0x480 [lnet]
2018-01-23 15:30:16 [<ffffffffa0a97354>] lnet_dyn_add_ni+0x194/0x1c0 [lnet]
2018-01-23 15:30:16 [<ffffffffa0a3f311>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
2018-01-23 15:30:16 [<ffffffffa0ab1668>] lnet_ioctl+0x268/0x290 [lnet]
2018-01-23 15:30:16 [<ffffffffa0a3d2b8>] libcfs_ioctl+0x118/0x4d0 [libcfs]
2018-01-23 15:30:16 [<ffffffffa0a39231>] libcfs_psdev_ioctl+0x51/0x100 [libcfs]
2018-01-23 15:30:16 [<ffffffff811af742>] vfs_ioctl+0x22/0xa0
2018-01-23 15:30:16 [<ffffffff811af8e4>] do_vfs_ioctl+0x84/0x580
2018-01-23 15:30:16 [<ffffffff811a7bc6>] ? final_putname+0x26/0x50
2018-01-23 15:30:16 [<ffffffff811afe61>] sys_ioctl+0x81/0xa0
2018-01-23 15:30:16 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
2018-01-23 15:30:16
2018-01-23 15:30:16 Kernel panic - not syncing: LBUG
2018-01-23 15:30:16 Pid: 30614, comm: lnetctl Tainted: P – ------------ 2.6.32-642.15.1.el6.x86_64 #1
2018-01-23 15:30:16 Call Trace:
2018-01-23 15:30:16 [<ffffffff815484e1>] ? panic+0xa7/0x179
2018-01-23 15:30:16 [<ffffffffa0a339e6>] ? lbug_with_loc+0x56/0x90 [libcfs]
2018-01-23 15:30:16 [<ffffffffa0a95b38>] ? lnet_startup_lndnet+0x8b8/0x8c0 [lnet]
2018-01-23 15:30:16 [<ffffffffa0a4a65b>] ? cfs_percpt_lock+0x5b/0x110 [libcfs]
2018-01-23 15:30:16 [<ffffffffa0a96cf4>] ? lnet_add_net_common+0x134/0x480 [lnet]
2018-01-23 15:30:16 [<ffffffffa0a97354>] ? lnet_dyn_add_ni+0x194/0x1c0 [lnet]
2018-01-23 15:30:16 [<ffffffffa0a3f311>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
2018-01-23 15:30:16 [<ffffffffa0ab1668>] ? lnet_ioctl+0x268/0x290 [lnet]
2018-01-23 15:30:16 [<ffffffffa0a3d2b8>] ? libcfs_ioctl+0x118/0x4d0 [libcfs]
2018-01-23 15:30:16 [<ffffffffa0a39231>] ? libcfs_psdev_ioctl+0x51/0x100 [libcfs]
2018-01-23 15:30:16 [<ffffffff811af742>] ? vfs_ioctl+0x22/0xa0
2018-01-23 15:30:16 [<ffffffff811af8e4>] ? do_vfs_ioctl+0x84/0x580
2018-01-23 15:30:16 [<ffffffff811a7bc6>] ? final_putname+0x26/0x50
2018-01-23 15:30:16 [<ffffffff811afe61>] ? sys_ioctl+0x81/0xa0
2018-01-23 15:30:16 [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
2018-01-23 15:30:16 -----------[ cut here ]-----------
2018-01-23 15:30:16 WARNING: at arch/x86/kernel/smp.c:118



 Comments   
Comment by Andreas Dilger [ 24/Jan/18 ]

We should never LASSERT on data from userspace, sent over the network, or read from disk.

Comment by Gerrit Updater [ 31/Jan/18 ]

Sonia Sharma (sonia.sharma@intel.com) uploaded a new patch: https://review.whamcloud.com/31100
Subject: LU-10554 lnet: Remove LASSERT on userspace data
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8ad3ae3c99cb600f6542387990809f79b58bbf85

Comment by Sonia Sharma (Inactive) [ 31/Jan/18 ]

This issue is anyways resolved with LU-10151. With LU-10151, check for incomplete user data is put in place which errors out for missing information while adding NI. 
With the above patch though, LASSERT on data from userspace is removed and the missing information is checked for and handled gracefully in kernel as well.

Comment by Gerrit Updater [ 06/Feb/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31100/
Subject: LU-10554 lnet: Remove LASSERT on userspace data
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8059dbbe97a61e287efe0ae9d1f7767d362aa2d7

Comment by Peter Jones [ 06/Feb/18 ]

Landed for 2.11

Generated at Sat Feb 10 02:36:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.