[LU-11859] panic on lnetctl net add Created: 14/Jan/19  Updated: 15/Jan/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Chris Horn Assignee: Sonia Sharma (Inactive)
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
sles15s01:~ # modprobe lnet
sles15s01:~ # lnetctl lnet configure
sles15s01:~ # lnetctl net add --ip2net "tcp1(eth0) 192.168.2.[21-50]"
sles15s01:~ # lnetctl net add --ip2net "tcp1 192.168.2.[21-50]" --if eth0

That second net add command causes a panic:

[  465.582968] libcfs: loading out-of-tree module taints kernel.
[  465.583233] libcfs: module verification failed: signature and/or required key missing - tainting kernel
[  465.588080] LNet: HW NUMA nodes: 1, HW CPU cores: 8, npartitions: 4
[  465.591029] alg: No test for adler32 (adler32-zlib)
[  480.617562] LNet: Added LNI 192.168.2.21@tcp1 [8/256/0/180]
[  480.617645] LNet: Accept secure, port 988
[  486.991890] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  486.991960] IP: strlen+0x0/0x20
[  486.991984] PGD 80000007f4f90067 P4D 80000007f4f90067 PUD 7f363c067 PMD 0
[  486.992035] Oops: 0000 [#1] SMP PTI
[  486.992064] CPU: 2 PID: 2023 Comm: lnetctl Tainted: G           OE      4.12.14-25.25-default #1 SLE15 (unreleased)
[  486.992145] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
[  486.992218] task: ffff8c4e44420bc0 task.stack: ffffa15f83cb0000
[  486.992260] RIP: 0010:strlen+0x0/0x20
[  486.992287] RSP: 0018:ffffa15f83cb3ca8 EFLAGS: 00010282
[  486.992324] RAX: ffff8c53daa74880 RBX: ffff8c54d047cc00 RCX: ffff8c53daa74cd8
[  486.992372] RDX: ffff8c53daa74cd8 RSI: 0000000000000000 RDI: 0000000000000000
[  486.992419] RBP: ffff8c4e44be9540 R08: 0000350a80010f30 R09: 0000000000000000
[  486.992467] R10: ffff8c54b4019940 R11: ffff8c4e47d0f000 R12: 0000000000000000
[  486.992515] R13: ffff8c53daa748c8 R14: 0000000000020001 R15: ffff8c54d047c400
[  486.993446] FS:  00007f751ba59740(0000) GS:ffff8c54ffc80000(0000) knlGS:0000000000000000
[  486.994381] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  486.995346] CR2: 0000000000000000 CR3: 0000000813774000 CR4: 00000000000406e0
[  486.996320] Call Trace:
[  486.996960]  lnet_ni_unique_net+0x2d/0x60 [lnet]
[  486.997604]  lnet_startup_lndnet+0xd4/0x920 [lnet]
[  486.998245]  lnet_add_net_common+0x116/0x360 [lnet]
[  486.998869]  lnet_dyn_add_ni+0x15f/0x240 [lnet]
[  486.999518]  lnet_ioctl+0x23b/0x250 [lnet]
[  487.000149]  notifier_call_chain+0x47/0x70
[  487.000788]  blocking_notifier_call_chain+0x3e/0x60
[  487.001486]  libcfs_ioctl+0x24e/0x470 [libcfs]
[  487.002162]  ? security_capable+0x47/0x60
[  487.002831]  libcfs_psdev_ioctl+0xbe/0xd0 [libcfs]
[  487.003532]  do_vfs_ioctl+0x90/0x5f0
[  487.004199]  SyS_ioctl+0x74/0x80
[  487.004880]  do_syscall_64+0x74/0x140
[  487.005518]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  487.006171] RIP: 0033:0x7f751ade9467
[  487.006815] RSP: 002b:00007ffd16f8e5c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[  487.007483] RAX: ffffffffffffffda RBX: 00000000007172b0 RCX: 00007f751ade9467
[  487.008139] RDX: 00000000007172b0 RSI: 00000000c0b8655f RDI: 0000000000000003
[  487.008782] RBP: 00000000007172b0 R08: 0000000000000000 R09: 0000000000000003
[  487.009422] R10: 000000000000055a R11: 0000000000000202 R12: 00007ffd16f8e730
[  487.010059] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffd16f8e610
[  487.010665] Code: f8 f6 82 e0 ea c8 8d 20 74 14 48 c7 c1 e0 ea c8 8d 48 83 c0 01 0f b6 10 f6 04 11 20 75 f3 f3 c3 66 66 2e 0f 1f 84 00 00 00 00 00 <80> 3f 00 48 89 f8 74 10 48 83 c7 01 80 3f 00 75 f7 48 29 c7 48
[  487.012853] Modules linked in: ksocklnd(OEN) lnet(OEN) libcfs(OEN) af_packet iscsi_ibft iscsi_boot_sysfs vmw_vsock_vmci_transport vsock sb_edac coretemp vmwgfx crc32_pclmul ttm crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd drm_kms_helper glue_helper drm cryptd joydev ppdev drm_panel_orientation_quirks syscopyarea sysfillrect vmw_balloon sysimgblt vmxnet3 pcspkr i2c_piix4 parport_pc shpchp parport vmw_vmci ac button fb_sys_fops ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sd_mod ata_piix ahci libahci serio_raw mptspi libata scsi_transport_spi mptscsih mptbase floppy sg scsi_mod autofs4
[  487.017956] Supported: No, Unreleased kernel
[  487.018666] CR2: 0000000000000000


 Comments   
Comment by Amir Shehata (Inactive) [ 15/Jan/19 ]

I'm able to reproduce. investigating

Comment by Amir Shehata (Inactive) [ 15/Jan/19 ]

There are 3 problems:

  1. The whole management of legacy ip2nets is counter intuitive. For example if I have a node with 3 interfaces 192.168.122.110, 192.168.122.111, 192.168.122.112, and I do this: lnetctl net add --ip2net "tcp1 192.168.122.111" This rule will create 192.168.122.110@tcp1. One would expect that it should match against 192.168.122.111@tcp1
  2. Also with mutli-Rail since we could configure multiple interfaces on the same network should a rule like: "tcp1 192.168.122.*" with the interfaces in the above example create all three NIs on the same network?
  3. The crash is caused by not handling NULL iface parameter in lnet_ni_unique_net(). We're not handling the case where there are a mixture of NIs some with explicitly defined interface names and others without.

I think we need to come to a conclusion whether we want to keep legacy ip2nets. ip2nets have been reimplemented in user space with a little more predictable results. And if we want to keep legacy ip2nets, do we want to keep the original behavior?

Generated at Sat Feb 10 02:47:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.