Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19498

LNet: fail gracefully on an attempt to exceed lnet_interfaces_max

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When too many NIs get created, i.e. in excess of lnet_interfaces_max, that triggers an LBUG as follows:

      [  461.968115] LNet: Added LNI 10.1.2.51@tcp2 [8/256/0/180]
      [  461.968266] LNetError: 68600:0:(api-ni.c:2171:lnet_ping_target_install_locked()) ASSERTION( !rc ) failed: Invalid ping target: -34
      [  461.968425] LNetError: 68600:0:(api-ni.c:2171:lnet_ping_target_install_locked()) LBUG
      [  461.968501] CPU: 1 PID: 68600 Comm: lnetctl Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1
      [  461.968649] Hardware name: Red Hat KVM, BIOS 1.13.0-2.module_el8.5.0+746+bbd5d70c 04/01/2014
      [  461.968740] Call Trace:
      [  461.968829]  dump_stack+0x41/0x60
      [  461.968919]  lbug_with_loc.cold.6+0x5/0x43 [libcfs]
      [  461.969009]  lnet_ping_target_update+0x87e/0x8a0 [lnet]
      [  461.969167]  lnet_add_net_common+0x290/0x4b0 [lnet]
      [  461.969269]  lnet_dyn_add_ni+0x12b/0x1d0 [lnet]
      [  461.969376]  lnet_genl_parse_local_ni.isra.59+0x30a/0x1a50 [lnet]
      [  461.969483]  ? libcfs_str2net_internal+0xee/0x150 [lnet]
      [  461.969586]  lnet_net_cmd+0x53d/0x950 [lnet]
      [  461.969693]  genl_family_rcv_msg_doit.isra.17+0x113/0x150
      [  461.969796]  genl_family_rcv_msg+0xb7/0x170
      [  461.969888]  ? lnet_dyn_del_net+0x220/0x220 [lnet]
      [  461.969993]  ? lnet_udsp_info_send+0x460/0x460 [lnet]
      [  461.970098]  ? lnet_parse_peer_nis.constprop.61+0x690/0x690 [lnet]
      [  461.970206]  ? lnet_ping_event_handler+0x130/0x130 [lnet]
      [  461.970315]  genl_rcv_msg+0x47/0xa0
      [  461.970410]  ? genl_family_rcv_msg+0x170/0x170
      [  461.970507]  netlink_rcv_skb+0x4c/0x130
      [  461.970603]  genl_rcv+0x24/0x40
      [  461.970731]  netlink_unicast+0x19a/0x230
      [  461.970830]  netlink_sendmsg+0x204/0x3d0
      [  461.970928]  sock_sendmsg+0x50/0x60
      [  461.971032]  ____sys_sendmsg+0x22a/0x250
      [  461.971134]  ? copy_msghdr_from_user+0x5c/0x90
      [  461.971237]  ___sys_sendmsg+0x7c/0xc0
      [  461.971342]  ? __raw_spin_unlock+0x5/0x10
      [  461.971448]  ? handle_pte_fault+0x770/0x880
      [  461.971553]  ? __handle_mm_fault+0x453/0x6c0
      [  461.971659]  __sys_sendmsg+0x57/0xa0
      [  461.971769]  do_syscall_64+0x5b/0x1b0
      [  461.971877]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
      [  461.971995] RIP: 0033:0x7fc82f30fa98

      Handle this gracefully and return an error instead of crashing.

      Attachments

        Activity

          People

            ssmirnov Serguei Smirnov
            ssmirnov Serguei Smirnov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: