Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18893

RIP: 0010:lnet_startup_lndnet+0x141/0x790 [lnet]

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.17.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      bug in lnet_load_lnd() introduced by https://review.whamcloud.com//48814

      static const struct lnet_lnd *lnet_load_lnd(u32 lnd_type)
      {
      ...
                      rc = request_module("%s", libcfs_lnd2modname(lnd_type));
                      mutex_lock(&the_lnet.ln_lnd_mutex);
      
                      lnd = lnet_find_lnd_by_type(lnd_type);
                      if (!lnd) {
                              CERROR("Can't load LND %s, module %s, rc=%d\n",
                              libcfs_lnd2str(lnd_type),
                              libcfs_lnd2modname(lnd_type), rc);
                              lnd = ERR_PTR(rc);
      

      request_module() can return a positive return code from modprobe. If this happens the result from ERR_PTR is not valid and we end up hitting an OOPS.

      [ 4081.133394] RIP: 0010:lnet_startup_lndnet+0x141/0x790 [lnet]
      [ 4081.139771] Code: 90 00 00 00 48 29 ce 81 c1 98 00 00 00 49 89 87 24 01 00 00 c1 e9 03 f3 48 a5 41 c6 87 2c 01 00 00 01 49 8b 44 24 48 4c 89 ff <48> 8b 40 08 e8 b6 88 ff e1 48 c7 c7 e8 81 90 c1 89 c3 e8 a8 19 ff
      [ 4081.159202] RSP: 0018:ffffaf54ce79f938 EFLAGS: 00010246
      [ 4081.165112] RAX: 0000000000000100 RBX: 0000000000000100 RCX: 0000000000000000
      [ 4081.172930] RDX: ffff90d943f7d100 RSI: ffff90ca422a50a8 RDI: ffff90e941adfc00
      [ 4081.180747] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000fffeffff
      [ 4081.188564] R10: ffffffffc18dbcd8 R11: ffffaf54ce79f4e0 R12: ffff90f93a4db680
      [ 4081.196383] R13: ffffaf54ce79f968 R14: ffff90e941adfc00 R15: ffff90e941adfc00
      [ 4081.204200] FS:  00007fc68e562740(0000) GS:ffff91092ef00000(0000) knlGS:0000000000000000
      [ 4081.212970] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4081.219403] CR2: 0000000000000108 CR3: 0000003077e6e000 CR4: 0000000000350ee0
      [ 4081.227220] Call Trace:
      [ 4081.230359]  <TASK>
      [ 4081.233159]  LNetNIInit+0x6d3/0xd60 [lnet ed0fe992725a42406d79b15a64650da6c0fdc5bb]
      [ 4081.241520]  lnet_configure+0x4c/0x70 [lnet ed0fe992725a42406d79b15a64650da6c0fdc5bb]
      [ 4081.250055]  lnet_net_conf_cmd+0x3b/0xe0 [lnet ed0fe992725a42406d79b15a64650da6c0fdc5bb]
      [ 4081.258843]  genl_family_rcv_msg_doit.isra.15+0x11b/0x150
      [ 4081.264938]  genl_rcv_msg+0xe3/0x1e0
      [ 4081.269211]  ? lnet_mark_ping_buffer_for_update+0x30/0x30 [lnet ed0fe992725a42406d79b15a64650da6c0fdc5bb]
      [ 4081.279479]  ? genl_family_rcv_msg_doit.isra.15+0x150/0x150
      [ 4081.285735]  netlink_rcv_skb+0x50/0x100
      [ 4081.290261]  genl_rcv+0x24/0x40
      [ 4081.294092]  netlink_unicast+0x1b6/0x280
      [ 4081.298701]  netlink_sendmsg+0x320/0x450
      [ 4081.303313]  sock_sendmsg+0x5f/0x70
      [ 4081.307498]  ____sys_sendmsg+0x1ee/0x250
      [ 4081.312108]  ? copy_msghdr_from_user+0x5c/0x90
      [ 4081.317242]  ___sys_sendmsg+0x88/0xd0
      [ 4081.321599]  ? __wake_up_common_lock+0x87/0xc0
      [ 4081.326738]  ? netlink_setsockopt+0x165/0x3d0
      [ 4081.331784]  ? __sys_setsockopt+0xff/0x1e0
      [ 4081.336575]  ? __sys_sendmsg+0x5e/0xa0
      [ 4081.341012]  __sys_sendmsg+0x5e/0xa0
      [ 4081.345278]  ? tb_acpi_retimer_set_power+0x160/0x2d0
      [ 4081.350938]  do_syscall_64+0x5b/0x80
      [ 4081.355210]  ? syscall_exit_to_user_mode+0x1f/0x40
      [ 4081.360694]  ? do_syscall_64+0x67/0x80
      [ 4081.365133]  ? exc_page_fault+0x67/0x150
      [ 4081.369742]  entry_SYSCALL_64_after_hwframe+0x6b/0xd5
      

      Attachments

        Activity

          [LU-18893] RIP: 0010:lnet_startup_lndnet+0x141/0x790 [lnet]
          pjones Peter Jones added a comment -

          Merged for 2.17

          pjones Peter Jones added a comment - Merged for 2.17

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/58669/
          Subject: LU-18893 lnet: Use negative errno for ERR_PTR
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 969a680544e630183c5cbbb5b38effe505459bc0

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/58669/ Subject: LU-18893 lnet: Use negative errno for ERR_PTR Project: fs/lustre-release Branch: master Current Patch Set: Commit: 969a680544e630183c5cbbb5b38effe505459bc0

          "Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58669
          Subject: LU-18893 lnet: Use negative errno for ERR_PTR
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 4ff48c4cbf44209d15d94c4026e9be7241531755

          gerrit Gerrit Updater added a comment - "Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58669 Subject: LU-18893 lnet: Use negative errno for ERR_PTR Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4ff48c4cbf44209d15d94c4026e9be7241531755

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: