Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15914

BUG: unable to handle kernel NULL pointer dereference at 0000000000000050

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      looks like regression in

      commit 959304eac7ec5b156b4bfa57f47cbbf9ef3c8315
      Author: Alexey Lyashkov <alexey.lyashkov@hpe.com>
      Date:   Mon Feb 7 18:02:14 2022 +0300
      
          LU-15189 lnet: fix memory mapping.
      
      [ 2699.061116] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
      [ 2699.062873] IP: [<ffffffffc053c0eb>] lnet_find_best_ni_on_spec_net+0x6b/0x4d0 [lnet]
      [ 2699.077584] RIP: 0010:[<ffffffffc053c0eb>]  [<ffffffffc053c0eb>] lnet_find_best_ni_on_spec_net+0x6b/0x4d0 [lnet]
      [ 2699.079020] RSP: 0018:ffff8acfd8bcfaa8  EFLAGS: 00010286
      [ 2699.079979] RAX: ffff8acfe93fe000 RBX: ffff8acfea501c00 RCX: 0000000000000000
      [ 2699.080961] RDX: 00000000000002be RSI: 0000000000000000 RDI: 0000000000000000
      [ 2699.082054] RBP: ffff8acfd8bcfb60 R08: 000000000000000a R09: 000000000000fffe
      [ 2699.082981] R10: 0000000000000000 R11: 000000000000000f R12: ffff8acff9812480
      [ 2699.083784] R13: ffff8acfd88ba000 R14: 0000000000000000 R15: 0000000000000000
      [ 2699.084448] FS:  0000000000000000(0000) GS:ffff8acfffc00000(0000) knlGS:0000000000000000
      [ 2699.085469] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2699.086065] CR2: 0000000000000050 CR3: 000000007aa98000 CR4: 00000000000606f0
      [ 2699.086875] Call Trace:
      [ 2699.087557]  [<ffffffffc053dfcf>] lnet_select_pathway+0x50f/0x18d0 [lnet]
      [ 2699.088830]  [<ffffffffc053f401>] lnet_send+0x71/0x200 [lnet]
      [ 2699.089727]  [<ffffffffc053008b>] lnet_finalize+0x51b/0x9f0 [lnet]
      [ 2699.090606]  [<ffffffffc05d0585>] ksocknal_process_receive+0x665/0xe30 [ksocklnd]
      [ 2699.091929]  [<ffffffffc05d11ca>] ksocknal_scheduler+0x1fa/0xd00 [ksocklnd]
      [ 2699.092817]  [<ffffffff898c7780>] ? wake_up_atomic_t+0x30/0x30
      [ 2699.093869]  [<ffffffffc05d0fd0>] ? ksocknal_recv+0x280/0x280 [ksocklnd]
      [ 2699.094758]  [<ffffffff898c6691>] kthread+0xd1/0xe0
      [ 2699.095614]  [<ffffffff898c65c0>] ? insert_kthread_work+0x40/0x40
      [ 2699.096393]  [<ffffffff89f92d37>] ret_from_fork_nospec_begin+0x21/0x21
      [ 2699.097336]  [<ffffffff898c65c0>] ? insert_kthread_work+0x40/0x40
      
      crash_x86_64> lnet_msg.msg_md ffff8acfd88ba000
        msg_md = 0x0
      crash_x86_64>
      

      OOPS on this line:

              bool gpu = md->md_flags & LNET_MD_FLAG_GPU;
      

      OOPS hit on LNet router

      Attachments

        Issue Links

          Activity

            People

              hornc Chris Horn
              hornc Chris Horn
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: