[LU-15914] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 Created: 06/Jun/22  Updated: 17/Feb/23  Resolved: 17/Feb/23

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Critical
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-16211 o2iblnd NULL md deref Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

looks like regression in

commit 959304eac7ec5b156b4bfa57f47cbbf9ef3c8315
Author: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Date:   Mon Feb 7 18:02:14 2022 +0300

    LU-15189 lnet: fix memory mapping.
[ 2699.061116] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[ 2699.062873] IP: [<ffffffffc053c0eb>] lnet_find_best_ni_on_spec_net+0x6b/0x4d0 [lnet]
[ 2699.077584] RIP: 0010:[<ffffffffc053c0eb>]  [<ffffffffc053c0eb>] lnet_find_best_ni_on_spec_net+0x6b/0x4d0 [lnet]
[ 2699.079020] RSP: 0018:ffff8acfd8bcfaa8  EFLAGS: 00010286
[ 2699.079979] RAX: ffff8acfe93fe000 RBX: ffff8acfea501c00 RCX: 0000000000000000
[ 2699.080961] RDX: 00000000000002be RSI: 0000000000000000 RDI: 0000000000000000
[ 2699.082054] RBP: ffff8acfd8bcfb60 R08: 000000000000000a R09: 000000000000fffe
[ 2699.082981] R10: 0000000000000000 R11: 000000000000000f R12: ffff8acff9812480
[ 2699.083784] R13: ffff8acfd88ba000 R14: 0000000000000000 R15: 0000000000000000
[ 2699.084448] FS:  0000000000000000(0000) GS:ffff8acfffc00000(0000) knlGS:0000000000000000
[ 2699.085469] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2699.086065] CR2: 0000000000000050 CR3: 000000007aa98000 CR4: 00000000000606f0
[ 2699.086875] Call Trace:
[ 2699.087557]  [<ffffffffc053dfcf>] lnet_select_pathway+0x50f/0x18d0 [lnet]
[ 2699.088830]  [<ffffffffc053f401>] lnet_send+0x71/0x200 [lnet]
[ 2699.089727]  [<ffffffffc053008b>] lnet_finalize+0x51b/0x9f0 [lnet]
[ 2699.090606]  [<ffffffffc05d0585>] ksocknal_process_receive+0x665/0xe30 [ksocklnd]
[ 2699.091929]  [<ffffffffc05d11ca>] ksocknal_scheduler+0x1fa/0xd00 [ksocklnd]
[ 2699.092817]  [<ffffffff898c7780>] ? wake_up_atomic_t+0x30/0x30
[ 2699.093869]  [<ffffffffc05d0fd0>] ? ksocknal_recv+0x280/0x280 [ksocklnd]
[ 2699.094758]  [<ffffffff898c6691>] kthread+0xd1/0xe0
[ 2699.095614]  [<ffffffff898c65c0>] ? insert_kthread_work+0x40/0x40
[ 2699.096393]  [<ffffffff89f92d37>] ret_from_fork_nospec_begin+0x21/0x21
[ 2699.097336]  [<ffffffff898c65c0>] ? insert_kthread_work+0x40/0x40
crash_x86_64> lnet_msg.msg_md ffff8acfd88ba000
  msg_md = 0x0
crash_x86_64>

OOPS on this line:

        bool gpu = md->md_flags & LNET_MD_FLAG_GPU;

OOPS hit on LNet router



 Comments   
Comment by Gerrit Updater [ 06/Jun/22 ]

"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/47546
Subject: LU-15914 lnet: Fix null md deref in lnet_get_best_ni
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 190d8794d525ab646448fdfe6c413c1cecd8fb89

Comment by Gerrit Updater [ 30/Jun/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47546/
Subject: LU-15914 lnet: Fix null md deref for finalized message
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cb0220db3ce517b0e2fce93c864e6c3dbb61b5e0

Comment by Chris Horn [ 27/Jan/23 ]

pjones this one can be closed.

Generated at Sat Feb 10 03:22:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.