[LU-15944] LNet: adding dst udsp rule before peer is discovered causes oops on peer discovery Created: 14/Jun/22 Updated: 08/Mar/23 Resolved: 08/Mar/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Serguei Smirnov | Assignee: | Cyril Bordage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | lnet, udsp | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This has been found and reported by hornc: The following sequence of commands causes a crash:
# lnetctl peer del --prim_nid=10.1.0.60@o2ib1 # <-- make sure there no record of this peer
# lnetctl udsp add --dst tcp --prio 1
# lnetctl discover 192.168.122.60@tcp
The trace is as follows: [5449781.397300] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 [5449781.399193] IP: [<ffffffffc0c36ddb>] lnet_udsp_apply_rule_on_lpni+0xbb/0x7b0 [lnet] [5449781.400130] PGD 8000000055a7f067 PUD 4964e067 PMD 0 [5449781.400717] Oops: 0000 [#1] SMP [5449781.418329] Call Trace: [5449781.419109] [<ffffffffc0c35844>] lnet_udsp_apply_single_policy+0xf4/0x540 [lnet] [5449781.419881] [<ffffffffc0c35cce>] lnet_udsp_apply_policies_helper.part.8+0x3e/0x70 [lnet] [5449781.420644] [<ffffffffc0c37db6>] lnet_udsp_apply_policies_on_lpni+0x56/0x80 [lnet] [5449781.421386] [<ffffffffc0c36d20>] ? lnet_udsp_apply_rte_rule_on_nets+0x130/0x130 [lnet] [5449781.422228] [<ffffffffc0c28231>] lnet_peer_attach_peer_ni+0x161/0x600 [lnet] [5449781.422987] [<ffffffffc0c2883e>] lnet_peer_ni_traffic_add+0x16e/0x2b0 [lnet] [5449781.423761] [<ffffffffc0c2de25>] lnet_peerni_by_nid_locked+0xe5/0x140 [lnet] [5449781.424521] [<ffffffffc0c2df5e>] lnet_nid2peerni_locked+0xde/0xf0 [lnet] [5449781.425281] [<ffffffffc0bf8713>] LNetCtl+0x14d3/0x1c80 [lnet] [5449781.426061] [<ffffffffc0bf59fb>] ? LNetNIInit+0x8b/0xd50 [lnet] [5449781.426818] [<ffffffffc0c18a33>] lnet_ioctl+0x63/0x270 [lnet] [5449781.427581] [<ffffffff8ad90b6f>] notifier_call_chain+0x4f/0x70 [5449781.428345] [<ffffffff8a6cc15d>] __blocking_notifier_call_chain+0x4d/0x70 [5449781.429083] [<ffffffff8a6cc196>] blocking_notifier_call_chain+0x16/0x20 [5449781.429837] [<ffffffffc0bbc3ad>] libcfs_psdev_ioctl+0x43d/0x5c0 [libcfs] [5449781.430580] [<ffffffff8a863590>] do_vfs_ioctl+0x3a0/0x5b0 [5449781.431319] [<ffffffff8a863841>] SyS_ioctl+0xa1/0xc0 [5449781.432065] [<ffffffff8ad95f92>] system_call_fastpath+0x25/0x2a |
| Comments |
| Comment by Serguei Smirnov [ 14/Jun/22 ] |
|
Temporary fix applied by Chris locally: diff --git a/lnet/lnet/udsp.c b/lnet/lnet/udsp.c index 08c1a7fccc..1f55b9289f 100644 --- a/lnet/lnet/udsp.c +++ b/lnet/lnet/udsp.c @@ -536,6 +536,9 @@ lnet_udsp_apply_rule_on_lpni(struct udsp_info *udi) &lp_match->ud_net_id.udn_net_num_range, &lp_match->ud_addr_range); + if (!udi->udi_lpn) + udi->udi_lpn = lpni->lpni_peer_net; + /* check if looking for a net match */ if (!rc && (lnet_get_list_len(&lp_match->ud_addr_range) || This prevents the crash, but causes nid priority to be inherited from the previously set net priority for the peer. |
| Comment by Gerrit Updater [ 07/Oct/22 ] |
|
"Cyril Bordage <cbordage@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48801 |
| Comment by Gerrit Updater [ 08/Mar/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48801/ |
| Comment by Peter Jones [ 08/Mar/23 ] |
|
Landed for 2.16 |