[LU-13193] lnetctl udsp add OOPS Created: 03/Feb/20  Updated: 17/Feb/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Chris Horn Assignee: Serguei Smirnov
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Easy to reproduce OOPs

sles15s01:/usr/lib64/lustre/tests # lnetctl peer add --prim_nid 192.168.2.50@tcp
sles15s01:/usr/lib64/lustre/tests # lnetctl peer add --prim_nid 192.168.2.50@tcp --nid 192.168.2.50@tcp99
sles15s01:/usr/lib64/lustre/tests # lnetctl peer add --prim_nid 192.168.2.50@tcp --nid 192.168.2.50@tcp40
sles15s01:/usr/lib64/lustre/tests # lnetctl peer show
peer:
    - primary nid: 192.168.2.50@tcp
      Multi-Rail: True
      peer ni:
        - nid: 192.168.2.50@tcp
          state: NA
        - nid: 192.168.2.50@tcp99
          state: NA
        - nid: 192.168.2.50@tcp40
          state: NA
sles15s01:/usr/lib64/lustre/tests # lnetctl udsp add --dst tcp40 --priority 0
[16595633.241627] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
[16595633.242021] IP: lnet_udsp_apply_rule_on_lpn+0x31/0xe0 [lnet]
[16595633.242236] PGD 80000003217e0067 P4D 80000003217e0067 PUD 423341067 PMD 0
[16595633.242475] Oops: 0000 [#1] SMP PTI
[16595633.242706] CPU: 3 PID: 9190 Comm: lnetctl Tainted: G           OE      4.12.14-150.14-default #1 SLE15
[16595633.243225] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
[16595633.243815] task: ffff991987880700 task.stack: ffffb20842718000
[16595633.244159] RIP: 0010:lnet_udsp_apply_rule_on_lpn+0x31/0xe0 [lnet]
[16595633.244505] RSP: 0018:ffffb2084271bc40 EFLAGS: 00010246
[16595633.244862] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: ffff9919a2ef3d08
[16595633.245242] RDX: ffff9919a2ef3d20 RSI: ffff9919a2ef3d20 RDI: ffffb2084271bd08
[16595633.245674] RBP: 0000000000000000 R08: 0000000000008000 R09: 0000000000000000
[16595633.246107] R10: 0000000000000001 R11: ffff99198b11f54c R12: ffffb2084271bd08
[16595633.246530] R13: ffff99198b001710 R14: ffff9919a308bb00 R15: 0000000000000000
[16595633.246948] FS:  00007f1923608f00(0000) GS:ffff9919bfcc0000(0000) knlGS:0000000000000000
[16595633.247385] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16595633.247851] CR2: 0000000000000028 CR3: 0000000337c50000 CR4: 00000000000406e0
[16595633.248377] Call Trace:
[16595633.248871]  lnet_udsp_apply_rule_on_lpnis+0x1bb/0x2e0 [lnet]
[16595633.249384]  lnet_udsp_apply_policies_helper.part.11+0x31/0x60 [lnet]
[16595633.249908]  lnet_udsp_apply_policies+0x60/0x90 [lnet]
[16595633.250436]  ? lnet_udsp_apply_rule_on_lpni+0x740/0x740 [lnet]
[16595633.250990]  ? lnet_udsp_apply_prio_rule_on_net+0xe0/0xe0 [lnet]
[16595633.251544]  ? copy_range_info.part.12+0x140/0x140 [lnet]
[16595633.252111]  LNetCtl+0x1030/0x1460 [lnet]


 Comments   
Comment by Chris Horn [ 17/Feb/23 ]

I'm not able to reproduce this anymore. Maybe this can be closed? Serguei, do you know what might have fixed this in latest master?

Generated at Sat Feb 10 02:59:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.