[LU-5751] misconfiguration crashes cfs_cpt_set_node Created: 16/Oct/14  Updated: 20/Feb/15  Resolved: 08/Jan/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Minor
Reporter: Stephen Champion Assignee: Liang Zhen (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 16147

 Description   

We are experimenting with various configurations and usage of Lustre on a UV system, and thought we'd try stuffing all things Lustre onto the node adjacent to the ib card with a single partition on node 3.

<1>[96791.728121] BUG: unable to handle kernel NULL pointer dereference at (null)
<1>[96791.736878] IP: [<ffffffff81260485>] memcpy+0x5/0x120
<4>[96791.742555] PGD 13ced2dc067 PUD 13ced2db067 PMD 0
<0>[96791.747929] Oops: 0000 1 SMP

Stack traceback for pid 33047
0xffff88bde41ec380 33047 33046 1 20 R 0xffff88bde41ec9f0 *modprobe
[<ffffffff81260485>] memcpy+0x5/0x120
[<ffffffffa0eb5929>] cfs_cpt_set_node+0xf9/0x120 [libcfs]
[<ffffffffa0eb796e>] cfs_cpt_table_create_pattern+0x19e/0x6a0 [libcfs]
[<ffffffffa0eb88f5>] cfs_cpu_init+0x175/0x4c0 [libcfs]
[<ffffffffa0ec0aeb>] init_libcfs_module+0x9b/0x3b0 [libcfs]
[<ffffffff810001cb>] do_one_initcall+0x3b/0x180
[<ffffffff810a126f>] sys_init_module+0xcf/0x240
[<ffffffff8146a012>] system_call_fastpath+0x16/0x1b
[<00007ffff7b413aa>] 0x7ffff7b413aa
r15 = 0xffff893cc628e459 r14 = 0xffff893cbddec7c0
r13 = 0x0000000000000000 r12 = 0xffff893cbddec7c0
bp = 0x000000000000001e bx = 0xffff893de526ea00
r11 = 0x0000000000000000 r10 = 0x0000000000000025
r9 = 0x000000000000000a r8 = 0x000000000000000a
ax = 0xffff893de526ea00 cx = 0x0000000000000018
dx = 0x0000000000000018 si = 0x0000000000000000
di = 0xffff893de526ea00 orig_ax = 0xffffffffffffffff
ip = 0xffffffff81260485 cs = 0x0000000000000010
flags = 0x0000000000010206 sp = 0xffff88bcd95f1e80
ss = 0x0000000000000018 &regs = 0xffff88bcd95f1de8

The trigger:
/etc/modprobe.d/Lustre.conf :
options lnet accept_port=50 networks=o2ib0(ib0)
options ptlrpc ptlrpcd_bind_policy=4
options libcfs cpu_pattern="N 0[30-39]"

This system only has 10 NUMA nodes, so this is a mis configuration. That should be "N 0[3]" for my desired effect. But the error handing could be improved.



 Comments   
Comment by Liang Zhen (Inactive) [ 16/Oct/14 ]

I will check this later

Comment by Gerrit Updater [ 30/Dec/14 ]

Liang Zhen (liang.zhen@intel.com) uploaded a new patch: http://review.whamcloud.com/13207
Subject: LU-5751 libcfs: check mask returned by cpumask_of_node
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 15670c31362655e863af3a274e1d2589d2a1a101

Comment by Gerrit Updater [ 07/Jan/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13207/
Subject: LU-5751 libcfs: check mask returned by cpumask_of_node
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2a32996c017cfbbe4260463710f68d1ff91465aa

Comment by Liang Zhen (Inactive) [ 08/Jan/15 ]

patch landed

Generated at Sat Feb 10 07:02:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.