Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.7.0
-
None
-
3
-
16147
Description
We are experimenting with various configurations and usage of Lustre on a UV system, and thought we'd try stuffing all things Lustre onto the node adjacent to the ib card with a single partition on node 3.
<1>[96791.728121] BUG: unable to handle kernel NULL pointer dereference at (null)
<1>[96791.736878] IP: [<ffffffff81260485>] memcpy+0x5/0x120
<4>[96791.742555] PGD 13ced2dc067 PUD 13ced2db067 PMD 0
<0>[96791.747929] Oops: 0000 1 SMP
Stack traceback for pid 33047
0xffff88bde41ec380 33047 33046 1 20 R 0xffff88bde41ec9f0 *modprobe
[<ffffffff81260485>] memcpy+0x5/0x120
[<ffffffffa0eb5929>] cfs_cpt_set_node+0xf9/0x120 [libcfs]
[<ffffffffa0eb796e>] cfs_cpt_table_create_pattern+0x19e/0x6a0 [libcfs]
[<ffffffffa0eb88f5>] cfs_cpu_init+0x175/0x4c0 [libcfs]
[<ffffffffa0ec0aeb>] init_libcfs_module+0x9b/0x3b0 [libcfs]
[<ffffffff810001cb>] do_one_initcall+0x3b/0x180
[<ffffffff810a126f>] sys_init_module+0xcf/0x240
[<ffffffff8146a012>] system_call_fastpath+0x16/0x1b
[<00007ffff7b413aa>] 0x7ffff7b413aa
r15 = 0xffff893cc628e459 r14 = 0xffff893cbddec7c0
r13 = 0x0000000000000000 r12 = 0xffff893cbddec7c0
bp = 0x000000000000001e bx = 0xffff893de526ea00
r11 = 0x0000000000000000 r10 = 0x0000000000000025
r9 = 0x000000000000000a r8 = 0x000000000000000a
ax = 0xffff893de526ea00 cx = 0x0000000000000018
dx = 0x0000000000000018 si = 0x0000000000000000
di = 0xffff893de526ea00 orig_ax = 0xffffffffffffffff
ip = 0xffffffff81260485 cs = 0x0000000000000010
flags = 0x0000000000010206 sp = 0xffff88bcd95f1e80
ss = 0x0000000000000018 ®s = 0xffff88bcd95f1de8
The trigger:
/etc/modprobe.d/Lustre.conf :
options lnet accept_port=50 networks=o2ib0(ib0)
options ptlrpc ptlrpcd_bind_policy=4
options libcfs cpu_pattern="N 0[30-39]"
This system only has 10 NUMA nodes, so this is a mis configuration. That should be "N 0[3]" for my desired effect. But the error handing could be improved.