[LU-2519] cfs_cpu_init() Failed to create ptable with npartitions 0 Created: 21/Dec/12  Updated: 18/Nov/13  Resolved: 18/Nov/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Jay Lan (Inactive) Assignee: Liang Zhen (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Environment:

sles11sp2 x86_64


Attachments: File cpuinfo.s331     File cpuinfo.s332    
Severity: 3
Rank (Obsolete): 5934

 Description   

I recently built lustre-2.3.0 client on sles11sp2. When I tried to load the libcfs module, it failed:

  1. modprobe libcfs
    FATAL: Error inserting libcfs (/lib/modules/3.0.42-0.7.3.20121219-nasuv/updates/kernel/net/lustre/libcfs.ko): Operation not permitted

The /var/log/message said:
Dec 21 10:19:05 service331 kernel: [59144.393322] LNetError: 14632:0:(linux-cpu.c:881:cfs_cpt_table_create()) Failed to setup CPU-partition-table with 2 CPU-partitions, online HW nodes: 8, HW cpus: 8.
Dec 21 10:19:05 service331 kernel: [59144.436812] LNetError: 14632:0:(linux-cpu.c:1093:cfs_cpu_init()) Failed to create ptable with npartitions 0

The sles11sp1 version of lustre client 2.3.0 worked fine for me. I have scraped the systems so comparison is not available at this point.

There must be an easy answer for this problem, but my searching for answer came out empty. Please help! My testing of lustre-client 2.3.0 on sles11sp2 stalls. Shoudn't a default value just work?

I thought I tested 2.3.0 on sles11sp2 before, but I was wrong. It was 2.3.0 on sles11sp1 that I tested.



 Comments   
Comment by Peter Jones [ 21/Dec/12 ]

Jay

You have marked this ticket as Sev 1 which is reserved for production sites out of service. My understanding is that this is that you are experimenting on a test system and this issue does not affect production systems.

Is this correct?

Peter

Comment by Peter Jones [ 21/Dec/12 ]

Bob will help with this

Comment by Jay Lan (Inactive) [ 21/Dec/12 ]

Since there is no other message in the /var/log/messages, the error can be narrowed down to the for_each_online_node() loop in cfs_cpt_table_create(0) of libcfs/libcfs/linux/linux-cpu.c.

Comment by Bob Glossman (Inactive) [ 21/Dec/12 ]

Just as a workaround until we have a good solution, there is a modparam for libcfs to turn off cpu partitioning. cpu_npartitions=1.

see section 25.4 of Lustre Operations manual.

Comment by Jay Lan (Inactive) [ 21/Dec/12 ]

Still failed.

Dec 21 12:34:47 service331 kernel: [ 758.708600] LNetError: 8058:0:(linux-cpu.c:881:cfs_cpt_table_create()) Failed to setup CPU-partition-table with 1 CPU-partitions, online HW nodes: 8, HW cpus: 8.
Dec 21 12:34:47 service331 kernel: [ 758.751826] LNetError: 8058:0:(linux-cpu.c:1093:cfs_cpu_init()) Failed to create ptable with npartitions 1

Comment by Bob Glossman (Inactive) [ 21/Dec/12 ]

I have been trying to reproduce your failure, but can't. I've tried 4 and 8 cpus, both with & without specifying cpu_npartitions and it all works for me. However I only have VMs to work with, not real hardware. Is there anything special about your HW platform?

Comment by Bob Glossman (Inactive) [ 21/Dec/12 ]

What's the kernel version in your sles11 sp2? I update mine frequently with latest updates, my current version is 3.0.51-0.7.9. Don't know if that would make a difference, just trying to guess how your environment might be different from mine.

Comment by Jay Lan (Inactive) [ 21/Dec/12 ]

There is nothing special about my HW platform afaik.
The kernel is 3.0.42-0.7.3.

Comment by Jay Lan (Inactive) [ 21/Dec/12 ]

It failed here:

libcfs/libcfs/linux/linux-cpu.c
static struct cfs_cpt_table *
cfs_cpt_table_create(int ncpt)
{
...
for_each_online_node {
cfs_node_to_cpumask(i, mask);
CWARN("for_each_online_node: i=%d\n", i);

while (!cpus_empty(*mask)) {
struct cfs_cpu_partition *part;
int n;

CWARN("!cpus_empty: cpt=%d\n", cpt);
if (cpt >= ncpt)

{ CERROR("cpt %d >= ncput %d\n", cpt, ncpt); goto failed; }

It failed on the second cpu (i=1), first while-loop (cpt=1).

Since cpt is not reset for each for loop, and I have 8 cpu, the cpt clearly will become 7 on the 8th cpu. So the if statement will be guaranteed to fail.

Should the cpt be reset to 0 at beginning of the for-loop?

Comment by Bob Glossman (Inactive) [ 21/Dec/12 ]

I put some extra debug in the success path of cfs_cpt_table_create() and I see:

13085:0:(linux-cpu.c:877:cfs_cpt_table_create()) Setup CPU-partition-table with 2 CPU-partitions, online HW nodes: 1, HW cpus: 8.

Note that even with multiple cpus I have only 1 HW node. This is probably why it works for me.

Comment by Jay Lan (Inactive) [ 21/Dec/12 ]

No, cpt would increment conditionally:

if (num == cpus_weight(*part->cpt_cpumask))
cpt++;

But in my case, even cpu_npartitions=1, cpt was incremented to 1 and caused the logic to fail.

Comment by Jay Lan (Inactive) [ 21/Dec/12 ]

Hmm, i think my system should be just 1 node.

Comment by Bob Glossman (Inactive) [ 21/Dec/12 ]

yes, that is what I would expect, but your reported error msg says "online HW nodes: 8"

Comment by Jay Lan (Inactive) [ 21/Dec/12 ]

I hacked cfs_cpt_table_create() to assume single node for now. I just confirmed that lustre-2.3.0 libcfs was loaded fine on a similar system running sles11sp1.

I will check if sles11sp1 return num_online_nodes 1 or 8. If it returns 8, we will need more debugging on lustre; if it return 1, it will be an issue of hardware vendor/kernel. Thanks~

Comment by Bob Glossman (Inactive) [ 21/Dec/12 ]

You said your sles11sp1 was on a similar system, not the same system. I was wondering if there might be some BIOS or other firmware level setting on your platform that could deceive the OS about the number of HW nodes. Could you check for different settings on the platforms you are using? If there are varying settings it might be those, not the distro version that makes the difference.

Trying to cover all the bases here. It would be a lot easier if I could reproduce this myself, but no luck so far.

Comment by Jay Lan (Inactive) [ 21/Dec/12 ]

Well, similar system today, but same system 2 weeks ago. I ran acc-sm testing on both clients running lustre-2.3 client sles11sp1 two weeks ago.

I have acc-sm testing running now, both 2.3 clients, one running sles11sp1 kernel and the other sles11sp2 kernel. The 2.3 client running on sles11sp2 has cfs_cpt_table_create() hacked. I do not wish to interrupt the testing now

The hardware vendor has a week-long furlough next week, so I will not get any response back from them next week. But I will try to determine if sles11sp1 responds num_online_nodes() with 1 or 8 next week.

Comment by Liang Zhen (Inactive) [ 21/Dec/12 ]

Hi Bob, sorry I think I should take over this bug, it must be something wrong in my code. I will look into it.
I will create a debug patch for this.

Comment by Jay Lan (Inactive) [ 26/Dec/12 ]

How do you think there is something wrong in your code, Liang Zhen?

I have confirmed that the num_online_nodes() macro in sles11sp1 (2.6.32.54-0.3.1) returned 1 in my 8-cpu test system, and returned 8 in sles11sp2 (3.0.42-0.7.3).

I hacked libcfs to assume always 1 node to continue my testing; however, it would be a real problem when I move my testing to a big SMP system, unless lustre can find a way to do cfs_cpu_init without calling num_online_nodes().

Comment by Liang Zhen (Inactive) [ 26/Dec/12 ]

What's CPU topology of you system? I'm wondering why both num_online_cpus() and num_online_nodes() return 8, does it mean your system has 8 CPU sockets and each socket has a single core?
I think there is a way to use "cpu_pattern" parameter of libcfs to workaround this, but I need to know how many NUMA nodes, CPU sockets, CPU cores in your system.

Comment by Jay Lan (Inactive) [ 27/Dec/12 ]

Attached /proc/cpuinfo from two systems. S331 runs sles11sp2 and s332 runs sles11sp1. They looks almost the same to me.

Comment by Liang Zhen (Inactive) [ 27/Dec/12 ]

so your system has 2 CPU sockets, each socket has 4 cores, but num_online_nodes() will return 8 on sp2 for unknown reason, is this correct? could you run this under sp2 to see how many online nodes:
ls /sys/devices/system/node/
and run this for each node:
cat /sys/devices/system/node/node0/cpulist
cat /sys/devices/system/node/node1/cpulist
...

I think one way to make it work is put this line in /etc/modprobe.d/lustre.conf:
options libcfs cpu_pattern="0[0,2,4,6] 1[1,3,5,7]"

Comment by Jay Lan (Inactive) [ 28/Dec/12 ]

Liang's workaround worked for me.

I filed a bug report to the hardware vendor but do not expect to get response until next week. Let's keep this LU open until we have a better understanding on the problem on this hardware platform.

Comment by Jay Lan (Inactive) [ 03/Jan/13 ]

I think we can close this ticket. It appeared to be a problem of some particular hardware platforms. All newer hardware platforms seem to work correctly. Fortunately the troubled hardware we use in production are not used as lustre client (except those I use in testing.)

I am happy to use the workaround provided by Liang in my test rack.

Comment by Peter Jones [ 03/Jan/13 ]

ok thanks Jay

Comment by Jay Lan (Inactive) [ 22/Oct/13 ]

Could you reopen this ticket?

This problem happened again and this time I had a clear picture of what went wrong. It appears libcfs can not handle the fake numa situation, ie, adding "numa=fake=<n>" at the boot line.

When a system is booted with fake numa, syslog would show "Operation not permitted" error at libcfs, and all lustre modules were not able to load.

The workaround suggested by Liang Zhen worked this time also. We are able to get lustre mounted at those fake numa sles11sp2 systems.

Comment by Liang Zhen (Inactive) [ 23/Oct/13 ]

There is another ticket (LU-3992) which reported the same issue, and patch link is:
http://review.whamcloud.com/#/c/7724/

Comment by Jay Lan (Inactive) [ 29/Oct/13 ]

The patch failed in Maloo testing though...

Comment by Jay Lan (Inactive) [ 18/Nov/13 ]

LU-3992 marked closed and I have cherry-picked into nas-2.4.0-1 and nas-2.4.1. Please close this ticket. Thanks!

Comment by Peter Jones [ 18/Nov/13 ]

ok - thanks Jay!

Generated at Sat Feb 10 01:25:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.