[LU-1501] Kernel panic - not syncing: Cannot find space for the kernel page tables Created: 11/Jun/12  Updated: 20/Jun/12  Resolved: 20/Jun/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.7
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Minoru Hamakawa Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: client
Environment:

Clients: SGI UV1000
kernel: 2.6.32-131.0.15.el6.x86_64
lustre: 1.8.7 wc1


Attachments: File fat_panic_log     Text File log.txt     File messages.0611    
Severity: 3
Epic: client
Rank (Obsolete): 6384

 Description   

Hello,

One of our, DDN Japan, customer faced kernel panic with RIP in "RIP [<ffffffffa07a32e3>] ll_readahead+0x43/0x1d20 [lustre]".
Attached is the console log of this client.

Could you check if this is Kernel/BIOS issue or Lustre issue?

Many thanks,
Minoru@DDN Japan



 Comments   
Comment by Peter Jones [ 11/Jun/12 ]

Bobijam

Could you please help with this one?

Thanks

Peter

Comment by Shuichi Ihara (Inactive) [ 13/Jun/12 ]

Bobijam,

Could you please have a look at the log files and advise us first analysis if the lustre is related?

Comment by Zhenyu Xu [ 13/Jun/12 ]

what does the fat_panic_log come about? is it the same client? that part is not a lustre issue.

Can you get the crash dump of the node?

Comment by Minoru Hamakawa [ 13/Jun/12 ]

Thank you, Bobijam,

Yes. it's same client. And both files are console output of that client.
Will ask about crash dump and time relation between two files.

Many thanks,
Minoru

Comment by Minoru Hamakawa [ 14/Jun/12 ]

Hi Bojijam,

I upload whole console log and crash dump under /upload/LU-1501.
Could you look at this?
We can see there are lots of mce log, two hours ago of the panic..

Many thanks,
Minoru

Comment by Zhenyu Xu [ 14/Jun/12 ]

the console.log uploaded under /upload/LU-1501 shows the same kernel error as fat_panic.log showed.

I'm afraid the uploaded "uvdmp_26_2012_6_1_17_22" is not what I want, it says "uvdmp - UV Hub and NL5 Router MMR Capture Utility - Version 2.05" in it, is it a network captured data?

What I meant is the kernel core dump file with its uncompressed kernel image which we can check what possibly happened when the system crashed.

Comment by Minoru Hamakawa [ 14/Jun/12 ]

I'm sorry I upload it without dump..

Comment by Minoru Hamakawa [ 18/Jun/12 ]

Hi Bojijam,

They couldn't get any kernel dump this time..
Could you check whether the root cause of the panic is related to lustre or not with only console msg?

Many thanks,
Minoru

Comment by Zhenyu Xu [ 18/Jun/12 ]

the log.txt relates to Lustre and the fat_panic_log does not, does the system has memory issue?

Comment by Minoru Hamakawa [ 18/Jun/12 ]

We can see followings in /var/log/messages before that panic happened.
MEMLOG[44377]: r001i23b07 P0D-DIMM1A has a failed DRAM and must be replaced soon. Exposure to Uncorrected Error is high

Comment by Zhenyu Xu [ 18/Jun/12 ]

I'd say it's mostly like a hardware problem in this case (memory failure).

Comment by Minoru Hamakawa [ 20/Jun/12 ]

Agreeded. If they face same panic without hardware error, I'll open this again.
So please close this.
Many thanks.

Comment by Peter Jones [ 20/Jun/12 ]

ok thanks!

Generated at Sat Feb 10 01:17:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.