Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1501

Kernel panic - not syncing: Cannot find space for the kernel page tables

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 1.8.7
    • Clients: SGI UV1000
      kernel: 2.6.32-131.0.15.el6.x86_64
      lustre: 1.8.7 wc1
    • 3
    • 6384

    Description

      Hello,

      One of our, DDN Japan, customer faced kernel panic with RIP in "RIP [<ffffffffa07a32e3>] ll_readahead+0x43/0x1d20 [lustre]".
      Attached is the console log of this client.

      Could you check if this is Kernel/BIOS issue or Lustre issue?

      Many thanks,
      Minoru@DDN Japan

      Attachments

        1. fat_panic_log
          2 kB
        2. log.txt
          4 kB
        3. messages.0611
          610 kB

        Issue Links

          Activity

            [LU-1501] Kernel panic - not syncing: Cannot find space for the kernel page tables
            pjones Peter Jones added a comment -

            ok thanks!

            pjones Peter Jones added a comment - ok thanks!

            Agreeded. If they face same panic without hardware error, I'll open this again.
            So please close this.
            Many thanks.

            mhamakawa Minoru Hamakawa added a comment - Agreeded. If they face same panic without hardware error, I'll open this again. So please close this. Many thanks.
            bobijam Zhenyu Xu added a comment -

            I'd say it's mostly like a hardware problem in this case (memory failure).

            bobijam Zhenyu Xu added a comment - I'd say it's mostly like a hardware problem in this case (memory failure).

            We can see followings in /var/log/messages before that panic happened.
            MEMLOG[44377]: r001i23b07 P0D-DIMM1A has a failed DRAM and must be replaced soon. Exposure to Uncorrected Error is high

            mhamakawa Minoru Hamakawa added a comment - We can see followings in /var/log/messages before that panic happened. MEMLOG [44377] : r001i23b07 P0D-DIMM1A has a failed DRAM and must be replaced soon. Exposure to Uncorrected Error is high
            bobijam Zhenyu Xu added a comment -

            the log.txt relates to Lustre and the fat_panic_log does not, does the system has memory issue?

            bobijam Zhenyu Xu added a comment - the log.txt relates to Lustre and the fat_panic_log does not, does the system has memory issue?

            Hi Bojijam,

            They couldn't get any kernel dump this time..
            Could you check whether the root cause of the panic is related to lustre or not with only console msg?

            Many thanks,
            Minoru

            mhamakawa Minoru Hamakawa added a comment - Hi Bojijam, They couldn't get any kernel dump this time.. Could you check whether the root cause of the panic is related to lustre or not with only console msg? Many thanks, Minoru

            I'm sorry I upload it without dump..

            mhamakawa Minoru Hamakawa added a comment - I'm sorry I upload it without dump..
            bobijam Zhenyu Xu added a comment -

            the console.log uploaded under /upload/LU-1501 shows the same kernel error as fat_panic.log showed.

            I'm afraid the uploaded "uvdmp_26_2012_6_1_17_22" is not what I want, it says "uvdmp - UV Hub and NL5 Router MMR Capture Utility - Version 2.05" in it, is it a network captured data?

            What I meant is the kernel core dump file with its uncompressed kernel image which we can check what possibly happened when the system crashed.

            bobijam Zhenyu Xu added a comment - the console.log uploaded under /upload/ LU-1501 shows the same kernel error as fat_panic.log showed. I'm afraid the uploaded "uvdmp_26_2012_6_1_17_22" is not what I want, it says "uvdmp - UV Hub and NL5 Router MMR Capture Utility - Version 2.05" in it, is it a network captured data? What I meant is the kernel core dump file with its uncompressed kernel image which we can check what possibly happened when the system crashed.

            Hi Bojijam,

            I upload whole console log and crash dump under /upload/LU-1501.
            Could you look at this?
            We can see there are lots of mce log, two hours ago of the panic..

            Many thanks,
            Minoru

            mhamakawa Minoru Hamakawa added a comment - Hi Bojijam, I upload whole console log and crash dump under /upload/ LU-1501 . Could you look at this? We can see there are lots of mce log, two hours ago of the panic.. Many thanks, Minoru

            Thank you, Bobijam,

            Yes. it's same client. And both files are console output of that client.
            Will ask about crash dump and time relation between two files.

            Many thanks,
            Minoru

            mhamakawa Minoru Hamakawa added a comment - Thank you, Bobijam, Yes. it's same client. And both files are console output of that client. Will ask about crash dump and time relation between two files. Many thanks, Minoru

            People

              bobijam Zhenyu Xu
              mhamakawa Minoru Hamakawa
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: