Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1501

Kernel panic - not syncing: Cannot find space for the kernel page tables

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 1.8.7
    • Clients: SGI UV1000
      kernel: 2.6.32-131.0.15.el6.x86_64
      lustre: 1.8.7 wc1
    • 3
    • 6384

    Description

      Hello,

      One of our, DDN Japan, customer faced kernel panic with RIP in "RIP [<ffffffffa07a32e3>] ll_readahead+0x43/0x1d20 [lustre]".
      Attached is the console log of this client.

      Could you check if this is Kernel/BIOS issue or Lustre issue?

      Many thanks,
      Minoru@DDN Japan

      Attachments

        1. fat_panic_log
          2 kB
        2. log.txt
          4 kB
        3. messages.0611
          610 kB

        Issue Links

          Activity

            [LU-1501] Kernel panic - not syncing: Cannot find space for the kernel page tables
            pjones Peter Jones added a comment -

            ok thanks!

            pjones Peter Jones added a comment - ok thanks!

            Agreeded. If they face same panic without hardware error, I'll open this again.
            So please close this.
            Many thanks.

            mhamakawa Minoru Hamakawa added a comment - Agreeded. If they face same panic without hardware error, I'll open this again. So please close this. Many thanks.
            bobijam Zhenyu Xu added a comment -

            I'd say it's mostly like a hardware problem in this case (memory failure).

            bobijam Zhenyu Xu added a comment - I'd say it's mostly like a hardware problem in this case (memory failure).

            We can see followings in /var/log/messages before that panic happened.
            MEMLOG[44377]: r001i23b07 P0D-DIMM1A has a failed DRAM and must be replaced soon. Exposure to Uncorrected Error is high

            mhamakawa Minoru Hamakawa added a comment - We can see followings in /var/log/messages before that panic happened. MEMLOG [44377] : r001i23b07 P0D-DIMM1A has a failed DRAM and must be replaced soon. Exposure to Uncorrected Error is high
            bobijam Zhenyu Xu added a comment -

            the log.txt relates to Lustre and the fat_panic_log does not, does the system has memory issue?

            bobijam Zhenyu Xu added a comment - the log.txt relates to Lustre and the fat_panic_log does not, does the system has memory issue?

            Hi Bojijam,

            They couldn't get any kernel dump this time..
            Could you check whether the root cause of the panic is related to lustre or not with only console msg?

            Many thanks,
            Minoru

            mhamakawa Minoru Hamakawa added a comment - Hi Bojijam, They couldn't get any kernel dump this time.. Could you check whether the root cause of the panic is related to lustre or not with only console msg? Many thanks, Minoru

            I'm sorry I upload it without dump..

            mhamakawa Minoru Hamakawa added a comment - I'm sorry I upload it without dump..

            People

              bobijam Zhenyu Xu
              mhamakawa Minoru Hamakawa
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: