Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.10.4
    • None
    • 3
    • 9223372036854775807

    Description

      Experiencing LRU kernel panics and reboot of systems. 

      On terminal at kernel panic the following is displayed:

      kernel:[1333527.166678] LustreError: 68:0:(cl_page.c:410:cl_vmpage_page()) ASSERTION( page->cp_type == CPT_CACHEABLE ) failed:

      This has happened several different systems:

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-514.el7.x86_64/vmlinux
      DUMPFILE: /var/crash/127.0.0.1-2018-11-07-19:19:44/vmcore [PARTIAL DUMP]
      CPUS: 8
      DATE: Wed Nov 7 19:18:32 2018
      UPTIME: 15 days, 10:25:29
      LOAD AVERAGE: 4.80, 4.70, 3.88
      TASKS: 635
      NODENAME: scdm1804.jlab.org
      RELEASE: 3.10.0-514.el7.x86_64
      VERSION: #1 SMP Tue Nov 22 16:42:41 UTC 2016
      MACHINE: x86_64 (3600 Mhz)
      MEMORY: 95.4 GB
      PANIC: "Kernel panic - not syncing: LBUG"
      PID: 68
      COMMAND: "khugepaged"
      TASK: ffff880c402aedd0 [THREAD_INFO: ffff880c3c294000]
      CPU: 7
      STATE: TASK_RUNNING (PANIC)

      [1333527.166678] LustreError: 68:0:(cl_page.c:410:cl_vmpage_page()) ASSERTION( page->cp_type == CPT_CACHEABLE ) failed:
      [1333527.167005] LustreError: 68:0:(cl_page.c:410:cl_vmpage_page()) LBUG
      [1333527.167173] Pid: 68, comm: khugepaged
      [1333527.167174]
      Call Trace:
      [1333527.167193] [<ffffffffa0ac27ee>] libcfs_call_trace+0x4e/0x60 [libcfs]
      [1333527.167200] [<ffffffffa0ac287c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      [1333527.167234] [<ffffffffa0c8f870>] cl_page_slice_add+0x0/0x140 [obdclass]
      [1333527.167261] [<ffffffffa109dba3>] ll_releasepage+0x73/0x1a0 [lustre]
      [1333527.167266] [<ffffffff81180462>] try_to_release_page+0x32/0x50
      [1333527.167269] [<ffffffff811953a0>] shrink_page_list+0x950/0xb00
      [1333527.167273] [<ffffffff81195bda>] shrink_inactive_list+0x1fa/0x630
      [1333527.167276] [<ffffffff81196775>] shrink_lruvec+0x385/0x770
      [1333527.167279] [<ffffffff810c4e83>] ? wake_up_process+0x23/0x40
      [1333527.167282] [<ffffffff81196bd6>] shrink_zone+0x76/0x1a0
      [1333527.167285] [<ffffffff81196f6d>] zone_reclaim+0x26d/0x2f0
      [1333527.167288] [<ffffffff8118a424>] get_page_from_freelist+0x2c4/0x9f0
      [1333527.167292] [<ffffffff81029569>] ? __switch_to+0xd9/0x4c0
      [1333527.167295] [<ffffffff8168b070>] ? __schedule+0x3b0/0x990
      [1333527.167298] [<ffffffff8118acc6>] __alloc_pages_nodemask+0x176/0x420
      [1333527.167300] [<ffffffff8118e920>] ? __pagevec_lru_add_fn+0x0/0x220
      [1333527.167303] [<ffffffff811e8983>] khugepaged_scan_mm_slot+0x433/0xc70
      [1333527.167306] [<ffffffff811e9417>] khugepaged+0x257/0x480
      [1333527.167310] [<ffffffff810b1600>] ? autoremove_wake_function+0x0/0x40
      [1333527.167312] [<ffffffff811e91c0>] ? khugepaged+0x0/0x480
      [1333527.167315] [<ffffffff810b052f>] kthread+0xcf/0xe0
      [1333527.167317] [<ffffffff810b0460>] ? kthread+0x0/0xe0
      [1333527.167321] [<ffffffff81696518>] ret_from_fork+0x58/0x90
      [1333527.167324] [<ffffffff810b0460>] ? kthread+0x0/0xe0
      [1333527.167326]
      [1333527.167327] Kernel panic - not syncing: LBUG
      [1333527.167490] CPU: 7 PID: 68 Comm: khugepaged Tainted: G W OE ------------ 3.10.0-514.el7.x86_64 #1
      [1333527.167812] Hardware name: Supermicro SYS-2029BT-HNR/X11DPT-B, BIOS 2.0b 02/24/2018
      [1333527.168125] ffffffffa0ae0e8b 00000000a7bd8849 ffff880c3c2976c8 ffffffff81685fac
      [1333527.168453] ffff880c3c297748 ffffffff8167f3b3 ffffffff00000008 ffff880c3c297758
      [1333527.168774] ffff880c3c2976f8 00000000a7bd8849 00000000a7bd8849 0000000000000246
      [1333527.169096] Call Trace:
      [1333527.169252] [<ffffffff81685fac>] dump_stack+0x19/0x1b
      [1333527.169417] [<ffffffff8167f3b3>] panic+0xe3/0x1f2
      [1333527.169586] [<ffffffffa0ac2894>] lbug_with_loc+0x64/0xb0 [libcfs]
      [1333527.169787] [<ffffffffa0c8f870>] cl_vmpage_page+0x140/0x140 [obdclass]
      [1333527.169969] [<ffffffffa109dba3>] ll_releasepage+0x73/0x1a0 [lustre]
      [1333527.170138] [<ffffffff81180462>] try_to_release_page+0x32/0x50
      [1333527.170305] [<ffffffff811953a0>] shrink_page_list+0x950/0xb00
      [1333527.170471] [<ffffffff81195bda>] shrink_inactive_list+0x1fa/0x630
      [1333527.170639] [<ffffffff81196775>] shrink_lruvec+0x385/0x770
      [1333527.170804] [<ffffffff810c4e83>] ? wake_up_process+0x23/0x40
      [1333527.170971] [<ffffffff81196bd6>] shrink_zone+0x76/0x1a0
      [1333527.171135] [<ffffffff81196f6d>] zone_reclaim+0x26d/0x2f0
      [1333527.171300] [<ffffffff8118a424>] get_page_from_freelist+0x2c4/0x9f0
      [1333527.171469] [<ffffffff81029569>] ? __switch_to+0xd9/0x4c0
      [1333527.171634] [<ffffffff8168b070>] ? __schedule+0x3b0/0x990
      [1333527.171799] [<ffffffff8118acc6>] __alloc_pages_nodemask+0x176/0x420
      [1333527.171967] [<ffffffff8118e920>] ? lru_deactivate_fn+0x1d0/0x1d0
      [1333527.172134] [<ffffffff811e8983>] khugepaged_scan_mm_slot+0x433/0xc70
      [1333527.172303] [<ffffffff811e9417>] khugepaged+0x257/0x480
      [1333527.172468] [<ffffffff810b1600>] ? wake_up_atomic_t+0x30/0x30
      [1333527.172633] [<ffffffff811e91c0>] ? khugepaged_scan_mm_slot+0xc70/0xc70
      [1333527.172813] [<ffffffff810b052f>] kthread+0xcf/0xe0
      [1333527.172975] [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
      [1333527.173142] [<ffffffff81696518>] ret_from_fork+0x58/0x90
      [1333527.173306] [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140

       

       

      Attachments

        Activity

          [LU-11646] Getting LBUG kernel panic's

          root@scdm1804:~] cat /sys/kernel/mm/transparent_hugepage/enabled
          [always] madvise never

          root@scdm1804:~] cat /proc/meminfo |grep Huge
          AnonHugePages: 4759552 kB
          HugePages_Total: 0
          HugePages_Free: 0
          HugePages_Rsvd: 0
          HugePages_Surp: 0
          Hugepagesize: 2048 kB

          rackley David Racily (Inactive) added a comment - root@scdm1804:~] cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never root@scdm1804:~] cat /proc/meminfo |grep Huge AnonHugePages: 4759552 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB

          It is interesting that the problematic command is khugepaged. Do you have huge pages configured and in use on the client? It may be that this is interacting badly with the IO handling?

          adilger Andreas Dilger added a comment - It is interesting that the problematic command is khugepaged . Do you have huge pages configured and in use on the client? It may be that this is interacting badly with the IO handling?
          pjones Peter Jones added a comment -

          Bobijam

          Can you please advise?

          Thanks

          Peter

          pjones Peter Jones added a comment - Bobijam Can you please advise? Thanks Peter

          People

            bobijam Zhenyu Xu
            rackley David Racily (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: