[LU-11646] Getting LBUG kernel panic's Created: 08/Nov/18  Updated: 08/Nov/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.4
Fix Version/s: None

Type: Bug Priority: Major
Reporter: David Racily (Inactive) Assignee: Zhenyu Xu
Resolution: Unresolved Votes: 0
Labels: None
Environment:

CentOS Linux release 7.3.1611 (Core)
Linux scdm1804.jlab.org 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
kmod-lustre-client-2.10.4-1.el7.centos.x86_64
lustre-client-2.10.4-1.el7.centos.x86_64


Epic/Theme: kernel-panic
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Experiencing LRU kernel panics and reboot of systems. 

On terminal at kernel panic the following is displayed:

kernel:[1333527.166678] LustreError: 68:0:(cl_page.c:410:cl_vmpage_page()) ASSERTION( page->cp_type == CPT_CACHEABLE ) failed:

This has happened several different systems:

KERNEL: /usr/lib/debug/lib/modules/3.10.0-514.el7.x86_64/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2018-11-07-19:19:44/vmcore [PARTIAL DUMP]
CPUS: 8
DATE: Wed Nov 7 19:18:32 2018
UPTIME: 15 days, 10:25:29
LOAD AVERAGE: 4.80, 4.70, 3.88
TASKS: 635
NODENAME: scdm1804.jlab.org
RELEASE: 3.10.0-514.el7.x86_64
VERSION: #1 SMP Tue Nov 22 16:42:41 UTC 2016
MACHINE: x86_64 (3600 Mhz)
MEMORY: 95.4 GB
PANIC: "Kernel panic - not syncing: LBUG"
PID: 68
COMMAND: "khugepaged"
TASK: ffff880c402aedd0 [THREAD_INFO: ffff880c3c294000]
CPU: 7
STATE: TASK_RUNNING (PANIC)

[1333527.166678] LustreError: 68:0:(cl_page.c:410:cl_vmpage_page()) ASSERTION( page->cp_type == CPT_CACHEABLE ) failed:
[1333527.167005] LustreError: 68:0:(cl_page.c:410:cl_vmpage_page()) LBUG
[1333527.167173] Pid: 68, comm: khugepaged
[1333527.167174]
Call Trace:
[1333527.167193] [<ffffffffa0ac27ee>] libcfs_call_trace+0x4e/0x60 [libcfs]
[1333527.167200] [<ffffffffa0ac287c>] lbug_with_loc+0x4c/0xb0 [libcfs]
[1333527.167234] [<ffffffffa0c8f870>] cl_page_slice_add+0x0/0x140 [obdclass]
[1333527.167261] [<ffffffffa109dba3>] ll_releasepage+0x73/0x1a0 [lustre]
[1333527.167266] [<ffffffff81180462>] try_to_release_page+0x32/0x50
[1333527.167269] [<ffffffff811953a0>] shrink_page_list+0x950/0xb00
[1333527.167273] [<ffffffff81195bda>] shrink_inactive_list+0x1fa/0x630
[1333527.167276] [<ffffffff81196775>] shrink_lruvec+0x385/0x770
[1333527.167279] [<ffffffff810c4e83>] ? wake_up_process+0x23/0x40
[1333527.167282] [<ffffffff81196bd6>] shrink_zone+0x76/0x1a0
[1333527.167285] [<ffffffff81196f6d>] zone_reclaim+0x26d/0x2f0
[1333527.167288] [<ffffffff8118a424>] get_page_from_freelist+0x2c4/0x9f0
[1333527.167292] [<ffffffff81029569>] ? __switch_to+0xd9/0x4c0
[1333527.167295] [<ffffffff8168b070>] ? __schedule+0x3b0/0x990
[1333527.167298] [<ffffffff8118acc6>] __alloc_pages_nodemask+0x176/0x420
[1333527.167300] [<ffffffff8118e920>] ? __pagevec_lru_add_fn+0x0/0x220
[1333527.167303] [<ffffffff811e8983>] khugepaged_scan_mm_slot+0x433/0xc70
[1333527.167306] [<ffffffff811e9417>] khugepaged+0x257/0x480
[1333527.167310] [<ffffffff810b1600>] ? autoremove_wake_function+0x0/0x40
[1333527.167312] [<ffffffff811e91c0>] ? khugepaged+0x0/0x480
[1333527.167315] [<ffffffff810b052f>] kthread+0xcf/0xe0
[1333527.167317] [<ffffffff810b0460>] ? kthread+0x0/0xe0
[1333527.167321] [<ffffffff81696518>] ret_from_fork+0x58/0x90
[1333527.167324] [<ffffffff810b0460>] ? kthread+0x0/0xe0
[1333527.167326]
[1333527.167327] Kernel panic - not syncing: LBUG
[1333527.167490] CPU: 7 PID: 68 Comm: khugepaged Tainted: G W OE ------------ 3.10.0-514.el7.x86_64 #1
[1333527.167812] Hardware name: Supermicro SYS-2029BT-HNR/X11DPT-B, BIOS 2.0b 02/24/2018
[1333527.168125] ffffffffa0ae0e8b 00000000a7bd8849 ffff880c3c2976c8 ffffffff81685fac
[1333527.168453] ffff880c3c297748 ffffffff8167f3b3 ffffffff00000008 ffff880c3c297758
[1333527.168774] ffff880c3c2976f8 00000000a7bd8849 00000000a7bd8849 0000000000000246
[1333527.169096] Call Trace:
[1333527.169252] [<ffffffff81685fac>] dump_stack+0x19/0x1b
[1333527.169417] [<ffffffff8167f3b3>] panic+0xe3/0x1f2
[1333527.169586] [<ffffffffa0ac2894>] lbug_with_loc+0x64/0xb0 [libcfs]
[1333527.169787] [<ffffffffa0c8f870>] cl_vmpage_page+0x140/0x140 [obdclass]
[1333527.169969] [<ffffffffa109dba3>] ll_releasepage+0x73/0x1a0 [lustre]
[1333527.170138] [<ffffffff81180462>] try_to_release_page+0x32/0x50
[1333527.170305] [<ffffffff811953a0>] shrink_page_list+0x950/0xb00
[1333527.170471] [<ffffffff81195bda>] shrink_inactive_list+0x1fa/0x630
[1333527.170639] [<ffffffff81196775>] shrink_lruvec+0x385/0x770
[1333527.170804] [<ffffffff810c4e83>] ? wake_up_process+0x23/0x40
[1333527.170971] [<ffffffff81196bd6>] shrink_zone+0x76/0x1a0
[1333527.171135] [<ffffffff81196f6d>] zone_reclaim+0x26d/0x2f0
[1333527.171300] [<ffffffff8118a424>] get_page_from_freelist+0x2c4/0x9f0
[1333527.171469] [<ffffffff81029569>] ? __switch_to+0xd9/0x4c0
[1333527.171634] [<ffffffff8168b070>] ? __schedule+0x3b0/0x990
[1333527.171799] [<ffffffff8118acc6>] __alloc_pages_nodemask+0x176/0x420
[1333527.171967] [<ffffffff8118e920>] ? lru_deactivate_fn+0x1d0/0x1d0
[1333527.172134] [<ffffffff811e8983>] khugepaged_scan_mm_slot+0x433/0xc70
[1333527.172303] [<ffffffff811e9417>] khugepaged+0x257/0x480
[1333527.172468] [<ffffffff810b1600>] ? wake_up_atomic_t+0x30/0x30
[1333527.172633] [<ffffffff811e91c0>] ? khugepaged_scan_mm_slot+0xc70/0xc70
[1333527.172813] [<ffffffff810b052f>] kthread+0xcf/0xe0
[1333527.172975] [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
[1333527.173142] [<ffffffff81696518>] ret_from_fork+0x58/0x90
[1333527.173306] [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140

 

 



 Comments   
Comment by Peter Jones [ 08/Nov/18 ]

Bobijam

Can you please advise?

Thanks

Peter

Comment by Andreas Dilger [ 08/Nov/18 ]

It is interesting that the problematic command is khugepaged. Do you have huge pages configured and in use on the client? It may be that this is interacting badly with the IO handling?

Comment by David Racily (Inactive) [ 08/Nov/18 ]

root@scdm1804:~] cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

root@scdm1804:~] cat /proc/meminfo |grep Huge
AnonHugePages: 4759552 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB

Generated at Sat Feb 10 02:45:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.