Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.4.0
-
Sequoia LAC node, ppc64, Lustre 2.3.58-4chaos
-
3
-
6643
Description
Our Sequoia login node hit the assertion:
client_obd_list_unlock()) ASSERTION( lock->task != ((void *)0) ) failed
This was immediately after failing a memory allocation. Here's the full console information:
2013-02-01 12:57:58 rsync: page allocation failure. order:0, mode:0x50 2013-02-01 12:57:58 Call Trace: 2013-02-01 12:57:58 [c000000164005b50] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable) 2013-02-01 12:57:58 [c000000164005c00] [c000000000166b20] .__alloc_pages_nodemask+0x640/0x8c0 2013-02-01 12:57:58 [c000000164005da0] [c0000000001ac0bc] .kmem_getpages+0x7c/0x1a0 2013-02-01 12:57:58 [c000000164005e30] [c0000000001ad4c4] .fallback_alloc+0x244/0x310 2013-02-01 12:57:58 [c000000164005f10] [c0000000001ae43c] .__kmalloc+0x1fc/0x260 2013-02-01 12:57:58 [c000000164005fd0] [d00000000c932724] .cfs_alloc+0x34/0x80 [libcfs] 2013-02-01 12:57:58 [c000000164006060] [d000000011896098] .lov_io_init_raid0+0x528/0xdd0 [lov] 2013-02-01 12:57:58 [c000000164006150] [d0000000118870a4] .lov_io_init+0xb4/0x190 [lov] 2013-02-01 12:57:58 [c0000001640061e0] [d00000000ee0a1a4] .cl_io_init0+0x104/0x260 [obdclass] 2013-02-01 12:57:58 [c000000164006290] [d000000010d552f0] .osc_lru_shrink+0x520/0x1360 [osc] 2013-02-01 12:57:58 [c000000164006420] [d000000010d565dc] .osc_lru_del+0x3cc/0x760 [osc] 2013-02-01 12:57:58 [c000000164006550] [d000000010d57fe0] .osc_page_delete+0x150/0x550 [osc] 2013-02-01 12:57:58 [c000000164006620] [d00000000edfc180] .cl_page_delete0+0x140/0x800 [obdclass] 2013-02-01 12:57:58 [c0000001640066f0] [d00000000edfc8c4] .cl_page_delete+0x84/0x250 [obdclass] 2013-02-01 12:57:58 [c0000001640067a0] [d0000000116a0e90] .ll_releasepage+0x190/0x200 [lustre] 2013-02-01 12:57:58 [c000000164006850] [c00000000014c158] .try_to_release_page+0x68/0xa0 2013-02-01 12:57:58 [c0000001640068c0] [c00000000017005c] .shrink_page_list.clone.2+0x67c/0x770 2013-02-01 12:57:58 [c000000164006a80] [c000000000170508] .shrink_inactive_list+0x3b8/0x9a0 2013-02-01 12:57:58 [c000000164006c80] [c000000000170df0] .shrink_mem_cgroup_zone+0x300/0x700 2013-02-01 12:57:58 [c000000164006de0] [c000000000171278] .shrink_zone+0x88/0x120 2013-02-01 12:57:58 [c000000164006eb0] [c0000000001714a8] .do_try_to_free_pages+0x198/0x700 2013-02-01 12:57:58 [c000000164006fd0] [c000000000171bc8] .try_to_free_pages+0xb8/0x190 2013-02-01 12:57:58 [c0000001640070d0] [c000000000166a00] .__alloc_pages_nodemask+0x520/0x8c0 2013-02-01 12:57:58 [c000000164007270] [c0000000001a2650] .alloc_pages_current+0xb0/0x170 2013-02-01 12:57:58 [c000000164007310] [c00000000014dee8] .__page_cache_alloc+0xc8/0xf0 2013-02-01 12:57:58 [c000000164007390] [c00000000014e280] .grab_cache_page_write_begin+0xf0/0x130 2013-02-01 12:57:58 [c000000164007440] [d0000000116a1014] .ll_write_begin+0x94/0x270 [lustre] 2013-02-01 12:57:58 [c000000164007510] [c00000000014f148] .generic_file_buffered_write+0x138/0x3a0 2013-02-01 12:57:58 [c000000164007650] [c00000000014f8b8] .__generic_file_aio_write+0x2c8/0x430 2013-02-01 12:57:58 [c000000164007750] [c00000000014fac0] .generic_file_aio_write+0xa0/0x130 2013-02-01 12:57:58 [c000000164007810] [d0000000116bf92c] .vvp_io_write_start+0xfc/0x3e0 [lustre] 2013-02-01 12:57:58 [c0000001640078e0] [d00000000ee09d3c] .cl_io_start+0xcc/0x220 [obdclass] 2013-02-01 12:57:58 [c000000164007980] [d00000000ee125e4] .cl_io_loop+0x194/0x2c0 [obdclass] 2013-02-01 12:57:58 [c000000164007a30] [d00000001163a2a8] .ll_file_io_generic+0x498/0x670 [lustre] 2013-02-01 12:57:58 [c000000164007b30] [d00000001163a904] .ll_file_aio_write+0x1d4/0x3a0 [lustre] 2013-02-01 12:57:58 [c000000164007c00] [d00000001163ac20] .ll_file_write+0x150/0x320 [lustre] 2013-02-01 12:57:58 [c000000164007ce0] [c0000000001c38cc] .vfs_write+0xec/0x1f0 2013-02-01 12:57:58 [c000000164007d80] [c0000000001c3af8] .SyS_write+0x58/0xb0 2013-02-01 12:57:58 [c000000164007e30] [c000000000008564] syscall_exit+0x0/0x40 2013-02-01 12:57:58 Mem-Info: 2013-02-01 12:57:58 Node 0 DMA per-cpu: 2013-02-01 12:57:58 CPU 0: hi: 6, btch: 1 usd: 1 2013-02-01 12:57:58 CPU 1: hi: 6, btch: 1 usd: 5 2013-02-01 12:57:58 CPU 2: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 3: hi: 6, btch: 1 usd: 2 2013-02-01 12:57:58 CPU 4: hi: 6, btch: 1 usd: 5 2013-02-01 12:57:58 CPU 5: hi: 6, btch: 1 usd: 2 2013-02-01 12:57:58 CPU 6: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 7: hi: 6, btch: 1 usd: 5 2013-02-01 12:57:58 CPU 8: hi: 6, btch: 1 usd: 2 2013-02-01 12:57:58 CPU 9: hi: 6, btch: 1 usd: 2 2013-02-01 12:57:58 CPU 10: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 11: hi: 6, btch: 1 usd: 3 2013-02-01 12:57:58 CPU 12: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 13: hi: 6, btch: 1 usd: 5 2013-02-01 12:57:58 CPU 14: hi: 6, btch: 1 usd: 2 2013-02-01 12:57:58 CPU 15: hi: 6, btch: 1 usd: 5 2013-02-01 12:57:58 CPU 16: hi: 6, btch: 1 usd: 4 2013-02-01 12:57:58 CPU 17: hi: 6, btch: 1 usd: 5 2013-02-01 12:57:58 CPU 18: hi: 6, btch: 1 usd: 1 2013-02-01 12:57:58 CPU 19: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 20: hi: 6, btch: 1 usd: 3 2013-02-01 12:57:58 CPU 21: hi: 6, btch: 1 usd: 5 2013-02-01 12:57:58 CPU 22: hi: 6, btch: 1 usd: 4 2013-02-01 12:57:58 CPU 23: hi: 6, btch: 1 usd: 5 2013-02-01 12:57:58 CPU 24: hi: 6, btch: 1 usd: 5 2013-02-01 12:57:58 CPU 25: hi: 6, btch: 1 usd: 3 2013-02-01 12:57:58 CPU 26: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 27: hi: 6, btch: 1 usd: 1 2013-02-01 12:57:58 CPU 28: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 29: hi: 6, btch: 1 usd: 2 2013-02-01 12:57:58 CPU 30: hi: 6, btch: 1 usd: 2 2013-02-01 12:57:58 CPU 31: hi: 6, btch: 1 usd: 5 2013-02-01 12:57:58 CPU 32: hi: 6, btch: 1 usd: 4 2013-02-01 12:57:58 CPU 33: hi: 6, btch: 1 usd: 1 2013-02-01 12:57:58 CPU 34: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 35: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 36: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 37: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 38: hi: 6, btch: 1 usd: 1 2013-02-01 12:57:58 CPU 39: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 40: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 41: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 42: hi: 6, btch: 1 usd: 2 2013-02-01 12:57:58 CPU 43: hi: 6, btch: 1 usd: 1 2013-02-01 12:57:58 CPU 44: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 45: hi: 6, btch: 1 usd: 2 2013-02-01 12:57:58 CPU 46: hi: 6, btch: 1 usd: 0 2013-02-01 12:57:58 CPU 47: hi: 6, btch: 1 usd: 3 2013-02-01 12:57:58 active_anon:55550 inactive_anon:11352 isolated_anon:0 2013-02-01 12:57:58 active_file:99678 inactive_file:728273 isolated_file:0 2013-02-01 12:57:58 unevictable:0 dirty:286 writeback:16 unstable:7321 2013-02-01 12:57:58 free:1118 slab_reclaimable:25213 slab_unreclaimable:56151 2013-02-01 12:57:58 mapped:2009 shmem:125 pagetables:3732 bounce:0 2013-02-01 12:57:58 Node 0 DMA free:71552kB min:32128kB low:40128kB high:48192kB active_anon:3555200kB inactive_anon:726528kB active_file:6379392kB inactive_file:46609472kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:64694208kB mlocked:0kB dirty:18304kB writeback:1024kB mapped:128576kB shmem:8000kB slab_reclaimable:1613632kB slab_unreclaimable:3593664kB kernel_stack:42528kB pagetables:238848kB unstable:468544kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no 2013-02-01 12:57:58 lowmem_reserve[]: 0 0 0 2013-02-01 12:57:58 Node 0 DMA: 43*64kB 165*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 23872kB 2013-02-01 12:57:58 828068 total pagecache pages 2013-02-01 12:57:58 0 pages in swap cache 2013-02-01 12:57:58 Swap cache stats: add 495, delete 495, find 137/164 2013-02-01 12:57:58 Free swap = 4187968kB 2013-02-01 12:57:58 Total swap = 4194176kB 2013-02-01 12:57:58 1011712 pages RAM 2013-02-01 12:57:58 12550 pages reserved 2013-02-01 12:57:59 863674 pages shared 2013-02-01 12:57:59 256887 pages non-shared 2013-02-01 12:57:59 LustreError: 8998:0:(obd.h:120:client_obd_list_unlock()) ASSERTION( lock->task != ((void *)0) ) failed: 2013-02-01 12:57:59 LustreError: 8998:0:(obd.h:120:client_obd_list_unlock()) LBUG 2013-02-01 12:57:59 Call Trace: 2013-02-01 12:57:59 [c0000001640060a0] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable) 2013-02-01 12:57:59 [c000000164006150] [d00000000c930cb8] .libcfs_debug_dumpstack+0xd8/0x150 [libcfs] 2013-02-01 12:57:59 [c000000164006200] [d00000000c931480] .lbug_with_loc+0x50/0xc0 [libcfs] 2013-02-01 12:57:59 [c000000164006290] [d000000010d560e8] .osc_lru_shrink+0x1318/0x1360 [osc] 2013-02-01 12:57:59 [c000000164006420] [d000000010d565dc] .osc_lru_del+0x3cc/0x760 [osc] 2013-02-01 12:57:59 [c000000164006550] [d000000010d57fe0] .osc_page_delete+0x150/0x550 [osc] 2013-02-01 12:57:59 [c000000164006620] [d00000000edfc180] .cl_page_delete0+0x140/0x800 [obdclass] 2013-02-01 12:57:59 [c0000001640066f0] [d00000000edfc8c4] .cl_page_delete+0x84/0x250 [obdclass] 2013-02-01 12:57:59 [c0000001640067a0] [d0000000116a0e90] .ll_releasepage+0x190/0x200 [lustre] 2013-02-01 12:57:59 [c000000164006850] [c00000000014c158] .try_to_release_page+0x68/0xa0 2013-02-01 12:57:59 [c0000001640068c0] [c00000000017005c] .shrink_page_list.clone.2+0x67c/0x770 2013-02-01 12:57:59 [c000000164006a80] [c000000000170508] .shrink_inactive_list+0x3b8/0x9a0 2013-02-01 12:57:59 [c000000164006c80] [c000000000170df0] .shrink_mem_cgroup_zone+0x300/0x700 2013-02-01 12:57:59 [c000000164006de0] [c000000000171278] .shrink_zone+0x88/0x120 2013-02-01 12:57:59 [c000000164006eb0] [c0000000001714a8] .do_try_to_free_pages+0x198/0x700 2013-02-01 12:57:59 [c000000164006fd0] [c000000000171bc8] .try_to_free_pages+0xb8/0x190 2013-02-01 12:57:59 [c0000001640070d0] [c000000000166a00] .__alloc_pages_nodemask+0x520/0x8c0 2013-02-01 12:57:59 [c000000164007270] [c0000000001a2650] .alloc_pages_current+0xb0/0x170 2013-02-01 12:57:59 [c000000164007310] [c00000000014dee8] .__page_cache_alloc+0xc8/0xf0 2013-02-01 12:57:59 [c000000164007390] [c00000000014e280] .grab_cache_page_write_begin+0xf0/0x130 2013-02-01 12:57:59 [c000000164007440] [d0000000116a1014] .ll_write_begin+0x94/0x270 [lustre] 2013-02-01 12:57:59 [c000000164007510] [c00000000014f148] .generic_file_buffered_write+0x138/0x3a0 2013-02-01 12:57:59 [c000000164007650] [c00000000014f8b8] .__generic_file_aio_write+0x2c8/0x430 2013-02-01 12:57:59 [c000000164007750] [c00000000014fac0] .generic_file_aio_write+0xa0/0x130 2013-02-01 12:57:59 [c000000164007810] [d0000000116bf92c] .vvp_io_write_start+0xfc/0x3e0 [lustre] 2013-02-01 12:57:59 [c0000001640078e0] [d00000000ee09d3c] .cl_io_start+0xcc/0x220 [obdclass] 2013-02-01 12:57:59 [c000000164007980] [d00000000ee125e4] .cl_io_loop+0x194/0x2c0 [obdclass] 2013-02-01 12:57:59 [c000000164007a30] [d00000001163a2a8] .ll_file_io_generic+0x498/0x670 [lustre] 2013-02-01 12:57:59 [c000000164007b30] [d00000001163a904] .ll_file_aio_write+0x1d4/0x3a0 [lustre] 2013-02-01 12:57:59 [c000000164007c00] [d00000001163ac20] .ll_file_write+0x150/0x320 [lustre] 2013-02-01 12:57:59 [c000000164007ce0] [c0000000001c38cc] .vfs_write+0xec/0x1f0 2013-02-01 12:57:59 [c000000164007d80] [c0000000001c3af8] .SyS_write+0x58/0xb0 2013-02-01 12:57:59 [c000000164007e30] [c000000000008564] syscall_exit+0x0/0x40 2013-02-01 12:57:59 Kernel panic - not syncing: LBUG 2013-02-01 12:57:59 Call Trace: 2013-02-01 12:57:59 [c0000001640060c0] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable) 2013-02-01 12:57:59 [c000000164006170] [c0000000005c4b18] .panic+0xb8/0x1ec 2013-02-01 12:57:59 [c000000164006200] [d00000000c9314e0] .lbug_with_loc+0xb0/0xc0 [libcfs] 2013-02-01 12:57:59 [c000000164006290] [d000000010d560e8] .osc_lru_shrink+0x1318/0x1360 [osc] 2013-02-01 12:57:59 [c000000164006420] [d000000010d565dc] .osc_lru_del+0x3cc/0x760 [osc] 2013-02-01 12:57:59 [c000000164006550] [d000000010d57fe0] .osc_page_delete+0x150/0x550 [osc] 2013-02-01 12:57:59 [c000000164006620] [d00000000edfc180] .cl_page_delete0+0x140/0x800 [obdclass] 2013-02-01 12:57:59 [c0000001640066f0] [d00000000edfc8c4] .cl_page_delete+0x84/0x250 [obdclass] 2013-02-01 12:57:59 [c0000001640067a0] [d0000000116a0e90] .ll_releasepage+0x190/0x200 [lustre] 2013-02-01 12:57:59 [c000000164006850] [c00000000014c158] .try_to_release_page+0x68/0xa0 2013-02-01 12:57:59 [c0000001640068c0] [c00000000017005c] .shrink_page_list.clone.2+0x67c/0x770 2013-02-01 12:57:59 [c000000164006a80] [c000000000170508] .shrink_inactive_list+0x3b8/0x9a0 2013-02-01 12:57:59 [c000000164006c80] [c000000000170df0] .shrink_mem_cgroup_zone+0x300/0x700 2013-02-01 12:57:59 [c000000164006de0] [c000000000171278] .shrink_zone+0x88/0x120 2013-02-01 12:57:59 [c000000164006eb0] [c0000000001714a8] .do_try_to_free_pages+0x198/0x700 2013-02-01 12:57:59 [c000000164006fd0] [c000000000171bc8] .try_to_free_pages+0xb8/0x190 2013-02-01 12:57:59 [c0000001640070d0] [c000000000166a00] .__alloc_pages_nodemask+0x520/0x8c0 2013-02-01 12:57:59 [c000000164007270] [c0000000001a2650] .alloc_pages_current+0xb0/0x170 2013-02-01 12:57:59 [c000000164007310] [c00000000014dee8] .__page_cache_alloc+0xc8/0xf0 2013-02-01 12:57:59 [c000000164007390] [c00000000014e280] .grab_cache_page_write_begin+0xf0/0x130 2013-02-01 12:57:59 [c000000164007440] [d0000000116a1014] .ll_write_begin+0x94/0x270 [lustre] 2013-02-01 12:57:59 [c000000164007510] [c00000000014f148] .generic_file_buffered_write+0x138/0x3a0 2013-02-01 12:57:59 [c000000164007650] [c00000000014f8b8] .__generic_file_aio_write+0x2c8/0x430 2013-02-01 12:57:59 [c000000164007750] [c00000000014fac0] .generic_file_aio_write+0xa0/0x130 2013-02-01 12:57:59 [c000000164007810] [d0000000116bf92c] .vvp_io_write_start+0xfc/0x3e0 [lustre] 2013-02-01 12:57:59 [c0000001640078e0] [d00000000ee09d3c] .cl_io_start+0xcc/0x220 [obdclass] 2013-02-01 12:57:59 [c000000164007980] [d00000000ee125e4] .cl_io_loop+0x194/0x2c0 [obdclass] 2013-02-01 12:57:59 [c000000164007a30] [d00000001163a2a8] .ll_file_io_generic+0x498/0x670 [lustre] 2013-02-01 12:57:59 [c000000164007b30] [d00000001163a904] .ll_file_aio_write+0x1d4/0x3a0 [lustre] 2013-02-01 12:57:59 [c000000164007c00] [d00000001163ac20] .ll_file_write+0x150/0x320 [lustre] 2013-02-01 12:57:59 [c000000164007ce0] [c0000000001c38cc] .vfs_write+0xec/0x1f0 2013-02-01 12:57:59 [c000000164007d80] [c0000000001c3af8] .SyS_write+0x58/0xb0 2013-02-01 12:57:59 [c000000164007e30] [c000000000008564] syscall_exit+0x0/0x40 2013-02-01 12:57:59 Feb 1 12:57:58 seqlac2 kernel: LustreError: 8998:0:(obd.h:120:client_obd_list_unlock()) ASSERTION( lock->task != ((void *)0) ) failed: 2013-02-01 12:57:59 Feb 1 12:57:58 seqlac2 kernel: LustreError: 8998:0:(obd.h:120:client_obd_list_unlock()) LBUG 2013-02-01 12:57:59 Feb 1 12:57:58 seqlac2 kernel: Kernel panic - not syncing: LBUG