[LU-3680] OOM crash: null_alloc_rs()) ASSERTION( rs->rs_size >= rs_size ) failed - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1, Lustre 2.4.3, Lustre 2.8.0
Affects Version/s: Lustre 2.5.0
Labels:
None

Severity:
3
Rank (Obsolete):
9502

Description

Hit this running sanity in a loop:

<4>[80900.195000] ldlm_cn01_000: page allocation failure. order:1, mode:0x40
<4>[80900.195002] Pid: 17587, comm: ldlm_cn01_000 Not tainted 2.6.32-rhe6.4-debug #2
<4>[80900.195003] Call Trace:
<4>[80900.195008]  [<ffffffff8112a666>] ? __alloc_pages_nodemask+0x7c6/0x980
<4>[80900.195011]  [<ffffffff811658f2>] ? kmem_getpages+0x62/0x170
<4>[80900.195013]  [<ffffffff8116834a>] ? fallback_alloc+0x1ba/0x270
<4>[80900.195015]  [<ffffffff81167bf7>] ? cache_grow+0x4d7/0x520
<4>[80900.195017]  [<ffffffff81168038>] ? ____cache_alloc_node+0xa8/0x200
<4>[80900.195018]  [<ffffffff81168943>] ? kmem_cache_alloc_trace+0x1c3/0x250
<4>[80900.195029]  [<ffffffffa099cbc5>] ? osd_key_init+0x25/0x4e0 [osd_ldiskfs]
<4>[80900.195035]  [<ffffffffa099cbc5>] ? osd_key_init+0x25/0x4e0 [osd_ldiskfs]
<4>[80900.195060]  [<ffffffffa0bdd27f>] ? keys_fill+0x6f/0x190 [obdclass]
<4>[80900.195090]  [<ffffffffa0be132e>] ? lu_context_init+0x4e/0x240 [obdclass]
<4>[80900.195109]  [<ffffffffa0be1383>] ? lu_context_init+0xa3/0x240 [obdclass]
<4>[80900.195111]  [<ffffffff811665be>] ? cache_free_debugcheck+0x2ae/0x360
<4>[80900.195130]  [<ffffffffa0be153e>] ? lu_env_init+0x1e/0x30 [obdclass]
<4>[80900.195140]  [<ffffffffa0e3d69a>] ? ofd_lvbo_update+0x7a/0xea8 [ofd]
<4>[80900.195164]  [<ffffffffa04ac434>] ? ldlm_resource_putref+0x1d4/0x280 [ptlrpc]
<4>[80900.195186]  [<ffffffffa04c97b7>] ? ldlm_request_cancel+0x247/0x410 [ptlrpc]
<4>[80900.195206]  [<ffffffffa04c9abd>] ? ldlm_handle_cancel+0x13d/0x240 [ptlrpc]
<4>[80900.195226]  [<ffffffffa04cefb9>] ? ldlm_cancel_handler+0x1e9/0x500 [ptlrpc]
<4>[80900.195250]  [<ffffffffa04ffad1>] ? ptlrpc_server_handle_request+0x3b1/0xc70 [ptlrpc]
<4>[80900.195260]  [<ffffffffa0a2355e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
<4>[80900.195270]  [<ffffffffa0a34b6f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
<4>[80900.195340]  [<ffffffffa04f6bb1>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
<4>[80900.195345]  [<ffffffff81054613>] ? __wake_up+0x53/0x70
<4>[80900.195367]  [<ffffffffa0500db2>] ? ptlrpc_main+0xa22/0x1650 [ptlrpc]
<4>[80900.195437]  [<ffffffffa0500390>] ? ptlrpc_main+0x0/0x1650 [ptlrpc]
<4>[80900.195441]  [<ffffffff81094606>] ? kthread+0x96/0xa0
<4>[80900.195444]  [<ffffffff8100c10a>] ? child_rip+0xa/0x20
<4>[80900.195447]  [<ffffffff81094570>] ? kthread+0x0/0xa0
<4>[80900.195448]  [<ffffffff8100c100>] ? child_rip+0x0/0x20
<6>[80900.195449] Mem-Info:
<4>[80900.195450] Node 0 DMA per-cpu:
<4>[80900.195451] CPU    0: hi:    0, btch:   1 usd:   0
<4>[80900.195452] CPU    1: hi:    0, btch:   1 usd:   0
<4>[80900.195453] CPU    2: hi:    0, btch:   1 usd:   0
<4>[80900.195454] CPU    3: hi:    0, btch:   1 usd:   0
<4>[80900.195455] CPU    4: hi:    0, btch:   1 usd:   0
<4>[80900.195456] CPU    5: hi:    0, btch:   1 usd:   0
<4>[80900.195458] CPU    6: hi:    0, btch:   1 usd:   0
<4>[80900.195459] CPU    7: hi:    0, btch:   1 usd:   0
<4>[80900.195459] Node 0 DMA32 per-cpu:
<4>[80900.195460] CPU    0: hi:  186, btch:  31 usd:  51
<4>[80900.195461] CPU    1: hi:  186, btch:  31 usd:  26
<4>[80900.195462] CPU    2: hi:  186, btch:  31 usd:   0
<4>[80900.195463] CPU    3: hi:  186, btch:  31 usd:   0
<4>[80900.195464] CPU    4: hi:  186, btch:  31 usd:  57
<4>[80900.195465] CPU    5: hi:  186, btch:  31 usd: 174
<4>[80900.195466] CPU    6: hi:  186, btch:  31 usd: 162
<4>[80900.195467] CPU    7: hi:  186, btch:  31 usd:  32
<4>[80900.195470] active_anon:61548 inactive_anon:61459 isolated_anon:0
<4>[80900.195470]  active_file:94797 inactive_file:74222 isolated_file:0
<4>[80900.195471]  unevictable:0 dirty:20 writeback:0 unstable:0
<4>[80900.195471]  free:43025 slab_reclaimable:75111 slab_unreclaimable:271092
<4>[80900.195472]  mapped:577 shmem:119300 pagetables:383 bounce:0
<4>[80900.195473] Node 0 DMA free:9692kB min:136kB low:168kB high:204kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:9296kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
<4>[80900.195478] lowmem_reserve[]: 0 2967 2967 2967
<4>[80900.195479] Node 0 DMA32 free:162408kB min:44916kB low:56144kB high:67372kB active_anon:246192kB inactive_anon:245836kB active_file:379188kB inactive_file:296888kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3039076kB mlocked:0kB dirty:80kB writeback:0kB mapped:2308kB shmem:477200kB slab_reclaimable:300444kB slab_unreclaimable:1084368kB kernel_stack:3296kB pagetables:1532kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
<4>[80900.195485] lowmem_reserve[]: 0 0 0 0
<4>[80900.195486] Node 0 DMA: 3*4kB 0*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 2*4096kB = 9692kB
<4>[80900.195490] Node 0 DMA32: 37378*4kB 1032*8kB 32*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 162408kB
<4>[80900.195495] 129723 total pagecache pages
<4>[80900.195496] 925 pages in swap cache
<4>[80900.195497] Swap cache stats: add 376365, delete 375440, find 1323456/1328202
<4>[80900.195498] Free swap  = 1869104kB
<4>[80900.195499] Total swap = 2097144kB
<6>[80900.198861] 774396 pages RAM
<6>[80900.198861] 38583 pages reserved
<6>[80900.198861] 11942 pages shared
<6>[80900.198861] 675636 pages non-shared
<4>[80900.226747] 129650 total pagecache pages
<4>[80900.226747] 1136 pages in swap cache
<4>[80900.226747] Swap cache stats: add 376650, delete 375514, find 1323456/1328202
<4>[80900.226747] Free swap  = 1867964kB
<4>[80900.226747] Total swap = 2097144kB
<6>[80900.226747] 774396 pages RAM
<6>[80900.226747] 38583 pages reserved
<6>[80900.226747] 11963 pages shared
<6>[80900.226747] 668761 pages non-shared
<0>[80900.502883] LustreError: 17604:0:(sec_null.c:318:null_alloc_rs()) ASSERTION( rs->rs_size >= rs_size ) failed: 
<0>[80900.504111] LustreError: 17604:0:(sec_null.c:318:null_alloc_rs()) LBUG
<4>[80900.504782] Pid: 17604, comm: mdt01_002
<4>[80900.505352] 
<4>[80900.505353] Call Trace:
<4>[80900.506312]  [<ffffffffa0a228a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4>[80900.507011]  [<ffffffffa0a22ea7>] lbug_with_loc+0x47/0xb0 [libcfs]
<4>[80900.507716]  [<ffffffffa052a382>] null_alloc_rs+0x272/0x390 [ptlrpc]
<4>[80900.508419]  [<ffffffffa0518f19>] sptlrpc_svc_alloc_rs+0x1d9/0x2a0 [ptlrpc]
<4>[80900.509166]  [<ffffffffa04ef218>] lustre_pack_reply_v2+0x98/0x2a0 [ptlrpc]
<4>[80900.509906]  [<ffffffffa04ef4ce>] lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]
<4>[80900.510935]  [<ffffffffa04ef621>] lustre_pack_reply+0x11/0x20 [ptlrpc]
<4>[80900.511642]  [<ffffffffa0516603>] req_capsule_server_pack+0x53/0x100 [ptlrpc]
<4>[80900.513248]  [<ffffffffa0d472e5>] mdt_getxattr+0x585/0x13c0 [mdt]
<4>[80900.514017]  [<ffffffffa0d2570e>] mdt_intent_getxattr+0x9e/0x160 [mdt]
<4>[80900.514572]  [<ffffffffa0d2265e>] mdt_intent_policy+0x3ae/0x770 [mdt]
<4>[80900.515391]  [<ffffffffa04a735a>] ldlm_lock_enqueue+0x2ea/0x860 [ptlrpc]
<4>[80900.516099]  [<ffffffffa04cfc7f>] ldlm_handle_enqueue0+0x4ef/0x10c0 [ptlrpc]
<4>[80900.518542]  [<ffffffffa0d22b26>] mdt_enqueue+0x46/0xe0 [mdt]
<4>[80900.519098]  [<ffffffffa0d28ca7>] mdt_handle_common+0x647/0x16d0 [mdt]
<4>[80900.519529]  [<ffffffffa0d63335>] mds_regular_handle+0x15/0x20 [mdt]
<4>[80900.519959]  [<ffffffffa04ffad1>] ptlrpc_server_handle_request+0x3b1/0xc70 [ptlrpc]
<4>[80900.520740]  [<ffffffffa0a2355e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
<4>[80900.521190]  [<ffffffffa0a34b6f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
<4>[80900.521644]  [<ffffffffa04f6bb1>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
<4>[80900.522146]  [<ffffffff81054613>] ? __wake_up+0x53/0x70
<4>[80900.522670]  [<ffffffffa0500db2>] ptlrpc_main+0xa22/0x1650 [ptlrpc]
<4>[80900.523097]  [<ffffffffa0500390>] ? ptlrpc_main+0x0/0x1650 [ptlrpc]
<4>[80900.523532]  [<ffffffff81094606>] kthread+0x96/0xa0
<4>[80900.526690]  [<ffffffff8100c10a>] child_rip+0xa/0x20
<4>[80900.527224]  [<ffffffff81094570>] ? kthread+0x0/0xa0
<4>[80900.527603]  [<ffffffff8100c100>] ? child_rip+0x0/0x20
<4>[80900.528010] 
<0>[80900.528950] Kernel panic - not syncing: LBUG

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

lu3680.txt.gz
833 kB
05/Oct/13 1:26 AM

Issue Links

duplicates

LU-4357 page allocation failure. mode:0x40 caused by missing __GFP_WAIT flag

Resolved

is duplicated by

LU-6472 OI scrub causes MDS reboot

Resolved

is related to

LU-6324 CLASSERT(sizeof(struct osd_thread_info) <= PAGE_SIZE) fails for some configs

Resolved

OOM crash: null_alloc_rs()) ASSERTION( rs->rs_size >= rs_size ) failed

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates