Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.5.2
-
None
-
3
-
15215
Description
While testing HSM copytool in a single VM with 512MB memory, I saw page allocation errors in mdt_readpage, and subsequent IO errors on the client when trying to read that directory again. It appears the client is caching the error page, and not allowing the ll_get_dir_page() try to fetch it again. Here are the errors on the client side:
LustreError: 18907:0:(dir.c:422:ll_get_dir_page()) read cache page: [0x200000402:0x27a:0x0] at 0: rc -12 LustreError: 18907:0:(dir.c:584:ll_dir_read()) error reading dir [0x200000402:0x27a:0x0] at 0: rc -12 LustreError: 18912:0:(dir.c:398:ll_get_dir_page()) dir page locate: [0x200000402:0x27a:0x0] at 0: rc -5 LustreError: 18912:0:(dir.c:584:ll_dir_read()) error reading dir [0x200000402:0x27a:0x0] at 0: rc -5 LustreError: 7358:0:(dir.c:398:ll_get_dir_page()) dir page locate: [0x200000402:0x27a:0x0] at 0: rc -5 LustreError: 7358:0:(dir.c:584:ll_dir_read()) error reading dir [0x200000402:0x27a:0x0] at 0: rc -5
And this is the allocation failure on the MDT:
mdt_rdpg00_001: page allocation failure. order:0, mode:0xc0 Pid: 4794, comm: mdt_rdpg00_001 Not tainted 2.6.32-431.17.1.el6_lustre.x86_64 #1 Call Trace: [<ffffffff8112f64a>] ? __alloc_pages_nodemask+0x74a/0x8d0 [<ffffffffa0653d10>] ? lustre_swab_mdt_body+0x0/0x140 [ptlrpc] [<ffffffff8116769a>] ? alloc_pages_current+0xaa/0x110 [<ffffffffa0c9f3c0>] ? mdt_readpage+0x1d0/0x940 [mdt] [<ffffffffa0c8f58a>] ? mdt_handle_common+0x52a/0x1470 [mdt] [<ffffffffa0ccb735>] ? mds_readpage_handle+0x15/0x20 [mdt] [<ffffffffa0660bc5>] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc] [<ffffffffa03713cf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [<ffffffffa06582a9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] [<ffffffffa0661f2d>] ? ptlrpc_main+0xaed/0x1740 [ptlrpc] [<ffffffffa0661440>] ? ptlrpc_main+0x0/0x1740 [ptlrpc] [<ffffffff8109ab56>] ? kthread+0x96/0xa0 [<ffffffff8100c20a>] ? child_rip+0xa/0x20 [<ffffffff8109aac0>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Mem-Info: Node 0 DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 72 active_anon:9625 inactive_anon:10498 isolated_anon:0 active_file:24454 inactive_file:28863 isolated_file:0 unevictable:0 dirty:4370 writeback:0 unstable:0 free:1018 slab_reclaimable:5718 slab_unreclaimable:18294 mapped:2472 shmem:132 pagetables:1268 bounce:0 Node 0 DMA free:2020kB min:84kB low:104kB high:124kB active_anon:464kB inactive_anon:2924kB active_file:1412kB inactive_file:7108kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15368kB mlocked:0kB dirty:408kB writeback:0kB mapped:268kB shmem:308kB slab_reclaimable:264kB slab_unreclaimable:632kB kernel_stack:40kB pagetables:704kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 489 489 489 Node 0 DMA32 free:2052kB min:2784kB low:3480kB high:4176kB active_anon:38036kB inactive_anon:39068kB active_file:96404kB inactive_file:108344kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:500896kB mlocked:0kB dirty:17072kB writeback:0kB mapped:9620kB shmem:220kB slab_reclaimable:22608kB slab_unreclaimable:72544kB kernel_stack:1608kB pagetables:4368kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:64 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2020kB Node 0 DMA32: 183*4kB 5*8kB 4*16kB 2*32kB 2*64kB 0*128kB 2*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2052kB 53571 total pagecache pages 112 pages in swap cache Swap cache stats: add 245, delete 133, find 25/32 Free swap = 834836kB Total swap = 835576kB 131055 pages RAM 5534 pages reserved 66002 pages shared 75464 pages non-shared