[LU-10446] mds-survey test_1: mdt_out00_001: page allocation failure: order:4, mode:0x10c050 Created: 30/Dec/17  Updated: 02/Jan/18  Resolved: 02/Jan/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Jian Yu Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Lustre Branch: master
MDSCOUNT=4


Issue Links:
Duplicate
duplicates LU-10331 mds-survey test_1: mds-survey failed Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

mds-survey test 1 failed as follows:

Fri Dec 29 13:39:36 UTC 2017 /usr/bin/mds-survey from trevis-68vm2.trevis.hpdd.intel.com
mdt 4 file   50000 dir    2 thr    2 create  527.34 [    0.00, 2999.66] lookup 3346.87 [  999.91, 4999.57] md_getattr 3272.70 [ 1992.72, 4999.55] setxattr 3290.89 [  999.63, 5999.30] destroy  816.55 [    0.00, 3999.62] 
mdt 4 file   50000 dir    2 thr    4 create  622.84 [    0.00, 5999.36] lookup 4100.74 [    0.00, 8999.24] md_getattr 3896.47 [    0.00, 8998.48] setxattr 3863.03 [    0.00, 8999.13] destroy             ERROR 
starting run for config:  test: create  file: 50000 threads: 2  directories: 2
starting run for config:  test: lookup  file: 50000 threads: 2  directories: 2
starting run for config:  test: md_getattr  file: 50000 threads: 2  directories: 2
starting run for config:  test: setxattr  file: 50000 threads: 2  directories: 2
starting run for config:  test: destroy  file: 50000 threads: 2  directories: 2
starting run for config:  test: create  file: 50000 threads: 4  directories: 2
starting run for config:  test: lookup  file: 50000 threads: 4  directories: 2
starting run for config:  test: md_getattr  file: 50000 threads: 4  directories: 2
starting run for config:  test: setxattr  file: 50000 threads: 4  directories: 2
starting run for config:  test: destroy  file: 50000 threads: 4  directories: 2
starting run for config:  test: create  file: 50000 threads: 2  directories: 2
starting run for config:  test: lookup  file: 50000 threads: 2  directories: 2
starting run for config:  test: md_getattr  file: 50000 threads: 2  directories: 2
starting run for config:  test: setxattr  file: 50000 threads: 2  directories: 2
starting run for config:  test: destroy  file: 50000 threads: 2  directories: 2
starting run for config:  test: create  file: 50000 threads: 4  directories: 2
starting run for config:  test: lookup  file: 50000 threads: 4  directories: 2
starting run for config:  test: md_getattr  file: 50000 threads: 4  directories: 2
starting run for config:  test: setxattr  file: 50000 threads: 4  directories: 2
starting run for config:  test: destroy  file: 50000 threads: 4  directories: 2
 mds-survey test_1: @@@@@@ FAIL: mds-survey failed 

Console log on MDS node:

Lustre: DEBUG MARKER: == mds-survey test 1: Metadata survey with zero-stripe =============================================== 13:39:30 (1514554770)
Lustre: DEBUG MARKER: /usr/sbin/lctl dl
Lustre: Echo OBD driver; http://www.lustre.org/
LustreError: 13893:0:(echo_client.c:1795:echo_md_lookup()) lookup MDT0000-tests: rc = -2
LustreError: 13893:0:(echo_client.c:2027:echo_md_destroy_internal()) Can't find child MDT0000-tests: rc = -2
LustreError: 14052:0:(echo_client.c:1795:echo_md_lookup()) lookup MDT0002-tests: rc = -2
LustreError: 14052:0:(echo_client.c:1795:echo_md_lookup()) Skipped 1 previous similar message
LustreError: 14052:0:(echo_client.c:2027:echo_md_destroy_internal()) Can't find child MDT0002-tests: rc = -2
LustreError: 14052:0:(echo_client.c:2027:echo_md_destroy_internal()) Skipped 1 previous similar message
mdt_out00_001: page allocation failure: order:4, mode:0x10c050
CPU: 1 PID: 9793 Comm: mdt_out00_001 Tainted: G           OE  ------------   3.10.0-693.5.2.el7_lustre.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
 000000000010c050 000000008bfb7ba6 ffff88004a4c78a8 ffffffff816a3e51
 ffff88004a4c7938 ffffffff81188820 0000000000000000 ffff88007ffd8000
 0000000000000004 000000000010c050 ffff88004a4c7938 000000008bfb7ba6
Call Trace:
 [<ffffffff816a3e51>] dump_stack+0x19/0x1b
 [<ffffffff81188820>] warn_alloc_failed+0x110/0x180
 [<ffffffff8169fe2a>] __alloc_pages_slowpath+0x6b6/0x724
 [<ffffffff8118cdb5>] __alloc_pages_nodemask+0x405/0x420
 [<ffffffff811d1078>] alloc_pages_current+0x98/0x110
 [<ffffffff8118761e>] __get_free_pages+0xe/0x40
 [<ffffffff811dca2e>] kmalloc_order_trace+0x2e/0xa0
 [<ffffffff811e05c1>] __kmalloc+0x211/0x230
 [<ffffffffc08b5479>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
 [<ffffffffc0b6dddd>] out_handle+0xa5d/0x1920 [ptlrpc]
 [<ffffffffc0afef22>] ? lustre_msg_get_opc+0x22/0xf0 [ptlrpc]
 [<ffffffffc0b67d09>] ? tgt_request_preprocess.isra.28+0x299/0x7a0 [ptlrpc]
 [<ffffffffc0b68b35>] tgt_request_handle+0x925/0x13b0 [ptlrpc]
 [<ffffffffc0b0ce3e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
 [<ffffffff810ba598>] ? __wake_up_common+0x58/0x90
 [<ffffffffc0b105e2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
 [<ffffffff81029557>] ? __switch_to+0xd7/0x510
 [<ffffffff816a9000>] ? __schedule+0x350/0x8b0
 [<ffffffffc0b0fb50>] ? ptlrpc_register_service+0xe80/0xe80 [ptlrpc]
 [<ffffffff810b099f>] kthread+0xcf/0xe0
 [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40
 [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90
 [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40
Mem-Info:
active_anon:20651 inactive_anon:19487 isolated_anon:0
 active_file:94123 inactive_file:74066 isolated_file:0
 unevictable:0 dirty:5223 writeback:0 unstable:0
 slab_reclaimable:59778 slab_unreclaimable:52270
 mapped:13531 shmem:22506 pagetables:1807 bounce:0
 free:19187 free_pcp:60 free_cma:0
Node 0 DMA free:7104kB min:416kB low:520kB high:624kB active_anon:132kB inactive_anon:256kB active_file:1324kB inactive_file:1360kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:572kB shmem:0kB slab_reclaimable:436kB slab_unreclaimable:456kB kernel_stack:96kB pagetables:160kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 1663 1663 1663 
Node 0 DMA32 free:69644kB min:44636kB low:55792kB high:66952kB active_anon:82472kB inactive_anon:77692kB active_file:375252kB inactive_file:294820kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080756kB managed:1704756kB mlocked:0kB dirty:20892kB writeback:0kB mapped:53552kB shmem:90024kB slab_reclaimable:238676kB slab_unreclaimable:208624kB kernel_stack:4096kB pagetables:7068kB unstable:0kB bounce:0kB free_pcp:232kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 20*4kB (UEM) 18*8kB (UEM) 16*16kB (UEM) 11*32kB (UEM) 4*64kB (M) 1*128kB (M) 1*256kB (U) 1*512kB (E) 1*1024kB (E) 2*2048kB (EM) 0*4096kB = 7104kB
Node 0 DMA32: 5057*4kB (UEM) 2445*8kB (UEM) 1319*16kB (UEM) 233*32kB (UEM) 9*64kB (UEM) 2*128kB (UE) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 69180kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
190492 total pagecache pages
14 pages in swap cache
Swap cache stats: add 153, delete 139, find 0/0
Free swap  = 3670424kB
Total swap = 3671036kB
524185 pages RAM
0 pages HighMem/MovableOnly
94023 pages reserved 
LustreError: 11-0: lustre-MDT0002-osp-MDT0000: operation out_update to node 0@lo failed: rc = -12
LustreError: Skipped 1 previous similar message
LustreError: 9808:0:(layout.c:2099:__req_capsule_get()) @@@ Wrong buffer for field `object_update_reply' (1 of 1) in format `OUT_UPDATE': 0 vs. 4096 (server)
  req@ffff880064300600 x1588109799891040/t0(0) o1000->lustre-MDT0002-osp-MDT0000@0@lo:24/4 lens 376/192 e 0 to 0 dl 1514555485 ref 2 fl Interpret:ReM/0/0 rc -12/-12
LustreError: 15257:0:(echo_client.c:2048:echo_md_destroy_internal()) Can not unlink child 2147486512: rc = -12
LustreError: 15257:0:(echo_client.c:2109:echo_destroy_object()) Can not unlink child (null): rc = -12
LustreError: 9832:0:(llog_cat.c:795:llog_cat_cancel_records()) lustre-MDT0002-osp-MDT0000: fail to cancel 0 of 1 llog-records: rc = -116
LustreError: 15374:0:(echo_client.c:2048:echo_md_destroy_internal()) Can not unlink child MDT0000-tests: rc = -39
LustreError: 15374:0:(echo_client.c:2048:echo_md_destroy_internal()) Skipped 3 previous similar messages
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  mds-survey test_1: @@@@@@ FAIL: mds-survey failed
Lustre: DEBUG MARKER: mds-survey test_1: @@@@@@ FAIL: mds-survey failed

Maloo report: https://testing.hpdd.intel.com/test_sets/85960c0e-eca3-11e7-a169-52540065bddc



 Comments   
Comment by Jian Yu [ 02/Jan/18 ]

More failures on master branch:
https://testing.hpdd.intel.com/test_sets/85ea0694-ef80-11e7-8c23-52540065bddc
https://testing.hpdd.intel.com/test_sets/88b6588e-e632-11e7-9840-52540065bddc
https://testing.hpdd.intel.com/test_sets/ad212386-e5f9-11e7-a066-52540065bddc
https://testing.hpdd.intel.com/test_sets/bc067b44-e5d6-11e7-a066-52540065bddc

Comment by Jian Yu [ 02/Jan/18 ]

This failure has been fixed in LU-10331. The patches that hit this failure were not rebased.

Generated at Sat Feb 10 02:35:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.