Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10446

mds-survey test_1: mdt_out00_001: page allocation failure: order:4, mode:0x10c050

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.11.0
    • None
    • Lustre Branch: master
      MDSCOUNT=4
    • 3
    • 9223372036854775807

    Description

      mds-survey test 1 failed as follows:

      Fri Dec 29 13:39:36 UTC 2017 /usr/bin/mds-survey from trevis-68vm2.trevis.hpdd.intel.com
      mdt 4 file   50000 dir    2 thr    2 create  527.34 [    0.00, 2999.66] lookup 3346.87 [  999.91, 4999.57] md_getattr 3272.70 [ 1992.72, 4999.55] setxattr 3290.89 [  999.63, 5999.30] destroy  816.55 [    0.00, 3999.62] 
      mdt 4 file   50000 dir    2 thr    4 create  622.84 [    0.00, 5999.36] lookup 4100.74 [    0.00, 8999.24] md_getattr 3896.47 [    0.00, 8998.48] setxattr 3863.03 [    0.00, 8999.13] destroy             ERROR 
      starting run for config:  test: create  file: 50000 threads: 2  directories: 2
      starting run for config:  test: lookup  file: 50000 threads: 2  directories: 2
      starting run for config:  test: md_getattr  file: 50000 threads: 2  directories: 2
      starting run for config:  test: setxattr  file: 50000 threads: 2  directories: 2
      starting run for config:  test: destroy  file: 50000 threads: 2  directories: 2
      starting run for config:  test: create  file: 50000 threads: 4  directories: 2
      starting run for config:  test: lookup  file: 50000 threads: 4  directories: 2
      starting run for config:  test: md_getattr  file: 50000 threads: 4  directories: 2
      starting run for config:  test: setxattr  file: 50000 threads: 4  directories: 2
      starting run for config:  test: destroy  file: 50000 threads: 4  directories: 2
      starting run for config:  test: create  file: 50000 threads: 2  directories: 2
      starting run for config:  test: lookup  file: 50000 threads: 2  directories: 2
      starting run for config:  test: md_getattr  file: 50000 threads: 2  directories: 2
      starting run for config:  test: setxattr  file: 50000 threads: 2  directories: 2
      starting run for config:  test: destroy  file: 50000 threads: 2  directories: 2
      starting run for config:  test: create  file: 50000 threads: 4  directories: 2
      starting run for config:  test: lookup  file: 50000 threads: 4  directories: 2
      starting run for config:  test: md_getattr  file: 50000 threads: 4  directories: 2
      starting run for config:  test: setxattr  file: 50000 threads: 4  directories: 2
      starting run for config:  test: destroy  file: 50000 threads: 4  directories: 2
       mds-survey test_1: @@@@@@ FAIL: mds-survey failed 
      

      Console log on MDS node:

      Lustre: DEBUG MARKER: == mds-survey test 1: Metadata survey with zero-stripe =============================================== 13:39:30 (1514554770)
      Lustre: DEBUG MARKER: /usr/sbin/lctl dl
      Lustre: Echo OBD driver; http://www.lustre.org/
      LustreError: 13893:0:(echo_client.c:1795:echo_md_lookup()) lookup MDT0000-tests: rc = -2
      LustreError: 13893:0:(echo_client.c:2027:echo_md_destroy_internal()) Can't find child MDT0000-tests: rc = -2
      LustreError: 14052:0:(echo_client.c:1795:echo_md_lookup()) lookup MDT0002-tests: rc = -2
      LustreError: 14052:0:(echo_client.c:1795:echo_md_lookup()) Skipped 1 previous similar message
      LustreError: 14052:0:(echo_client.c:2027:echo_md_destroy_internal()) Can't find child MDT0002-tests: rc = -2
      LustreError: 14052:0:(echo_client.c:2027:echo_md_destroy_internal()) Skipped 1 previous similar message
      mdt_out00_001: page allocation failure: order:4, mode:0x10c050
      CPU: 1 PID: 9793 Comm: mdt_out00_001 Tainted: G           OE  ------------   3.10.0-693.5.2.el7_lustre.x86_64 #1
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
       000000000010c050 000000008bfb7ba6 ffff88004a4c78a8 ffffffff816a3e51
       ffff88004a4c7938 ffffffff81188820 0000000000000000 ffff88007ffd8000
       0000000000000004 000000000010c050 ffff88004a4c7938 000000008bfb7ba6
      Call Trace:
       [<ffffffff816a3e51>] dump_stack+0x19/0x1b
       [<ffffffff81188820>] warn_alloc_failed+0x110/0x180
       [<ffffffff8169fe2a>] __alloc_pages_slowpath+0x6b6/0x724
       [<ffffffff8118cdb5>] __alloc_pages_nodemask+0x405/0x420
       [<ffffffff811d1078>] alloc_pages_current+0x98/0x110
       [<ffffffff8118761e>] __get_free_pages+0xe/0x40
       [<ffffffff811dca2e>] kmalloc_order_trace+0x2e/0xa0
       [<ffffffff811e05c1>] __kmalloc+0x211/0x230
       [<ffffffffc08b5479>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
       [<ffffffffc0b6dddd>] out_handle+0xa5d/0x1920 [ptlrpc]
       [<ffffffffc0afef22>] ? lustre_msg_get_opc+0x22/0xf0 [ptlrpc]
       [<ffffffffc0b67d09>] ? tgt_request_preprocess.isra.28+0x299/0x7a0 [ptlrpc]
       [<ffffffffc0b68b35>] tgt_request_handle+0x925/0x13b0 [ptlrpc]
       [<ffffffffc0b0ce3e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
       [<ffffffff810ba598>] ? __wake_up_common+0x58/0x90
       [<ffffffffc0b105e2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
       [<ffffffff81029557>] ? __switch_to+0xd7/0x510
       [<ffffffff816a9000>] ? __schedule+0x350/0x8b0
       [<ffffffffc0b0fb50>] ? ptlrpc_register_service+0xe80/0xe80 [ptlrpc]
       [<ffffffff810b099f>] kthread+0xcf/0xe0
       [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40
       [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90
       [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40
      Mem-Info:
      active_anon:20651 inactive_anon:19487 isolated_anon:0
       active_file:94123 inactive_file:74066 isolated_file:0
       unevictable:0 dirty:5223 writeback:0 unstable:0
       slab_reclaimable:59778 slab_unreclaimable:52270
       mapped:13531 shmem:22506 pagetables:1807 bounce:0
       free:19187 free_pcp:60 free_cma:0
      Node 0 DMA free:7104kB min:416kB low:520kB high:624kB active_anon:132kB inactive_anon:256kB active_file:1324kB inactive_file:1360kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:572kB shmem:0kB slab_reclaimable:436kB slab_unreclaimable:456kB kernel_stack:96kB pagetables:160kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 1663 1663 1663 
      Node 0 DMA32 free:69644kB min:44636kB low:55792kB high:66952kB active_anon:82472kB inactive_anon:77692kB active_file:375252kB inactive_file:294820kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080756kB managed:1704756kB mlocked:0kB dirty:20892kB writeback:0kB mapped:53552kB shmem:90024kB slab_reclaimable:238676kB slab_unreclaimable:208624kB kernel_stack:4096kB pagetables:7068kB unstable:0kB bounce:0kB free_pcp:232kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 0 DMA: 20*4kB (UEM) 18*8kB (UEM) 16*16kB (UEM) 11*32kB (UEM) 4*64kB (M) 1*128kB (M) 1*256kB (U) 1*512kB (E) 1*1024kB (E) 2*2048kB (EM) 0*4096kB = 7104kB
      Node 0 DMA32: 5057*4kB (UEM) 2445*8kB (UEM) 1319*16kB (UEM) 233*32kB (UEM) 9*64kB (UEM) 2*128kB (UE) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 69180kB
      Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      190492 total pagecache pages
      14 pages in swap cache
      Swap cache stats: add 153, delete 139, find 0/0
      Free swap  = 3670424kB
      Total swap = 3671036kB
      524185 pages RAM
      0 pages HighMem/MovableOnly
      94023 pages reserved 
      LustreError: 11-0: lustre-MDT0002-osp-MDT0000: operation out_update to node 0@lo failed: rc = -12
      LustreError: Skipped 1 previous similar message
      LustreError: 9808:0:(layout.c:2099:__req_capsule_get()) @@@ Wrong buffer for field `object_update_reply' (1 of 1) in format `OUT_UPDATE': 0 vs. 4096 (server)
        req@ffff880064300600 x1588109799891040/t0(0) o1000->lustre-MDT0002-osp-MDT0000@0@lo:24/4 lens 376/192 e 0 to 0 dl 1514555485 ref 2 fl Interpret:ReM/0/0 rc -12/-12
      LustreError: 15257:0:(echo_client.c:2048:echo_md_destroy_internal()) Can not unlink child 2147486512: rc = -12
      LustreError: 15257:0:(echo_client.c:2109:echo_destroy_object()) Can not unlink child (null): rc = -12
      LustreError: 9832:0:(llog_cat.c:795:llog_cat_cancel_records()) lustre-MDT0002-osp-MDT0000: fail to cancel 0 of 1 llog-records: rc = -116
      LustreError: 15374:0:(echo_client.c:2048:echo_md_destroy_internal()) Can not unlink child MDT0000-tests: rc = -39
      LustreError: 15374:0:(echo_client.c:2048:echo_md_destroy_internal()) Skipped 3 previous similar messages
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark  mds-survey test_1: @@@@@@ FAIL: mds-survey failed
      Lustre: DEBUG MARKER: mds-survey test_1: @@@@@@ FAIL: mds-survey failed
      

      Maloo report: https://testing.hpdd.intel.com/test_sets/85960c0e-eca3-11e7-a169-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: