Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4432

recovery-mds-scale test_failover_ost: tar: Cannot write: Cannot allocate memory

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.5.1
    • None

    • Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/5/
      Distro/Arch: RHEL6.4/x86_64
      FSTYPE=zfs
      TEST_GROUP=failover
    • 3
    • 12176

    Description

      While running recovery-mds-scale test failover_ost, tar operation on one of the client nodes failed as follows:

      tar: etc/libreport/plugins/rhtsupport.conf: Cannot write: Cannot allocate memory
      tar: Exiting with failure status due to previous errors
      

      Maloo report: https://maloo.whamcloud.com/test_sets/e8a2857a-7529-11e3-936d-52540035b04c

      Attachments

        Issue Links

          Activity

            [LU-4432] recovery-mds-scale test_failover_ost: tar: Cannot write: Cannot allocate memory

            Shows mode:0x40 == __GFP_IO, but missing __GFP_WAIT from LU-4357.

            adilger Andreas Dilger added a comment - Shows mode:0x40 == __GFP_IO, but missing __GFP_WAIT from LU-4357 .
            yujian Jian Yu added a comment -

            Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/13/
            Distro/Arch: RHEL6.4/x86_64
            TEST_GROUP=failover

            The similar issue occurred on client while running recovery-double-scale test:

            10:54:02:Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock client-27vm3:client-27vm7:/lustre /mnt/lustre
            10:54:02:mount.lustre: page allocation failure. order:2, mode:0x40
            10:54:02:Pid: 10206, comm: mount.lustre Not tainted 2.6.32-358.18.1.el6.x86_64 #1
            10:54:02:Call Trace:
            10:54:02: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0
            10:54:03: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170
            10:54:03: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270
            10:54:03: [<ffffffff811673ff>] ? cache_grow+0x2cf/0x320
            10:54:03: [<ffffffff81167729>] ? ____cache_alloc_node+0x99/0x160
            10:54:03: [<ffffffffa0706bc6>] ? null_alloc_repbuf+0x66/0x3b0 [ptlrpc]
            10:54:03: [<ffffffff811684f9>] ? __kmalloc+0x189/0x220
            10:54:03: [<ffffffffa0706bc6>] ? null_alloc_repbuf+0x66/0x3b0 [ptlrpc]
            10:54:03: [<ffffffffa06f4f25>] ? sptlrpc_cli_alloc_repbuf+0x175/0x220 [ptlrpc]
            10:54:03: [<ffffffffa06c88ec>] ? ptl_send_rpc+0x93c/0xc40 [ptlrpc]
            10:54:03: [<ffffffff81281734>] ? snprintf+0x34/0x40
            10:54:04: [<ffffffffa03e77b1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            10:54:04: [<ffffffffa06bd894>] ? ptlrpc_send_new_req+0x454/0x790 [ptlrpc]
            10:54:04: [<ffffffffa06c2e7e>] ? ptlrpc_set_wait+0x5be/0x860 [ptlrpc]
            10:54:04: [<ffffffffa053d8ec>] ? lustre_get_jobid+0xcc/0x380 [obdclass]
            10:54:04: [<ffffffffa06cc316>] ? lustre_msg_set_jobid+0xb6/0x140 [ptlrpc]
            10:54:04: [<ffffffffa06c31a7>] ? ptlrpc_queue_wait+0x87/0x220 [ptlrpc]
            10:54:04: [<ffffffffa06e12d8>] ? llog_client_read_header+0xd8/0x5e0 [ptlrpc]
            10:54:04: [<ffffffffa0533d2c>] ? llog_init_handle+0xcc/0x960 [obdclass]
            10:54:04: [<ffffffffa0565683>] ? class_config_parse_llog+0x1a3/0x330 [obdclass]
            10:54:04: [<ffffffffa09f0302>] ? mgc_process_log+0xd22/0x18e0 [mgc]
            10:54:04: [<ffffffffa09f1630>] ? config_recover_log_add+0x150/0x280 [mgc]
            10:54:04: [<ffffffffa09ea360>] ? mgc_blocking_ast+0x0/0x810 [mgc]
            10:54:04: [<ffffffffa06aa530>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc]
            10:54:05: [<ffffffffa09f24a5>] ? mgc_process_config+0x645/0x11d0 [mgc]
            10:54:05: [<ffffffffa0574626>] ? lustre_process_log+0x256/0xa60 [obdclass]
            10:54:05: [<ffffffff8128ca66>] ? __percpu_counter_init+0x56/0x70
            10:54:05: [<ffffffffa0a732d8>] ? ll_fill_super+0xaa8/0x14d0 [lustre]
            10:54:05: [<ffffffffa057993d>] ? lustre_fill_super+0x34d/0x510 [obdclass]
            10:54:05: [<ffffffffa05795f0>] ? lustre_fill_super+0x0/0x510 [obdclass]
            10:54:05: [<ffffffff811845cf>] ? get_sb_nodev+0x5f/0xa0
            10:54:05: [<ffffffffa0571545>] ? lustre_get_sb+0x25/0x30 [obdclass]
            10:54:05: [<ffffffff81183beb>] ? vfs_kern_mount+0x7b/0x1b0
            10:54:05: [<ffffffff81183d92>] ? do_kern_mount+0x52/0x130
            10:54:05: [<ffffffff811a3ef2>] ? do_mount+0x2d2/0x8d0
            10:54:05: [<ffffffff811a4580>] ? sys_mount+0x90/0xe0
            10:54:05: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
            10:54:05:Mem-Info:
            

            Maloo report: https://maloo.whamcloud.com/sub_tests/d75a7160-7f3d-11e3-94f3-52540035b04c

            yujian Jian Yu added a comment - Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/13/ Distro/Arch: RHEL6.4/x86_64 TEST_GROUP=failover The similar issue occurred on client while running recovery-double-scale test: 10:54:02:Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock client-27vm3:client-27vm7:/lustre /mnt/lustre 10:54:02:mount.lustre: page allocation failure. order:2, mode:0x40 10:54:02:Pid: 10206, comm: mount.lustre Not tainted 2.6.32-358.18.1.el6.x86_64 #1 10:54:02:Call Trace: 10:54:02: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0 10:54:03: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170 10:54:03: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270 10:54:03: [<ffffffff811673ff>] ? cache_grow+0x2cf/0x320 10:54:03: [<ffffffff81167729>] ? ____cache_alloc_node+0x99/0x160 10:54:03: [<ffffffffa0706bc6>] ? null_alloc_repbuf+0x66/0x3b0 [ptlrpc] 10:54:03: [<ffffffff811684f9>] ? __kmalloc+0x189/0x220 10:54:03: [<ffffffffa0706bc6>] ? null_alloc_repbuf+0x66/0x3b0 [ptlrpc] 10:54:03: [<ffffffffa06f4f25>] ? sptlrpc_cli_alloc_repbuf+0x175/0x220 [ptlrpc] 10:54:03: [<ffffffffa06c88ec>] ? ptl_send_rpc+0x93c/0xc40 [ptlrpc] 10:54:03: [<ffffffff81281734>] ? snprintf+0x34/0x40 10:54:04: [<ffffffffa03e77b1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 10:54:04: [<ffffffffa06bd894>] ? ptlrpc_send_new_req+0x454/0x790 [ptlrpc] 10:54:04: [<ffffffffa06c2e7e>] ? ptlrpc_set_wait+0x5be/0x860 [ptlrpc] 10:54:04: [<ffffffffa053d8ec>] ? lustre_get_jobid+0xcc/0x380 [obdclass] 10:54:04: [<ffffffffa06cc316>] ? lustre_msg_set_jobid+0xb6/0x140 [ptlrpc] 10:54:04: [<ffffffffa06c31a7>] ? ptlrpc_queue_wait+0x87/0x220 [ptlrpc] 10:54:04: [<ffffffffa06e12d8>] ? llog_client_read_header+0xd8/0x5e0 [ptlrpc] 10:54:04: [<ffffffffa0533d2c>] ? llog_init_handle+0xcc/0x960 [obdclass] 10:54:04: [<ffffffffa0565683>] ? class_config_parse_llog+0x1a3/0x330 [obdclass] 10:54:04: [<ffffffffa09f0302>] ? mgc_process_log+0xd22/0x18e0 [mgc] 10:54:04: [<ffffffffa09f1630>] ? config_recover_log_add+0x150/0x280 [mgc] 10:54:04: [<ffffffffa09ea360>] ? mgc_blocking_ast+0x0/0x810 [mgc] 10:54:04: [<ffffffffa06aa530>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc] 10:54:05: [<ffffffffa09f24a5>] ? mgc_process_config+0x645/0x11d0 [mgc] 10:54:05: [<ffffffffa0574626>] ? lustre_process_log+0x256/0xa60 [obdclass] 10:54:05: [<ffffffff8128ca66>] ? __percpu_counter_init+0x56/0x70 10:54:05: [<ffffffffa0a732d8>] ? ll_fill_super+0xaa8/0x14d0 [lustre] 10:54:05: [<ffffffffa057993d>] ? lustre_fill_super+0x34d/0x510 [obdclass] 10:54:05: [<ffffffffa05795f0>] ? lustre_fill_super+0x0/0x510 [obdclass] 10:54:05: [<ffffffff811845cf>] ? get_sb_nodev+0x5f/0xa0 10:54:05: [<ffffffffa0571545>] ? lustre_get_sb+0x25/0x30 [obdclass] 10:54:05: [<ffffffff81183beb>] ? vfs_kern_mount+0x7b/0x1b0 10:54:05: [<ffffffff81183d92>] ? do_kern_mount+0x52/0x130 10:54:05: [<ffffffff811a3ef2>] ? do_mount+0x2d2/0x8d0 10:54:05: [<ffffffff811a4580>] ? sys_mount+0x90/0xe0 10:54:05: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b 10:54:05:Mem-Info: Maloo report: https://maloo.whamcloud.com/sub_tests/d75a7160-7f3d-11e3-94f3-52540035b04c
            m.magrys Marek Magrys added a comment -

            I think I did report a similar issue some time ago: LU-4034

            m.magrys Marek Magrys added a comment - I think I did report a similar issue some time ago: LU-4034
            yujian Jian Yu added a comment - More instance on Lustre b2_5 branch: https://maloo.whamcloud.com/test_sets/2df52e18-7ab4-11e3-8b19-52540035b04c
            yujian Jian Yu added a comment -

            Lustre client build: http://build.whamcloud.com/job/lustre-b2_4/70/ (2.4.2)
            Lustre server build: http://build.whamcloud.com/job/lustre-b2_5/8/

            performance-sanity test 8 failed as follows:

            rank 0: stat(f173313) error: Cannot allocate memory
            

            Console log on OSS:

            08:33:51:Lustre: DEBUG MARKER: ===== mdsrate-stat-large.sh
            08:33:51:ldlm_cn00_006: page allocation failure. order:1, mode:0x40
            08:33:51:Pid: 640, comm: ldlm_cn00_006 Not tainted 2.6.32-358.18.1.el6_lustre.g6093be6.x86_64 #1
            08:33:52:Call Trace:
            08:33:52: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0
            08:33:52: [<ffffffff8127f72c>] ? put_dec+0x10c/0x110
            08:33:53: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170
            08:33:53: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270
            08:33:53: [<ffffffff811673ff>] ? cache_grow+0x2cf/0x320
            08:33:54: [<ffffffff81167729>] ? ____cache_alloc_node+0x99/0x160
            08:33:54: [<ffffffff811688f0>] ? kmem_cache_alloc_node_trace+0x90/0x200
            08:33:54: [<ffffffff81168b0d>] ? __kmalloc_node+0x4d/0x60
            08:33:54: [<ffffffffa0457651>] ? cfs_cpt_malloc+0x31/0x60 [libcfs]
            08:33:54: [<ffffffffa0a42b48>] ? ptlrpc_alloc_rqbd+0x1e8/0x670 [ptlrpc]
            08:33:54: [<ffffffffa0a430b5>] ? ptlrpc_grow_req_bufs+0xe5/0x2a0 [ptlrpc]
            08:33:54: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
            08:33:55: [<ffffffffa0a474bd>] ? ptlrpc_main+0xb5d/0x1740 [ptlrpc]
            08:33:55: [<ffffffffa0a46960>] ? ptlrpc_main+0x0/0x1740 [ptlrpc]
            08:33:55: [<ffffffff81096a36>] ? kthread+0x96/0xa0
            08:33:55: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
            08:33:56: [<ffffffff810969a0>] ? kthread+0x0/0xa0
            08:33:56: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
            

            Maloo report: https://maloo.whamcloud.com/test_sets/51d6d872-78ed-11e3-a27b-52540035b04c

            yujian Jian Yu added a comment - Lustre client build: http://build.whamcloud.com/job/lustre-b2_4/70/ (2.4.2) Lustre server build: http://build.whamcloud.com/job/lustre-b2_5/8/ performance-sanity test 8 failed as follows: rank 0: stat(f173313) error: Cannot allocate memory Console log on OSS: 08:33:51:Lustre: DEBUG MARKER: ===== mdsrate-stat-large.sh 08:33:51:ldlm_cn00_006: page allocation failure. order:1, mode:0x40 08:33:51:Pid: 640, comm: ldlm_cn00_006 Not tainted 2.6.32-358.18.1.el6_lustre.g6093be6.x86_64 #1 08:33:52:Call Trace: 08:33:52: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0 08:33:52: [<ffffffff8127f72c>] ? put_dec+0x10c/0x110 08:33:53: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170 08:33:53: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270 08:33:53: [<ffffffff811673ff>] ? cache_grow+0x2cf/0x320 08:33:54: [<ffffffff81167729>] ? ____cache_alloc_node+0x99/0x160 08:33:54: [<ffffffff811688f0>] ? kmem_cache_alloc_node_trace+0x90/0x200 08:33:54: [<ffffffff81168b0d>] ? __kmalloc_node+0x4d/0x60 08:33:54: [<ffffffffa0457651>] ? cfs_cpt_malloc+0x31/0x60 [libcfs] 08:33:54: [<ffffffffa0a42b48>] ? ptlrpc_alloc_rqbd+0x1e8/0x670 [ptlrpc] 08:33:54: [<ffffffffa0a430b5>] ? ptlrpc_grow_req_bufs+0xe5/0x2a0 [ptlrpc] 08:33:54: [<ffffffff81063410>] ? default_wake_function+0x0/0x20 08:33:55: [<ffffffffa0a474bd>] ? ptlrpc_main+0xb5d/0x1740 [ptlrpc] 08:33:55: [<ffffffffa0a46960>] ? ptlrpc_main+0x0/0x1740 [ptlrpc] 08:33:55: [<ffffffff81096a36>] ? kthread+0x96/0xa0 08:33:55: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20 08:33:56: [<ffffffff810969a0>] ? kthread+0x0/0xa0 08:33:56: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 Maloo report: https://maloo.whamcloud.com/test_sets/51d6d872-78ed-11e3-a27b-52540035b04c

            People

              wc-triage WC Triage
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: