Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5077

insanity test_1: out of memory on MDT in crypto_create_tfm()

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Duplicate
    • Affects Version/s: Lustre 2.6.0, Lustre 2.5.2, Lustre 2.5.3
    • Fix Version/s: None
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      14011

      Description

      This issue was created by maloo for wangdi <di.wang@intel.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/4f1c047c-dd60-11e3-9f1e-52540035b04c.

      The sub-test test_1 failed with the following error:

      16:39:44:Lustre: DEBUG MARKER: == insanity test 1: MDS/MDS failure == 15:39:30 (1400279970)
      16:39:44:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
      16:39:44:Lustre: DEBUG MARKER: umount -d /mnt/mds1
      16:39:44:Lustre: Failing over lustre-MDT0000
      16:42:00:LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.1.4.136@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
      16:42:00:LustreError: Skipped 2839 previous similar messages
      16:42:00:Lustre: 11949:0:(client.c:1914:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1400279972/real 1400279972] req@ffff88007a72f000 x1468278858237216/t0(0) o251->MGC10.1.4.144@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1400279978 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      16:42:00:Lustre: 11949:0:(client.c:1914:ptlrpc_expire_one_request()) Skipped 12 previous similar messages
      16:42:00:Lustre: server umount lustre-MDT0000 complete
      16:42:00:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      16:42:00:Lustre: DEBUG MARKER: hostname
      16:42:00:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1
      16:42:00:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1
      16:42:00:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl /dev/lvm-Role_MDS/P1 /mnt/mds1
      16:42:00:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts:
      16:42:00:LustreError: 12262:0:(mdt_coordinator.c:976:mdt_hsm_cdt_start()) lustre-MDT0000: cannot take the layout locks needed for registered restore: -2
      16:42:00:Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 5 clients reconnect
      16:42:00:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
      16:42:00:LNet: 12440:0:(debug.c:218:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
      16:42:00:LNet: 12440:0:(debug.c:218:libcfs_debug_str2mask()) Skipped 1 previous similar message
      16:42:00:Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2>/dev/null
      16:42:00:Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound).
      16:42:00:Lustre: 12222:0:(service.c:1511:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=0, svcEst=21, delay=0(jiff)
      16:42:00:Lustre: 12222:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff88002ef5e400 x1468299694051176/t0(0) o400>ac3e68be-778c-5e81-3dfa-5ce31a3eb36d@10.1.4.142@tcp:0/0 lens 224/0 e 225451 to 0 dl 1400280113 ref 2 fl Complete:H/c0/ffffffff rc 0/-1
      16:42:00:Lustre: 12229:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff880022bb0000 x1468278870043488/t0(0) o400>lustre-MDT0001-mdtlov_UUID@10.1.4.140@tcp:0/0 lens 224/0 e 1032545 to 0 dl 1400280114 ref 2 fl Complete:/c0/ffffffff rc 0/-1
      16:42:00:mdt_out00_000 invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
      16:42:00:mdt_out00_000 cpuset=/ mems_allowed=0
      16:42:00:Pid: 12229, comm: mdt_out00_000 Not tainted 2.6.32-431.17.1.el6_lustre.g684b38d.x86_64 #1
      16:42:00:Call Trace:
      16:42:00: [<ffffffff810d0211>] ? cpuset_print_task_mems_allowed+0x91/0xb0
      16:42:00: [<ffffffff811225c0>] ? dump_header+0x90/0x1b0
      16:42:00: [<ffffffff8129058d>] ? __bitmap_intersects+0x1d/0xa0
      16:42:00: [<ffffffff8122781c>] ? security_real_capable_noaudit+0x3c/0x70
      16:42:00: [<ffffffff81122a42>] ? oom_kill_process+0x82/0x2a0
      16:42:00: [<ffffffff8112293e>] ? select_bad_process+0x9e/0x120
      16:42:00: [<ffffffff81122e80>] ? out_of_memory+0x220/0x3c0
      16:42:00: [<ffffffff8112f79f>] ? __alloc_pages_nodemask+0x89f/0x8d0
      16:42:01: [<ffffffff8116e082>] ? kmem_getpages+0x62/0x170
      16:42:01: [<ffffffff8116ec9a>] ? fallback_alloc+0x1ba/0x270
      16:42:01: [<ffffffff8116e6ef>] ? cache_grow+0x2cf/0x320
      16:42:01: [<ffffffff8116ea19>] ? ____cache_alloc_node+0x99/0x160
      16:42:01: [<ffffffff8124bc3c>] ? crypto_create_tfm+0x3c/0xe0
      16:42:01: [<ffffffff8116f7e9>] ? __kmalloc+0x189/0x220
      16:42:01: [<ffffffff8124bc3c>] ? crypto_create_tfm+0x3c/0xe0
      16:42:01: [<ffffffff812525d8>] ? crypto_init_shash_ops+0x68/0x100
      16:42:01: [<ffffffff8124bd4a>] ? __crypto_alloc_tfm+0x6a/0x130
      16:42:01: [<ffffffff8124c5ba>] ? crypto_alloc_base+0x5a/0xb0
      16:42:01: [<ffffffffa04a2684>] ? cfs_percpt_unlock+0x24/0xb0 [libcfs]
      16:42:01: [<ffffffffa048d2ca>] ? cfs_crypto_hash_alloc+0x7a/0x290 [libcfs]
      16:42:01: [<ffffffffa048d5da>] ? cfs_crypto_hash_digest+0x6a/0xf0 [libcfs]
      16:42:01: [<ffffffff8116f86c>] ? __kmalloc+0x20c/0x220
      16:42:01: [<ffffffffa082bd73>] ? lustre_msg_calc_cksum+0xd3/0x130 [ptlrpc]
      16:42:01: [<ffffffffa0865881>] ? null_authorize+0xa1/0x100 [ptlrpc]
      16:42:01: [<ffffffffa0854a56>] ? sptlrpc_svc_wrap_reply+0x56/0x1c0 [ptlrpc]
      16:42:01: [<ffffffffa08241ec>] ? ptlrpc_send_reply+0x1fc/0x7f0 [ptlrpc]
      16:42:01: [<ffffffffa083b675>] ? ptlrpc_at_check_timed+0xc05/0x1360 [ptlrpc]
      16:42:01: [<ffffffffa0832c09>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
      16:42:01: [<ffffffffa083cf68>] ? ptlrpc_main+0x1198/0x1980 [ptlrpc]
      16:42:01: [<ffffffffa083bdd0>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
      16:42:01: [<ffffffff8109ab56>] ? kthread+0x96/0xa0
      16:42:01: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
      16:42:01: [<ffffffff8109aac0>] ? kthread+0x0/0xa0
      16:42:01: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      16:42:01:Mem-Info:

      test failed to respond and timed out

      Info required for matching: insanity 1

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                niu Niu Yawei (Inactive)
                Reporter:
                maloo Maloo
              • Votes:
                0 Vote for this issue
                Watchers:
                16 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: