[LU-5077] insanity test_1: out of memory on MDT in crypto_create_tfm() - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Blocker
Fix Version/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.5.2, Lustre 2.5.3
Labels:
None

Severity:
3
Rank (Obsolete):
14011

Description

This issue was created by maloo for wangdi <di.wang@intel.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/4f1c047c-dd60-11e3-9f1e-52540035b04c.

The sub-test test_1 failed with the following error:

16:39:44:Lustre: DEBUG MARKER: == insanity test 1: MDS/MDS failure == 15:39:30 (1400279970)
16:39:44:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
16:39:44:Lustre: DEBUG MARKER: umount -d /mnt/mds1
16:39:44:Lustre: Failing over lustre-MDT0000
16:42:00:LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.1.4.136@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
16:42:00:LustreError: Skipped 2839 previous similar messages
16:42:00:Lustre: 11949:0:(client.c:1914:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1400279972/real 1400279972] req@ffff88007a72f000 x1468278858237216/t0(0) o251->MGC10.1.4.144@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1400279978 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
16:42:00:Lustre: 11949:0:(client.c:1914:ptlrpc_expire_one_request()) Skipped 12 previous similar messages
16:42:00:Lustre: server umount lustre-MDT0000 complete
16:42:00:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
16:42:00:Lustre: DEBUG MARKER: hostname
16:42:00:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1
16:42:00:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1
16:42:00:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl /dev/lvm-Role_MDS/P1 /mnt/mds1
16:42:00:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts:
16:42:00:LustreError: 12262:0:(mdt_coordinator.c:976:mdt_hsm_cdt_start()) lustre-MDT0000: cannot take the layout locks needed for registered restore: -2
16:42:00:Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 5 clients reconnect
16:42:00:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
16:42:00:LNet: 12440:0:(debug.c:218:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
16:42:00:LNet: 12440:0:(debug.c:218:libcfs_debug_str2mask()) Skipped 1 previous similar message
16:42:00:Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2>/dev/null
16:42:00:Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound).
16:42:00:Lustre: 12222:0:(service.c:1511:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=0, svcEst=21, delay=0(jiff)
16:42:00:Lustre: 12222:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (~~1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff88002ef5e400 x1468299694051176/t0(0) o400~~>ac3e68be-778c-5e81-3dfa-5ce31a3eb36d@10.1.4.142@tcp:0/0 lens 224/0 e 225451 to 0 dl 1400280113 ref 2 fl Complete:H/c0/ffffffff rc 0/-1
16:42:00:Lustre: 12229:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (~~1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff880022bb0000 x1468278870043488/t0(0) o400~~>lustre-MDT0001-mdtlov_UUID@10.1.4.140@tcp:0/0 lens 224/0 e 1032545 to 0 dl 1400280114 ref 2 fl Complete:/c0/ffffffff rc 0/-1
16:42:00:mdt_out00_000 invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
16:42:00:mdt_out00_000 cpuset=/ mems_allowed=0
16:42:00:Pid: 12229, comm: mdt_out00_000 Not tainted 2.6.32-431.17.1.el6_lustre.g684b38d.x86_64 #1
16:42:00:Call Trace:
16:42:00: [<ffffffff810d0211>] ? cpuset_print_task_mems_allowed+0x91/0xb0
16:42:00: [<ffffffff811225c0>] ? dump_header+0x90/0x1b0
16:42:00: [<ffffffff8129058d>] ? __bitmap_intersects+0x1d/0xa0
16:42:00: [<ffffffff8122781c>] ? security_real_capable_noaudit+0x3c/0x70
16:42:00: [<ffffffff81122a42>] ? oom_kill_process+0x82/0x2a0
16:42:00: [<ffffffff8112293e>] ? select_bad_process+0x9e/0x120
16:42:00: [<ffffffff81122e80>] ? out_of_memory+0x220/0x3c0
16:42:00: [<ffffffff8112f79f>] ? __alloc_pages_nodemask+0x89f/0x8d0
16:42:01: [<ffffffff8116e082>] ? kmem_getpages+0x62/0x170
16:42:01: [<ffffffff8116ec9a>] ? fallback_alloc+0x1ba/0x270
16:42:01: [<ffffffff8116e6ef>] ? cache_grow+0x2cf/0x320
16:42:01: [<ffffffff8116ea19>] ? ____cache_alloc_node+0x99/0x160
16:42:01: [<ffffffff8124bc3c>] ? crypto_create_tfm+0x3c/0xe0
16:42:01: [<ffffffff8116f7e9>] ? __kmalloc+0x189/0x220
16:42:01: [<ffffffff8124bc3c>] ? crypto_create_tfm+0x3c/0xe0
16:42:01: [<ffffffff812525d8>] ? crypto_init_shash_ops+0x68/0x100
16:42:01: [<ffffffff8124bd4a>] ? __crypto_alloc_tfm+0x6a/0x130
16:42:01: [<ffffffff8124c5ba>] ? crypto_alloc_base+0x5a/0xb0
16:42:01: [<ffffffffa04a2684>] ? cfs_percpt_unlock+0x24/0xb0 [libcfs]
16:42:01: [<ffffffffa048d2ca>] ? cfs_crypto_hash_alloc+0x7a/0x290 [libcfs]
16:42:01: [<ffffffffa048d5da>] ? cfs_crypto_hash_digest+0x6a/0xf0 [libcfs]
16:42:01: [<ffffffff8116f86c>] ? __kmalloc+0x20c/0x220
16:42:01: [<ffffffffa082bd73>] ? lustre_msg_calc_cksum+0xd3/0x130 [ptlrpc]
16:42:01: [<ffffffffa0865881>] ? null_authorize+0xa1/0x100 [ptlrpc]
16:42:01: [<ffffffffa0854a56>] ? sptlrpc_svc_wrap_reply+0x56/0x1c0 [ptlrpc]
16:42:01: [<ffffffffa08241ec>] ? ptlrpc_send_reply+0x1fc/0x7f0 [ptlrpc]
16:42:01: [<ffffffffa083b675>] ? ptlrpc_at_check_timed+0xc05/0x1360 [ptlrpc]
16:42:01: [<ffffffffa0832c09>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
16:42:01: [<ffffffffa083cf68>] ? ptlrpc_main+0x1198/0x1980 [ptlrpc]
16:42:01: [<ffffffffa083bdd0>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
16:42:01: [<ffffffff8109ab56>] ? kthread+0x96/0xa0
16:42:01: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
16:42:01: [<ffffffff8109aac0>] ? kthread+0x0/0xa0
16:42:01: [<ffffffff8100c200>] ? child_rip+0x0/0x20
16:42:01:Mem-Info:

test failed to respond and timed out

Info required for matching: insanity 1

Attachments

Issue Links

duplicates

LU-5670 replay-vbr test_4c: MDS OOM

Closed

is duplicated by

LU-5121 replay-ost-single test_0b: OOM on the OST

Resolved

LU-5328 Failure on test suite replay-vbr test_7a: MDS oom

Resolved

LU-5593 sanity: OOM during recovery

Resolved

LU-5745 sanity test_17m: mds reboot from OOM killer

Resolved

is related to

LU-5079 conf-sanity test_47 timeout

Resolved

LU-5595 replay-single test_38: MDS 1 OOM

Resolved

is related to

LU-5420 Failure on test suite sanity test_17m: mount MDS failed, Input/output error

Resolved

LU-5079 conf-sanity test_47 timeout

Resolved

LU-4578 Early replies do not honor at_max

Resolved

LU-5121 replay-ost-single test_0b: OOM on the OST

Resolved

(2 is related to, 4 is related to )

Activity

[LU-5077] insanity test_1: out of memory on MDT in crypto_create_tfm()

Andreas Dilger added a comment - 14/Oct/14 5:18 PM

Yu Jian, it isn't clear why you made the patch http://review.whamcloud.com/12290 to revert commit bfae5a4 fortestonly? If this fixes the problem then we would want to land this patch to master.

Andreas Dilger added a comment - 14/Oct/14 5:18 PM Yu Jian, it isn't clear why you made the patch http://review.whamcloud.com/12290 to revert commit bfae5a4 fortestonly? If this fixes the problem then we would want to land this patch to master.

Jian Yu added a comment - 13/Oct/14 11:38 PM

Here is a test only patch to run replay-vbr on master branch: http://review.whamcloud.com/12286
And here is a test only patch to run replay-vbr on master branch with commit bfae5a4 reverted: http://review.whamcloud.com/12290

Jian Yu added a comment - 13/Oct/14 11:38 PM Here is a test only patch to run replay-vbr on master branch: http://review.whamcloud.com/12286 And here is a test only patch to run replay-vbr on master branch with commit bfae5a4 reverted: http://review.whamcloud.com/12290

Jian Yu added a comment - 13/Oct/14 5:22 PM

On master branch:

commit bfae5a4e4a37e4057200bcc70dbff14e18a00797
Author: Jinshan Xiong <jinshan.xiong@intel.com>
Date:   Thu Feb 6 23:03:25 2014 -0800

    LU-4300 ldlm: ELC picks locks in a safer policy
    
    Change the policy of ELC to pick locks that have no dirty pages,
    no page in writeback state, and no locked pages.

$ git revert bfae5a4e4a37e4057200bcc70dbff14e18a00797
error: could not revert bfae5a4... LU-4300 ldlm: ELC picks locks in a safer policy
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' or 'git rm <paths>'
hint: and commit the result with 'git commit'

We cannot simply revert the patch. I'm looking at the conflicts.

Jian Yu added a comment - 13/Oct/14 5:22 PM On master branch: commit bfae5a4e4a37e4057200bcc70dbff14e18a00797 Author: Jinshan Xiong <jinshan.xiong@intel.com> Date: Thu Feb 6 23:03:25 2014 -0800 LU-4300 ldlm: ELC picks locks in a safer policy Change the policy of ELC to pick locks that have no dirty pages, no page in writeback state, and no locked pages. $ git revert bfae5a4e4a37e4057200bcc70dbff14e18a00797 error: could not revert bfae5a4... LU-4300 ldlm: ELC picks locks in a safer policy hint: after resolving the conflicts, mark the corrected paths hint: with 'git add <paths>' or 'git rm <paths>' hint: and commit the result with 'git commit' We cannot simply revert the patch. I'm looking at the conflicts.

Andreas Dilger added a comment - 13/Oct/14 6:23 AM

I wonder if this relates to ~~LU-5727~~ ?

Andreas Dilger added a comment - 13/Oct/14 6:23 AM I wonder if this relates to LU-5727 ?

Niu Yawei (Inactive) added a comment - 08/Oct/14 2:17 PM

patch of using crc32_le() directly instead of crypto API: http://review.whamcloud.com/#/c/12230

Niu Yawei (Inactive) added a comment - 08/Oct/14 2:17 PM patch of using crc32_le() directly instead of crypto API: http://review.whamcloud.com/#/c/12230

Niu Yawei (Inactive) added a comment - 08/Oct/14 3:22 AM

Hi, I'm just back from vacation, I'll try to compose a patch to add some debug code and use the direct crc32 function soon.

Another thing confused me is that why all OOM happened just after recovery done? What kind of operation immediately after recovery caused OOM?

Niu Yawei (Inactive) added a comment - 08/Oct/14 3:22 AM Hi, I'm just back from vacation, I'll try to compose a patch to add some debug code and use the direct crc32 function soon. Another thing confused me is that why all OOM happened just after recovery done? What kind of operation immediately after recovery caused OOM?

nasf (Inactive) added a comment - 04/Oct/14 5:19 AM - edited

Another failure instance: replay-single test_48 timeout, the MDS out_of_memory with similar stack.
https://testing.hpdd.intel.com/sub_tests/fe690dc4-4b3a-11e4-b999-5254006e85c2

nasf (Inactive) added a comment - 04/Oct/14 5:19 AM - edited Another failure instance: replay-single test_48 timeout, the MDS out_of_memory with similar stack. https://testing.hpdd.intel.com/sub_tests/fe690dc4-4b3a-11e4-b999-5254006e85c2

Andreas Dilger added a comment - 02/Oct/14 6:54 PM

James, as explained earlier in this ticket, in this particular instance Lustre is only using CRC32 to compute the checksum of the early reply message and does not need to support different checksum types. It is only checksumming 184 bytes of data that is sent if an RPC would otherwise have timed out (i.e. after tens or hundreds of seconds), so it is largely irrelevant whether ARM/PPC or even x86 have acceleration for CRC32 in this case. It may even be faster to call the crc32_le() function directly rather than going to external cryptoAPI code with hardware acceleration for such small messages.

It isn't clear to me yet why this code is failing in this particular case, but simplifying this code is never a bad thing, and maybe removing the cryptoAPI usage will tell us more about why/how it is running out of memory (i.e. do we start to fail in the crc32_le() or some other part of this same stack, or stop failing entirely).

The relevant commit that added the crptoAPI code to lustre_msg_calc_cksum() was http://review.whamcloud.com/3098 (commit 353e3c2f8d9). It looks like the compatibility for the older Lustre 1.6 code could just be deleted at this point, leaving something like:

__u32 lustre_msg_calc_cksum(struct lustre_msg *msg)
{
        switch (msg->lm_magic) {
        case LUSTRE_MSG_MAGIC_V2: {
                struct ptlrpc_body *pb = lustre_msg_ptlrpc_body(msg);
                LASSERTF(pb != NULL, "invalid msg %p: no ptlrpc body!\n", msg);
                return crc32_le(~(__u32)0, (unsigned char *)pb,
                                lustre_msg_buflen(msg, MSG_PTLRPC_BODY_OFF));
        }
        default:
                CERROR("incorrect message magic: %08x\n", msg->lm_magic);
                return 0;
        }
}

Andreas Dilger added a comment - 02/Oct/14 6:54 PM James, as explained earlier in this ticket, in this particular instance Lustre is only using CRC32 to compute the checksum of the early reply message and does not need to support different checksum types. It is only checksumming 184 bytes of data that is sent if an RPC would otherwise have timed out (i.e. after tens or hundreds of seconds), so it is largely irrelevant whether ARM/PPC or even x86 have acceleration for CRC32 in this case. It may even be faster to call the crc32_le() function directly rather than going to external cryptoAPI code with hardware acceleration for such small messages. It isn't clear to me yet why this code is failing in this particular case, but simplifying this code is never a bad thing, and maybe removing the cryptoAPI usage will tell us more about why/how it is running out of memory (i.e. do we start to fail in the crc32_le() or some other part of this same stack, or stop failing entirely). The relevant commit that added the crptoAPI code to lustre_msg_calc_cksum() was http://review.whamcloud.com/3098 (commit 353e3c2f8d9). It looks like the compatibility for the older Lustre 1.6 code could just be deleted at this point, leaving something like: __u32 lustre_msg_calc_cksum(struct lustre_msg *msg) { switch (msg->lm_magic) { case LUSTRE_MSG_MAGIC_V2: { struct ptlrpc_body *pb = lustre_msg_ptlrpc_body(msg); LASSERTF(pb != NULL, "invalid msg %p: no ptlrpc body!\n" , msg); return crc32_le(~(__u32)0, (unsigned char *)pb, lustre_msg_buflen(msg, MSG_PTLRPC_BODY_OFF)); } default : CERROR( "incorrect message magic: %08x\n" , msg->lm_magic); return 0; } }

Andreas Dilger added a comment - 02/Oct/14 4:14 PM

Dmitry, the question is why crypto_alloc_tfm() is always in the callpath that runs out of memory? If this was just a general shortage if memory, one would expect that memory allocation failures would happen in random places in the code instead of always in this same call stack. I don't think it is called frequently enough to just hit it due to statistical reasons.

Andreas Dilger added a comment - 02/Oct/14 4:14 PM Dmitry, the question is why crypto_alloc_tfm() is always in the callpath that runs out of memory? If this was just a general shortage if memory, one would expect that memory allocation failures would happen in random places in the code instead of always in this same call stack. I don't think it is called frequently enough to just hit it due to statistical reasons.

Dmitry Eremin (Inactive) added a comment - 02/Oct/14 9:06 AM

I think a cryptoAPI code is not an issue here. It's just last memory allocation code before OOM.

Dmitry Eremin (Inactive) added a comment - 02/Oct/14 9:06 AM I think a cryptoAPI code is not an issue here. It's just last memory allocation code before OOM.

James A Simmons added a comment - 01/Oct/14 7:46 PM

Please don't remove the crypto wrapper for direct crc32 access. Some platforms like ARM and PowerPC don't have hardware accelerate crc32c but do have other crypto algorithms that are accelerated.

James A Simmons added a comment - 01/Oct/14 7:46 PM Please don't remove the crypto wrapper for direct crc32 access. Some platforms like ARM and PowerPC don't have hardware accelerate crc32c but do have other crypto algorithms that are accelerated.

People

Assignee:: Niu Yawei (Inactive)

Reporter:: Maloo

Votes:: 0 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 17/May/14 3:23 AM

Updated:: 11/Nov/14 7:16 PM

Resolved:: 21/Oct/14 6:15 PM