Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5077

insanity test_1: out of memory on MDT in crypto_create_tfm()

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • Lustre 2.6.0, Lustre 2.5.2, Lustre 2.5.3
    • None
    • 3
    • 14011

    Description

      This issue was created by maloo for wangdi <di.wang@intel.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/4f1c047c-dd60-11e3-9f1e-52540035b04c.

      The sub-test test_1 failed with the following error:

      16:39:44:Lustre: DEBUG MARKER: == insanity test 1: MDS/MDS failure == 15:39:30 (1400279970)
      16:39:44:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
      16:39:44:Lustre: DEBUG MARKER: umount -d /mnt/mds1
      16:39:44:Lustre: Failing over lustre-MDT0000
      16:42:00:LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.1.4.136@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
      16:42:00:LustreError: Skipped 2839 previous similar messages
      16:42:00:Lustre: 11949:0:(client.c:1914:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1400279972/real 1400279972] req@ffff88007a72f000 x1468278858237216/t0(0) o251->MGC10.1.4.144@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1400279978 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      16:42:00:Lustre: 11949:0:(client.c:1914:ptlrpc_expire_one_request()) Skipped 12 previous similar messages
      16:42:00:Lustre: server umount lustre-MDT0000 complete
      16:42:00:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      16:42:00:Lustre: DEBUG MARKER: hostname
      16:42:00:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1
      16:42:00:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1
      16:42:00:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl /dev/lvm-Role_MDS/P1 /mnt/mds1
      16:42:00:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts:
      16:42:00:LustreError: 12262:0:(mdt_coordinator.c:976:mdt_hsm_cdt_start()) lustre-MDT0000: cannot take the layout locks needed for registered restore: -2
      16:42:00:Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 5 clients reconnect
      16:42:00:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
      16:42:00:LNet: 12440:0:(debug.c:218:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
      16:42:00:LNet: 12440:0:(debug.c:218:libcfs_debug_str2mask()) Skipped 1 previous similar message
      16:42:00:Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2>/dev/null
      16:42:00:Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound).
      16:42:00:Lustre: 12222:0:(service.c:1511:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=0, svcEst=21, delay=0(jiff)
      16:42:00:Lustre: 12222:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff88002ef5e400 x1468299694051176/t0(0) o400>ac3e68be-778c-5e81-3dfa-5ce31a3eb36d@10.1.4.142@tcp:0/0 lens 224/0 e 225451 to 0 dl 1400280113 ref 2 fl Complete:H/c0/ffffffff rc 0/-1
      16:42:00:Lustre: 12229:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff880022bb0000 x1468278870043488/t0(0) o400>lustre-MDT0001-mdtlov_UUID@10.1.4.140@tcp:0/0 lens 224/0 e 1032545 to 0 dl 1400280114 ref 2 fl Complete:/c0/ffffffff rc 0/-1
      16:42:00:mdt_out00_000 invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
      16:42:00:mdt_out00_000 cpuset=/ mems_allowed=0
      16:42:00:Pid: 12229, comm: mdt_out00_000 Not tainted 2.6.32-431.17.1.el6_lustre.g684b38d.x86_64 #1
      16:42:00:Call Trace:
      16:42:00: [<ffffffff810d0211>] ? cpuset_print_task_mems_allowed+0x91/0xb0
      16:42:00: [<ffffffff811225c0>] ? dump_header+0x90/0x1b0
      16:42:00: [<ffffffff8129058d>] ? __bitmap_intersects+0x1d/0xa0
      16:42:00: [<ffffffff8122781c>] ? security_real_capable_noaudit+0x3c/0x70
      16:42:00: [<ffffffff81122a42>] ? oom_kill_process+0x82/0x2a0
      16:42:00: [<ffffffff8112293e>] ? select_bad_process+0x9e/0x120
      16:42:00: [<ffffffff81122e80>] ? out_of_memory+0x220/0x3c0
      16:42:00: [<ffffffff8112f79f>] ? __alloc_pages_nodemask+0x89f/0x8d0
      16:42:01: [<ffffffff8116e082>] ? kmem_getpages+0x62/0x170
      16:42:01: [<ffffffff8116ec9a>] ? fallback_alloc+0x1ba/0x270
      16:42:01: [<ffffffff8116e6ef>] ? cache_grow+0x2cf/0x320
      16:42:01: [<ffffffff8116ea19>] ? ____cache_alloc_node+0x99/0x160
      16:42:01: [<ffffffff8124bc3c>] ? crypto_create_tfm+0x3c/0xe0
      16:42:01: [<ffffffff8116f7e9>] ? __kmalloc+0x189/0x220
      16:42:01: [<ffffffff8124bc3c>] ? crypto_create_tfm+0x3c/0xe0
      16:42:01: [<ffffffff812525d8>] ? crypto_init_shash_ops+0x68/0x100
      16:42:01: [<ffffffff8124bd4a>] ? __crypto_alloc_tfm+0x6a/0x130
      16:42:01: [<ffffffff8124c5ba>] ? crypto_alloc_base+0x5a/0xb0
      16:42:01: [<ffffffffa04a2684>] ? cfs_percpt_unlock+0x24/0xb0 [libcfs]
      16:42:01: [<ffffffffa048d2ca>] ? cfs_crypto_hash_alloc+0x7a/0x290 [libcfs]
      16:42:01: [<ffffffffa048d5da>] ? cfs_crypto_hash_digest+0x6a/0xf0 [libcfs]
      16:42:01: [<ffffffff8116f86c>] ? __kmalloc+0x20c/0x220
      16:42:01: [<ffffffffa082bd73>] ? lustre_msg_calc_cksum+0xd3/0x130 [ptlrpc]
      16:42:01: [<ffffffffa0865881>] ? null_authorize+0xa1/0x100 [ptlrpc]
      16:42:01: [<ffffffffa0854a56>] ? sptlrpc_svc_wrap_reply+0x56/0x1c0 [ptlrpc]
      16:42:01: [<ffffffffa08241ec>] ? ptlrpc_send_reply+0x1fc/0x7f0 [ptlrpc]
      16:42:01: [<ffffffffa083b675>] ? ptlrpc_at_check_timed+0xc05/0x1360 [ptlrpc]
      16:42:01: [<ffffffffa0832c09>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
      16:42:01: [<ffffffffa083cf68>] ? ptlrpc_main+0x1198/0x1980 [ptlrpc]
      16:42:01: [<ffffffffa083bdd0>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
      16:42:01: [<ffffffff8109ab56>] ? kthread+0x96/0xa0
      16:42:01: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
      16:42:01: [<ffffffff8109aac0>] ? kthread+0x0/0xa0
      16:42:01: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      16:42:01:Mem-Info:

      test failed to respond and timed out

      Info required for matching: insanity 1

      Attachments

        Issue Links

          Activity

            [LU-5077] insanity test_1: out of memory on MDT in crypto_create_tfm()

            It would be worthwhile to try a simple patch to replace the cryptoAPI code with a direct call to the crc32() library functions in the kernel.

            The other alternative is to add more debugging to this code so that we get good logs when a test fails. It would be good to know how often this code is being called and how large the allocation is. It should only be called in rare cases, because it is only used for early replies when the client might time out waiting for a busy server.

            There has to be some strange reason why these tests are always OOM in crypto_alloc_tfm() instead of in some random place. Is it because some calculation is going badly and it is trying to allocate a huge amount of memory? Is this function being called so much more than others that it is just failing here more often due to statistics?

            adilger Andreas Dilger added a comment - It would be worthwhile to try a simple patch to replace the cryptoAPI code with a direct call to the crc32() library functions in the kernel. The other alternative is to add more debugging to this code so that we get good logs when a test fails. It would be good to know how often this code is being called and how large the allocation is. It should only be called in rare cases, because it is only used for early replies when the client might time out waiting for a busy server. There has to be some strange reason why these tests are always OOM in crypto_alloc_tfm() instead of in some random place. Is it because some calculation is going badly and it is trying to allocate a huge amount of memory? Is this function being called so much more than others that it is just failing here more often due to statistics?
            yujian Jian Yu added a comment -

            Test results in http://review.whamcloud.com/12065 showed that with commit ff8087 reverted the failure still occurred. So, commit ff8087 is not a culprit.

            yujian Jian Yu added a comment - Test results in http://review.whamcloud.com/12065 showed that with commit ff8087 reverted the failure still occurred. So, commit ff8087 is not a culprit.
            yujian Jian Yu added a comment -

            Here is a test only patch to run replay-vbr on master branch: http://review.whamcloud.com/12064
            And here is a test only patch to run replay-vbr on master branch with commit ff8087 reverted: http://review.whamcloud.com/12065

            yujian Jian Yu added a comment - Here is a test only patch to run replay-vbr on master branch: http://review.whamcloud.com/12064 And here is a test only patch to run replay-vbr on master branch with commit ff8087 reverted: http://review.whamcloud.com/12065

            Here is a test only patch to run insanity on master branch: http://review.whamcloud.com/11268
            And here is a test only patch to run insanity on master branch with commit ff8087 reverted: http://review.whamcloud.com/11269
            Testing kept hitting LU-5429, which is blocking running insanity tests with MDSCOUNT=4.

            Hi, Yujian
            I saw lots of OOM failure in replay-vbr or replay-single, seems we can test replay-vbr instead of insanity to round the LU-5420 & LU-5429. What do you think about?

            niu Niu Yawei (Inactive) added a comment - Here is a test only patch to run insanity on master branch: http://review.whamcloud.com/11268 And here is a test only patch to run insanity on master branch with commit ff8087 reverted: http://review.whamcloud.com/11269 Testing kept hitting LU-5429 , which is blocking running insanity tests with MDSCOUNT=4. Hi, Yujian I saw lots of OOM failure in replay-vbr or replay-single, seems we can test replay-vbr instead of insanity to round the LU-5420 & LU-5429 . What do you think about?

            It looks like all of the cases where this is failing is calling into the stack ending in crypto_create_tfm():

            out_of_memory+0x220/0x3c0
            __alloc_pages_nodemask+0x89f/0x8d0
            kmem_getpages+0x62/0x170
            fallback_alloc+0x1ba/0x270
            cache_grow+0x2cf/0x320
            ____cache_alloc_node+0x99/0x160
            crypto_create_tfm+0x3c/0xe0
            __kmalloc+0x189/0x220
            crypto_create_tfm+0x3c/0xe0
            crypto_init_shash_ops+0x68/0x100
            __crypto_alloc_tfm+0x6a/0x130
            crypto_alloc_base+0x5a/0xb0
            cfs_percpt_unlock+0x24/0xb0 [libcfs]
            cfs_crypto_hash_alloc+0x7a/0x290 [libcfs]
            cfs_crypto_hash_digest+0x6a/0xf0 [libcfs]
            lustre_msg_calc_cksum+0xd3/0x130 [ptlrpc]
            null_authorize+0xa1/0x100 [ptlrpc]
            sptlrpc_svc_wrap_reply+0x56/0x1c0 [ptlrpc]
            ptlrpc_send_reply+0x1fc/0x7f0 [ptlrpc]
            ptlrpc_at_check_timed+0xc05/0x1360 [ptlrpc]
            ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
            ptlrpc_main+0x1198/0x1980 [ptlrpc]
            

            The allocation in crypto_create_tfm() is being done using GFP_KERNEL, so it may fail a bit earlier than necessary.

            Since this code is only ever going to use CRC32, and is only used for early reply in an interrupt context, it might make more sense to use the kernel library crc32 routines instead of going through the overhead of setting up the cryptoapi context. The message size is just the ptlrpc_body (only about 100 bytes) and early replies should only be used in relatively rare cases, so getting hardware assist for that is probably not going to affect performance either way. That could eliminate a large amount of code and complexity and also remove this kind of allocation failure.

            adilger Andreas Dilger added a comment - It looks like all of the cases where this is failing is calling into the stack ending in crypto_create_tfm(): out_of_memory+0x220/0x3c0 __alloc_pages_nodemask+0x89f/0x8d0 kmem_getpages+0x62/0x170 fallback_alloc+0x1ba/0x270 cache_grow+0x2cf/0x320 ____cache_alloc_node+0x99/0x160 crypto_create_tfm+0x3c/0xe0 __kmalloc+0x189/0x220 crypto_create_tfm+0x3c/0xe0 crypto_init_shash_ops+0x68/0x100 __crypto_alloc_tfm+0x6a/0x130 crypto_alloc_base+0x5a/0xb0 cfs_percpt_unlock+0x24/0xb0 [libcfs] cfs_crypto_hash_alloc+0x7a/0x290 [libcfs] cfs_crypto_hash_digest+0x6a/0xf0 [libcfs] lustre_msg_calc_cksum+0xd3/0x130 [ptlrpc] null_authorize+0xa1/0x100 [ptlrpc] sptlrpc_svc_wrap_reply+0x56/0x1c0 [ptlrpc] ptlrpc_send_reply+0x1fc/0x7f0 [ptlrpc] ptlrpc_at_check_timed+0xc05/0x1360 [ptlrpc] ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] ptlrpc_main+0x1198/0x1980 [ptlrpc] The allocation in crypto_create_tfm() is being done using GFP_KERNEL, so it may fail a bit earlier than necessary. Since this code is only ever going to use CRC32, and is only used for early reply in an interrupt context, it might make more sense to use the kernel library crc32 routines instead of going through the overhead of setting up the cryptoapi context. The message size is just the ptlrpc_body (only about 100 bytes) and early replies should only be used in relatively rare cases, so getting hardware assist for that is probably not going to affect performance either way. That could eliminate a large amount of code and complexity and also remove this kind of allocation failure.
            pjones Peter Jones added a comment -

            Niu

            Could you please help with this issue?

            Thanks

            Peter

            pjones Peter Jones added a comment - Niu Could you please help with this issue? Thanks Peter
            yujian Jian Yu added a comment -

            Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/83/
            Distro/Arch: RHEL6.5/x86_64
            MDSCOUNT=2

            replay-vbr test 5c hit the same oom failure on MDS:
            https://testing.hpdd.intel.com/test_sets/fb694c46-28f9-11e4-8b4b-5254006e85c2

            By searching on Maloo, I found one instance on Lustre 2.5.2:
            https://testing.hpdd.intel.com/test_sets/4bb8fc7e-018f-11e4-b80d-5254006e85c2

            And the following patch was also in Lustre 2.5.2:

            commit	9aacef521e971dff6097a4ae335bd840521be5dd
            LU-4877 mdt: tolerate zero md_packed in mdt_fix_reply
            
            yujian Jian Yu added a comment - Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/83/ Distro/Arch: RHEL6.5/x86_64 MDSCOUNT=2 replay-vbr test 5c hit the same oom failure on MDS: https://testing.hpdd.intel.com/test_sets/fb694c46-28f9-11e4-8b4b-5254006e85c2 By searching on Maloo, I found one instance on Lustre 2.5.2: https://testing.hpdd.intel.com/test_sets/4bb8fc7e-018f-11e4-b80d-5254006e85c2 And the following patch was also in Lustre 2.5.2: commit 9aacef521e971dff6097a4ae335bd840521be5dd LU-4877 mdt: tolerate zero md_packed in mdt_fix_reply
            di.wang Di Wang added a comment - Hmm, this seems occur more recently on master https://testing.hpdd.intel.com/sub_tests/4e152edc-185c-11e4-be2f-5254006e85c2 https://testing.hpdd.intel.com/test_sets/64d04970-185e-11e4-9c4c-5254006e85c2

            It seems that almost all of the recent test failures marked LU-5077 were actually caused by the three different versions of the LU-5420 patch that are being tested, and not the original LU-5077 problem. I've re-annotated them all with LU-5420.

            adilger Andreas Dilger added a comment - It seems that almost all of the recent test failures marked LU-5077 were actually caused by the three different versions of the LU-5420 patch that are being tested, and not the original LU-5077 problem. I've re-annotated them all with LU-5420 .
            yujian Jian Yu added a comment - - edited

            Here is a test only patch to run insanity on master branch: http://review.whamcloud.com/11268
            And here is a test only patch to run insanity on master branch with commit ff8087 reverted: http://review.whamcloud.com/11269

            Testing kept hitting LU-5429, which is blocking running insanity tests with MDSCOUNT=4.

            yujian Jian Yu added a comment - - edited Here is a test only patch to run insanity on master branch: http://review.whamcloud.com/11268 And here is a test only patch to run insanity on master branch with commit ff8087 reverted: http://review.whamcloud.com/11269 Testing kept hitting LU-5429 , which is blocking running insanity tests with MDSCOUNT=4.

            If this problem happens again, and it is possible to log into the node, then capturing the output of /proc/slabinfo would be very useful for debugging. Otherwise, getting a crashdump of the MDS, so that information about what is consuming the memory might be extracted from the crashdump.

            adilger Andreas Dilger added a comment - If this problem happens again, and it is possible to log into the node, then capturing the output of /proc/slabinfo would be very useful for debugging. Otherwise, getting a crashdump of the MDS, so that information about what is consuming the memory might be extracted from the crashdump.

            People

              niu Niu Yawei (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: