Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11934

replay-single test_70c: Oom on client

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807

    Description

      [ 8500.380063] Kernel panic - not syncing: Out of memory and no killable processes...
       
      [ 8500.385004] CPU: 0 PID: 25664 Comm: kworker/u4:0 Kdump: loaded Tainted: G           OE  ------------   3.10.0-862.14.4.el7.x86_64 #1
      [ 8500.390771] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 8500.393649] Call Trace:
      [ 8500.395723]  [<ffffffffa0313754>] dump_stack+0x19/0x1b
      [ 8500.398474]  [<ffffffffa030d29f>] panic+0xe8/0x21f
      [ 8500.401112]  [<ffffffff9fd9b50a>] out_of_memory+0x4ea/0x4f0
      [ 8500.403935]  [<ffffffffa030f423>] __alloc_pages_slowpath+0x5d6/0x724
      [ 8500.406940]  [<ffffffff9fda18b5>] __alloc_pages_nodemask+0x405/0x420
      [ 8500.409955]  [<ffffffff9fdec058>] alloc_pages_current+0x98/0x110
      [ 8500.412877]  [<ffffffff9fd9bf3e>] __get_free_pages+0xe/0x40
      [ 8500.415664]  [<ffffffff9fc775b2>] pgd_alloc+0x22/0x150
      [ 8500.418237]  [<ffffffff9fc90958>] mm_init+0x158/0x1b0
      [ 8500.420734]  [<ffffffff9fc90ee0>] mm_alloc+0x80/0x110
      [ 8500.423203]  [<ffffffff9fe279d9>] do_execve_common.isra.24+0x249/0x6e0
      [ 8500.426004]  [<ffffffff9fe34d1c>] ? poll_select_copy_remaining+0xfc/0x150
      [ 8500.428928]  [<ffffffff9fe30900>] ? vfs_unlink+0x170/0x190
      [ 8500.431440]  [<ffffffff9fe27e88>] do_execve+0x18/0x20
      [ 8500.433811]  [<ffffffff9fcb2bef>] ____call_usermodehelper+0xff/0x140
      [ 8500.436484]  [<ffffffff9fcb2c30>] ? ____call_usermodehelper+0x140/0x140
      [ 8500.439191]  [<ffffffff9fcb2c4e>] call_helper+0x1e/0x20
      [ 8500.441533]  [<ffffffffa03255f7>] ret_from_fork_nospec_begin+0x21/0x21
      [ 8500.444160]  [<ffffffff9fcb2c30>] ? ____call_usermodehelper+0x140/0x140
      
      crash-7.2.5> kmem -i
                      PAGES        TOTAL      PERCENTAGE
         TOTAL MEM   945937       3.6 GB         ----
              FREE    21480      83.9 MB    2% of TOTAL MEM
              USED   924457       3.5 GB   97% of TOTAL MEM
            SHARED       64       256 KB    0% of TOTAL MEM
           BUFFERS       35       140 KB    0% of TOTAL MEM
            CACHED      388       1.5 MB    0% of TOTAL MEM
              SLAB    11322      44.2 MB    1% of TOTAL MEM
      
      crash-7.2.5> kmem -p | awk '/head/ { print extent ; extent=1 } /tail/ { extent++ }' | sort -n | uniq -c
            1 
         1734 2
          820 4
          210 8
            2 16
           41 32
            1 64
         1472 512
      
       

      So vmcore shows 1472 block with size 2MB. During memory analyze we've found that 2MB chunks belongs to REINT_SETATTR request.
      crash-7.2.5> ptlrpc_request ffff96af35364000
      > rq_repbuf_len = 2097152,

      This size is set at mdc_setattr() function

       
      mdc_setattr(){
       ....
       req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER,
       req->rq_import->imp_connect_data.ocd_max_easize);
       ..
       }

      ocd_max_easize is 1MB, a reply is bit larger and roundup set it to 2MB.

      The Patrick's patch 4f78164f helps here and set 64KB.
      But there is no need in ACL at all for mdc_setattr, because server doesn't fill it.

      Attachments

        Activity

          People

            aboyko Alexander Boyko
            aboyko Alexander Boyko
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: