[LU-11934] replay-single test_70c: Oom on client Created: 06/Feb/19  Updated: 19/Feb/19  Resolved: 19/Feb/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0

Type: Bug Priority: Minor
Reporter: Alexander Boyko Assignee: Alexander Boyko
Resolution: Fixed Votes: 0
Labels: patch

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
[ 8500.380063] Kernel panic - not syncing: Out of memory and no killable processes...
 
[ 8500.385004] CPU: 0 PID: 25664 Comm: kworker/u4:0 Kdump: loaded Tainted: G           OE  ------------   3.10.0-862.14.4.el7.x86_64 #1
[ 8500.390771] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 8500.393649] Call Trace:
[ 8500.395723]  [<ffffffffa0313754>] dump_stack+0x19/0x1b
[ 8500.398474]  [<ffffffffa030d29f>] panic+0xe8/0x21f
[ 8500.401112]  [<ffffffff9fd9b50a>] out_of_memory+0x4ea/0x4f0
[ 8500.403935]  [<ffffffffa030f423>] __alloc_pages_slowpath+0x5d6/0x724
[ 8500.406940]  [<ffffffff9fda18b5>] __alloc_pages_nodemask+0x405/0x420
[ 8500.409955]  [<ffffffff9fdec058>] alloc_pages_current+0x98/0x110
[ 8500.412877]  [<ffffffff9fd9bf3e>] __get_free_pages+0xe/0x40
[ 8500.415664]  [<ffffffff9fc775b2>] pgd_alloc+0x22/0x150
[ 8500.418237]  [<ffffffff9fc90958>] mm_init+0x158/0x1b0
[ 8500.420734]  [<ffffffff9fc90ee0>] mm_alloc+0x80/0x110
[ 8500.423203]  [<ffffffff9fe279d9>] do_execve_common.isra.24+0x249/0x6e0
[ 8500.426004]  [<ffffffff9fe34d1c>] ? poll_select_copy_remaining+0xfc/0x150
[ 8500.428928]  [<ffffffff9fe30900>] ? vfs_unlink+0x170/0x190
[ 8500.431440]  [<ffffffff9fe27e88>] do_execve+0x18/0x20
[ 8500.433811]  [<ffffffff9fcb2bef>] ____call_usermodehelper+0xff/0x140
[ 8500.436484]  [<ffffffff9fcb2c30>] ? ____call_usermodehelper+0x140/0x140
[ 8500.439191]  [<ffffffff9fcb2c4e>] call_helper+0x1e/0x20
[ 8500.441533]  [<ffffffffa03255f7>] ret_from_fork_nospec_begin+0x21/0x21
[ 8500.444160]  [<ffffffff9fcb2c30>] ? ____call_usermodehelper+0x140/0x140

crash-7.2.5> kmem -i
                PAGES        TOTAL      PERCENTAGE
   TOTAL MEM   945937       3.6 GB         ----
        FREE    21480      83.9 MB    2% of TOTAL MEM
        USED   924457       3.5 GB   97% of TOTAL MEM
      SHARED       64       256 KB    0% of TOTAL MEM
     BUFFERS       35       140 KB    0% of TOTAL MEM
      CACHED      388       1.5 MB    0% of TOTAL MEM
        SLAB    11322      44.2 MB    1% of TOTAL MEM

crash-7.2.5> kmem -p | awk '/head/ { print extent ; extent=1 } /tail/ { extent++ }' | sort -n | uniq -c
      1 
   1734 2
    820 4
    210 8
      2 16
     41 32
      1 64
   1472 512

 

So vmcore shows 1472 block with size 2MB. During memory analyze we've found that 2MB chunks belongs to REINT_SETATTR request.
crash-7.2.5> ptlrpc_request ffff96af35364000
> rq_repbuf_len = 2097152,

This size is set at mdc_setattr() function

 
mdc_setattr(){
 ....
 req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER,
 req->rq_import->imp_connect_data.ocd_max_easize);
 ..
 }

ocd_max_easize is 1MB, a reply is bit larger and roundup set it to 2MB.

The Patrick's patch 4f78164f helps here and set 64KB.
But there is no need in ACL at all for mdc_setattr, because server doesn't fill it.



 Comments   
Comment by Gerrit Updater [ 06/Feb/19 ]

Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/34194
Subject: LU-11934 mdc: don't use ACL at setattr
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 01a72c09bc95d841f033c182417521cffaebdda6

Comment by Patrick Farrell (Inactive) [ 06/Feb/19 ]

Alex,

You might take a look at:

https://review.whamcloud.com/#/c/34058/

The maximum allowed xattr size in Linux is 64 KiB, more than that and tar (and other tools) break.

Even if you don't want the whole patch, you might consider just the change to the max xattr size on ldiskfs.  It improves memory behavior with ea_inode a bunch.

Comment by Patrick Farrell (Inactive) [ 06/Feb/19 ]

Oh, wait, I think you found my patch.

"The Patrick's patch 4f78164f helps here and set 64KB."

But your patch is correct and obviously still good.  Saves memory.

 

Comment by Gerrit Updater [ 18/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34194/
Subject: LU-11934 mdc: don't use ACL at setattr
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e7f6f870c356f2158a01497d50d732d25b1c29ac

Comment by Peter Jones [ 19/Feb/19 ]

Landed for 2.13

Generated at Sat Feb 10 02:48:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.