[LU-9514] sanity test 51f times out due to out of memory Created: 16/May/17  Updated: 24/Apr/19  Resolved: 19/Jul/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: Lustre 2.10.1, Lustre 2.11.0

Type: Bug Priority: Critical
Reporter: James Nunez (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

autotest


Issue Links:
Related
is related to LU-7473 Enlarge ACL entry limitation Resolved
is related to LU-10155 File creation with 'lfs setstripe' is... Resolved
is related to LU-5964 test large number of concurrent open ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity test_51f times out with the usual message

'test failed to respond and timed out' 

The only information in the test_log is

== sanity test 51f: check many open files limit ====================================================== 08:37:34 (1494923854)
MDT0 numfree=711956, max=100000
changed ulimit from 1024 to 100020

There are two cases where sanity test 51f times out with oom. The stack trace is found in the client console

08:38:17:[ 1565.371695] Lustre: DEBUG MARKER: == sanity test 51f: check many open files limit ====================================================== 08:37:34 (1494923854)
08:38:17:[ 1602.443283] automount invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
08:38:17:[ 1602.446098] automount cpuset=/ mems_allowed=0
08:38:17:[ 1602.448513] CPU: 0 PID: 1010 Comm: automount Tainted: G           OE  ------------   3.10.0-514.16.1.el7.x86_64 #1
08:38:17:[ 1602.451566] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
08:38:17:
08:38:17:[ 1602.454058]  ffff8800794eaf10 000000003457cee2 ffff88007b9e79f8 ffffffff81686ac3
08:38:17:[ 1602.459021]  ffff88007b9e7a88 ffffffff81681a6e ffff88007b9e7af0 ffff88007b9e7a48
08:38:17:[ 1602.461680]  ffffffff81692b6c ffffffff81a33cc0 0000000000000000 0000000000000000
08:38:17:[ 1602.464276] Call Trace:
08:38:17:[ 1602.466395]  [<ffffffff81686ac3>] dump_stack+0x19/0x1b
08:38:17:[ 1602.468694]  [<ffffffff81681a6e>] dump_header+0x8e/0x225
08:38:17:[ 1602.470982]  [<ffffffff81692b6c>] ? notifier_call_chain+0x4c/0x70
08:38:17:[ 1602.473299]  [<ffffffff810b6d98>] ? __blocking_notifier_call_chain+0x58/0x70
08:38:17:[ 1602.475696]  [<ffffffff81184e1e>] check_panic_on_oom+0x2e/0x60
08:38:17:[ 1602.477970]  [<ffffffff8118523b>] out_of_memory+0x23b/0x4f0
08:38:17:[ 1602.480183]  [<ffffffff81682577>] __alloc_pages_slowpath+0x5d7/0x725
08:38:17:[ 1602.482651]  [<ffffffff8118b5c5>] __alloc_pages_nodemask+0x405/0x420
08:38:17:[ 1602.484867]  [<ffffffff811d272a>] alloc_pages_vma+0x9a/0x150
08:38:17:[ 1602.487039]  [<ffffffff811c351b>] read_swap_cache_async+0xeb/0x160
08:38:17:[ 1602.489192]  [<ffffffff811c3638>] swapin_readahead+0xa8/0x110
08:38:17:[ 1602.491293]  [<ffffffff811b189c>] handle_mm_fault+0xb1c/0xfe0
08:38:17:[ 1602.493385]  [<ffffffff81692644>] __do_page_fault+0x154/0x450
08:38:17:[ 1602.495536]  [<ffffffff81692975>] do_page_fault+0x35/0x90
08:38:17:[ 1602.497619]  [<ffffffff8168eb88>] page_fault+0x28/0x30
08:38:17:[ 1602.499598] Mem-Info:
08:38:17:[ 1602.501313] active_anon:350 inactive_anon:385 isolated_anon:0
08:38:17:[ 1602.501313]  active_file:0 inactive_file:525 isolated_file:0
08:38:17:[ 1602.501313]  unevictable:0 dirty:0 writeback:142 unstable:0
08:38:17:[ 1602.501313]  slab_reclaimable:4058 slab_unreclaimable:36202
08:38:17:[ 1602.501313]  mapped:0 shmem:52 pagetables:1613 bounce:0
08:38:17:[ 1602.501313]  free:12916 free_pcp:7 free_cma:0
08:38:17:[ 1602.512345] Node 0 DMA free:7020kB min:416kB low:520kB high:624kB active_anon:400kB inactive_anon:640kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15892kB mlocked:0kB dirty:0kB writeback:36kB mapped:0kB shmem:228kB slab_reclaimable:108kB slab_unreclaimable:712kB kernel_stack:48kB pagetables:20kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:15736 all_unreclaimable? yes
08:38:17:[ 1602.521429] lowmem_reserve[]: 0 1663 1663 1663
08:38:17:[ 1602.523300] Node 0 DMA32 free:44644kB min:44636kB low:55792kB high:66952kB active_anon:1000kB inactive_anon:900kB active_file:0kB inactive_file:2100kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080756kB managed:1705000kB mlocked:0kB dirty:0kB writeback:532kB mapped:0kB shmem:0kB slab_reclaimable:16124kB slab_unreclaimable:144096kB kernel_stack:2656kB pagetables:6432kB unstable:0kB bounce:0kB free_pcp:28kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
08:38:17:[ 1602.534026] lowmem_reserve[]: 0 0 0 0
08:38:17:[ 1602.535888] Node 0 DMA: 39*4kB (EM) 32*8kB (UEM) 25*16kB (UM) 12*32kB (UEM) 3*64kB (EM) 10*128kB (UM) 3*256kB (EM) 3*512kB (UEM) 2*1024kB (UE) 0*2048kB 0*4096kB = 7020kB
08:38:17:[ 1602.540391] Node 0 DMA32: 995*4kB (UEM) 623*8kB (UEM) 319*16kB (UEM) 861*32kB (UEM) 46*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44564kB
08:38:17:[ 1602.544824] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
08:38:17:[ 1602.547006] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
08:38:17:[ 1602.549124] 975 total pagecache pages
08:38:17:[ 1602.550875] 375 pages in swap cache
08:38:17:[ 1602.552582] Swap cache stats: add 26733, delete 26358, find 248/330
08:38:17:[ 1602.554487] Free swap  = 3566956kB
08:38:17:[ 1602.556164] Total swap = 3671036kB
08:38:17:[ 1602.557826] 524185 pages RAM
08:38:17:[ 1602.559435] 0 pages HighMem/MovableOnly
08:38:17:[ 1602.561132] 93962 pages reserved
08:38:17:[ 1602.562747] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
08:38:17:[ 1602.564748] [  343]     0   343     9303        1      20       92             0 systemd-journal
08:38:17:[ 1602.566845] [  365]     0   365    29723        0      26       85             0 lvmetad
08:38:17:[ 1602.568879] [  382]     0   382    10895        1      21      134         -1000 systemd-udevd
08:38:17:[ 1602.570936] [  460]     0   460    13854        0      28      108         -1000 auditd
08:38:17:[ 1602.572986] [  492]     0   492     4793       18      14       37             0 irqbalance
08:38:17:[ 1602.575027] [  493]   998   493   134941        0      53     2131             0 polkitd
08:38:17:[ 1602.577041] [  496]    81   496     9197        0      20      129          -900 dbus-daemon
08:38:17:[ 1602.579085] [  509]   997   509    28962        0      28       98             0 chronyd
08:38:17:[ 1602.581105] [  559]     0   559    67227        0      43      657             0 gssproxy
08:38:17:[ 1602.583118] [  603]     0   603    64343        0      74      266             0 sssd
08:38:17:[ 1602.585096] [  604]     0   604    90858        0     118      570             0 sssd_be
08:38:17:[ 1602.587095] [  605]     0   605   127883        0      68     1074             0 NetworkManager
08:38:17:[ 1602.589144] [  612]     0   612    30625        1      30      207             0 rpc.gssd
08:38:17:[ 1602.591150] [  619]     0   619    65752        5      84      227             0 sssd_nss
08:38:17:[ 1602.593151] [  620]     0   620    60653        0      73      238             0 sssd_pam
08:38:17:[ 1602.595417] [  621]     0   621    59047        5      71      221             0 sssd_ssh
08:38:17:[ 1602.598098] [  622]     0   622    67553        0      83      330             0 sssd_pac
08:38:17:[ 1602.600145] [  635]     0   635     6048        1      16       77             0 systemd-logind
08:38:17:[ 1602.602201] [  641]     0   641    28220        1      58     3120             0 dhclient
08:38:17:[ 1602.604259] [  859]     0   859   138287       62      87     2611             0 tuned
08:38:17:[ 1602.606279] [  860]     0   860    77477        0      46      285             0 rsyslogd
08:38:17:[ 1602.608319] [  861]     0   861    26370        1      54      248         -1000 sshd
08:38:17:[ 1602.610259] [  868]     0   868     6776        1      17       62             0 xinetd
08:38:17:[ 1602.612235] [  875]   995   875    56858        0      23      263             0 munged
08:38:17:[ 1602.614120] [  989]     0   989    22780        0      43      262             0 master
08:38:17:[ 1602.615980] [  991]    89   991    25347        0      45      253             0 pickup
08:38:17:[ 1602.617764] [  992]    89   992    25364        0      47      254             0 qmgr
08:38:17:[ 1602.619536] [ 1007]     0  1007    31555        0      18      155             0 crond
08:38:17:[ 1602.621255] [ 1009]     0  1009   149996        0      63      498             0 automount
08:38:17:[ 1602.623021] [ 1012]     0  1012     6461        0      17       51             0 atd
08:38:17:[ 1602.624633] [ 1043]     0  1043    27509        1      10       32             0 agetty
08:38:17:[ 1602.626296] [ 1045]     0  1045    27509        1      12       32             0 agetty
08:38:17:[ 1602.627878] [ 2247]     0  2247    40560        0      79      347             0 sshd
08:38:17:[ 1602.629439] [ 2249]     0  2249    28283        0      14       58             0 run_test.sh
08:38:17:[ 1602.630986] [ 2461]     0  2461    29424       28      15      671             0 bash
08:38:17:[ 1602.632479] [22429]     0 22429    29424       23      13      675             0 bash
08:38:17:[ 1602.633907] [22430]     0 22430    26975        0       9       27             0 tee
08:38:17:[ 1602.635313] [22600]     0 22600    29793       39      14     1016             0 bash
08:38:17:[ 1602.636706] [17680]     0 17680    29793       59      14      999             0 bash
08:38:17:[ 1602.638107] [17681]     0 17681    26975        1       9       26             0 tee
08:38:17:[ 1602.639460] [17855]     0 17855     2076        9       9       19             0 createmany
08:38:17:[ 1602.640891] Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled

Logs for these two cases are at:
2017-05-16 - https://testing.hpdd.intel.com/test_sets/4f3d47bc-3a42-11e7-b81e-5254006e85c2
2017-05-16 - https://testing.hpdd.intel.com/test_sets/6d1be1a2-3a43-11e7-b861-5254006e85c2



 Comments   
Comment by Bob Glossman (Inactive) [ 17/May/17 ]

more on master:
https://testing.hpdd.intel.com/test_sets/2d824ac6-3abc-11e7-91db-5254006e85c2
https://testing.hpdd.intel.com/test_sets/b573cb72-3b56-11e7-b3f6-5254006e85c2
https://testing.hpdd.intel.com/test_sets/f864ac5e-3c18-11e7-b3f6-5254006e85c2

Comment by Gerrit Updater [ 18/May/17 ]

James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/27189
Subject: LU-9514 test: sanity 51f add to ALWAYS_EXCEPT
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9f6e57118a8af37a0184303740d9d3c43d4fced9

Comment by Jinshan Xiong (Inactive) [ 18/May/17 ]

it seems this bug outboke after the patch LU-9305 is landed.

Comment by Peter Jones [ 18/May/17 ]

Jinshan

Do you think that we should revert LU-9305 until you have this more fully understood?

Peter

Comment by Jinshan Xiong (Inactive) [ 18/May/17 ]

Another possible culprit would be "LU-7473 acl: increase ACL entries limitation" where the sanity:51f was not run at all: https://testing.hpdd.intel.com/test_sets/3dfbffd0-e181-11e6-8cc2-5254006e85c2

Comment by Jinshan Xiong (Inactive) [ 18/May/17 ]

Peter - no, it's less likely to be LU-9305.

Comment by Jian Yu [ 18/May/17 ]

By reverting the patch https://review.whamcloud.com/19790/ for LU-7473 from the latest master branch, sanity test 51f passed with FSTYPE=zfs:
https://testing.hpdd.intel.com/test_sessions/1f66dfc9-5d3a-4cf0-a264-6bfabef2d825

Comment by Oleg Drokin [ 18/May/17 ]

I believe the actual problem is test 51f itself - it puts the system under so much strain that any slightest deviation in memory allocation tips it over with our minimalmemory test VMs.
e.g. if we enable debug i nthe kernel - that kills it too: https://testing.hpdd.intel.com/test_sets/e4ce6544-2a8a-11e7-9073-5254006e85c2

Comment by Andreas Dilger [ 19/May/17 ]

I think the root of the problem is that we are probably allocating too much memory for each open file. For a 2GB node, allocating 100k files is about 21KB per file. Not outrageous, but I suspect we could do better with some investigation.

I ran a test on my VM (2GB RAM, 72k files) and didn't have any problems with the patch, but i did get about 88MB into swap during the run. If our test nodes don't have swap space they would fail when running out of memory.

In the meantime, it might make sense to limit this to "safer" parameters like 50000 files max when running in a VM?

Comment by Gerrit Updater [ 19/May/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27189/
Subject: LU-9514 test: sanity 51f add to ALWAYS_EXCEPT
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cfed34264ac3ab96507a55973e6ffe1d648ca527

Comment by nasf (Inactive) [ 19/May/17 ]

Part of the reason is known now.
Because the client does not know how many ACL entries will be for the file to be opened, it will create big enough buffer to hold the open RPC reply. On the other hand, open RPC is replayable, so before the application close the file, the open RPC is held on the client, then its large reply buffer is held also. With large ACL patch introduced, the open RPC reply buffer is larger than original case. The "createmany -o -k" operation keeps a lot of files opened on the client. As to a lot of RAM are occupied by the unused reply buffer. That is why the client out of RAM in sanity-51f.

I will make patch to handle that.

Comment by Gerrit Updater [ 19/May/17 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/27208
Subject: LU-9514 ptlrpc: free reply buffer for replay RPC
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0dfb0b541e104c3a9b6cc5db7910052d91a4984b

Comment by Peter Jones [ 20/May/17 ]

If I understand correctly, the failing test has been added to always_except for now so this is not a "must have" for 2.10, but can land if ready on time. Otherwise it can be fixed in 2.10.1

Comment by nasf (Inactive) [ 12/Jun/17 ]

The sanity test_51f has been disabled, so it will not affect Maloo tests. But the issue still needs to be fixed, otherwise it will affect the client opened files count. (current implementation will occupy too much RAM on client for opened files, then less files can be opened concurrently by the client)

Comment by nasf (Inactive) [ 19/Jul/17 ]

According to the discussion with Oleg, there are some others can be improved. It is unnecessary for the client to prepare very large buffer for unknown server side RPC reply. In fact, for most of cases, the real reply is quite small. So we can make the client to prepare a relative small buffer the RPC reply. If the MDT finds that it is not big enough, it will tell the client via the RPC reply. Then the client can prepare larger buffer and retry the RPC. In theory, some others may changed the target file and cause the second RPC failed again, but such case will be very rare, so retrying RPC should not bring too much performance trouble.

Comment by Gerrit Updater [ 19/Jul/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27208/
Subject: LU-9514 ptlrpc: free reply buffer earlier for open RPC
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c8e3992acf3039b2824725d41f90d9a3be3be921

Comment by Peter Jones [ 19/Jul/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 26/Jul/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28217
Subject: LU-9514 ptlrpc: free reply buffer earlier for open RPC
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: be8e18e899c029be0ff85c044c7e7da80c12523d

Comment by Gerrit Updater [ 03/Aug/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28217/
Subject: LU-9514 ptlrpc: free reply buffer earlier for open RPC
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: a9d4e4b6a17bec691ce32460fdf9057ec080a738

Comment by Alex Zhuravlev [ 24/Apr/19 ]

still hitting this locally sometimes, on master.

Generated at Sat Feb 10 02:26:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.