[LU-13096] incorrect used_mb in max_cached_mb Created: 20/Dec/19  Updated: 13/Jun/20  Resolved: 17/Mar/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Emoly Liu Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

it seems to print incorrect used_mb size in max_cached_mb on large memory client.
that client has 24TB memory and it has 12TB for max_cached_mb by default, but max_cached_mb says most of cache is "used_mb".

sdf2{root}1014: free
              total        used        free      shared  buff/cache   available
Mem:    24965051716   139544736 24824112084       39772     1394896 24818988328
Swap:       8388604           0     8388604
sdf2{root}1015: lctl get_param llite.*.max_cached_mb
llite.lustre-ffff96631b3d6800.max_cached_mb=
users: 16
max_cached_mb: 12189966
used_mb: 12157198
unused_mb: 32768
reclaim_count: 0

'Drop cache' doesn't help niether.

sdf2{root}1016: echo 3 > /proc/sys/vm/drop_caches 
sdf2{root}1017: lctl get_param llite.*.max_cached_mb
llite.lustre-ffff96631b3d6800.max_cached_mb=
users: 16
max_cached_mb: 12189966
used_mb: 12157198
unused_mb: 32768
reclaim_count: 0


 Comments   
Comment by Gerrit Updater [ 20/Dec/19 ]

Emoly Liu (emoly@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37080
Subject: LU-13096 client: fix incorrect used_mb in max_cached_mb
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f9e0b9f65b3f529e3dacaab53ae12f738ed11945

Comment by Jeremy Filizetti [ 22/Jan/20 ]

I want to make sure I understand things correctly.  This patch doesn't actually attempt to fix the problem does it?  On a 64 bit platform as far as I can tell an atomic_long_t is an atomic64_t.  Today I was able to confirm that the patch had no effect for us.

Comment by Gerrit Updater [ 25/Feb/20 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/37707
Subject: LU-13096 llite: fix potential overflow in ll_max_cached_mb_seq_write()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 83576aeab760887fc8fd58290fb22d59641705ce

Comment by Gerrit Updater [ 25/Feb/20 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/37708
Subject: LU-13096 llite: limit default max lru pages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 48202d2aad10f75f8d6df3f10400ab4c6c23a85e

Comment by Gerrit Updater [ 25/Feb/20 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/37710
Subject: LU-13096 llite: serialize max_cached_mb write operation
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0247c40f5ff6884856575199bb74a1e9e815afe0

Comment by Gerrit Updater [ 01/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37707/
Subject: LU-13096 llite: fix potential overflow in ll_max_cached_mb_seq_write()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a7d0e91ea687564d3d5be0eb96bd5b6a260e665b

Comment by Gerrit Updater [ 17/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37710/
Subject: LU-13096 llite: serialize max_cached_mb write operation
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e4d63c854d774792f8a77b8d1e575ccc2d8c3c8b

Comment by Peter Jones [ 17/Mar/20 ]

Landed for 2.14

Generated at Sat Feb 10 02:58:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.