Details
-
Question/Request
-
Resolution: Fixed
-
Minor
-
Lustre 2.5.4
-
None
-
RHEL-6.6, lustre-2.5.4
-
9223372036854775807
Description
Our system space utilization on one of our systems is high, and as we work to prune some of this data, we're exploring some other space tunings.
One of our admins noted the "cur_grant_bytes" osc parameter. When we looked at a few clients, we saw that this variable often exceeds the max_dirty_mb, sometimes by an order of magnitude. We usually use 64MB of dirty cache per osc per client. Is there an upper limit to this cur_grants_bytes parameter? What are the side effects of setting this value to some lower value (or 0)? Can we reduce this client grant while there is active I/O, and can we do this for all osc connections simultaneously (for a collective of millions of osc connections) for a system? Is this documented well anywhere?
Additionally, we are looking into tuning the reserved_blocks_percent parameter. The Lustre manual states that 5% is the minimum, but is that a sane value for all OST sizes?
Thanks,
–
Jesse
Attachments
Issue Links
- is related to
-
LU-3859 grant shrinker floods OST and produce a large load
-
- Resolved
-
I just ran a quick test on our TDS system. I took a newly mounted client and created 50 files striped across OST 0. I backgrounded 50 dd processes against those files and gathered logs with +cache enabled on the client and server.
The first thing I noticed it that the server very quickly increased the grant to the client, maybe even before the client had a chance to realize it.
The server granted it 56MB before the client even reported having a grant.
I haven't read all of the grant-related code, so take this analysis with a grain of salt.
Is the want parameter supposed to be an absolute or relative value?
This looks like want is being used as an absolute value. Assuming want should be absolute, do we also need a check to ensure that fed->fed_grant isn't much larger than want?
This looks like want is a relative value.
So the clients repeatedly says "I want 32MB" and the server takes that request, lowers it to grant_chunk (8MB), and grants it 8MB repeatedly until the client claims it has at least 32MB.
According to Andreas in
LU-3859, OBD_CONNECT_GRANT_SHRINK isn't set, so this is never cleaned up automatically. Is there a reason this is disabled?