[LU-9133] Use only MDS-local time when determining if directory is active Created: 16/Feb/17  Updated: 24/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.16.0

Type: Improvement Priority: Minor
Reporter: Steve Guminski (Inactive) Assignee: Mikhail Pershin
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Blocker
is blocked by LU-11435 add contention check and accounting f... Closed
Related
Rank (Obsolete): 9223372036854775807

 Description   

When determining if a directory has recently been active, the client-provided timestamp in la_ctime is compared against the current time on the MDS. If the clocks are not synchronized on the client and the MDS, an incorrect elapsed time can be calculated. The comparison should instead be performed against the lr_contention_time in the resource, which is set by the MDS.

Andreas Dilger provided detailed information about the required changes:

                if (!(child_bits & MDS_INODELOCK_UPDATE) &&
                      mdt_object_exists(child) && !mdt_object_remote(child)) {
                        struct md_attr *ma = &info->mti_attr;

                        ma->ma_valid = 0;
                        ma->ma_need = MA_INODE;
                        rc = mdt_attr_get_complex(info, child, ma);
                        if (unlikely(rc != 0))
                                GOTO(out_child, rc);

                        /* If the file has not been changed for some time, we
                         * return not only a LOOKUP lock, but also an UPDATE
                         * lock and this might save us RPC on later STAT. For
                         * directories, it also let negative dentry cache start
                         * working for this dir. */
                        if (ma->ma_valid & MA_INODE &&
                            ma->ma_attr.la_valid & LA_CTIME &&
                            info->mti_mdt->mdt_namespace->ns_ctime_age_limit +
                                ma->ma_attr.la_ctime < cfs_time_current_sec())
                                child_bits |= MDS_INODELOCK_UPDATE;
                }

I think instead of looking up the inode and checking the client-supplied la_ctime on the directory against the MDS-local clock, this check should look up the resource and compare res->lr_contention_time via ldlm_check_contention() which uses only the clock on the server. That would remove similar, but different code, and unify lock contention behaviour between the OST and MDT somewhat.

 



 Comments   
Comment by Andreas Dilger [ 04/Mar/17 ]

From an email discussion between Steve and I:

It appears that lhc is an mdt_lock_handle() for the child directory entry, and lhp is a lock handle for the parent directory. You can turn these into DLM locks via lock = ldlm_handle2lock(&lhc->mlh_reg_lh);

Thanks for the info, Andreas. I've been trying to follow the instructions you provided, but ldlm_handle2lock(&lhc->mlh_reg_lh) seems to always be returning NULL. Shortly before the code that I'm replacing, there is a call to mdt_lock_handle_init(lhc). This function zeroes out the lhc->mlh_reg_lh.cookie, which seems to be needed by ldlm_handle2lock() to actually locate the lock.

I see that a few lines later, mdt_object_lock() is called for the child. Is this where the handle's cookie gets set? If so, then there's a conflict: I need to know the child_bits before calling mdt_object_lock(), but I can't get the child_bits until I've called mdt_object_lock() to get the lock and retrieve the contention time.

According to the existing code, which is using child and the timestamps thereon, it makes sense to check the lhc data rather than lhp.

If I instead try to locate the lock for the parent (lhp), I get a successful result from ldlm_handle2lock(). I'm not sure if the parent lock is actually useful, however.

I also tried to find the ldlm_resource to directly compare the res->lr_contention_time. I tried to get the ldlm_res_id from the child_fid using fid_build_reg_res_name(), and then called ldlm_resource_get() to find the resource. However, the latter function returned NULL. Is this approach incorrect?

Comment by Andreas Dilger [ 04/Mar/17 ]

I also tried to find the ldlm_resource to directly compare the res->lr_contention_time. I tried to get the ldlm_res_id from the child_fid using fid_build_reg_res_name(), and then called ldlm_resource_get() to find the resource. However, the latter function returned NULL. Is this approach incorrect?

That would have been my next choice. It is possible the LDLM namespace (info->mti_mdt->mdt_namespace) or resource type (LDLM_IBITS) was not correct? Looking down the callpath, it looks like mdt_object_local_lock() is doing the same. I do agree there is a chicken-and-egg problem here, so this may not be as easy as I thought.

On the plus side, there is work being done by tappro for Data-on-MDT in https://review.whamcloud.com/23010 to allow speculatively adding/removing lock bits to a lock, which may allow us to do the lookup of the child lock/resource with the UPDATE bit optional, and then not grant that bit in the DLM if there is resource contention. With that said, I think it makes sense to shelve this change until Mike's work lands.

Comment by Mikhail Pershin [ 07/Mar/17 ]

Andreas, do you mean that 'try_bits' patch: https://review.whamcloud.com/#/c/25262/ ? Or whole combination with lock convert? The first one os ready for landing, just review is required.

Comment by Andreas Dilger [ 08/Mar/17 ]

I guess I meant the patch https://review.whamcloud.com/25262 ("LU-9184 ldlm: selective IBITS lock trying"). The issue here is that we can't check the contention time on the DLM resource before getting the lock, but we don't know if we want the UPDATE bit until after checking the lock contention...

My thought to fix this properly is that the caller will always speculatively try to get the UPDATE bit, but the DLM code (or MDS callback?) can drop the UPDATE bit if the lock is recently contended. That keeps the info about lock contention internal to the DLM instead of the caller, and avoids the chicken-and-egg problem.

Comment by Mikhail Pershin [ 09/Mar/17 ]

I see, so for this particular ticket LU-9133 the only patch for selective lock bits is needed, not whole DoM work.

Comment by Andreas Dilger [ 22/Aug/18 ]

Mike, is there some chance that this could be fixed for 2.12 still? We still get occasional complaints about slow operations when the system clocks are not in sync.

Comment by Mikhail Pershin [ 23/Aug/18 ]

Andreas, I see that contention mechanism is not being used with IBITS locks. This must be done first to update lr_contention_time regularly. The simplest way would be to count each conflicting lock as contended one, maybe you have an better idea about contention for IBITS locks?

Comment by Andreas Dilger [ 28/Aug/18 ]

Mike, I think that is exactly what lr_contention_time is for - to track how recently the resource was contended by multiple clients, so that the server can make better decisions than just ping-pong of the lock. This should apply to conflicting IBITS locks as well, if it doesn't yet.

Comment by Mikhail Pershin [ 28/Sep/18 ]

Andreas, I am not sure that lr_contention_time can replace ctime check in that code, lr_contention_time is updated when amount of contended locks are bigger than ns_contended_locks variable which is 32, therefore this time is updated much rarely than ctime. Is that what we need?

Comment by Andreas Dilger [ 28/Sep/18 ]

Hmm. I thought it was updated on the most recent conflict, but I haven't looked at it recently. If there is some other way to determine if the directory has been idle recently, I'd be fine with a different solution. Using timestamps from the MDS is best, since client and MDS clocks may be out of sync. Also, clocks between clients may be different, so using the MDS clock is best.

Comment by Andreas Dilger [ 12/Oct/18 ]

Quentin, is there a chance you could look into this issue?  It is somewhat related to your work on LU-11199 "mdt: Attempt lookup lock on open"

Comment by Quentin Bouget [ 15/Oct/18 ]

@Andreas, I am pretty sure you meant someone else... Maybe Patrick Farell?

Comment by Andreas Dilger [ 15/Oct/18 ]

Sorry, I meant Dominique's work in patch https://review.whamcloud.com/30880 "U-10235 mdt: mdt_create: check EEXIST without lock"

Generated at Sat Feb 10 02:23:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.