[LU-12124] slower unlinks with DOM files due to disabled ELC Created: 27/Mar/19  Updated: 20/Nov/20  Resolved: 20/Nov/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Mikhail Pershin Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11276 racer: mdc_dev.c:1346:mdc_req_attr_se... Resolved
is related to LU-12321 Unlink speed needs to be improved in ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

the problem with slower unlinks was reported by Andrew in LU-11276:

 Subject: LU-11276 ldlm: don't apply ELC to converting and DOM locks

With this patch, unlinkmany (after createmany) performance within a DoM directory drops by ~35%:

HEAD is now at 70a01a6... LU-11276 ldlm: don't apply ELC to converting and DOM locks
total: 20000 unlinks in 52 seconds: 384.615387 unlinks/second
HEAD is now at 7237248... LU-11347 osd: do not use pagecache for I/O
total: 20000 unlinks in 34 seconds: 588.235291 unlinks/second
HEAD is now at 8b9105d... LU-11199 mdt: Attempt lookup lock on open
total: 20000 unlinks in 34 seconds: 588.235291 unlinks/second
HEAD is now at 697e8fe... LU-11473 doc: add lfs-getsom man page
total: 20000 unlinks in 34 seconds: 588.235291 unlinks/second
HEAD is now at ed0c19d... LU-1095 misc: quiet console messages at startup 

That happens due to exclusion of DOM locks from ELC to don't flush data. The possible solution would be lock convert from ELC.



 Comments   
Comment by Patrick Farrell (Inactive) [ 27/Mar/19 ]

So why do we skip ELC for DOM locks?  I understand why for converting locks, but why do we skip for DOM locks?  If ELC is happening, then the lock meets the ELC policy, and flush is OK.  If flush is too soon, we should adjust something, not skip ELC.

Comment by Andrew Perepechko [ 27/Mar/19 ]

Patrick, Mikhail,

if instant data flush causes deadlocks or inconsistency, it would be nice to cancel at least PR locks (sounds safe) and PW locks without dirty data (sounds more racy and complicated to implement atomically with respect to Lustre layering). Does it make sense?

Comment by Patrick Farrell (Inactive) [ 27/Mar/19 ]

It does, but it scares me if we can't do instant data flush.  Data flush can happen for several reasons, it shouldn't cause deadlocks when it comes from ELC...?

But if we have to skip locks with data, LDLM already has a lock weighing mechanism, right?  It is used (or it was used previously?) in ELC already, at least for some policies.

Comment by Mikhail Pershin [ 28/Mar/19 ]

A data flush happens as needed by CLIO means and that is not skipped, but ELC is not about data, it is used just to cancel locks in advance to avoid extra roundtrip for BL AST+CANCEL and the main problem with ELC vs DOM is unneeded data flush. It happens when ELC tries to cancel some other bits but lock has also DOM bit, e.g. ELC drops LOOKUP or LAYOUT bits but that cause data flush because lock has also DOM bit. I am going to solve that by using lock convert in ELC to don't cancel whole lock but only needed bits.

Comment by Patrick Farrell (Inactive) [ 28/Mar/19 ]

Ohh, OK.  It still seems like maybe we need to change ELC policy to be DOM-aware?  Why is ELC trying to cancel some bits but not DOM?  (I could check, but  )

Comment by Mikhail Pershin [ 28/Mar/19 ]

ELC doesn't require all bits to cancel all the time, it can be just UPDATE or LOOKUP bit. The only case when it get FULL bits is unlink or rename but in both cases DOM is excluded to don't flush data prior metadata operation. 

Comment by Mikhail Pershin [ 29/Mar/19 ]

more details about situation in general:

  • OST objects deletion started with unlink+ELC(FULL bits) and then MDT deletes objects on OST. On object deletion it takes discard lock so all client's data will be dropped on cancel callback to don't produce massive write out for objects being deleted because that is not needed.
  • DOM is more difficult, first, ELC with FULL would cause data flush though we want it to be discarded from server later. OK, exclude DOM bit from ELC AND exclude from ELC any lock with DOM bit to avoid data flush early. Now MDT deletes object and take discard lock to discard client data. Sometimes I think that would be simpler maybe to allow that data flush happen, its cost could be similar to all these extra RPCs and lock discard mechanism. But massive unlink would cause simultaneous IO to the server and will slow down unlinks even more. Also not every unlink from client means final unlink and object deletion on MDT, contrary to OST where unlink is always deletion. So I think we better to keep the same scheme with discard from server and no early data flush.

This works but excluding ELC produces extra roundtrips for cancel callbacks and cancels and slows unlink. I think ELC with lock convert should solve this, it will drop only needed bits, pack convert records like it does with cancels and send them to the server along with unlink request. Initially it can be done even with separate convert RPC instead of packing converts into the same RPC, for simplicity. That should help with unlinks and other related operation.

Comment by Colin Faber [X] (Inactive) [ 03/May/19 ]

Hi,

Is there any active work taking place here?

Thanks!

Comment by Patrick Farrell (Inactive) [ 03/May/19 ]

Hmmm, Mike - I think you put the patch for 'this' under LU-10894?
https://review.whamcloud.com/#/c/34736/

Comment by Mikhail Pershin [ 03/May/19 ]

Patrick, not quite, that is about ELC for write operations, while this ticket is about ELC for generic ops like UNLINK.

Colin, there is work under this ticket just not the highest priority right now

Comment by Andreas Dilger [ 28/May/19 ]

Mike, it might make sense to take an opportunistic approach to ELC. If we know that the file was created locally and the MDC DLM lock has never been cancelled (e.g. flag set at create time that is cleared on lock cancel/convert never set again), then the client knows that an unlink is discarding the only copy of the file.

Comment by Andrew Perepechko [ 14/Jun/19 ]

Mike, do you have a draft patch for this issue? I could help with debugging/testing.

Comment by Mikhail Pershin [ 14/Jun/19 ]

Andrew, I have draft code but it is not complete and based on other changes, I have to re-arrange all things in it properly to share. I will do commit as soon as it will be ready.

Comment by Andrew Perepechko [ 16/Aug/19 ]

Mike, sorry for being noisy on the same subject. Are you going to push your patch for review soon? Thanks!

Comment by Mikhail Pershin [ 05/Sep/19 ]

Andrew, I will push it as soon as it will be stable at least, so far I have couple assertion triggered by code changes.

Comment by Mikhail Pershin [ 18/Dec/19 ]

Andrew, there is patch in related LU-12321 which helps with unlink for files without dirty data. Note, that is not full solution but workaround, though it should cover majority of file deletion cases.

Comment by Andrew Perepechko [ 19/Dec/19 ]

Thank you, Mikhail.

Comment by Andreas Dilger [ 20/Oct/20 ]

Is this still a problem, or has it been fixed in master with some other patch?

Comment by Peter Jones [ 20/Nov/20 ]

as per Mike, it is ok to close out this ticket

Generated at Sat Feb 10 02:49:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.