[LU-14711] Canceling lock with a lot of cached data can take a lot of time Created: 26/May/21  Updated: 20/Jan/23  Resolved: 04/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-14885 sanity test_903: __ptlrpc_prep_bulk_p... Resolved
Related
is related to LU-11290 Batch callbacks in osc_page_gang_lookup Resolved
is related to LU-13134 try to use slab allocation for cl_page Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

On clients with large amounts of RAM it's possible to have large thinly-striped files to have a single object with a lot of pages cached.

When such a lock is then canceled iterating over all of those pages takes a long time during which three are no RPCs to be sent (e.g. because we are truncating the lock or if the lock is PR).

Here's a simple testcase I have

lfs setstripe /mnt/lustre -c 2
dd if=/dev/zero of=/mnt/lustre/testfile1 bs=4096k count=1
dd if=/dev/zero of=/mnt/lustre/testfile2 bs=4096k count=800
mv /mnt/lustre/testfile1 /mnt/lustre/testfile2

Now the the destroy for the 3.2G file causes every of both stripes to be destroyed and according to the logs even at default log level the process takes 4.7s, so if the file was 30x bigger (100G) we'd already spend 141 second just iterating over pages on this particular machine.

00010000:00010000:0.0:1622008589.369887:0:5816:0:(ldlm_request.c:1150:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-OST0001-osc-ffff880316ae0800 lock: ffff88039a18cd80/0xfe254c0b2e6873ba lrc: 3/0,0 mode: PW/PW res: [0x19:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x428400010000 nid: local remote: 0xfe254c0b2e6873c1 expref: -99 pid: 11550 timeout: 0 lvb_type: 1
00000080:00200000:0.0:1622008589.369896:0:5816:0:(vvp_io.c:1717:vvp_io_init()) [0x200000401:0x18:0x0] ignore/verify layout 1/0, layout version 0 restore needed 0
00000080:00200000:0.0:1622008594.161234:0:5816:0:(vvp_io.c:313:vvp_io_fini()) [0x200000401:0x18:0x0] ignore/verify layout 1/0, layout version 0 need write layout 0, restore needed 0
00010000:00010000:0.0:1622008594.161266:0:5816:0:(ldlm_request.c:1209:ldlm_cancel_pack()) ### packing ns: lustre-OST0001-osc-ffff880316ae0800 lock: ffff88039a18cd80/0xfe254c0b2e6873ba lrc: 2/0,0 mode: --/PW res: [0x19:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x4c69400010000 nid: local remote: 0xfe254c0b2e6873c1 expref: -99 pid: 11550 timeout: 0 lvb_type: 1

We need to send something to the server if cancel is taking a long time just to prolong the lock and indicate we are still there. This is not super ideal because of course instant cancel RPC sounds better on the surface but is trickier to implement in all cases but DESTROY where we are sure no more data could be added to the mapping.



 Comments   
Comment by Oleg Drokin [ 26/May/21 ]

tangentially related to speed up processing is LU-11290 with two patches there (Vitaly quotes 30% processing time improvement with both), but I feel like it does not fully fix the problems since at certain size processing would still be higher than the timeout so we still need to have a way to calm impatient server at the very least.

Comment by Andreas Dilger [ 26/May/21 ]

Per earlier discussion, it may be possible that sending a zero-byte read or write to the OST with the cancelling DLM lock handle would be enough to prolong the lock timeout on the OSS, and avoid eviction.

However, reducing the time that page eviction takes would also be desirable, such as LU-11290, and any other optimizations to reduce the per-page overhead, like LU-13134 which reduces the size/count of allocations per page.

Comment by Oleg Drokin [ 28/May/21 ]

zero sized io sadly does not work so I'll do 1 byte io with "discard me" flag, old servers not aware of the flag will do io, new servers will discard the io altogether.

As I am adding a patch here, I just realized that just prolonging the lock from client side is still only a half measure, the client that sent a lock cancel is still going to timeout in 600 seconds (at_max). Though they will resend so at least no evictions, but the chatter in the logs will be substantial. Something to keep in mind.

Comment by Gerrit Updater [ 28/May/21 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43857
Subject: LU-14711 osc: Notify server if cache discard takes a long time
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9cd980cb07838ebb9a543870a3a8e998d567ac8c

Comment by Gerrit Updater [ 29/May/21 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43869
Subject: LU-14711 tests: Test demonstrating eviction during long cache processing
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e868e6deb017cadf1e75a355fee0639140faf6f8

Comment by Gerrit Updater [ 14/Jun/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43857/
Subject: LU-14711 osc: Notify server if cache discard takes a long time
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 564070343ac4ccf4f97843009e1c36f5130ac19c

Comment by Gerrit Updater [ 13/Aug/21 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44654
Subject: LU-14711 osc: Do not attempt sending empty pages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d3e9202944c6760b1269dd78d4043699200cbf38

Comment by Gerrit Updater [ 17/Sep/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43869/
Subject: LU-14711 tests: Ensure there's no eviction with long cache discard
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c0a7f78529e21c9cafa986abea255925b4b41244

Comment by Gerrit Updater [ 04/Oct/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44654/
Subject: LU-14711 osc: Do not attempt sending empty pages
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1a409a3e6a74685970ee779ebe32917bf51eaf3a

Comment by Peter Jones [ 04/Oct/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:12:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.