Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.15.0
-
None
-
3
-
9223372036854775807
Description
On clients with large amounts of RAM it's possible to have large thinly-striped files to have a single object with a lot of pages cached.
When such a lock is then canceled iterating over all of those pages takes a long time during which three are no RPCs to be sent (e.g. because we are truncating the lock or if the lock is PR).
Here's a simple testcase I have
lfs setstripe /mnt/lustre -c 2 dd if=/dev/zero of=/mnt/lustre/testfile1 bs=4096k count=1 dd if=/dev/zero of=/mnt/lustre/testfile2 bs=4096k count=800 mv /mnt/lustre/testfile1 /mnt/lustre/testfile2
Now the the destroy for the 3.2G file causes every of both stripes to be destroyed and according to the logs even at default log level the process takes 4.7s, so if the file was 30x bigger (100G) we'd already spend 141 second just iterating over pages on this particular machine.
00010000:00010000:0.0:1622008589.369887:0:5816:0:(ldlm_request.c:1150:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-OST0001-osc-ffff880316ae0800 lock: ffff88039a18cd80/0xfe254c0b2e6873ba lrc: 3/0,0 mode: PW/PW res: [0x19:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x428400010000 nid: local remote: 0xfe254c0b2e6873c1 expref: -99 pid: 11550 timeout: 0 lvb_type: 1 00000080:00200000:0.0:1622008589.369896:0:5816:0:(vvp_io.c:1717:vvp_io_init()) [0x200000401:0x18:0x0] ignore/verify layout 1/0, layout version 0 restore needed 0 00000080:00200000:0.0:1622008594.161234:0:5816:0:(vvp_io.c:313:vvp_io_fini()) [0x200000401:0x18:0x0] ignore/verify layout 1/0, layout version 0 need write layout 0, restore needed 0 00010000:00010000:0.0:1622008594.161266:0:5816:0:(ldlm_request.c:1209:ldlm_cancel_pack()) ### packing ns: lustre-OST0001-osc-ffff880316ae0800 lock: ffff88039a18cd80/0xfe254c0b2e6873ba lrc: 2/0,0 mode: --/PW res: [0x19:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x4c69400010000 nid: local remote: 0xfe254c0b2e6873c1 expref: -99 pid: 11550 timeout: 0 lvb_type: 1
We need to send something to the server if cancel is taking a long time just to prolong the lock and indicate we are still there. This is not super ideal because of course instant cancel RPC sounds better on the surface but is trickier to implement in all cases but DESTROY where we are sure no more data could be added to the mapping.