[LU-17197] Performance regression with "LU-15947 obdclass: improve precision of wakeups for mod_rpcs" Created: 14/Oct/23  Updated: 22/Dec/23  Resolved: 22/Dec/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Major
Reporter: Shaun Tancheff Assignee: Shaun Tancheff
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-15947 Spinlock contention during wake_up_al... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

A performance regression:

Introduced MDTest Unique Directory Directory (-25%)/File Removal(-11%)/File Reads Regressions

mdtest -v -i 5 -p 120 -C -w 32768 -E -e 32768 -T -r -n 3120 -u -d <dir>
...
mdtest -v -i 5 -p 120 -C -E -T -r -n 3120 -u -d  <dir>

Also an IOR 4k sequential read (-6%)

ior -F -r -t 4k -D 180 -u -b 8g -o <path>
...
Options: 
api                 : POSIX
apiVersion          : 
test filename       : /mnt/kjlmo2/pkoutoupis/flash/test.05
access              : file-per-process
type                : independent
segments            : 1
ordering in a file  : sequential
ordering inter file : no tasks offsets
nodes               : 21
tasks               : 336
clients per node    : 16
repetitions         : 1
xfersize            : 4096 bytes
blocksize           : 8 GiB
aggregate filesize  : 2.62 TiB
stonewallingTime    : 180
stoneWallingWearOut : 0


 Comments   
Comment by Gerrit Updater [ 14/Oct/23 ]

"Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52698
Subject: LU-17197 obdclass: revert improve precision of wakeups
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8f1dd012fd651a05824919fff8591c3ae73d9cf3

Comment by Gerrit Updater [ 14/Oct/23 ]

"Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52699
Subject: LU-17197 ptlrpc: Sort waiters on close_req completion
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3054382915f534e8fb4ed00e256b66fda2aab5a9

Comment by Neil Brown [ 17/Oct/23 ]

25% is a really big regression.  I think the most likely cause is requests being handled in a different order.

The patch you are reverting claimed to improve fairness but I now see that it does exactly the reverse.

Previously most waiters were queued with wait_event_idle_exclusive which adds them to the end of the queue.  With my patch __add_wait_queue() is used, which added to the front.  That is bad - but easily fixed.

Instead of this patch, could you please try changing the __add_wait_queue() in obd_get_mod_rpc_slot() to __add_wait_queue_entry_tail().  That should restore fairness and preserve all the other good properties of my patch.

Thanks.

Comment by Gerrit Updater [ 18/Oct/23 ]

"Neil Brown <neilb@suse.de>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52738
Subject: LU-17197 obdclass: preserve fairness when waiting for rpc slot
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d7f66cb67c025519ec743ac3594b9634562e2223

Comment by Gerrit Updater [ 08/Nov/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52738/
Subject: LU-17197 obdclass: preserve fairness when waiting for rpc slot
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b5fde4d6c02324a8511afe30d02eb2cf46ea799d

Comment by Gerrit Updater [ 24/Nov/23 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53232
Subject: LU-17197 obdclass: preserve fairness when waiting for rpc slot
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 34de852e55233bc13d1e795a0a1912f5ac80e15b

Comment by Peter Jones [ 22/Dec/23 ]

Fix merged for 2.16

Generated at Sat Feb 10 03:33:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.