[LU-12609] Limit maximum number of journal credits for an RPC Created: 29/Jul/19  Updated: 29/Oct/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Patrick Farrell (Inactive) Assignee: Alex Zhuravlev
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

It is important to limit the total number of fragments allowed in an RPC to avoid issues with the on disk file system which can occur with too many fragments.

For example:

May 27 03:21:02 oss2c104 kernel: Lustre: 5711:0:(osd_handler.c:1155:osd_trans_dump_creds())   create: 0/0/0, destroy: 0/0/0
May 27 03:21:02 oss2c104 kernel: Lustre: 5711:0:(osd_handler.c:1155:osd_trans_dump_creds()) Skipped 4 previous similar messages
May 27 03:21:02 oss2c104 kernel: Lustre: 5711:0:(osd_handler.c:1162:osd_trans_dump_creds())   attr_set: 1/1/0, xattr_set: 1/1/0
May 27 03:21:02 oss2c104 kernel: Lustre: 5711:0:(osd_handler.c:1162:osd_trans_dump_creds()) Skipped 4 previous similar messages
May 27 03:21:02 oss2c104 kernel: Lustre: 5711:0:(osd_handler.c:1172:osd_trans_dump_creds())   write: 2/17434/0, punch: 0/0/0, quota 5/5/0
May 27 03:21:02 oss2c104 kernel: Lustre: 5711:0:(osd_handler.c:1172:osd_trans_dump_creds()) Skipped 4 previous similar messages
May 27 03:21:02 oss2c104 kernel: Lustre: 5711:0:(osd_handler.c:1179:osd_trans_dump_creds())   insert: 0/0/0, delete: 0/0/0
May 27 03:21:02 oss2c104 kernel: Lustre: 5711:0:(osd_handler.c:1179:osd_trans_dump_creds()) Skipped 4 previous similar messages
May 27 03:21:02 oss2c104 kernel: Lustre: 5711:0:(osd_handler.c:1186:osd_trans_dump_creds())   ref_add: 0/0/0, ref_del: 0/0/0
May 27 03:21:02 oss2c104 kernel: Lustre: 5711:0:(osd_handler.c:1186:osd_trans_dump_creds()) Skipped 4 previous similar messages
May 27 03:21:02 oss2c104 kernel: LustreError: 5711:0:(osd_internal.h:1117:osd_trans_exec_op()) share1-OST004c-osd: op = 7, rb = 7
May 27 03:21:02 oss2c104 kernel: LustreError: 5711:0:(osd_internal.h:1117:osd_trans_exec_op()) Skipped 3 previous similar messages
May 27 03:21:02 oss2c104 kernel: Pid: 5856, comm: ll_ost_io01_044
May 27 03:21:02 oss2c104 kernel: \x0aCall Trace:
May 27 03:21:02 oss2c104 kernel:  [<ffffffffc074680e>] libcfs_call_trace+0x4e/0x60 [libcfs]
May 27 03:21:02 oss2c104 kernel:  [<ffffffffc0746846>] libcfs_debug_dumpstack+0x26/0x30 [libcfs]
May 27 03:21:02 oss2c104 kernel:  [<ffffffffc0e3fcfd>] osd_trans_start+0x36d/0x390 [osd_ldiskfs]
May 27 03:21:02 oss2c104 kernel:  [<ffffffffc0f964fe>] ofd_trans_start+0x6e/0xf0 [ofd]
May 27 03:21:02 oss2c104 kernel:  [<ffffffffc0f9cc63>] ofd_commitrw_write+0x943/0x1d90 [ofd]
May 27 03:21:02 oss2c104 kernel:  [<ffffffffc0fa100b>] ofd_commitrw+0x54b/0xb70 [ofd]
May 27 03:21:02 oss2c104 kernel:  [<ffffffffc0bd39d6>] obd_commitrw.constprop.39+0x2fe/0x341 [ptlrpc]
May 27 03:21:02 oss2c104 kernel:  [<ffffffffc0bc02e0>] tgt_brw_write+0xd60/0x16d0 [ptlrpc]
May 27 03:21:02 oss2c104 kernel:  [<ffffffff810d11c4>] ? update_curr+0x104/0x190
May 27 03:21:02 oss2c104 kernel:  [<ffffffff810cdbce>] ? account_entity_dequeue+0xae/0xd0
May 27 03:21:02 oss2c104 kernel:  [<ffffffffc0b0a650>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc]
May 27 03:21:02 oss2c104 kernel:  [<ffffffffc0bbc460>] tgt_request_handle+0x900/0x11f0 [ptlrpc]
May 27 03:21:02 oss2c104 kernel:  [<ffffffffc0b5edc0>] ptlrpc_server_handle_request+0x220/0xa90 [ptlrpc]
May 27 03:21:02 oss2c104 kernel:  [<ffffffffc0758818>] ? lc_watchdog_touch+0x68/0x180 [libcfs]
May 27 03:21:02 oss2c104 kernel:  [<ffffffffc0b5b938>] ? ptlrpc_wait_event+0x98/0x330 [ptlrpc]
May 27 03:21:02 oss2c104 kernel:  [<ffffffff810bdc4b>] ? __wake_up_common+0x5b/0x90
May 27 03:21:02 oss2c104 kernel:  [<ffffffffc0b6278f>] ptlrpc_main+0xc3f/0x1fa0 [ptlrpc]
May 27 03:21:02 oss2c104 kernel:  [<ffffffffc0b61b50>] ? ptlrpc_main+0x0/0x1fa0 [ptlrpc]
May 27 03:21:02 oss2c104 kernel:  [<ffffffff810b4031>] kthread+0xd1/0xe0
May 27 03:21:02 oss2c104 kernel:  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
May 27 03:21:02 oss2c104 kernel:  [<ffffffff816c1577>] ret_from_fork+0x77/0xb0
May 27 03:21:02 oss2c104 kernel:  [<ffffffff810b3f60>] ? kthread+0x0/0xe0 


 Comments   
Comment by Patrick Farrell (Inactive) [ 29/Jul/19 ]

https://review.whamcloud.com/#/c/35609/

Comment by Andreas Dilger [ 10/Aug/19 ]

Patrick, based on your recent comments in the patch, you indicate that the number of fragments per RPC is already being limited? Do you have some empirical evidence to suggest this is the case (eg. logs with 4MB RPCs showing random writes sending <= 256 niobufs?

Another area of investigation is to see whether the journal credits can be "slimmed down" in such cases. It is entirely possible that we are over-allocating buffer credits for such cases (eg. superblock, inode, higher-level index blocks, etc x256) when they cannot possibly be needed, even if each block was allocated to a separate group.

Comment by Patrick Farrell (Inactive) [ 16/Aug/19 ]

Well, I noticed when writing my code I was successfully limiting the number of fragments/niobufs, but then I realized what I had done was identical to the existing "limit number of extents" code, basically.  Since every extent is one fragment, guaranteed.  (I did do some poking to check this.)

I could do some more testing if you'd like, but I'm pretty sure.  (I didn't do random i/o, I just did strided unaligned i/o from one process with big gaps.  So each write generated an extent.)

re: Credits, unfortunately, I am totally out of my depth in that area.

Comment by Andreas Dilger [ 29/Oct/19 ]

Alex, it may be that this is avoided by the writeback cache patch, but allocating too many credits for random IO is possibly still a performance overhead that isn't needed? Not a huge priority, but maybe worth a few minutes if there is something obvious to fix.

Generated at Sat Feb 10 02:54:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.